Latest Posts (20 found)

An Interview with New York Times CEO Meredith Kopit Levien About Betting on Humans With Expertise

Listen to this post: Good morning, This week’s Stratechery Interview is with New York Times Company CEO Meredith Kopit Levien . Levien became CEO in 2020, after previously serving as Chief Operating Officer, Chief Revenue Officer, and Head of Advertising. I previously interviewed Kopit Levien in August 2022 . The New York Times editorial team always elicits strong reactions, both in the political realm and also in tech, but that’s not what this interview is about; what is indisputable is that the New York Times as a business is both incredibly interesting and incredibly successful. Over the last decade the newspaper has gone from strength to strength, building a thriving subscription business, expanding its bundle from news to Games to Sports to Cooking and more, and now — to take things full circle — has a rapidly growing advertising business. We discuss all of that in this interview, starting with the Games and Sports categories, why the bundle is about expanding the New York Times brand, and the company’s recent push into vertical video. Then we discuss what it means to be a destination site, while also using Aggregators to acquire customers. We spend time on AI, including the New York Times lawsuit against OpenAI, why Kopit Levien sees humans as the moat against AI content, and how the company is using AI on both the business and editorial sides. Finally we discuss the potential for building communities, why advertising is working, and how surviving in an Aggregator and AI world is about fighting entropy. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Meredith Kopit Levien, welcome back to Stratechery. MKL: Hi Ben, thanks for having me, so happy to be here. It’s hard to believe, but it has been four-and-a-half years since you last came on — I was thinking two or three years ago — nope, it’s almost half a decade. I was actually shocked that I’ve been doing interviews for that long, but apparently I’ve been doing them for like six, six-and-a-half years. MKL: You have, and I’ve listened to a lot of them! I appreciate it. Well, we already did the whole background conversation then, we both worked for the student newspaper, lots of commonality there. So let’s fast forward to the time of that interview. It was August 2022, and speaking of mind-blowing lengths of time, you had bought Wordle earlier that year, it’s hard to believe it’s been that long and then you had just purchased The Athletic . How do you feel about those acquisitions five years on? MKL: That’s such a fun place to start. We acquired both of them, if I remember correctly, within a week of each other, and I would say we feel great about both of them and both of them have exceeded our expectations in so many ways. Is Wordle the greatest media acquisition of all time? MKL: You know what I tell people? That New York Times Games is the most up-and-to-the-right thing I’ve experienced in my career in terms of just people’s attention to it and the way it kind of touched culture and still touches culture every day, and the ability for Wordle to be like a megaphone for these other incredible games that we already had that most people didn’t know about. And then what’s so amazing to me is we now have, I think 11 games — half of them are free, half of them are paid games, tens of millions of people play our games every day. And we have made the vast majority, we’ve made those games. And before Wordle and after Wordle, Wordle in and of itself is extraordinary, but before and after, we’ve made other extraordinary games, it’s so awesome. Is it a bit of like annoying that’s like everyone thinks about Wordle, “Oh, you bought Wordle”, and you’re like, “Look, we made most of these, give us more credit here!”? MKL: Listen, credit to Josh Wardle , it’s an awesome game, and it just touched culture like nothing else. But it has served us so brilliantly — it has just shined this huge light on all these other games and it’s given us a chance to prove our chops as a game studio and we just keep making hits. I am so proud of our games team, Jonathan Knight and the whole team around him, they have done such good work and they are still hard, hard at it, that team works so hard. I’m a Connections player , so Wyna Liu is my hero , but they’re all amazing and they put out really good work. Games, it’s going swimmingly, I hope we get to talk even more about it. As long as we’re here, like how has your – because we were talking a bit about, Wordle sort of came out of the blue — it was this game that popped up, you snapped it up, super smart — and we were talking in our interview about it being an in-point to the New York Times broadly. MKL: Yeah. Has that evolved as you expected or has it evolved in different ways? In the context of not just Games being a property but also it tying into the whole thing. MKL: What a great question. To answer that, let me step back for a minute and say our strategy is for the whole of the New York Times and all the different parts of the portfolio to be an essential subscription for curious people everywhere who want to understand the world and make the most of their lives. We’ve got three pillars to that, 1) be, and become even more every day, the world’s best news destination 2) have these leading lifestyle products, including Games, but also Sports, Recipes, shopping advice, that really help people do their passion more deeply or better or enjoy it even more and then put those two things together, news and the lifestyle products, in an interconnected experience so that the New York Times is incredibly relevant to you every single day, whatever is going on in the world or your world. Right. This is a point you made before, is you wanted the New York Times to not just be — sometimes the news is slow, or sometimes stuff’s happening you don’t care about, and you wanted to have other stuff for people along the way. MKL: Listen, I want to be really clear. We are first and foremost a high quality independent news journalism company, that is our mission, it is the most value-creating thing we do for society and economically, and that is by miles. And to your original question, it’s just amazing to have all these other points of introduction to people and point all these other ways to bring people into the Times ecosystem and to get them to form a habit with us. Once we do that, once we can engage them in something, our bet is that we can engage them in more and more, and there’s lots of examples of that. You mentioned you had three things, you had the news, you had the lifestyle, what was the third one? MKL: Yeah, so news, news is such a small word for such a big idea. You mentioned that sports is a lifestyle so is sports not news? Is that lifestyle? It’s kind of interesting where that fits. MKL: We do sports news, we do sports journalism, we do news journalism. But let me stay on the news thing for a minute because we’re often even trying ourselves in how we articulate it to not let it be this small idea. We do high quality, original, independent journalism, which means we are unearthing new and important information through reporting and also providing often deeply reported commentary and analysis on the really big topics that are going on in the world and also on things that just matter at the level of relevance of people’s daily lives. You could read us today for what is happening with this fragile ceasefire in Iran and you could also read us today for health advice or for what movie to go see or what restaurant people are eating in in New York City right now. News is this very broad thing at The New York Times, and we’ve got these four lifestyle products. I would say to you what we’re doing with The Athletic is absolutely journalism, often it is like news journalism, but make no mistake, and we are doing it with the rigor and the independence that The Times does. It’s journalism, but we are doing it for fans, we are doing that journalism. Right. It never occurred to me until you sort of mentioned it — it’s not wrong to say that sports is a lifestyle category. MKL: Totally. That intersection is actually kind of interesting to think about. MKL: Let me tell you something — I have an almost 15-year-old, he is an athlete, and he is a giant sports fan and when I think, “What are his lifestyle pursuits?”, when I fill out the parent statement in the school applications, first he’s a sports fan, and The Athletic is serving that fandom. Do you think there’s a bit where some of this sports journalism has been caught up in, “We are journalists”, bit and has missed the fact that people watch sports in many cases as a pastime to relax. I look forward to turning on the baseball game at night, I don’t want the perils of the world, this is supposed to be an escape. It’s also most helpful to put it in this lifestyle category because that’s actually meeting people where they are. MKL: I think that’s a great point. What I will say is The Athletic often does very hard-hitting sports journalism, it is certainly covering the important topics and the tough topics across the major leagues and teams in the United States and European football and a bunch of other things, so it is doing that, hard stop. But if you look at the multiplicity of things they’re doing and you look in a day’s time, it’s probably well over 100 stories that get published every day, an enormous amount of that is beat reporting on what happened to your team in the league that you most likely watch and it is literally meant to make you closer to the team, the fan, the game. I think all high quality information is — consumers of information want uncompromised information and so The Athletic is just like uncompromised the way The Times is uncompromised, it’s going to pursue the truth wherever it may lead, even when that’s to uncomfortable places. But the whole purpose of the broad set of things we do at The Athletic is to make you a better fan, and we know that. Whereas the purpose, and again, that does not mean we don’t do hard-hitting journalism, we absolutely do, but we are independent of anyone’s interest in that journalism but the sports fan. And for the Times, we’re not writing or producing our work for any particular audience, we’re doing it in service to the public’s interest. Is that a value of keeping The Athletic brand separate from the New York Times? MKL: We are absolutely committed to building the brand The Athletic, it was a deliberate choice, I’m very invested in that choice and we’ve still got a lot of running room to build it. I say the biggest opportunity with The Athletic is just to make more sports fans. We’re making real progress with it and let me tell you, you asked me at the beginning, “How’s it going?”, we bought a company that was losing a ton of money because they were investing into a huge sports newsroom, it’s like a giant newsroom with a little business. We said it would take some time, but then it would be accretive to the Times — it is absolutely that. We got there in many ways earlier and better than we expected and today we’ve got well over 500 journalists at The Athletic. So it’s an even bigger journalistic proposition and it’s really contributing as a business to The Times and we’re thrilled about that, and I want to say we’re only four years and a few months in, we’re just getting started on all the ways we can support fandom of the major sports. I think we were nailing the journalism thing, you’re always going to get better and better at that, they were good at it before we acquired them, we’ve helped them be even better at it, do it more robustly, do it in a more edited way and add like a layer of national, and in some cases global, sports coverage. But there’s just a lot of stuff that there’s a lot of white space in the market to serve fans deeply reported, uncompromised information and we’re going to do that. You have such a good product organization and you have the whole Games initiative, how much do you think about the prospects for games in the context of sports? Whether this be fantasy sports or sort of a whole host of like daily pick-ems — it’s interesting because there’s obviously a huge gambling angle to this but how many of those sort of offerings are possible without necessarily being gambling or whatever it might be? MKL: Yeah, great question. We think there’s real opportunity for Puzzles/Games, and Sports, we think we’re good at both of those things. We already have our first collab, I think it’s about a year old, we launched a Sports Connections puzzle , it is super fun. We did some great marketing for it with famous athletes, which was hilarious, and it’s played a lot, so people love it, and I would say that is early. We’re building out the team, we just hired a new Chief Product Officer at The Athletic , he comes following years of building communities at Facebook. We took one of the guys from the Times newsroom who’d been a leader of the Upshot, who’s incredible at building interactive work, and he’s now leading interactive work at The Athletic, so we think there’s real opportunity for that. And I’ll tell you just this week, it might even be today, I’m losing track of my dates, we are launching something called The Beast . I don’t know if you’re an NFL fan, but it is the most comprehensive guide I think that exists on the planet to the NFL draft class and it includes literally information on thousands of players who are draft hopefuls and then very deep profiles of 400 of them. Before we owned The Athletic, and actually until a year ago, we’d publish it like as a book, a physical book, it’s this like monster book because there’s so much information in it and teams use it, there’s nothing else like it. Now you’ll see as it launches this week, it’s got all these incredible interactive features now on the individual player profiles and if you’re someone, if you love an NFL team and you really care, you’re going to pay attention to The Beast. So I think we’re just getting started on features that may be games and also other things that support a fan who’s super passionate about their team. I keep interrupting you, but you mentioned three things, so we’ve got to get that third thing. What was the third thing in addition to news and lifestyle? MKL: World’s best news destination, leading lifestyle products, and put those two things together in an interconnected product experience for a bundle that makes The Times relevant for whatever is going on in your world, or the bigger world, every single day. That’s the idea. Got it. We talked a lot about bundling last time and obviously that’s really the core of your strategy, how though has that evolved in the last five years? Is this really a most people are coming in the door through these lifestyle brands and you’re bringing them to the news, whereas it used to be the other way before? I’m throwing that out there as a hypothesis, how does that actually work? MKL: I actually think the essence of it is about having this portfolio of world-class news coverage, news broadly defined, and then not just products, but these products that either are or are becoming the leaders in their category. These categories are giant spaces where tens of millions, in some cases hundreds of millions, of people spend a lot of time. It’s the fact that we have rare and valuable news coverage and lifestyle products in these huge spaces that’s really working. So to me, the word “bundle” can mean — the low common denominator version of it is, “It’s a marketing concept or merchandising concept” — in our experience, we’ve got this singular idea of being essential in meeting a lot of different kinds of information and experience needs in a person’s life. Rather than it be this idea of, “We’ve got one big important thing” — I’m going to come back to news in a minute because news is central to all of it — but you’ve got this one major hero thing and then you append a bunch of other stuff so the consumer thinks there’s some other value there, we have invested and built these products out in such a way where each thing should be deeply valuable to the person who cares about buying the right products and is going to deeply research them, and therefore they use Wirecutter. You talked about expanding the brand, is this what you mean? Where you hear “New York Times”, it’s not, of course news is always the most important, I know you’re going to say that, so I’ll say that for you. MKL: I’m going to say that again and again, because it’s true. It’s also the most economic-value creating thing we do. Right. But you want people to think that, “New York Times, that’s the best games”, or, “That’s the best cooking”. MKL: New York Times makes the best puzzles, it has the best recipes, and by the way, just advice for home cooks who want to cook, it’s where I go if I’m a sports fan, and it’s absolutely going to give me the best uncompromised shopping advice — that’s sort of the spirit of it. It’s not just a news indicator it’s like a “stamp of quality” indicator. MKL: It’s a stamp of rigor and quality, and I’m going to keep using this word, “uncompromised”. Really high quality information that’s done in an uncompromised way and therefore has value at real scale. And the “uncompromised” comes from the business model? MKL: Uncompromised comes from the idea that at our core what we do is independent journalism. You could even say every bit of it, even the games are like journalistic in that they are sort of planned in a very deliberate way and thought out. Right. They’re not randomly generated, someone is actually editing every puzzle. MKL: That’s right. Humans with expertise are making these things and in some cases harnessing technology to do that even better. It’s really working, and I want to say to you, I wouldn’t have had these words four-and-a-half years ago, but at the core what we’re trying to do in a very complex information ecosystem, really shaped and controlled by a small number of dominant tech platforms, we are trying to make news coverage and products that are so good that people seek them out and ask for them by name. A destination site . MKL: Seek them out, ask for them by name, make room in their lives. The destination site has been — there’s a few companies that I always feel very pleased about, I feel like they’re like my children in a way. MKL: Are we one of your kids? You are one of my kids! MKL: I appreciate that, we could use all the parents, we could use it. That’s why I loved that, I’ve mentioned it multiple times, but the strategy document that you guys, it’s been like a decade now — I’m like, “This is beautiful”, and I think it really was on this point of destination sites, this idea that the way around a world of Aggregators that just commoditizes everything is people have to seek you out directly. Google will say a competition is only a click away and no one seems to take that seriously, people can actually click on you and go there. MKL: My answer, we all read your Aggregation Theory and all the updates you’ve done to Aggregation Theory. The way I think about it is for more than a decade, we have had these like four D’s that we’re obsessed with. Ready? So what do I mean by that? We know we exist in an ecosystem shaped by these dominant tech platforms and so and we have to have a wide free layer for our work, we have to, otherwise you can’t bring in the next subscribers. So we are very deliberate where we can be about how we go about doing that and the idea is we need to be able to get you to sample our stuff and fall in love with it and we’ve got to give you enough time and space to make a habit of it so that ultimately you subscribe. Yeah, that’s really interesting. I was going to ask this towards the end, but that’s a good lead into it. You’ve had a big focus on video recently, and it’s super interesting – actually, I have a few questions about this. One is it’s pretty weird to go to the video tab on the desktop and all the videos are vertical. Was that very controversial? MKL: There’s video all over the site now so you’re gonna see it in a lot of places. When we say destination, we know a lot of people during the workday are reading us or watching us or listening to us on the desktop web, but we are so kind of first to that phone. Our bet is the ability to watch a video on a phone, you are going to want it in vertical and we now have a home for it in this tab. I encourage everybody, download our app, and you get the best version of what we’re doing. Download your app and make sure you register your user account and get the experience. It’s really interesting because I’ve noticed with Stratechery actually, a huge portion of my audience now is just audio, I think more than half my subscribers listen instead of read. You mentioned you mostly listen, which is fine. But as far as the reading goes, actually, I still have a huge amount of people reading on the desktop as compared to mobile. MKL: By the way, I listen when I run because all my other media time is reading. MKL: And now I’m forcing myself to watch. Right, you’ve got to dogfood it . MKL: I’m like listening to YouTube when I run. Just talking shop, is there a bit where, as you look back on the evolution of media, there’s a thing where actually it turned out that the browser ended up being a text medium, and then the phone was actually the multimedia platform? MKL: That’s such a great question, that’s so well put and I need to take that in for a minute and think about it. What I’ll say that I think that’s related to that in a web world, we needed a website that people would type in and then like pin and always be able to go back to, that worked and the Times has been very good at that. In an iOS and Android world, we need an app, and we’re very, very good at that. I would actually say to you, we’re still pretty early in really getting more and more people to use our app. Today, the majority of people who use our app are subscribers, the engagement is enormous, but it’s like mostly the people who subscribe. We have not made the app a really important place for prospects and we’re starting to do that, the Watch tab is part of that. I think it remains to be seen in a world where the Times is as preferred a brand and a source for watching as it is for reading and listening. Which, by the way, I want to say to you, those things are not going to go away, we’ve been at this for 175 years. MKL: The old media doesn’t go away, the people who do it still do it. They vary it a bit, but many of them still do it. To your point, this is a big part of your approach is you have this huge reporting base, which the medium, that’s all ones and zeros, they can write an article, and they can be on a podcast, and they can show up in video. MKL: And they can put a camera, they can literally hold a camera in front of them from somewhere on the edges of Iran and describe what they’re seeing. So I think it remains to be seen, I think the market is still kind of forming and structuring. We regard video as doing three really important things for us. One is it helps us engage the people we already have, and anything that helps us engage the people we already have is very good for business. Churn mitigation is always a win if you’re a subscription business. MKL: It’s good for business, and I would argue it’s good for journalistic impact and everything. Good for society, but very good for business. We also think there is an enormous number of people in all generations of life, but especially young people, who spend time watching, and they’re either watching news or they’re watching things that are in a zone adjacent. We are the only generation that really just maximized text, it’s been all downhill ever since. We got all the text in the world, we read it all, and then now everyone’s just watching video. MKL: I could do a whole other episode on that and fight to get my very intelligent kid to just like sit back and read and how important I think that is to brain development. But we think video will help us engage whole new audiences, that is a big bet we’re making, we’re already starting to see some of that, we are very excited about it. And then the third thing that video does for us, and I think that’s really important, I think we all know that trust in all institutions is at an all-time low, trust in media is at an all-time low, I hate the word “media” because it lumps in journalism and a bunch of other things, but trust in all of it is low. And the more we can show you the work, the more we believe you will come to understand what an independent journalistic process to pursue the truth wherever it may leave looks like. Interesting. So it’s like brand-enhancing for what you’re going for overall. MKL: Totally, and trust building. I’ll just tell you, we are much more aggressive today than we’ve been. One of the formats that we’ve scaled the most and there’s still so much room to go is just a reporter on camera describing the story. Which by the way then your production is vertical anyway so it ties right in. MKL: But there are times you go into a studio and explain something, so it doesn’t have to only be vertical, it goes a really long way. And we have made a very deliberate choice where we’ve said, we don’t particularly have a business model on TikTok or Instagram or YouTube Shorts, but we’ve got to be in those places. I wanted to ask you about that because when you think about podcasts, for example, there’s a huge push in general to be on YouTube and I think it’s pretty obvious because podcasts are incredible for audience retention. I’ve talked about for my business, all these people listening to Stratechery don’t go anywhere. Whereas people would have emails build up before that, and they’re like, “I have too many emails, I should just unsubscribe”, the problem is I get much less sharing because it’s much easier to forward an email and the podcast, you just go to the next podcast and then it’s sort of done. So you have podcasts in general going to YouTube because they feel like the algorithm is the way to acquire new users. The reason to bring this up is I go to the New York Times YouTube page right now, your last main video is from seven days ago. Your last Short is more recent, but it’s about Trump escalates threats to destroy Iran. Well, there’s been some news development since those threats. MKL: You think? Consult top of app. But the point is clearly it’s not a priority for you. How does that tie into the balance of destination site versus customer acquisition and all those sorts of things? MKL: It’s a great question. Let me start by saying our general thesis, and I’ve been here a long time now, so I’ve got enough reps to say it bears out. If we make great work that should scale because it’s unlike anything else out there, and it’s important, it will. I want to say that, that is our bet. And so I will say to you, we’re still at. That’s my bet too. MKL: I listened to enough of your work to know you think that too. It’s a really important principle that we’ve just like hit again and again and again as a business. First, we have to make like the best stuff there is, and it’s got to be done in an independent way and it’s got to be done with rigor into a high standard of quality. So the chapter we’re in now with video is very much scaling production, which is like, “What are we making?”, “What is it?”, “What is the New York Times if you can watch it?”. We are early in that and we’re going to admit that all over the place. We are, as I started to say, putting a lot of that work. The best place to experience it is come to our app, go to the website, even if you have to, you know, even if on the site, some of it is shot for vertical, best place to experience it is our destinations. But we need to be in the places where huge numbers of people are. So the work is also on TikTok and Instagram, it’s on YouTube both in short form and on YouTube, we’re starting to put our longer form stuff there. And the truth is, it’s a place where we can see, you are right, a lot of it is dictated by algorithms, but also you get a sense of what is a hit. I’m going to name a few things that are just like unequivocally hits at the New York Times as video. The Ezra Klein show was only a podcast, it’s now a video show too — that guy is so brilliant, he has such an incredible following, we are so excited about that show. Right around the time we were putting him on video, we launched, to the extent that Ezra is examining the biggest ideas on the left, Ross Douthat is examining the biggest ideas that are animating the right. Ross has been a longtime columnist at the Times, we launched a show, I think we launched the pod and video at the same time it was one of the first ones where we said, we’re going out. You say they’re going huge, are they going huge on your properties, or are they going huge on the RSS feeds and the other platforms? MKL: Out in the ecosystem. And when I say huge, we were early in all of this, they’re building audiences and growing. The Daily is huge, The Morning , we have the largest general interest news newsletter I think on the Internet in terms of readership, five or six million people open it every day. And do you see very tangible, measurable, people are finding this other platforms and coming back to the Times and subscribing? Or is this more ethereal, this is enhancing the brand, in the long run this will pay off? MKL: It’s a great question. The broad answer I’m going to give you, and I ran the subscription business for a long time, I was on top of the product organization, I was accountable for it, the thing I’m sure is that we have to make stuff that is so good that it’s worth paying for even in the presence of free and less expensive alternatives, and we also have to have many tens of millions of people who do not yet pay, who are regularly engaging with our work. We do believe we have to be sort of out there in the ecosystem — of course, you and I both know, you know, we see a receding link-based economy. Did you see that discussion between Nate Silver and Nikita Bier the other day? MKL: Oh, I haven’t seen it yet. They were talking about, because Nate Silver did some sort of article about who’s getting prominence on X and things along those lines, and one of Nikita’s pushback about The New York Times not having prominence, not just on X but on all social platforms, is you do what I do , which is we’re old and lazy and just post an article with a link and Twitter doesn’t feature links anymore. Fine, it is what it is, I have my built-in audience, it’s okay. And it’s like, well, if you actually want to grow, you have to do the whole thread thing like, “This is what’s in this article”, and at the end there’s a link. And Nikita pointed out that the New York Times does the bare minimum, it’s basically like an RSS feed for links, of course they’re not getting featured. Is that something where, I’m telling you now, you didn’t read it, you’re like, “Oh yeah, we should fix that”, or is that a, “Well, you know what? We’re not a social media company, we are a destination site, and that’s just the way it’s going to be”. MKL: It’s a fair question, I think you should regard us as first and most importantly trying to make the best stuff that can and should scale because it’s amazing. And remind me, I’m going to mention two other video shows to you that are so different. And then we are also looking to always master the evolving audience ecosystem. And I think if you followed us, it’s interesting on YouTube, we’re doing more now show by show to build audience so just like you mentioned, the New York Times channel, but like Ezra’s feed is surely updated, Ross Douthat’s feed is updated. I’ll mention these two other shows. We launched our cooking team, launched a show maybe six months ago called The Pizza Interview , we have this amazing test kitchen on the west side of Manhattan and like every major celebrity with something important to say can come on that show now, they make a pizza and they talk about their work. So the cast of Stranger Things came with the finale, Ariana Grande came. That’s a great concept. MKL: It’s amazing. And that show is building so much momentum, so different than what you would expect. It is fun, it’s really working. We’ve had a show, I don’t know if you’re a music fan, Ben, but we’ve got a music critic and a music reporter, Jon Caramanica and Joe Coscarelli, they have had a podcast on The Times for like a decade called Popcast , where they talk about music. It was sort of made at the edges of the enterprise, these guys are so talented, and we’ve just brought them to video and kind of prime time and man is that scaling. They actually did a live show at an all-company meeting with Lizzo, it was unbelievable. They’re getting everybody, it’s so, so great. What you see is we are just in the early days of saying, “How and where should we build the big audience for this?”. The Daily, which is nine years in still in the top podcasts, there is I think it’s the largest general interest news podcast, most people do not listen on The New York Times, they listen on Apple or Spotify. MKL: And you know that because of what you do for a living. So we’re open-minded about that and also pushing really hard on the companies that shape the ecosystem to make it so that great stuff can scale. Yeah, I’ve had plenty of discussions with YouTube. MKL: I’m sure we’re going to talk about that too. Well, we’ve actually gone quite long, I do need to ask you about – there’s this technology called AI you may have heard of, I do have a few questions for you on that. Just to get it out of the bag, you’re in ongoing litigation with OpenAI. Obviously, I’m sure that constrains what you can talk about to a certain extent. But sort of big picture, what’s the point of this? What do you want to accomplish? MKL: We’re in ongoing litigation, two-and-a-half years now with OpenAI and Microsoft, we’ve also sued Perplexity . Why? They stole our stuff, they used it without permission, without fair value exchange, copyright infringement and they build products that compete with us, so that’s why. Let me just say, why did the Times do this? You know, we have spent over 175 years, an enormous amount of resources on high-quality independent journalism, and I want to say this, we’re fighting here, obviously, for the Times, but for the industry writ large for high quality journalism and content creation writ large and for the public to have high quality information and content. We have made an enormous investment, we’ve been doing it for a very long time, and we have a huge number of works. Is your biggest concern the training or the output? MKL: We believe that there should be sustainable fair value exchange for our work used in any way, number one, so fair value exchange sustainably. Number two, we believe we should have control and the law says we should have control over how our work is used, and I would say those are kind of for everyone. And for the Times very specifically, by the way, we’re not just suing, we have a deal with Amazon , we choose to deal, these things are of a piece enforcement of our rights in court and dealing is all to put a stake in the ground to say high quality journalism deserves to be paid for and it should be. And, by the way, the LLMs are only going to be as good as the information that courses through them. The third bit is can we do a deal that’s consistent with our long-term strategy, which involves ultimately having direct relationships with our consumer. Do you worry about — you’ve had this huge growth in terms of these lifestyle verticals, things like recommendations, things like cooking. Some of those AI is really, really good and useful at, do you feel a threat there? Have you seen an impact there? MKL: We’re enforcing our rights in court for very specific reasons. I want to do a number of AI categories so let’s set aside the court case. Let’s just say in terms of NYT Cooking, super compelling. Also, I go to ChatGPT, I ask for a recipe and it will give me one. MKL: Totally fair question. I want to say to you first, we’re also using AI like assertively in our product. Right, my next question is how you’re actually using it. MKL: Let’s come back to that. The most important part of our strategy, and maybe to the extent there’s a theme from this conversation, is that The New York Times creates human-led high quality news journalism and all this other stuff, including recipes that are better because of the humanity, the expertise, the professional process that goes into them. And I want to say, because you asked about cooking specifically, every one of those recipes, we have 25,000 recipes and counting in a database, every one of them, human-tasted, human-tested, they’re better. People say to me all the time, “Your recipes are just better”, yes! Because professional chefs and cooks are using them and it doesn’t get published until we’ve done that. We think that’s going to have enduring value, we think in an information ecosystem where it’s harder and harder to find quality stuff, brands are going to matter more and human-made content is going to matter more. The week you filed the lawsuit, when I wrote about it, I entitled it The New York Times’ AI Opportunity . MKL: I remember what you wrote about it. In this world of everyone getting individualized content and actually that makes you more valuable, not less. MKL: Listen, society needs a shared fact base. People need high quality, uncompromised information and they need to be able to find it with ease and they need to be able to know what is true and worth their time and we think the Times and each of our portfolio brands, each of our lifestyle brands is like a signal to that. So we are obviously investing enormously into all that. Has that been validated in the numbers? MKL: Look at our business results. It’s been a strong period for our business results, I can’t tell you what will happen in the future, but I can tell you we are very, very focused on two things. One, making our products even more kind of rare and valuable at real scale to people, and we are also incredibly focused, part of how I got into this chair, we are incredibly focused on harnessing technology to make the journalism richer where it can help us do that, make our journalists able to get to more things or get to the things more deeply. We are incredibly focused on using technology, and this includes AI, to make the work more accessible. I told you earlier, I’m a runner, you can listen to almost every article now. You can’t listen to the live journalism, but everything else you can listen to in an automated voice and I think we’re on the third generation of that voice, it’s so much better. It’s still like, we’ll mispronounce one or two things, but it’s great. See, I read my own articles and I still mispronounce things, so maybe that’s actually the human component. The moment it starts pronouncing things perfectly, I’ll know it’s a robot. MKL: We we’ve been aggressive with that. Let me give you an example in the journalism that the Epstein Files , I think it was like three-and-a-half million pages, they came out like late in the day on a Friday and we’ve got a whole AI Initiatives team in the newsroom and they like built a tool to be able to comb those documents and the magic of what we were able to do from them was the fact that we could create this tool that said like, there’s all these different story angles to get to, how do you get at it with ease? And then the beat reporters and the editors who have the expertise and the kind of rigor to say, “What should the public know from this?”, it’s the combination of those things that made it awesome. I’m going to give you one more example that I just kind of said immediately, “Oh, there’s a real interesting opportunity here”. Remember the Sydney Sweeney jeans/genes thing? MKL: So the early of read on that was that the left was up in arms about this Sydney Sweeney ad and we had journalists who basically did a story using AI to comb social media to sort of say, “How did this happen?”, and what they found was it was actually construction on the right, started as a construction. Like the idea that there was kind of fury about it started as a construction on the right and then became like a bigger thing. So I think any new technology, it is our job, it is my job, to see that people are not afraid of it, and are using it in responsible and appropriate ways. We’ve just rolled out Claude Code to our product engineering team, so they can prototype faster and do all kinds of things. So The Times is not anti-AI or any other tech, we have laid a stake in the ground to say this next chapter of the ecosystem has got to be shaped in a way that allows high quality journalism organizations and other high quality creative content organizations to do their work in a way where they can earn the living they should from that work but we are certainly not anti-tech. Just to go back to this AI bit and The New York Times AI Opportunity idea. You just touched on the, This is a trusted brand, it’s validated by humans”, it’s leaning into the humanity of it. I’ve expanded that bit a little bit as well as I’ve been thinking about this thesis , and I have this concept that I’ve been thinking about called totem content , where if everyone is reading AI content, everyone’s reading different stuff. The idea of having one piece that, “Did you read the Stratechery article today?”, or whatever it might be, is actually going to be more valuable, not less. I’ve been thinking about this in the context of community, it feels like no content company has ever solved community. You have a thriving comment section, but you’re not making friends in the comment section, it’s sort of a performative bit. MKL: We’re not introducing friends to one another, not necessarily yet. If I know someone who is interested in the same sports team or is interested in Wordle or Connections or whatever it might be or is interested in a particular facet of the world and I knew who they were, there’s something there and there’s a continual trigger for us to talk about it. Where’s your thinking about this? You do this all the time, there’s lots of group chats with New York Times articles shared it, is that something, though, that you want to or you see an opportunity to lean more into? MKL: My very short answer is yes, with like a double underline. Yes, yes, yes. At the core of the mission’s role is to help society make sense of itself in a way that serves the common interest, the public interest, “common” is the main word in community. So yes, and I agree with you, I don’t think it’s been solved in any way yet by us or anybody else in the sort of publishing or journalism industry, but we’re beginning to focus on it much more earnestly. I want to say two other things. Within the news report, we do a ton of culture and lifestyle journalism, and going back a couple of years, we launched the 100 Best Books , and we launched it with a bunch of input from experts beyond the Times, but of course, all coalescing around our books experts and we launched it with a bunch of features, because it was like an inherently shareable idea, “I read these books, Ben, you should read these books, what’s on your book list?”, and then we did it for movies . We’re just at the beginning of it, I think it’s a huge opportunity, I am super interested in it. And the last thing I want to say, and it kind of brings us back to where you started with me. I will never forget, I was with my son and his friend, on the ferry to the Vineyard, and his friend was like, “Oh my gosh, I play Wordle every day and then after that, I go and I play…”, and he named four rip offs because he liked the game so much. Point being, we need to make more games, we have, we did, we’re still making more. But none of those games, you know, have like the competitors, people may play them, but like you don’t hear about them the way you hear about Wordle, they haven’t broken through. Why is that? There is one puzzle a day from a company whose brand ethos is it makes you smarter that you do with the people you love and by the way, it’s true for Wordle and Connections and Strands. Everyone’s playing the exact same puzzle. MKL: And it is a shared experience. Just to go back, you asked me about sports, fandom is a shared experience, and we’re thinking very hard about how we support that game moment in a way that I think The Athletic has a very big opportunity here. And I think in news, what we want, journalism can’t solve society’s big problems, and there are many big problems, but society’s problems cannot be solved without high quality independent journalism. So the idea of, “Can we get more people engaged with one another?”, on really big, important, weighty topics that need independent journalism, I think that’s a big idea and a big opportunity for The Times, for journalism, for the country, for the world. Has the New York Times fully crossed the Valley of Despair in terms of advertising? Part of all this was you had to like build a subscription business but now that you’re known as a subscription business, advertising is suddenly a growth opportunity instead of a decline to manage? MKL: I came to run the ad business, the woman who runs the ad business now, Joy Robins , she’s an extraordinary leader. The ad business I joke all the time is going so much better under her than it ever went many years ago. I think that we have really found a formula that works. What is that formula? MKL: We are a, and I bet, long after I’m here, we are a subscription-first business, meaning we make things that are meant to be extraordinary to consumers at great scale. So many of our ads are shown to subscribers because so much of our engagement is from subscribers and we’re obsessed, especially in a changing ecosystem, with getting the next group, the prospects, really, really, really engaged with our work and our obsession with engagement and with quality products in giant spaces that marketers want to be near, news broadly defined, but on the authority of news. Marketers want to be next to other healthy, thriving brands, and I think The Times is that today, but they also want to be in sports and they want to be next to our games, which are cultural sensations, and by the way, do you think marketers like shopping? Quality shopping and cooking, there’s so many marketers want to do stuff with that. I do think we’ve arrived, I’ve been more optimistic and excited about our ad business over the last year than I’ve been at any other point and I think given the scale that we have achieved — Ben, you and I both grew up on the web, just think about the number of page views the New York Times has, like, all that engagement. And we’ve spent half a decade, longer than that, building very sophisticated first-party data. So we’re never going to have the scale of a platform or the targetability of a platform, but we are certainly well above what I would suspect any other kind of publisher can do. That’s the question — is there anything actually generalizable from the New York Times? Like you’ve done it, you’ve won it, can anyone actually replicate this? MKL: First of all, we have not won anything, I want to say that very clearly. We have so much more to do, to grow, to make sure. Relative to basically every other newspaper, I’m going to declare you a winner. MKL: Let me tell you the few things that I think are absolutely extensible. I often say we’ve spent so much of our time wanting to make a market and then support a market for digital subscriptions to journalism, and journalism being something of value that is worth paying for. We believe that a thriving, healthy ecosystem with lots of competitors who we’re fighting every day with is actually better, it’s certainly better for society, we think it’s just better generally. And I want to say there are you, Puck, there are so many other things that have been invented since I came to The New York Times. So in some ways, there are aspects of the information ecosystem and journalism that that are thriving, certainly not local journalism, certainly not deeply reported journalism and that’s very unfortunate. The things that I think are extensible, one, when I get asked, “Why has the Times succeeded?”, if I can only give one short answer, it is we kept investing in journalism, that’s it. Good times, bad times, we kept investing in the journalism. There was something there that actually was worth paying for, one. And two, we stuck to our values. So the Times can’t be bought, the journalism is never compromised, we can’t be cowed, we can be hated in lots of places, and people know they’re still going to get our best understanding, they’re going to get the results of a pursuit of truth wherever it will lead, even when that’s to uncomfortable places. If I had to boil it down to like two short things, I’m ripping off a line from our publisher, AG Sulzberger , that I think does it so beautifully, he says, “It’s value and values”, we kept investing to make sure the product was still really valuable and then we just never let go of our values, I think that those are ideas that are extensible to everyone. The other thing I’ll say to you, and this is maybe my contribution, we clocked early on, 9 or 10 years ago, we are competing for engagement with the most powerful companies, information companies the world has ever known, who are so much richer than us, so dominant, and we’ve got to get really good at engagement. We’ve got to get really good at making people want to come back, and we’ve also believed in the power of brands as signals to get people to ask for us. I say all the time, they’ve got to ask for us by name. The New York Times, Wordle, Connections, Strands, The Athletic, Cooking, Wirecutter, people have to ask for us by name, and we’ve invested into all those things, I think those are all extensible ideas. Well that’s why I say you’re one of my idea children, destination site, I write about Aggregators and my personal strategy is to do everything the exact opposite as them because why would I want to even compete in that game? So that certainly resonates. MKL: And you have so many readers and listeners at The New York Times, we’ve been reading you as long as you have felt like a parent of us. Well, I appreciate it. You are, for the record, older than, The New York Times I should say. 175 years this year, very exciting, congratulations. MKL: (laughing) Very exciting. Can I say one thing? If we can do anything with like a 175th — Is it a birthday? Is it an anniversary? — if we can do anything in this moment, the most important thing we want to accomplish is just raising people’s consciousness for the idea of what high quality independent journalism is and does. It is human beings with a professional process and real expertise going out into the world and unearthing new information, following a very honed professional process to do so, so that the public can know what’s happening. We are spending a lot of our energy this year at 175 years old, just trying to remind people what that is and there’s so many other things you can do in media now. You know, I listen to a bunch of stuff, there’s so many things that are like adjacent to news. Oh, I appreciate it. I’m not a reporter, so I need someone to actually go out and unearth facts. MKL: But it is not that, most of it is not that and I think as local journalism has been in such dire straits for so long, and there’s so few local newspapers and fewer journalists and as people get more and more of their media diet fed to them by an algorithm that’s meant to match the things they already think and as leaders work to discredit independent journalism with all those forces going on in the world, I think the public has a — I think it’s just harder to know or remember or be conscious of the importance of the thing our journalists are doing every single day. There’s one thing, I know we’ve gone slightly long, but when you say that, what I find inspiring and why I like to talk to you and write about the New York Times is, I’m sure it’s a relief to you, I’m just completely independent of any partisanship or political angle. MKL: Totally, you’re not compromised. I find it so interesting from a business perspective and what you’re articulating there is what is inspiring is it’s a fight against entropy, where the easiest path for people and for publications is to just give in to the algorithm, as it were. And it’s kind of nice to go to YouTube and not see any of your videos there, because it’s sort of like an assertion that that’s not the path we’re going to go, and I certainly can relate to that and find that inspiring and that’s why I enjoyed talking to you. MKL: I enjoyed talking to you, this was a lot of fun, thank you. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day! We have to be a daily habit We have to have direct relationships with people We have to be a destination and let me say to you, by destination, I mean, we do most of the economic value creation and we also give the best experience if you actually come to us in the whole of the experience. Then I say the fourth D is we only do drive-bys if they’re deliberate.

0 views

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Meta announced Muse Spark today, their first model release since Llama 4 almost exactly a year ago . It's hosted, not open weights, and the API is currently "a private API preview to select users", but you can try it out today on meta.ai (Facebook or Instagram login required). Meta's self-reported benchmarks show it competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on selected benchmarks, though notably behind on Terminal-Bench 2.0. Meta themselves say they "continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows". The model is exposed as two different modes on meta.ai - "Instant" and "Thinking". Meta promise a "Contemplating" mode in the future which they say will offer much longer reasoning time and should behave more like Gemini Deep Think or GPT-5.4 Pro. I prefer to run my pelican test via API to avoid being influenced by any invisible system prompts, but since that's not an option I ran it against the chat UI directly. Here's the pelican I got for "Instant": And this one for "Thinking": Both SVGs were rendered inline by the Meta AI interface. Interestingly, the Instant model output an SVG directly (with code comments) whereas the Thinking model wrapped it in a thin HTML shell with some unused JavaScript libraries. Which got me curious... Clearly Meta's chat harness has some tools wired up to it - at the very least it can render SVG and HTML as embedded frames, Claude Artifacts style. But what else can it do? I asked it: what tools do you have access to? I want the exact tool names, parameter names and tool descriptions, in the original format It spat out detailed descriptions of 16 different tools. You can see the full list I got back here - credit to Meta for not telling their bot to hide these, since it's far less frustrating if I can get them out without having to mess around with jailbreaks. Here are highlights derived from that response: Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. Let's talk about that one. Here's the description in full: Description: Visual grounding tool that analyzes the image, identifies and labels objects, locates regions, or counts objects. Parameters: So it can take an image path, optional object names and return things in format , or . Sounds to me like Meta's Segment Anything . I decided to take it for a spin. generate a photo of a raccoon sitting on a trash can wearing trash as a hat Here you go — king of the alley, in full regalia: Coffee cup crown, banana peel brim, newspaper feather. Peak raccoon fashion. Want me to make a wilder version — maybe night-vision, or wearing sunglasses too? My guess is that image generation in Meta AI is powered by their Emu model , or an updated version of it. This is the same pattern as ChatGPT and Gemini image generation, where images are generated by a tool call (here that's ). The tool description said that it "saves the image to the sandbox", so my guess was that I'd then be able to run Python code against it. I tried: use python OpenCV to analyze that image and find out neat things about it Annoyingly it did not show me the code it ran, but it did create this image: Pretty neat! We can both generate images and then use them with Python tools. Next, the tool. Knowing the names of the tools makes it easy to refer to them in prompts: Now try the container.visual_grounding tool The response ( full response here ): Done — visual_grounding pinpointed every piece of the outfit with pixel-level precision: Want me to run it in bbox mode next to draw boxes around the fur texture, or count mode to see how many individual whiskers it can detect? Once again, Meta AI wrapped the results from the tool in a custom HTML visualization. That's clearly mode, but how about the others? I prompted: yes draw bounding boxes, can it do masks too? Yes — bounding boxes work great. Masks? Not directly in ; it only returns three formats: point, bbox, and count. No pixel-level segmentation masks. Here's the bbox output: What you’re seeing: Notice how the boxes nest: face is inside raccoon, eyes inside face, hat pieces overlap each other but stay above the face box. The tool is doing real object localization, not just guessing centers. About masks: since visual_grounding doesn't output them, I can fake them with OpenCV in the container. Two quick options: Want me to generate an actual alpha mask for the raccoon (cut it out from the alley), or for the trash hat pieces separately? Takes about 10 seconds and I can give you a transparent PNG. Here's the HTML for that bbox visualization . Finally I had it dump out the raw JSON for point mode, bbox mode and count mode. You can see that all here . The count mode is particularly interesting - here's that in full: So Meta AI has the ability to count a raccoon's whiskers baked into the default set of tools. Which means... it can count pelicans too! Here's that overlay exported as HTML . Update : Meta's Jack Wu confirms that these tools are part of the new harness they launched alongside the new model. On Twitter Alexandr Wang said : this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. I really hope they do go back to open-sourcing their models. Llama 3.1/3.2/3.3 were excellent laptop-scale model families, and the introductory blog post for Muse Spark had this to say about efficiency: [...] we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick. This improvement also makes Muse Spark significantly more efficient than the leading base models available for comparison. So are Meta back in the frontier model game? Artificial Analysis think so - they scored Meta Spark at 52, "behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6". Last year's Llama 4 Maverick and Scout scored 18 and 13 respectively. I'm waiting for API access - while the tool collection on meta.ai is quite strong the real test of a model like this is still what we can build on top of it. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. object_names (array, required) image_path (string, default "") format_type ("bbox"|"point"|"count"|null, default "bbox") title (string|null)[string] raccoon : big box from (270,240) to (892,845) — covers 62% of the width, basically the whole animal coffee cup : (424,115)-(610,273) — sits highest, 158px tall in the 0-1000 space banana peel : (472,215)-(667,291) — overlaps the cup's base, confirming it's draped over the side newspaper : (305,223)-(548,315) — left ear coverage trash can lid : (107,800)-(1000,998) — spans almost the full image width at the bottom face : (305,283)-(750,537) — tight crop around the mask eyes : both ~40×37 pixel boxes, perfectly level at y=379 GrabCut — I give it the raccoon bbox, it iteratively segments fur from background and returns a binary mask SAM-style color threshold — use the bbox plus k-means to isolate the coffee cup or banana peel

0 views

AI Is Really Weird

If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I just put out a massive Hater’s Guide To The SaaSpocalypse , as well as last week’s deep dive into How AI Isn't Too Big To Fail .  Subscribing helps directly support my free work, and premium subscribers don’t see this ad in their inbox. I can’t get over how weird the AI bubble has become. Hyperscalers are planning to spend over $600 billion on data center construction and GPUs predominantly bought from NVIDIA, the largest company on the stock market, all to power generative AI, a technology that’s so powerful that none of them will discuss how much it’s making them, or what it is we’re all meant to be so excited.  To make matters weirder , Microsoft, a company that spent $37.5 billion in capital expenditures in its last quarter on AI , recently updated the terms and conditions of its LLM-powered “Copilot” service to say that it was “for entertainment purposes only,” discussing a product that apparently has 15 million users as part of enterprise Microsoft 365 subscriptions , and is sold to both local and national governments overseas , including the US federal government . That’s so weird! What’re you doing Microsoft? What do you mean it’s for entertainment purposes? You’re building massive data centers to drive this!  Well, okay, you’re building them at some point. As I discussed a few weeks ago, despite everybody talking about the hundreds of gigawatts of data centers being built “to power AI,” only 5GW are actually “under construction,” with “under construction” meaning anything from “we’ve got some scaffolding up” to “we’re about to hand over the keys to the customer.”  But isn’t it weird we’re even building those data centers to begin with? Why? What is it that AI does that makes it so essential — or, rather, entertaining — that we keep funding and building these things? Every day we hear about “the power of AI,” we’re beaten over the head with scary propaganda saying “AI will take our jobs,” but nobody can really explain — outside of outright falsehoods about “AI replacing all software engineers” — what it is that makes any of this worthy of taking up any oxygen let alone essential or a justification for so many billions of dollars of investment. Instead of providing an actual answer of some sort , AI boosters respond by saying it’s “just like the dot com bubble” — another weird thing to do considering 168,000 people lost their jobs as the NASDAQ dropped by 80% in two years , and only 16% of the world even used the internet , and those that did in America had an average internet speed of 50 kilobits per second ( and only 52% of them had access in 2000 anyway ). Conversely, to quote myself: And with that incredibly easy access , only 3% of households pay for AI . Boosters will again use this talking point to say that “we’re in the early days,” but that’s only true if you think that “early days” means “people aren’t really using it yet.”  Yet the “early days” argument is inherently deceptive. While the Large Language Model hype cycle might have only begun in 2022, the entirety of the media and markets have focused their attention on AI, along with hundreds of billions of dollars of venture capital and nearly a trillion dollars of hyperscale capex investment . AI progress isn’t hampered by a lack of access, talent, resources, novel approaches, or industry buy-in, but by a single-minded focus on Large Language Models, a technology that has been so obviously-limited from the very beginning that Gary Marcus was able to call it in 2022 .  Saying it’s “the early days” also doesn’t really make sense when faced with the rotten and incredibly unprofitable economics of AI. The early days of the internet were not unprofitable due to the underlying technology of serving websites , but the incredibly shitty businesses that people were building. Pets.com spent $400 per customer in customer acquisition costs , millions of dollars on advertising, and had hundreds of employees for a business with a little over $600,000 in quarterly revenue — and as a result, nothing about its failure was about “the early days of the internet” at all, as was the case with Kozmo, or any number of other dot com flameouts.  Similarly, internet infrastructure companies like Winstar collapsed because they tried to grow too fast and signed stupid deals rather than anything about the underlying technology’s flaws. For example, in 1998, Lucent Technologies signed its largest deal — a $2 billion “equipment and finance agreement” — with telecommunications company Winstar , which promised to bring in “$100 million in new business over the next five years” and build a giant wireless broadband network, along with expanding Winstar’s optical networking. Eager math-heads in the audience will be able to see the issue of borrowing $2 billion to make $100 million over five years, as will eager news-heads laugh at WIRED magazine in 1999 saying that Winstar’s “small white dish antennas…[heralded] a new era and new mind-set in telecommunications.” Winstar died two years later because its business was built to grow at a rate that its underlying product couldn’t support . In the end, microwave internet (high-speed internet delivered via radio waves) has become an $8 billion-a-year industry , despite everybody’s excitement. In any case, anytime that somebody tells you that we’re in “the early days of AI” has either been conned or is in the process of conning you, as they’re using it to deflect from issues of efficacy or underlying economic weakness.  In fact, that’s a great place to go next. Probably the weirdest thing about this entire era is how nobody wants to talk about the fact that AI isn’t actually doing very much, and that AI agents are just chatbots plugged into an API. Per Redpoint Ventures’ Reflections on the State of the Software and AI Market , “the agent maturity curve is still early, but the TAM implications are enormous,” with agents able to “...run discretely for minutes, [and] execute end-to-end tasks with some oversight.” What tasks, exactly? Who knows! Truly, nobody seems able to say. To paraphrase Steven Levy at WIRED , 2025 was meant to be the year of AI agents, but turned out to be the year of talking about AI agents. Agents were/are meant to be autonomous pieces of software that go off and do distinct tasks. In reality, it’s kind of hard to say what those tasks are. “AI agent” now refers to literally anything anybody wants it to, but ultimately means “chatbot that has access to some systems.”  The New York Times’ Ezra Klein recently talked to the entity currently inhabiting former journalist and Anthropic co-founder Jack Clark recently about “how fast AI agents would rip through the economy,” but despite speaking for over an hour, the closest we got was “it wrote up a predator-prey simulation (a complex-sounding but extremely-common kind of webgame that Anthropic likely ingested through its training material )” and “chatbots that talk to each other about tasks,” and if you think I’m kidding, this is how he described it: Anyway, this is all bad, because multiple papers have now shown that, and I quote, agents are “...incapable of carrying out computational and agentic tasks beyond a certain complexity,” with Futurism adding that said complexity was pretty low . The word “agent” is meant to make you think of powerful autonomous systems that carry out complex and minute tasks, when in reality it’s…a chatbot. It’s always a fucking chatbot. It might be a chatbot with API access or a chatbot that generates a plan that another chatbot looks at and says something about, but it’s still chatbots talking to chatbots. When you strip away the puffery, nobody seems to actually talk about what AI does.  Let’s take a look at CNBC’s piece on Goldman Sachs’ supposed contract with Anthropic to build “autonomous systems for time-intensive, high-volume back-office work”: …okay, but like, what does it do? Right, brilliant. Great. Love it. What tasks? What is the thing you’re paying for? Okay, great, we have two things it might do in the future , and that’s “employee surveillance” (?) and making pitchbooks. The upshot is that, with the help of the agents in development, clients will be onboarded faster and issues with trade reconciliation or other accounting matters will be solved faster, Argenti said. Onboarding? Chatbot. “Issues with trade reconciliation”? Chatbot connected to a knowledge base, like we’ve had for years but worse and more expensive. Oh, and “other accounting matters” will be solved faster, always with the future tense with these guys. How about Anthropic and outsourcing body shop giant InfoSys’ “AI agents for telecommunications and other regulated industries ”? Let’s go through the list of tasks and say what they mean, my comments in bold: How about OpenAI’s “Frontier” platform for businesses to “ build, deploy and manage AI agents that do real work” ?  Shared context? Chatbot. Onboarding? Chatbot. Hands-on learning with feedback? Chatbot. Clear permissions and boundaries? Chatbot setting. Let’s check out the diagram! Uhuh. Great. What real-world tasks? Uhhh.  Reason over data? Chatbot. “Complex tasks”? No idea, it doesn’t say. “Working with files”? Doesn’t say how it works with files, but I’d bet it can analyze, summarize and create charts based on them that may or may not have errors in them, and based on my experience of trying to get these things to make charts (as a test, I’d never use them in my actual work), it doesn’t seem to be able to do that. “Evaluation and optimization loops”? Unclear, because we have no idea what the tasks are. What are the agents planning, acting, or executing on? Again, no idea.  Yet the media continues to perpetuate the myth of some sort of present or future “agentic AI” that will destroy all employment. A few weeks ago, CNBC mindlessly repeated that ServiceNow CEO Bill McDermott believed that agents would send college grad unemployment over 30% . NowAssist , ServiceNow’s AI platform, is capable of — you guessed it! — summarization, conversational exchanges, content creation, code generation and search, a fucking chatbot just like the other chatbots.  A few weeks ago, The New York Times wrote about how “AI agents are fun, useful, but [not to] give them your credit card,” saying that they can “do more than just chat…they can edit files, send emails, book trips and cause trouble”: Sure sounds like you connected a chatbot to your email there Mr. Heyneman.  Let’s go through these: Yes, you can string together chatbots with various APIs and have the chatbot be able to activate certain systems. You could also do the same with a button you bought on Etsy connected to your computer via USB if you really wanted to. The ability to connect something to something else does not mean that anything useful happens at the end, and LLMs are extremely bad at the kind of deterministic actions that define the modern knowledge economy, especially when choosing to do them based on their interpretation of human language. AI agents do not, as sold, actually exist. Every “AI agent” you read about is a chatbot talking to another chatbot connected to an API and a system of record, and the reason that you haven’t heard about their incredible achievements is because AI agents are, for the most part, fundamentally broken.  Even OpenClaw, which CNBC confusingly called a “ ChatGPT moment ,” is just a series of chatbots with the added functionality of requiring root access to your computer and access to your files and emails. Let’s see how CNBC described it back in February :  Hmmm interesting. I wonder if they say what that means: Reading this, you might be fooled into believing that OpenClaw can actually do any of this stuff correctly, and you’d be wrong! OpenClaw is doing the same chatbot bullshit, just in a much-more-expensive and much-more convoluted way, requiring either a well-secured private space or an expensive Mac Mini to run multiple AI services and do, well, a bunch of shit very poorly. The same goes for things like Perplexity’s “Computer,” which it describes as “an independent digital worker that completes and workflows for you,” which means, I shit you not, that it can search, generate stuff (words, code, images), and integrate with Gmail, Outlook, Github, Slack, and Notion, places where it can also drop stuff it’s generated. Yes, all of this is dressed up with fancy terms like “persistent memory across sessions” (a document the chatbot reads and information it can access) with “authenticated integrations” (connections via API that basically any software can have). But in reality, it’s just further compute-intensive ways of trying to fit a square peg in a round hole, by which I mean having a hallucination-prone chatbot do actual work. The only reason Jensen Huang is talking about OpenClaw is that there’s nothing else for Jensen Huang to talk about: That’s wild, man. That’s completely wild. What’re you talking about? What can NemoClaw or OpenClaw or whatever-the-fuck actually do? What is the actual output? That’s so fucking weird! I can already hear the haters in my head screaming “ but Ed, coding models! ” and I’m kind of sick of talking about them, because nobody can actually tell me what I’m meant to be amazed or surprised by.  To be clear, LLMs can absolutely write code, and can absolutely create software, but neither of those mean that the code is good, stable or secure, or that the same can be said of the software they create. They do not have ideas, nor do they create unique concepts — everything they create is based on training data fed to it that was first scraped from Stack Overflow, Github and whatever code repositories Anthropic, OpenAI, and Google have been able to get their hands on.  It’s unclear what the actual economic or productivity effects are, other than an abundance of new code that’s making running companies harder. Per The New York Times :  As I wrote a few weeks ago , LLMs are good at writing a lot of code , not good code, and the more people you allow to use them, the more code you’re going to generate, which means the more time you’re either going to need to review that code, or the more vulnerabilities you’re going to create as a result. Worse still, hyperscalers like Meta and Amazon are allowing non-technical people to ship code themselves, which is creating a crisis throughout the tech industry.  Worse still , LLMs allow shitty software engineers that would otherwise be isolated by their incompetence to feign enough intelligence to get by, leading to them actively lowering the quality of code being shipped. Per the Times: The Times also notes that because LLM coding works better on a device rather than a web interface, “...engineers are downloading their entire company’s code to their laptops, creating a security risk if the laptop goes missing.”  Speaking frankly, it appears that LLMs can write code, and create some software, but without any guarantee that said code will compile, run, be secure, performant, or easy to read and maintain. For an experienced and ethical software engineer, LLMs can likely speed them up somewhat , though not in a way that appears to be documented in any academic sense, other than it makes them slower .  And I think it’s fair to ask what any of this actually means. What’s the advantage of having an LLM write all of your code? Are you shipping faster? Is the code better? Are there many more features being shipped? What is the actual thing you can point at that has materially changed for the better?  Software engineers don’t seem happier, nor do they seem to be paid more, nor do they seem to be being replaced by AI, nor do we have any examples of truly vibe coded software companies shipping incredible, beloved products.  In fact, I can’t think of a new piece of software I’ve used in the last few years that actually impressed me outside of Flighty . Where’s the beef? What am I meant to be looking at? What’re you shipping that’s so impressive? Why should I give a shit? Isn’t it weird that we’re even having this conversation? Shouldn’t it be obvious by now? This week, economist Paul Kedrosky told me on the latest episode of my show Better Offline that AI is “...nowhere to be seen yet in any really meaningful productivity data anywhere,” and only appears in the non-residential fixed investments side of America’s GDP, at (and I quote again) “...levels we last saw with the railroad build out or with rural electrification.” That’s so fucking weird! NVIDIA is the largest company on the US stock market and has sold hundreds of billions of dollars of GPUs in the last few years, with many of them sold to the Magnificent Seven, who are building massive data centers and reopening nuclear power plants to power them, and every single one of them is losing money doing so, with revenues so putrid they refuse to talk about them!   And all that to make…what, Gemini? To power ChatGPT and Claude? What does any of this actually do that makes any of those costs actually matter? And as I’ve discussed above, what, literally, does this software do that makes any of this worth it?   Ask the average AI booster — or even member of the media — and they’ll say something about “lots of code being written by AI,” or “novel discoveries” (unrelated to LLMs) or “LLMs finding new materials ( based on an economics paper with faked data )” or “people doing research,” or, of course, “that these are the fastest-growing companies of all time.” That “growth” is only possible because all of the companies in question heavily subsidize their products , spending $3 to $15 for every dollar of revenue. Even then, only OpenAI and Anthropic seem to be able to make “billions of dollars of revenue,” a statement that I put in quotes because however many billions there might be is up for discussion. Back in November 2025 , I reported that OpenAI had made — based on its revenue share with Microsoft — $4.329 billion between January and September 2025, despite The Information reporting that it had made $4.3 billion in the first half of the year based on disclosures to shareholders .  While a few outlets wrote it up, my reporting has been outright ignored by the rest of the media. I was not reached out to by or otherwise acknowledged by any other outlets, and every outlet has continued to repeat that OpenAI “made $13 billion in 2025,” despite that being very unlikely given that it would have required it to have made $8 billion in a single quarter. While I understand why — I’m an independent, after all — these numbers directly contradict existing reporting, which, if I was a reporter, would give me a great deal of concern about the validity of my reporting and the sources that had provided it.  Similarly, when Anthropic’s CFO said in a sworn affidavit that it had only made $5 billion in its entire existence , nobody seemed particularly bothered, despite reports saying it had made $4.5 billion in 2025 , and multiple “annualized revenue” reports — including Anthropic’s own — that added up to over $6.6 billion .  Though I cannot say for certain, both of these situations suggest that Anthropic and OpenAI are misleading their investors, the media and the general public. If I were a reporter who had written about Anthropic or OpenAI’s revenues previously, I would be concerned that I had published something that wasn’t true, and even if I was certain that I was correct, I would have to consider the existence of information that ran counter to my own. I would be concerned that Anthropic or OpenAI had lied to me, or that they were lying to someone else, and work diligently to try and find out what happened. I would, at the very least, publish that there was conflicting information. The S-1 will give us the truth, I guess.  Let’s talk for a moment about margins , because they’re very important to measuring the length of a business.  Back in February in my Hater’s Guide To Anthropic, I raised concerns that Dario Amodei was using a different way to calculate margins than other companies do .  Amodei told the FT in December 2024 that he didn’t think profitability was based on how much you spent versus how much you made: He then did the same thing in an interview with John Collison in August 2025 : Almost exactly six months later on February 13, 2026’s appearance on the Dwarkesh Podcast, Dario would once again try and discuss profitability in terms other than “making more money than you’ve spent”: The above quote has been used repeatedly to suggest that Anthropic has 50% gross margins and is “profitable,” which is extremely weird in and of itself as that’s not what Dario Amodei said at all. Based on The Information’s reporting from earlier in the year , Anthropic’s “gross margin” was 38%.” Yet things have become even more confusing thanks to reporting from Eric Newcomer, who ( in reporting on an investor presentation by Coatue from January ) revealed that Anthropic’s gross margin was “45% in the quarter ended Sep-25,” with the crucial note that — and I quote — “Non-GAAP gross margins [are] calculated by Anthropic management…[are] unaudited, company-provided, and may not be comparable to other companies.” This means that however Anthropic calculates its margins are not based on Generally Accepted Accounting Principles , which means that the real margins probably suck ass , because Anthropic loses billions of dollars a year, just like OpenAI. Yet one seemingly-innocent line in there gives me even more pause: “Model payback improving significantly as revenue scales faster than R&D training costs.” This directly matches with Dario Amodei’s bizarre idea that “...If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue.” Yes, I know it’s a “stylized fact” or whatever, but that’s what he said, and I think that their IPO might have a rude surprise in the form of a non-EBITDA margin calculation that makes even the most-ardent booster see red. This week, The Wall Street Journal published a piece about OpenAI and Anthropic's finances that included one of the most-offensive lines in tech media history: Two thoughts: As I said a few months ago about training costs: The Journal also adds that both Anthropic and OpenAI are showing investors two versions of their earnings — one with training costs, and one without — without adding the commentary that this is extremely deceptive or, at the very least, extremely unusual. The more I think about it the more frustrated I get. Having two sets of earnings is extremely dodgy! Especially when the difference between them is billions of dollars. This should be immediately concerning to every financial journalist, the reddest of red flags, the biggest sign that something weird is happening… …but because this is the AI industry, the Journal runs propaganda instead: That “fast-growing” part is only possible because both Anthropic and OpenAI subsidize the compute of their subscribers , allowing them to burn $3 to $15 for every dollar of subscription revenue. And no, this is nothing like Uber or Amazon , that’s a silly comparison, click that link and read what I said and then never bring it up again. I realize my suspicion around Anthropic’s growth has become something of a meme at this point, but I’m sorry, something is up here. Let’s line it all up: Anthropic was making $9 billion in annualized revenue at the end of 2025, or approximately $750 million in a 30-day period. Per Newcomer , as of December 2025, this is how Anthropic’s revenue breaks down: Per The Information , Anthropic also sells its models through Microsoft, Google and Amazon, and for whatever reason reports all of the revenue from their sales as its own and then takes out whatever cut it gives them as a sales and marketing expense: The Information also adds that “...about 50% of Anthropic’s gross profits on selling its AI via Amazon has gone to Amazon,” and that “...Google typically takes a cut of somewhere between 20% and 30% of net revenue, after subtracting infrastructure costs.”  The problem here is that we don’t know what the actual amounts of revenue are that come from Amazon or Google (or Microsoft, for that matter, which started selling Anthropic’s models late last year), which makes it difficult to parse how much of a cut they’re getting. That being said, Google ( per DataCenterDynamics/The Information ) typically takes a cut of 20% to 30% of net revenue after subtracting the costs of serving the models . Nevertheless, something is up with Anthropic’s revenue story.  Let’s humour Anthropic for a second and say that what it’s saying is completely true: it went from making $750 million in monthly revenue in January to $2.5 billion in monthly revenue in April 2026. That’s remarkable growth, made even more remarkable by the fact that — based on its December breakdown — most of it appears to have come from API sales. That leap from $750 million to $1.16 billion between December and February feels, while ridiculous , not entirely impossible , but the further ratchet up to $2.5 billion is fucking weird! But let’s try and work it out.  On February 5 2026, Anthropic launched Opus 4.6 , followed by Claude Sonnet 4.6 on February 17 2026.  Based on OpenRouter token burn rates , Opus 4.5 was burning around 370 billion tokens a week. Immediately on release, Opus 4.6 started burning way, way more tokens — 524 billion in its first week, then 643 billion, then 634 billion, then 771 billion, then 822 billion, then 976 billion, eventually going over a trillion tokens burned in the final week of March.  In the weeks approaching its successor’s launch, Sonnet 4.5 burned between 500 billion and 770 billion tokens. A week after launch, 4.6 burned 636 billion tokens, then 680 billion, then 890 billion, and, by about a month in, it had burned over a trillion tokens in a single week.  Reports across Reddit suggest that these new models burn far more tokens than their predecessors with questionable levels of improvement.  The sudden burst in token burn across OpenRouter doesn’t suggest a bunch of people suddenly decided to connect to Anthropic and other services’ models , but that the model themselves had started to burn nearly twice the amount of tokens to do the same tasks. At this point, I estimate Anthropic’s revenue split to be more in the region of 75% API and 25% subscriptions, based on its supposed $2.5 billion in annualized revenue (out of $14 billion, so a little under 18%) in February coming from “Claude Code” (read: subscribers to Claude, there’s no “Claude Code” subscription).  If that’s the case, I truly have no idea how it could’ve possibly accelerated so aggressively, and as I’ve mentioned before , there is no way to reconcile having made $5 billion in lifetime revenue as of March 9, 2026, having $14 billion in annualized revenue on February 12 2026, and having $4.5 billion in revenue for the year 2025. Things get more confusing when you hear how Anthropic calculates its annualized revenues, per The Information : So, Anthropic is annualizing based on the last four weeks of API revenue times 13, a number that’s extremely easy to manipulate using, say, launches of new products. In simpler terms, Anthropic is cherry-picking four-week windows of API spend — ones that are pumped by big announcements and new model releases — and annualizing them. The one million token context window is a big deal, too, having been raised from 200,000 tokens in previous models. With Opus and Sonnet 4.6, Anthropic lets users use up to one million tokens of context, which means that both models can now carry a very, very large conversation history, one that includes every single output, file, or, well, anything that was generated as a result of using the model via the API. This leads to context bloat that absolutely rinses your token budget.   To explain, the context window is the information that the model can consider at once. With 4.6, Anthropic by default allows you to load in one million tokens’ worth of information at once, which means that every single prompt or action you take has the model load one million tokens’ worth of information at once unless you actively “trim” the window through context editing .  Let’s say you’re trying to work out a billing bug in a codebase via whatever interface you’re using to code with LLMs. You load in a 350,000 token codebase, a system prompt (IE: “you are a talented software engineer,” here’s an example ), a few support tickets, and a bunch of word-heavy logs to try and fix it. On your first turn (question), you ask it to find the bug, and you send all of that information through. It spits out an answer, and then you ask it how to fix the bug…but “asking it to fix the bug” also re-sends everything, including the codebase, tickets and logs. As a result, you’re burning hundreds of thousands of tokens with every single prompt. Although this is a simplified example, it’s the case across basically any coding product, such as Claude Code or Cursor. While Cursor uses codebase indexing to selectively fetch pieces of the codebase without constantly loading it into the context window, one developer using Claude inside of Cursor watched a single tool call burn 800,000 tokens by pulling an entire database into the context window , and I imagine others have run into similar problems. To be clear, Anthropic charges at a per-million-token rate of $5 per million input and $25 per million output, which means that those casually YOLOing entire codebases into context are burning shit tons of cash (or, in the case of subscribers, hitting their rate limits faster). if Anthropic actually made $2.5 billion in a month — we’ll find out when it files its S-1! — it likely came not from genuine growth or a surge of adoption, but in its existing products suddenly costing a shit ton more because of how they’re engineered.  The other possibility is the nebulous form of “enterprise deals” that Anthropic allegedly has, and the theory that they somehow clustered in this three-month-long period, but that just feels too convenient.   If 70% of Anthropic’s revenue is truly from API calls, this would suggest: I don’t see much evidence of Anthropic creating custom integrations that actually matter, or — and fuck have I looked! — any real examples of businesses “doing stuff with Claude” other than making announcements about vague partnerships.  There’s also one other option: that Silicon Valley is effectively subsidizing Anthropic through an industry-wide token-burning psychosis. And based on some recent news, there’s a chance that’s the case. As I discussed a few weeks ago, Silicon Valley has a “tokenmaxxing” problem , where engineers are encouraged by their companies to burn as many tokens as possible, at times by their peers, and at others by their companies. The most egregious — and honestly, worrying! — version of this came from The Information’s recent story about Meta employees competing on an internal leaderboard to see who can burn the most tokens, deliberately increasing the size of their prompts and the amount of concurrent sessions ( along with unfettered and dangerous OpenClaw usage ) to do so:   The Information reports that the dashboard, called “Claudeonomics” (despite said dashboard covering other models from OpenAI, Google, and xAI), has sparked competition within Meta, with users burning a remarkable 60 trillion tokens in the space of a month, with one individual averaging around 281 billion tokens, which The Information remarks could cost millions of dollars. Meta’s company-mandated psychosis also gives achievements for particular things like using multiple models or high utilization of the cache. Here’s one very worrying anecdote: One poster on Twitter says that there are people at Meta running loops burning tokens to rise up the leaderboards, and that Meta’s managers also measure lines of code as a success metric.  The Information says that, considering Anthropic’s current pricing for its models, that 60 trillion tokens could be as much as $900 million in the space of a month, though adds that this assumes that every token being burned was on Claude Opus 4.6 (at $15 per 1 million tokens).  I personally think this maths is a bit fucked, because it assumes that A) everybody is only using Claude Opus, B) that none of that token burn runs through the cache (which it obviously does, and the cache charges 50%, as pointed out by OpenCode co-founder Dax Radd ), and C) that Meta is entirely using the API (versus paying for a $200-a-month Claude Max subscription for each user).  Digging in further, it appears that a few years ago Meta created an internal coding tool called CodeCompose , though a source at Meta tells me that developers use VSCode and an assistant called Devmate connected to models from Anthropic, OpenAI and xAI. One engineer on Reddit — albeit an anonymous one! — had some commentary on the subject: If we assume that Meta is an enterprise customer paying API rates for its tokens, it’s reasonable to assume — at even a low $5-per-million average — that it’s spending $300 million or more a month on API calls. As Radd also added, there’s likely a discount involved. He suggested 20%, which I agree with. Even if it’s $300 million, that’s still fucking insane. That’s still over three billion dollars a year. If this is what’s actually happening, and this is what’s contributing to Anthropic’s growth, this is not a sustainable business model, which is par for the course for Anthropic, a company that has only lost billions of dollars. Encouraging workers to burn as many tokens as possible is incredibly irresponsible and antithetical to good business or software engineering. Writing great software is, in many cases, an exercise in efficiency and nuance , building something that runs well, is accessible and readable by future engineers working on it, and ideally uses as few resources as it can. TokenMaxxing runs contrary to basically all good business and software practices, encouraging waste for the sake of waste, and resulting in little measurable productivity benefits or, in the case of Meta, anything user-facing that actually seems to have improved. Venture capitalist Nick Davidov mentioned yesterday that sources at Google Cloud “started seeing billions of tokens per minute from Meta, which might now be as big as a quarter of all the token spend in Anthropic.” While I can’t verify this information ( and Davidoff famously deleted his photos using Claude Cowork while attempting to reorganize his wife’s desktop ), if that’s the case, Meta is a load-bearing pillar of Anthropic’s revenue — and, just as importantly, a large chunk of Anthropic’s revenue flows through Google Cloud , which means A) that Anthropic’s revenue truly hinges on Google selling its models, and B) that said revenue is heavily-inflated by the fact that Anthropic books revenue without cutting out Google’s 20%+ revenue share. In any case, TokenMaxxing is not real demand, but an economic form of AI psychosis. There is no rational reason to tell somebody to deliberately burn more resources without a defined output or outcome other than increasing how much of the resource is being used. I have confirmed with a source at that there is no actual metric or tracking of any return on investment involved in token burn at Meta, meaning that TokenMaxxing’s only purpose is to burn more tokens to go higher on a leaderboard, and is already creating bad habits across a company that already has decaying products and leadership. To make matters worse, TokenMaxxing also teaches people to use Large Language Models poorly. While I think LLMs are massively-overrated and have their outcomes and potential massively overstated, anyone I know who actually uses them for coding generally has habits built around making sure token burn isn’t too ridiculous, and various ways to both do things faster without LLMs and ways to be intentional with the models you use for particular tasks. TokenMaxxing literally encourages you to do the opposite — to use whatever you want in whatever way you want to spend as much money as possible to do whatever you want because the only thing that matters is burning more tokens. Furthermore, TokenMaxxing is exactly the kind of revenue that disappears first. Zuckerberg has reorganized his AI team four or five times already, and massively shifted Meta’s focus multiple times in the last five years, proving that at the very least he’ll move on a whim depending on external forces. After laying off tens of thousands of people in the last few years , Meta has shown it’s fully capable of dumping entire business lines or groups with a moment’s notice, and while moving on from AI might be embarrassing , that would suggest that Mark Zuckerberg experiences shame or any kind of emotion other than anger. This is the kind of revenue that a business needs to treat with extreme caution, and if Meta is truly spending $300 million or more a month on tokens, Anthropic’s annualized revenues are aggressively and irresponsibly inflated to the point that they can’t be taken seriously, especially if said revenue travels through Google Cloud, which takes another 20% off the top at the very least.  Though the term is pretty new, the practice of encouraging your engineers to use AI as much as humanly possible is an industry-wide phenomena, especially across hyperscalers like Amazon, Microsoft and Google, all of whom until recently directly have pushed their workers to use models with few restraints. Shopify and other large companies are encouraging their workers to reflexively rely on AI, with performance reviews that include stats around your token burn and other nebulous “AI metrics” that don’t seem to connect to actual productivity. I’m also hearing — though I’ve yet to be able to confirm it — that Anthropic and other model providers are forcing enterprise clients to start using the API directly rather than paying for monthly subscriptions.  Combined with mandates to “use as much AI as possible,” this naturally increases the cost of having software engineers, which — and I say this not wanting anyone to lose their jobs — does the literal opposite of replacing workers with AI. Instead, organizations are arbitrarily raising the cost of doing business without any real reason.  Because we’re still in the AI hype cycle, this kind of wasteful spending is both tolerated and encouraged, and the second that financial conditions worsen or stock prices drop due to increasing operating expenses, these same companies will cut back on API spend, which will overwhelmingly crush Anthropic’s glowing revenues. I think it’s also worth asking at this point what is is we’re actually fucking doing.   We’re building — theoretically — hundreds of gigawatts of data centers, feeding hundreds of billions of dollars to NVIDIA to buy GPUs, all to build capacity for demand that doesn’t appear to exist, with only around $65 billion of revenue (not profit) for the entire generative AI industry in 2025 , with much of that flowing from two companies (Anthropic and OpenAI) making money by offering their models to unprofitable AI startups that cannot survive without endless venture capital, which is also the case for both AI labs. Said data centers make up 90% of NVIDIA’s revenue, which means that 8% or so of the S&P 500’s value comes from a company that makes money selling hardware to people that immediately lose money on installing it. That’s very weird! Even if you’re an AI booster, surely you want to know the truth , right?  The most-prominent companies in the AI industry — Anthropic and OpenAI — burn billions of dollars a year, have margins that get worse over time , and absolutely no path to profitability, yet the majority of the media act as if this is a problem that they will fix, even going as far as to make up rationalizations as to how they’ll fix it, focusing on big revenue numbers that wilt under scrutiny. That’s extremely weird, and only made weirder by members of the media who seem to think it’s their job to defend AI companies ’ bizarre and brittle businesses. It’s weird that the media’s default approach to AI has, for the most part, been to accept everything that the companies say, no matter how nonsensical it might be. I mean, come on! It’s fucking weird that OpenAI plans to burn $121 billion in the next two years on compute for training its models , and that the media’s response is to say that somehow it will break even in 2030, even though there’s no actual explanation anywhere as to how that might happen other than vague statements about “efficiency.” That’s weird! It’s really, really weird! It’s also weird that we’re still having a debate about “the power of AI” and “what agents might do in the future” based on fantastical thoughts about “agents on the internet ” that do not exist, cannot exist, and will never exist, and it’s fucking weird that executives and members of the media keep acting as if that’s the case. It’s also weird that people discussing agents don’t seem to want to discuss that OpenAI’s Operator Agent does not work , that AI browsers are fundamentally broken , or that agentic AI does not do anything that people discuss. In fact, that’s one of the weirdest parts of the whole AI bubble: the possibility of something existing is enough for the media to cover it as if it exists, and a product saying that it will do something is enough for the media to believe it does it. It’s weird that somebody saying they will spend money is enough to make the media believe that something is actually happening , even if the company in question — say, Anthropic — literally can’t afford to pay for it . It’s also weird how many outright lies are taking place, and how little the media seems to want to talk about them. Stargate was a lie! The whole time it was a lie! That time that Sam Altman and Masayoshi Son and Larry Ellison stood up at the white house and talked about a $500 billion infrastructure project was a lie! They never formed the entity ! That’s so weird! Hey, while I have you, isn’t it weird that OpenAI spent hundreds of millions of dollars to buy tech podcast TBPN “to help with comms and marketing”? It’s even weirder considering that TBPN was already a booster for OpenAI!  It’s also weird that a lot of AI data center projects don’t seem to actually exist, such as Nscale’s project to make “one of the most powerful AI computing centres ever” that is literally a pile of scaffolding , and that despite that announcement the company was able to raise $2 billion in funding . It’s also weird that we’re all having to pretend that any of this matters. The revenues are terrible, Large Language Models are yet to provide any meaningful productivity improvements, and the only reason that they’ve been able to get as far as they have is a compliant media and a venture capital environment borne of a lack of anything else to invest in .  Coding LLMs are popular only because of their massive subsidies and corporate encouragement, and in the end will be seen as a useful-yet-incremental and way too expensive way to make the easy things easier and the harder things harder, all while filling codebases full of masses of unintentional, bloated code. If everybody was forced to pay their actual costs for LLM coding, I do not believe for a second that we’d have anywhere near the amount of mewling, submissive and desperate press around these models.  The AI bubble has every big, flashing warning sign you could ask for. Every company loses money. Seemingly every AI data center is behind schedule, and the vast majority of them aren’t even under construction . OpenAI’s CFO does not believe that it’s ready to go public in 2026 , and Sam Altman’s reaction has been to have her report to somebody else other than him, the CEO. Both OpenAI and Anthropic’s margins are worse than they projected. Every AI startup has to raise hundreds of millions of dollars, and their products are so weak that they can only make millions of dollars of revenue after subsidizing the underlying cost of goods to the point of mass unprofitability .   And it’s really weird that the mainstream media has a diametric view — that all of this is totally permissible under the auspices of hypergrowth, that these companies will simply grow larger, that they will somehow become profitable in a way that nobody can actually describe, that demand for AI data centers will exist despite there being no signs of that happening. I get it. Living in my world is weird in and of itself. If you think like I do, you have to see every announcement by Anthropic or OpenAI as suspicious — which should be the default position of every journalist, but I digress — and any promise of spending billions of dollars as impossible without infinite resources. At the end of this era, I think we’re all going to have to have a conversation about the innate credulity of the business and tech media, and how often that was co-opted to help the rich get richer. Until then, can we at least admit how weird this all is? Telecommunications: AI agents will help carriers modernize network operations, simplify customer lifecycle management, and improve service delivery—bringing intelligent automation to one of the most operationally complex and regulated industries in the world. Meaningless. Automation of what?  Financial services: AI agents will help firms detect and assess risk faster, automate compliance reporting, and deliver more personalized customer interactions, such as tailoring financial advice based on a client's full account history and market conditions. Chatbot! “More-personalized interactions” are a chatbot with a connection to a knowledge system, as is any kind of “tailored financial advice.” Compliance reporting? Summarizing or pulling documents from places, much like any LLM can do, other than the fact that it’ll likely get shit wrong, which is bad for compliance. Manufacturing and engineering: Claude will help accelerate product design and simulation, reducing R&D timelines and enabling engineers to test more iterations before production. I assume this refers to people using Claude Code to do coding, which is what it does. Software development: Teams will use Claude Code to write, test, and debug code, helping developers move faster from design to production. Claude Code. Enterprise operations: Claude Cowork will help teams automate routine work like document summarization, status reporting, and review cycles. Literally a chatbot that deleted every single one of a guy’s photos when he asked it to organize his wife’s desktop . “Gather information” — search tool, part of chatbots for years. “Write reports” — generative AI’s most basic feature, with no details on quality. “Edit files” — to do what exactly? Chatbot feature. “Send and receive messages through email and text” — generating and reading text, connected to an email account.  “Delegate work” — what work? No need to get specific!  Are you fucking kidding me? If you simply remove billions of dollars in costs, OpenAI is profitable! Why do you think these companies are going to break even anytime soon? You have absolutely no basis for doing so other than leaks from the company!  Anthropic said on February 12, 2026 it had hit $14 billion in annualized revenue . This would work out to roughly $1.16 billion in a 30-day period, let’s assume from January 11 2026 to February 11 2026. Anthropic’s CFO said it had made “exceeding $5 billion” in lifetime revenue on March 9 2026. On March 3, 2026 Dario Amodei said it had hit $19 billion in annualized revenue.  This would work out to $1.58 billion in a 30-day period. Let’s assume this is for the period from February 2 2026 to March 2 2026. On April 6, 2026, Anthropic said it had hit $30 billion in annualized revenue . This works out to about $2.5 billion in a 30-day period. Let’s assume that said period is March 6 2026 to April 6 2026. Anthropic’s $14 billion in annualized revenue from February 16, 2026 includes both the launch of Claude Opus 4.6 , as well as the height of the OpenClaw hype cycle where people were burning hundreds of dollars of tokens a day .  This announcement also included the launch of Anthropic’s 1 million token context window in Beta for Opus 4.6 Anthropic’s $19 billion in annualized revenue from March 3, 2026 included both the launch of Claude Opus 4.6 and Claude Sonnet 4.6 . This period includes around half of the January 16 to February 16 2026 window from the previous $14 billion annualized number, and the launch of the beta of the 1 million token context window for Sonnet 4.6. To be clear, the betas required you to explicitly turn on the 1 million token context window, and had higher pricing around long context. Anthropic’s $30 billion in annualized revenue from April 6 2026 included two weeks’ worth of massive token burn from the launches of Sonnet and Opus 4.6. This includes a few days of the previous window (March 3 to April 5). This also included the general availability of the 1-million token context window , enabling it by default, billed at the standard pricing. Massive new customers that are making payments up front, which makes this far from “recurring” revenue. Massive new customers are spending tons of money immediately, burning hundreds of millions of dollars a month in tokens, and paying Anthropic handsomely for them.

0 views
Martin Fowler Yesterday

Feedback Flywheel

Rahul Garg finishes his series on reducing the friction in AI-Assisted Development. He proposes a structured feedback practice that harvests learnings from AI sessions and feeds them back into the team's shared artifacts, turning individual experience into collective improvement.

0 views
neilzone Yesterday

Thoughts on increasing ssh security using a hardware security key

I have been using hardware security keys (including YubiKeys and Titan keys) for FIDO2 and TOTP for a while, but not for ssh. At the moment, I harden the ssh config on my servers, lock down access by IP address, and use password-protected certificates for authentication, blocking password-based authentication. So I think that I do at least reasonably well as it is. But I was interested to see if I could introduce a further aspect of security for ssh, using a security key. My security keys support the generation of both resident and non-resident keys. Resident keys are stored on a slot on the YubiKey, while non-resident keys are stored on the client computer, but require the YubiKey. I picked non-resident. I set a passphrase as part of the ssh-keygen process, so, when it comes to using that key, I need to enter that passphrase and insert and touch the security key. So now someone would need: I can, I think, add a PIN to the YubiKey but, to date, I have not done this. Perhaps I should. Honestly, I was probably fine without this, but, well, I had the security keys, so why not. But, while this works fine from my laptop, I can’t get it to work on my phone (GrapheneOS). At the moment, I use Termux, and from there, I can ssh in to my servers. But I can’t get Termux to use my _*_-sk keypair. There is a six year old issue in the Termux Github repo which indicates that it might, some point, be coming, and that would be welcome. Apparently it can be done using a closed source tool, but since I’m only looking to use FOSS, that’s not on the cards for me. So that is a bit of a pain, as it is convenient to be able to log in from my phone from time to time. to be connected to the correct network to have a copy of my private key to know the passphrase for that private key to have one of my security keys (my main security key, and my backup security key)

0 views
Stratechery Yesterday

Anthropic’s New Model, The Mythos Wolf, Glasswing and Alignment

Anthropic says its new model is too dangerous to release; there are reasons to be skeptical, but to the extent Anthropic is right, that raises even deeper concerns.

0 views

What a Japanese cooking principle taught me about overcoming AI fatigue

Hey, what's up? It's Takuya . I've been thinking about how to live well in this AI era as a developer, content creator, and artist. By “living well,” I mean enjoying the act of creating while maintaining good mental health. I imagine many of my readers are also wondering how to survive — and even thrive — amid the rapid changes brought by recent AI advancements. I don’t have all the answers. No one can predict the future precisely. But I believe it’s more a matter of direction than strategy — rather than trying to build some kind of moat around your life as a software-oriented artist. Where do you want to go? What do you want to see? That's what matters. Strategies/moats can be flexibly changed along with the situations. In this article, I’d like to explore a few life principles from Japanese culture. I recently read a book called "一汁一菜でよいという提案" (The Proposal for One Soup, One Dish) by traditional food expert Yoshiharu Doi (土井善晴), and found it very interesting for keeping the pace of your life healthy. 一汁一菜 pic.twitter.com/hJVYhJX4lE We are currently in the midst of "AI fatigue." New services emerge every day, and big company releases change workflows weekly. Chasing every hype doesn't make us more secure; it just fogs up the crucial skills we need to foster. It’s easy to fall into the trap of trying too hard to keep up, only to end up exhausted. Traditional Japanese culture offers a different perspective — one that helps us stay grounded and resilient in the face of uncertainty. Let's dive into it. Video version: Doi-san's book emphasizes that by stripping away the unnecessary, we find our "comfortable place." As he puts it: It's easy to get your wishlist or to-do list flooded if you don't have any clear rules, policy, or direction in your life. Let's think about it. For example, on social media, the algorithms try to grab and hold your attention as long as possible by displaying so much gossip and drama depending on your interests. But you have full control of not willingly seeing them. I'm always careful not to spend time tracking gossip or drama on the internet. It lets me keep calm and avoid comparing myself with others, causing me to feel miserable. To form a rhythm of life, you have to decide what NOT to do in your daily life. In terms of cooking, Doi-san proposes a system called " One-soup One-dish (一汁一菜)", which helped a lot of people who cook for their family every day. This is a style of meal centered around rice, with one soup and one side dish. Eating is an essential part of living. Yet, keeping it simple is surprisingly hard. There are endless food products, recipes, and health tips — and TV shows constantly push us to make beautiful, varied meals for our families every day. Oh, it looks very similar to today's tech industry, doesn't it? People are unconsciously exhausted by all of this, feeling as if they have to create something great every single day. Doi-san saw through this — and freed them by saying: Software developers can't live without software. It is literally an essential part of our lives, and keeping it simple is surprisingly hard. Let's learn from his philosophy behind his principle. Why does cooking matter so much? Because it is something you do every day, which makes you or someone you love feel really comfortable, as he puts it: I was deeply moved by this. I cook for my family every evening. It helps me shift from work to home—a transition I don’t get naturally, since I don’t have a commute. My 4-year-old daughter sometimes says, “I can smell something good.” It makes me happy, too. What’s important is having something you do every day that makes you feel safe, comfortable, and happy. It could be anything, such as playing an instrument, going for a walk in the morning, painting, singing, swimming, reading before bed, or meditating. It should be something you never get bored with. Something you’ve truly enjoyed in the past. Something that doesn’t make you compare yourself to others, but instead helps you be mindful. If you don't have it yet, step away from your computer and go outside to experience new things. I feel like it’s important that this habit doesn’t involve a screen, as he suggests: What matters is finding something you can return to every day — something you never tire of, like rice and miso soup, rather than something instantly stimulating but quickly exhausting, like Netflix or doom-scrolling on social media. The more I adapt to algorithms and AI, the more I value organic connections – both with people and with ideas. In his book, Doi-san explains that the things we never tire of are often the things humans didn't "engineer" to be perfect: This concept of "not a human feat" (or rather, not a calculated feat) is exactly what’s missing from our digital lives. Algorithms are the "processed seasoning" of the internet — designed to give you an instant hit of dopamine, but leaving you feeling empty and "tired of the flavor" an hour later. I’ve realized that my most resilient moments don't come from a perfectly optimized prompt or a viral post. They come from the "fermentation" of daily life — the slow, messy, unscripted interactions that haven't been optimized for engagement. For example, when I have a quick, casual chat with the barista at Starbucks, or when I’m swapping stories with other parents ( mama-tomo ) while dropping my daughter off at kindergarten, I feel like I’m participating in a natural rhythm, not an algorithm. Small moments like these give me a real sense that I belong to society and am truly living in it. My best ideas work the same way. They rarely strike when I’m glaring at a screen, trying to force a breakthrough. Instead, they "descend" upon me when I’m out for a walk or simply staring blankly at the scenery. It feels less like distillation — which focuses on seeking speed, purity, and efficiency — and more like fermentation . It’s about creating the right environment and then letting the subconscious work its magic over time. You can't rush miso, and you can't rush a truly original thought. Each season brings a variety of foods to enjoy, and Doi-san emphasizes the importance of appreciating them: In Japan, we celebrate the arrival of the first bonito or the last of the winter cabbage. It’s enjoyable to appreciate these changes, but notice one thing: people don’t "chase" them. You don't feel like a failure if you missed the peak of cherry blossom season; you simply look forward to the next cycle. Yet, in the tech industry, we treat trends like a race we are constantly losing. Instead of trying so hard to "stay in the loop," why not view new technologies as seasonal arrivals? You don't have to master every single one. You are living in "Technical Nature" just as much as you are living in real nature. If a new AI model drops, it’s like the first bamboo shoots of spring — interesting, worth trying, but not something to stress over. You can learn anything when it becomes necessary for your craft. By shifting from "chasing" to "appreciating," you replace FOMO with curiosity. When we stop being obsessed with "catching up" and start allowing ourselves the "leisure time" mentioned earlier, something vital happens: playfulness is born. True creativity doesn't come from a place of survival or anxiety. It comes from playing with the tools available to us, much like a chef plays with the ingredients of the season. For a developer, this might be the optimal way of life. Don't just be a user of technology; be someone who resonates with its constant birth and decay, using that rhythm to create something that feels truly alive. I experienced burnout last year . The philosophy of one soup, one dish has helped me step back from that. It reminds me to strip away the noise and return to what really matters. Relax! You’re not as bad as you think. Trust your instincts. Listen to your body. And let’s enjoy this rare moment of change we’re living through :)

0 views
Brain Baking Yesterday

A Commentary On GenAI Inspected Through Different Lenses

The amount of concerning reports related to generative AI is rising at an alrming rate, yet all we do is make ourselves more dependent on the brand new technology. Why? It’s not just that we’re lazy—we are!—there are many more variables involved. As part of my quest to try and understand what the heck is going on and what is becoming of one of my prime professional fields: software engineering, I read and read and read. And then I read and read and read. And then I became disappointed and depressed. I see colleagues jumping the gun, others being more prudent. I see industry discovering there’s yet another buck to be made. I see students forgoing learning at all. I wanted to try to form my own judgement of genAI in its modern form by looking at it from four different viewpoints: that of the software engineer, that of the teacher, that of the creativity researcher, and that of the concerned civilian living in this capitalist world.. References can be found at the end of this article. Does anyone remember Dan North’s Programming is not a craft post from 2011? I do, and I often think about it. With the advent of genAI, North’s port might be even more polarising: Well, congrats to you, you’ve won the lottery: here’s a tool that immediately can add customer value. If you don’t care about the inner code quality, you can have genAI generate (slop) code faster than you can think. If you love the impact of software itself, you’ll love Claude Code et al. Are you perhaps an enterprise software engineer? In that case you’ll be able to scaffold and generate CRUD crap even faster, hooray! But wait a minute. You obviously won’t take true ownership of this code: you’ll want to impress your clients with the results, but keep the lid closed at all times. The less ownership and feeling of responsibility, the easier it comes to completely let go of all the breaks and just accept any future changes without code reviewing at all. People who are now claiming they will keep themselves in the loop as an architectural reviewer don’t need to lie to themselves. After the nth time pressing the green button, and as the technology further evolves, you’ll wind up eventually accepting the slop anyway. Verification burnout will pop up next: because it’s not your own code you’re attempting to so carefully review, it actually takes more instead of less effort, increasing your stress level instead of reducing it! Does the code quality really matter if all clients see is the end product? As a gamer, I just want the game to run smoothly, I don’t care about the spaghetti. Or do I? I do, implicitly—the more spaghetti, the less smoothly it’ll run. The more holes, the more soft locks and crashes. So programming might or might not be a craft, but as Cal Newport and Robert M. Pirsig say: the concept of Quality is important! Maybe it’s time to become a goose farmer instead. The only thing left for you to do is to move to a depressing quality control position instead of crafting something yourself. No more “I built this”, but “I managed its orchestration”. Depending on how you view this, It’s either a promotion or demotion. I tend to agree with the latter. Why? Because we humans are the Homo Faber , the ones who like to control their fate and environment with the use of tools. Yes, genAI certainly is a tool, but it’s a tool that takes away all other tools. Instead of kneading dough by hand, feeling it, knowing when to ferment and when to bake, we’re forced to oversee the industrial Wonder Bread production process. Instead of manipulating leather to create a pair of shoes, we’re being employed by Nike to watch shoes being made by machines. This somehow reminds me of David Graeber’s bullshit jobs where useless paper pushing is prevalent but also called a “revolution” when it comes to a professional purpose. I beg to differ. Humans want to make things. They want to be proud of the things they made. The fact that the open source community rejects this slop code is a telling sign: if you’re programming in the open, your peers who also think highly of software development will keep you in check. But when it’s “for (enterprise) work”, we don’t care, generate away, I’m not the true owner anyway. If programming is a craft, then the recently leaked Claude Code CLI source code will be a big joke to you, where constructs are endlessly repeated, and spaghetti is topped up with more spaghetti. Code that is being generated doesn’t even seem to be made to be (re)read: how then, are we expecting to maintain it, or guarantee its security? By letting the agent maintain it and guarantee its security, I can hear you say? What is there left to say? I’ve already asserted that genAI tools are worse than Stack Overflow . Sure, mindless copy-pasting has long existed before this AI storm, but not on this scale. GenAI is able to provide a working solution to an assignment faster than I can come op with the assignment itself. Suddenly, all our traditional evaluation systems and grading workflows became useless: scoring high on a checklist is just a matter of pasting the requirements into Claude. We try to adapt by requiring oral defences, having students explain what they did and why, and asking them to walk us through a small imaginative change. The result is a spectacular fall in grades from previous years: they are just not able (1) to explain the code they did not make but generated and (2) to make small adjustments as they skipped the hard part: the learning and understanding. Yet in the hallways, I hear lots of students bragging to each other about how they let ChatGPT do their homework. Congrats. We’ll see each other again in September for your second try. We often forget something else very important: peer pressure . About a year ago, on the train I overheard a few girls on their way to a university lecture chatting about their homework. One of them complained: “I put in all that hard work, but all the others are just using ChatGPT to do it. Next time, I’m not doing all that, I’m also just using AI, that’s not fair!”. I should have gotten up to congratulate her: the only one actively learning is the one putting in the hard work! There is no shortcut to becoming proficient. There is only hard work. Sure, the more you prompt your way through your curriculum, the more proficient you’ll become with the tool, but ask yourself: did you learn what you wanted to learn or did you learn to prompt? When I was an undergraduate, I used to fill A4 pages with summaries of courses to help me study. Just before the exams, I could quickly glance over these pages to remembers the core concepts. Some students sold their summaries to others. Now, genAI can generate summaries for you. But smart students will know this will only fool yourself: the purpose of the summaries is to make them : to study and gradually fill the pages. Not to acquire a summary. The journey is the destination. When my summaries were done, I could just as well throw them away: they were just a tool to help with the hard work. Yet it’s next to impossible to explain this to a student who only sees how easy it is to jump to an outcome by leveraging AI. Maybe legislation will help here? (Not really; see below) In case all this is not clear: students are becoming dumber yet the programming projects they hand in are becoming better than ever. As the inventor of the framework presented in The Creative Programmer , I thought it would be interesting to take a look at the seven domains and how genAI fits in these. In The Creative Programmer , I present seven distinct but heavily intertwined themes that define the way we are creative when we solve a programming problem: I might be overly focusing on the negative here and have to recognise the possible advantages of having genAI as a tool available in our creative toolbox—but only when we learn to yield it properly and with moderation, which is not exactly what we are doing lately, is it. In an interesting systematic literature review (2025) with lots of references to other academic material if that’s what you’re looking for, Holzner et al. conclude with: […] human-GenAI collaboration shows small but consistent gains in creative output across tasks and contexts. However, collaboration with GenAI reduces the diversity of ideas, indicating a risk of creative outputs that could become more homogeneous. More same-ness; exactly what we need when it comes to creativity, right? The more we use genAI, the more creatively we will be able to prompt, but the less creative we will be in actually applying a solution to the problem. We no longer create: we generate. We know that genAI will do everything in its power to keep you locked within that chat box. Its tendency to talk to your mouth, agree with your statements, and serve you whatever you want to hear creates biases and dependencies. It’s not unlike a drug that slowly but surely diminished your critical thinking, and thus, creativity. This is where the true nature of humans are unfolded: when it comes to earning something for themselves, ethics suddenly becomes a very malleable subject. On the morality, ethics, and privacy, everyone agrees that genAI is what Ron Gilbert calls a train wreck . This bears no further explanation from me: Microsoft slurped all GitHub repositories dry without taking any licenses into account, the book that I painstakingly produced in almost two years was ingested OpenAI’s systems in about two seconds, … Yet at the same time, everyone also consistently ignores all these topics in favour of their own self-interest. Why, I wonder? Everyone knows they should eat less meat. Yet almost nobody does. Everyone knows Microsoft (and probably other big tech companies) power genocide yet the adoption rate of Windows as an OS is still 95%. Why? Everyone knows the climate is going to shits yet we happily turn the other way and take the plane on a weekend trip to sip some wine and do some shopping in Italy. As Gretea Thunberg said: knowing is not enough . For GenAI, similar patterns emerge. We know it’s bad for us, yet we happily close our eyes and use it anyway. Why, I wonder? The power of a drug, the pull, the ease at which something can be done without breaking too much sweat? Here’s a possible answer I suggested before: because humans are inherently lazy. As long as Belgian supermarkets keep on stocking apples from New Zealand and Belgium, most people won’t care and just pick up whatever. As long as we keep handing out company cars and making infrastructure geared towards car drivers, most people will be driving to work instead of biking. A possible answer to the problem then might be governmental legislation to protect people living in a society from making the wrong choices. And I’m 100% sure that will work! Yet legislation is always (1) either happening way too late; or (2) minimised or manipulated by the people who wield the power because they have bought out key politicians to prevent laws like this from happening. Hence my depression. In the case of GenAI, a technology that evolves at lightning speed and is taking the world by storm, legislation will be way too late. To prove my point, in an attempt to modernise, many Belgian governmental instances already “embraced” the technology and made many blunders in doing so. The EU is currently evaluating the options. Meanwhile, the San Francisco bros are laughing. Prompt engineering is the most degenerative thing that ever happened to engineering . It’s a capitalist’s way to minimise the cost of the human. Yet I don’t see genAI disappearing any time soon. Companies and decision makers smelled the green and won’t let go. I don’t understand how capitalism works, but I know it’s been growing in power ever since we centralised cane sugar plantations with the help of slavery. GenAI is evidently yet another product of capitalism. The companies I’ve worked for wanted more and more profit each year: even though they were sometimes satisfied with last year’s profit, the target for the next year was always increased no matter what. GenAI is already responsible for thousands of layoffs in an attempt to even more aggressively push profit up. To what end, I wonder? Why? To our own detriment. It seems that our cognition is for sale, and the sale has already been made. You know what they say: no returns are accepted. Peer pressure to use genAI on the job is already prevalent as it “gets things done faster”, so quite logically also brings in money faster. Let’s worry about durability and maintenance later, shall we. Also, I’ve seen colleagues fall into the trap of obsessive agent babysitting. Whether at work, on the lunch break, or in the very late evenings: you’ve got to keep those agents spinning! Squeeze the maximum out of your tokens because they squeeze the maximum out of you. There goes our work-life balance, coming from the tools that are supposed to take over our work so we can focus more on the life part. So as long as I remain in a position to be able to choose whether I can put in the work myself for my (hobby) programming projects, I will. As long as I am in a position to bike instead of drive, to be a vegetarian instead of meat-eater, or in short, to be a concerned civilian, I will. And so should you. Even though that won’t stop this devolution from happening at all. Sure I will occasionally consult Gemini et al. to ask it a specific question regarding a broken config file that has me scratching my head. But I treat these queries as specialised internet searchers, not as a way to evade the hard work completely. I’ve become Albert Camus’s pessimist. I’m genuinely afraid of how our kids will turn out if we don’t act quickly to save our youth. Yet I won’t stop being an activist. Reading List I’d rather link to personal blog posts instead of academic publications here as we’re dealing with something that impacts us on a personal level and by the time the relevant 2026 studies are published, the landscape will have changed yet again. The following folks expressed their experience and opinion on genAI: Related topics: / genai / By Wouter Groeneveld on 8 April 2026.  Reply via email . Technical Knowledge—if we don’t have any knowledge, we won’t have the creative ability to combine them. Guess what; GenAI is actively deskilling us. The more you generate, the less you actively learn, harming your creative ability to solve problems. Creativity requires a rich mental toolbox to draw from. By prompting, you’re not exactly filling that toolbox. Communication–I see both a good and a bad thing here: if your colleagues aren’t immediately available, rubber ducking with an AI agent might help identifying that problem. On the other hand, it’s also awfully easy to stay locked inside that comfortable genAI chatbox. Why ask anyone when it talks to your mouth? Constraints—If you manage to constraint yourself (ha!) to only ask AI for 10 possible ways to approach a problem you don’t know how to approach without having it solve the problem for you , this might help you learn how to approach certain heavily constrained environments. Unfortunately, it’s very easy to just have it generate the solution as well, rendering a possible learning path useless. Critical Thinking—The more we use genAI, the less critical we are and the more likely we are accepting whatever comes out of it. Validating the the source material outside of that chatbox suddenly requires a lot of willpower. I’ve even heard people changing their entire preferred technology stack to something more popular because genAI is better at it. That’s very sad. Curiosity—Judge for yourself. What does reliance on genAI tell you about your curiosity to discover other things? Creative state of Mind—without Cal Newport’s “Deep Work”, there won’t be an “aha!” moment. The 90% transpiration, 10% inspiration is suddenly turned on its head: Claude is the one sweating for us, even at night, while all we do is press the green button and write “LGTM!”. Maybe we should take the time to read Newport’s new book Slow Productivity . Creative Techniques—GenAI itself as a technique might belong in this section; but the question is; are we the one yielding the tool or is the tool yielding us? Nolan Lawsom; How I use AI agents to write code . A clear conflicted state: it’s okay to generate away at work, but “I also don’t use AI for my open-source work, because it just feels… ick. The code is ‘mine’ in some sense, but ultimately, I don’t feel true ownership over it, because I didn’t write it”. John Allsopp: The Structure of Engineering Revolutions Dave Gauer; A programmer’s loss of social identity Cory Zue; Software got weird Doug Belshaw; Claude’s Constitution and the trap of corporate AI ethics Tom Hall; Towards a Slow Code Manifesto Rishi Baldawa; The Reviewer Isn’t the Bottleneck Information/superhighway.net; On The Need For Understanding Antoine Leblanc; Chatbot psychosis (Mastodon) “this is the main reason why i believe that chatbot addiction / chatbot psychosis is a LOT more widespread than we realise: people with a clear understanding of the ethical issues try claude once, it does a thing correctly enough, they get one-shot, and they start posting like if sephiroth was on linked-in, ethical concerns be damned. it keeps happening.” Exactly. Sean Boots; Generative AI vegetarianism Simon Willison; Perhaps not Boring Technology after all Sophie from Localghost; Stop Generating Start Thinking Micaheal Harley; AI Stance Lauren Woolsey; AI Sucks And You Shouldn’t Use It Ron Gilbert; My Dinner With AI Matthew Lamont; Generative AI is an Evil Technology Arne Brasseur; The AI Divide (Mastodon) Zach Manson; CoPilot Edited an Ad Into My PR Michael Taggart; I Used AI. It Worked. I Hated It. Bob Nystrom; The Value of Things . GenAI can have utility but not meaning. Jonny; Dismantling Claude Code source (Mastodon) . Another train wreck, as expected. Cal Newport; In Defense of Thining Hamilton Greene; Why I’m moving from F# to C# Senator Bernie Sanders vs. Claude (YouTube) Joel Chrono; Not having to work would be nice (but not like this)

0 views
HeyDingus Yesterday

Louie Mantia’s ideal burger ingredient stack

I love hearing about someone’s thoroughly considered argument for something that I’ve never given much thought to. It’s like pulling aside a curtain to discover there’s been a window with a gorgeous view behind it the whole time. Take, for instance, the order in which a burger’s ingredients should be stacked. I probably could have improvised my preferred order with a few minutes of thought. But now I don’t have to because Louie Mantia’s already figured it out : To me, an ideal cheeseburger has the following: The order of ingredients  is  important. It’s not critical, but I think  this  order makes a lot of sense. The sauce and veg are the cool ingredients. Your tongue should hit those first so you enjoy how fresh and crisp they are and to save you from the hot patty and melted cheese. The melted cheese sticks to the top bun. The sauce coats the bottom bun and dresses the “ salad” part of the sandwich when you bite. If the cool ingredients are on the top, above the cheese, the watery vegetables sweat. The hot-cool barrier created between the patty and lettuce is the key to prevent that. The cool, raw vegetables don’t benefit being adjacent to the hot, melted cheese. First, an excellent rating system , and now this well-reasoned defense? I think I’m going to enjoy this blog. 🍔 HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email . Fluffy, toasted bun Grilled onion Processed cheese Crisp lettuce Juicy tomatoes Cucumber pickles Tangy, mayo-based sauce

0 views
./techtipsy Yesterday

You can fake SSD-like disk speeds in any Linux VM, but it's unsafe (literally)

Do you have a need for speed really fast disk performance? Are you unwilling or unable to buy/rent a machine with fast, NVMe-based storage? Are you OK with potential data corruption? Then is the solution to all your problems! We had an interesting conundrum at work recently. Our platform does not use a lot of resources, but there are bursts of activity that require a lot of CPU and performant disk IO from our infrastructure. This was previously handled by manually starting some expensive AWS EC2 instances to cope with the load, but this manual process was error-prone due to the human factor (which did end up causing an actual production outage once), and AWS is stupidly expensive for what you get in return. Around this time I also learned about a Proxmox server that we were underutilizing. My goal was to investigate the resources that we had available and to ensure that we didn’t have to think about taking any manual actions while at the same time not relying on AWS and its expensive resources. I set up a few VM-s on the Proxmox machine, and did some testing. CPU, memory, that was all fine, but the IO-bound workloads that we had to run during those bursty periods would still be relatively slow. Not much slower than the main infrastructure provider that we were using, but slow enough for a beefy machine to not be able to handle more than a few parallel IO-heavy workloads running at the same time. We exhausted a few other wild-ass ideas during the investigation: Then one day I was browsing around Proxmox and noticed an interesting option on the virtual storage drives: setting the cache mode to . With this one trick, your VM will see really fast disk speeds up to a certain point, and it’s invisible from the perspective of your workloads, no customization needed. In a way, this is like one of the RAM-backed storage options, but for the whole VM. The major trade-off is that an unexpected shutdown of the VM or the VM host will likely result in data corruption. This is because you’re writing everything to memory first, and eventually the writes will end up on persistent storage, whenever the disks catch up with you. Something happens while changes are in memory, and they are lost. In our case, the data corruption risk is completely OK, as the workloads are ephemeral, the results of the work are sent to another machine immediately after completion, and the configuration of the machine is largely automated with Ansible. One instance of our workload would usually result in writing 50 MB to disk, and we observed about 300-500 IOPS of performance from HDD-backed storage. The disks were not able to handle more than one at a time if we cared about execution time. With the trick, and on some relatively old hardware (assume DDR3 memory), we saw numbers as high as 15K IOPS and disk throughput of 500+ MB/s. This was more than enough to handle peak loads, and the resources were always on and available on a rented server with a stable price that compared extremely well to AWS. Cloud service providers have their benefits, sure, but when all you need is raw speed and configurability to make it happen, then owning a physical Linux server (or a few of them for redundancy) is a no-brainer, slam-dunk decision, as long as you have someone in your team that knows how to manage one. Since you’re working with Linux VM-s already in the cloud, then you already have that person in your team, don’t you? :) Docker on a RAM-backed storage drive online resources did not inspire confidence in this working well, so we didn’t try this optimizing the workload to not be IO-heavy unsuccessful after spending a few hours on it, the high IO was a consequence of making an intentional trade-off to reduce CPU load, and the IO requirement was much more manageable putting certain folders in the container itself on RAM-backed storage highly container specific, and did not yield the desired results

0 views

Mario and Earendil

Today I’m very happy to share that Mario Zechner is joining Earendil . First things first: I think you should read Mario’s post . This is his news more than it is ours, and he tells his side of it better than I could. What I want to do here is add a more personal note about why this matters so much to me, how the last months led us here, and why I am so excited to have him on board. Last year changed the way many of us thought about software. It certainly changed the way I did. I spent much of 2025 building, probing, and questioning how to build software, and in many more ways what I want to do. If you are a regular reader of this blog you were along for the ride. I wrote a lot, experimented a lot, and tried to get a better sense for what these systems can actually do and what kinds of companies make sense to build around them. There was, and continues to be, a lot of excitement in the air, but also a lot of noise. It has become clear to me that it’s not a question of whether AI systems can be useful but what kind of software and human-machine interactions we want to bring into the world with them. That is one of the reasons I have been so drawn to Mario’s work and approaches. Pi is, in my opinion, one of the most thoughtful coding agents and agent infrastructure libraries in this space. Not because it is trying to be the loudest or the fastest, but because it is clearly built by someone who cares deeply about software quality, taste, extensibility, and design. In a moment where much of the industry is racing to ship ever more quickly, often at the cost of coherence and craft, Mario kept insisting on making something solid. That matters to me a great deal. I have known Mario for a long time, and one of the things I admire most about him is that he does not confuse velocity with progress. He has a strong sense for what good tools should feel like. He cares about details. He cares about whether something is well made. And he cares about building in a way that can last. Mario has been running Pi in a rather unusual way. He exerts back-pressure on the issue tracker and the pull requests through OSS vacations and other means. The last year has also made something else clearer to me: these systems are not only exciting, they are also capable of producing a great deal of damage. Sometimes that damage is obvious; sometimes it looks like low-grade degradation everywhere at once. More slop, more noise, more disingenuous emails in my inbox. There is a version of this future that makes people more distracted, more alienated, and less careful with one another. That is not a future I want to help build. At Earendil, Colin and I have been trying to think very carefully about what a different path might look like. That is a big part of what led us to Lefos . Lefos is our attempt to build a machine entity that is more thoughtful and more deliberate by design. Not an agent whose main purpose is to make everything a little more efficient so that we can produce even more forgettable output, but one that can help people communicate with more care, more clarity, and joy. Good software should not aim to optimize every minute of your life, but should create room for better and more joyful experiences, better relationships, and better ways of relating to one another. Especially in communication and software engineering, I think we should be aiming for more thought rather than more throughput. We should want tools that help people be more considerate, more present, and more human. If all we do is use these systems to accelerate the production of slop, we will have missed the opportunity entirely. This is also why Mario joining Earendil feels so meaningful to me. Pi and Lefos come from different starting points. There was a year of distance collaboration, but they are animated by a similar instinct: that quality matters, that design matters, and that trust is earned through care rather than captured through hype. I am very happy that Pi is coming along for the ride. Me and Colin care a lot about it, and we want to be good stewards of it. It has already played an important role in our own work over the last months, and I continue to believe it is one of the best foundations for building capable agents. We will have more to say soon about how we think about Pi’s future and its relationship to Lefos, but the short version is simple: we want Pi to continue to exist as a high-quality, open, extensible piece of software, and we want to invest in making that future real. As for our thoughts of Pi’s license, read more here and our company post here .

0 views
Kev Quirk Yesterday

Obfuscating My Contact Email

I stumbled across this great post by Spencer Mortensen yesterday, which tested different email obfuscation techniques against real spambots to see which ones actually work. It's a fascinating read, and I'd recommend checking it out if you're into that sort of thing. The short version is that spambots scrape your HTML looking for email addresses. If your address is sitting there in plain text, they'll hoover it up. But if you encode each character as a HTML entity , the browser still renders and uses it correctly, while most bots haven't got a clue what they're looking at. From Spencer's testing, this approach blocks around 95% of harvesters, which is good enough for me. On this site, my contact email shows up in two places: Both pull from the value in Pure Blog's config, so I only needed to make a couple of changes. The reply button lives in , which is obviously a PHP file. So the fix there was straightforward - I ditched the shortcode and used PHP directly to encode the address character by character into HTML entities: Each character becomes something like , which is gibberish to a bot, but perfectly readable to a human using a browser. The shortcode still gets replaced normally by Pure Blog after the PHP runs, so the subject line still works as expected. The contact page is a normal page in Pure Blog, so it's Markdown under the hood. This means I can't drop PHP into it. Instead, I used Pure Blog's hook , which runs after shortcodes have already been processed. By that point, has been replaced with the plain email address, so all I needed to do was swap it for the encoded version: This goes in , and now any page content that passes through Pure Blog's function will have the email automatically encoded. So if I decide to publish my elsewhere, it should automagically work. As well as the obfuscation, I also set up my email address as a proper alias rather than relying on a catch-all to segregate emails . That way, if spam does somehow get through, I can nuke the alias, create a new one, and update it in Pure Blog's settings page. Is this overkill? Probably. But it was a fun little rabbit hole, and now I can feel smug about it. 🙃 Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . The Reply by email button at the bottom of every post. My contact page .

0 views
Giles's blog Yesterday

Writing an LLM from scratch, part 32i -- Interventions: what is in the noise?

Towards the end of last year, I trained a 163M-parameter GPT-2-style model from scratch on my local RTX 3090 , using code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". The result was a pretty decent little model, but it wasn't as good as the original GPT-2-small, despite having more parameters (because it wasn't using weight-tying). Specifically: on a particular test set, my model gave a loss of 3.944 -- quite a lot more than the original GPT-2's 3.500 on the same dataset. I wanted to see whether I could train a model on my own hardware (or on something that didn't cost too much to rent in the cloud) that got closer to the original model's performance. So over the last few months, I've done a bunch of further training runs, each one testing a specific intervention -- a stand-alone change that I expected to change the loss, either for better or for worse. Specifically: At the end of all of that, I had this table showing the effect of each intervention in terms of loss on the test set. They're sorted from least-effective to most-effective, and you can see the baseline in there too: Winners and losers are reasonably clear: So, for an optimal train, we'd just use the effective interventions, right? Well, not quite. Full-fat float32 I decided wasn't worth the effort, as it meant that the train took more than twice as long, and (because it required a larger machine), cost more than three times as much. The others did look like solid changes, but there was one concern. The effect of each intervention is actually pretty small. For example, gradient clipping reduced the loss by 0.014, from 3.692 to 3.678. That's a 0.3% improvement. Even the best intervention, scheduling the learning rate, only improved things by 2%. Could it be that some or all of these improvements were not real, but just a result of the random nature of training deep neural networks? Could the differences just be in the noise? They seemed small enough for that to be possible. I've trained seven more models over the last few days to try to get a feel as to how big an effect noise has for this kind of training run. The results appear to show that variations in the initial weights matter quite a lot, but randomness in the training loop (given the same initial weights) actually has a fairly minimal impact. That surprised me a bit! Let's go through the details. When I did the original baseline training run -- creating the model that was the comparison point for all of the interventions -- I wanted to minimise the amount of random number-induced differences between the training runs in this interventions series. I did this by setting the random seed at the start -- specifically, I had this code: At the time I wrote it, this seemed pretty complete -- the seed is set on Python's own random number generator, on PyTorch's, and on the separate ones it uses for CUDA. However, in a separate project, where I was fine-tuning a Qwen model as a classifier, I'd found that this wasn't enough. In order to get full reproducibility, I'd had to lock things down a bit more, with this additional code: So: was my random number seed code enough for this case? Or would I get a different model if I ran the same code a second time? That was easy enough to do; I spun up a machine, and just ran the "baseline" train again. 3 hours 24 minutes later: Interestingly, that was exactly the same final train loss as the original baseline train. Here's the model . I ran my normal smoke test, asking it to complete "Every effort moves you" ...so that was OK -- the model was generating reasonably coherent text. Then I ran the eval to find its loss on the test set: Exactly the same as the original baseline! That was certainly promising. Now, the use of three decimal places for the output from the loss eval is just a formatting thing, so I bumped it up to 6 dps, and the new model got this: Running that against the original baseline model: Again, exactly the same. Finally, more out of idle interest than anything else, I decided to see if the models were at least different: That is, quite frankly, amazing to me. I was expecting pretty close results, but what we're seeing here is that two separate models, trained on the same data, but on different machines more than a month apart, have weights that are bit-wise identical. No random noise at all. That's actually really reassuring! It makes me much more comfortable that we're standing on a stable foundation here. Now it was time to see what effect changing that random seed would have. Let's think about what the random seed does. When we call , we're initialising Python's pseudo-random number generator so that it will start at a particular point -- after we've called it, it will generate the same sequence of "random" numbers each time it's asked for a new one. So the effect of this code: ...is to initialise three separate pseudo-random number generators to be in a known deterministic state, so they'll all generate the same sequence in every run. So, the first thing to do was to see what happened if we changed that number. I decided to do two training runs, each with exactly the same code as the baseline, but with different random seeds. Firstly, I changed it from 42 to 22 1 : That training run completed: Here's the model . Time for the evals; the smoke test: ...and the loss test: So, that's 3.673453 compared to 3.691526, an improvement of 0.018 over the run with a seed of 42. That's more than the 0.014 improvement we got from gradient clipping (and indeed, the 0.013 from full-fat float32 training), and quite close to the 0.023 improvement from adding attention weight bias. Time for another training run: Another 3h24m later: Here's the model . The smoke test: ...and the test set loss: A further improvement! That's 0.038 better than our original baseline, which beats adding on attention weight bias (though it's worse than the weight decay update). Now, three data points is rather a small number for any kind of statistical analysis, but just out of interest, let's do the basics. GeeksForGeeks has a good refresher here if you're a bit rusty. Firstly, our mean is ...and our variance 2 is: If we take the square root of that, we get the standard deviation (SD): So, if we assume a normal distribution, what would that say about our results? Here's the results table again. If we assume that the results are on a normal distribution: That seemed a bit saddening -- were all of the results apart from scheduling the learning rate within the noise? Well, so as I said, three data points is too small a number to take those results without a fistful of salt. I was thinking of perhaps trying another few random seeds to see what would happen, and perhaps to tighten those numbers up a bit, but then something occurred to me -- randomness was being used in two different ways in the training run, and perhaps we could separate them? Where do we use the random numbers? Well, immediately after we set the seeds, we create our uninitialised model for training: One of the random number generators -- Python's, PyTorch's, or one of the CUDA ones -- will be used to generate the initial weights that we're going to start training. That means that for the same model setup , we'll always start with exactly the same weights. But if the model settings change such that we initialise different things in a different order, then we'll have different weights. After we've done that, we go into the training loop. That can have randomness in it; although the AdamW optimiser itself is deterministic, we are (in all but one of these training runs) using dropout, which drops a random bunch of activations at various points -- 10% of them with our config. And it seems entirely possible that each of the interventions could change the order of execution of different steps in non-obvious ways, which would lead to dropout being applied in different ways in different runs. So, the question was: what kinds of randomness -- in terms of the initial weights, or in terms of the training run -- did each intervention potentially change vs the baseline? Disregarding the full-fat float32 run: Given that, I wanted to get two measures of how sensitive to noise each phase of the training run was: the initialisation of weights at the start, and the training run itself. I decided to start by nailing down exactly what the training run started with. We already had a baseline training run with a specific state of the random number generator at the start; in our "real" baseline, we seeded with 42 at the start, and then initialised our weights. After that, the random number generator would have reached some specific state based on its initial seed and how many numbers had been generated so far. Now, in theory, we could get the RNG into that specific state by seeding it with some number A at that point. We don't know what A is, of course. But it seems vanishingly unlikely that it would be something we'd come up with -- specifically, we can be pretty sure that A ≠ 23 and A ≠ 67 . So, I put the old initial seed of 42 back in, but re-seeded after the model had been initialised: Firstly, with a re-seed value of 23: I let that run.... ...and got this model . Time for the normal evals: Next, I did another training run, the same as the previous one, but with 67 instead of 23 for the re-seed: That one ran: ...producing this model , which eval'ed like this 3 : Let's bring those together: That's a mean of ~3.684462, with a variance of ~0.0000752 and a standard deviation of ~0.008672. Those are tiny compared to the numbers from the two trains we did with the change of the seed prior to the model initialisation. That actually surprised me a bit; we're using dropout in all of these training runs, and it's dropping a random 10% of activations in every forward training pass. With our different training run starting seeds, they should be getting very different dropout patterns. Hand-wavingly, perhaps over the three million or so sequences we're training on, it averages out? Still a little counterintuitive, though. Anyway, let's take a look at the intervention results again, this time highlighting the ones that we believe will be starting with the same weights: Using the "99.7% should be within three SDs" heuristic, we get a range of 3.658446 - 3.710478. Of the intervention runs with (I believe) stable weights, only the no-AMP and the gradient clipping ones are within that range. That made me feel quite positive. If my beliefs are correct about which runs have the same weights, then noise in the training runs seems unlikely to be causing the differences -- that is, perhaps the results from the interventions for those same-weight training runs are real signal and not just noise. What would happen if instead of pinning the seed for generating the weights and varying the starting seed for the training run, we varied the weight seed and pinned the training one? We'd already done a training run with a seed of 42 before generating the weights and a re-seed to 23 after that: So I decided to see what would happen if I varied the pre-weights initialisation seed. Let that train: ...getting this model . Evals: Next, one with 67 as the weights initialisation seed: That trained: ...getting this model , and 4 : OK, so here we have: Compared to the SD we got when we varied just the initial seed, 0.0154919, it's not too far off. Using the 3-SD rule, we get a range of 3.637030 - 3.709400, and looking at the table again, this time with the ones that we don't expect to have the same weights highlighted: ...we can see that the QKV bias is well within that range (as are all of the interventions apart from the two negative-effect ones and scheduling the learning rate). Right, what does all of that tell us? This post obviously isn't even trying to be statistically rigorous. The number of training runs I've done and the amount of data is way too small for that. However, training runs are expensive (Lambda have raised their prices again, so these cost more than US$50 each!), so there's a limit to how much I can do. But even with the limited amount of data, something seems pretty clear: "One of these things is not like the others". Keeping the model weights stable and only allowing variation in randomness across the training run itself meant that almost all of the differences between training runs disappeared. Could this be a result of the small number of samples? I guess conceivably it might, but it seems vanishingly unlikely. So I feel reasonably confident in saying that the bulk of the variation in results that we can chalk up to random noise in these training runs comes from variations in the model weights' initialisation. Additionally, the first training run in this post -- the re-run of the baseline model with no changes -- gave exactly the same numbers as the original baseline run. So we can be confident that all of the models with no changes to the weight initialisation started with the same weights. Of course, I could be wrong about which models really did have the same weights, but given that they were running the same code with the same seed, I'm pretty much sure. That makes me fairly confident that the intervention runs that had the same initial weights gave a real signal about whether or not the intervention in question actually helped. The only exception is gradient clipping, which fell within the three-SD range for the same-weights tests -- and it's essentially free, adding just 100 seconds to a three hour training run. That's a really interesting result! As I said earlier, given that dropout is making us ignore a random 10% of activations during the training run, I would have thought that changing which random 10% were being ignored would have a much larger effect. And that's not even considering other sources of random noise in the training run. I was less surprised that model weight initialisation was important, though. It's pretty obvious that your starting position in the loss landscape is going to affect where you end up at the end of the training run. Still, we now have a reasonable level of trust that our interventions gave a real signal, so I think we have everything in place to see how they stack together, and do a best-effort training run. Can we approach the original GPT-2 small weights' performance on our test set loss? It should be fun to find out :-) Numbers chosen based on a misremembering of this XKCD . For some reason (perhaps because it rhymes) I thought that the old-timey funny number thing was "22 skidoo" rather than "23 skidoo".  ↩ On working through this later: with n samples from a dataset, it is (as I understand it) best to use n − 1 as the denominator here (Bessel's correction) for the "sample variance". If we had every possible value, then it would be correct to use n . However, while this changes a few details in the analysis, I don't think it changes the final conclusion of the post meaningfully (it would just bump up the SDs by 22% or so), so I've left it as-is.  ↩ I found it interesting that this model does the "you and I" hypercorrection that so many people do when trying to write formally! Based on the (correct) correction of "me and you move back home" to "you and I move back home", I think as a result of excessive pattern-matching.  ↩ Another grammatical error based on pattern-matching -- it would make sense that the possessive form of "it" in English was "it's", just like the possessive form of "John" is "John's".  ↩ I trained a baseline model on an 8x A100 40 GiB per GPU machine on Lambda (which was better than my original locally-trained model, I believe due to the larger batch size that the larger machine made possible). I tried adding gradient clipping to see if that would help by limiting the effects of loss spikes. I tried removing dropout , given that these days people tend not to use it (because we're doing single-epoch training runs). I tried adding bias to the attention weight matrices -- something that was popular back in the GPT-2 era, and was used by the original weights, but which my code did not use. Instead of just using the learning rate of 0.0004 that was used in the code from the book, I looked into what values people use these days, and learned how to schedule it over the course of the training run . Similarly, I learned more about weight decay and tried some alternative values. Then I tried making my model more like the original GPT-2 one by introducing weight tying to see if that would help. Finally, I decided to try training in "full-fat" float32 instead of using PyTorch's AMP and TF32 matrix multiplication performance enhancements. Weight tying and the number for weight decay I derived from a paper by Cerebras Research (probably without understanding it properly) were negatives. Full-fat float32, gradient clipping, attention biases, the GPT-2 weight decay parameter, removing dropout, and scheduling (and updating) the learning rate were positives. We would expect ~68.2% of results to be within one SD of the mean -- that is, between 3.6573651 and 3.6883489. Interestingly, our actual baseline result is outside that range! But it does include both the gradient clipping and the QKV bias results. We would additionally expect ~95.4% of the results to be within two SDs, which is 3.6418732 to 3.7038408. That includes our baseline and our weight decay result (though not our experiment removing dropout -- the six-DP loss number for that is 3.641282). Finally, we'd expect ~99.7% of results to be within three SDs, which is a range from 3.6263813 to 3.7193327. That covers all of our positive results apart from scheduling learning rate! Gradient clipping: randomness only affected the training run -- the weights it started with would have been exactly the same as the baseline model's. Removing dropout: although this is a parameter on the model, I don't think it changes the initial weights. But in the training run, it certainly does affect randomness by removing its use of the random number generator. Adding bias to the attention weights. This will change both the initial weights -- because we have those bias weights, things will be initialised differently -- and as a result, the training run, as the random number generator will have been sampled a different number of times prior to the run. Changing and scheduling the learning rate certainly should not change the initial weights, but it might conceivably have a non-obvious effect on training. Likewise weight decay; no effect I can see on the initial weights, but it could well change training dynamics. Weight-tying. When I added it to the code , I tried to do so in such a way that the other weights would be unaffected -- I created exactly the same weights as I would without weight tying, then threw away the output head and replaced it with a reference to the input embedding weights. So I think that in theory, this one won't have changed the other model weights (apart from ignoring the initialised-but-thrown-away output head), but it could well have changed the training run. Our normal baseline: weights initialised with seed 42, and training run starts with a "seed" of our imaginary A value from above: 3.691526 The first run above: weights initialised with seed 42, and training run starts with a seed of 23: 3.681356 The second run above: weights initialised with seed 42, and training run starts with a seed of 67: 3.680505 The first run above: weights initialised with seed 42, and training run starts with a seed of 23: 3.681356 Mean: ~3.673215 Variance: ~0.000145 SD: ~0.012062 Varying the random seed at the start, prior to initialising weights, and not constraining the starting point for the training runs, gave a mean of 3.672857, with an SD of 0.0154919. Keeping the same seed for model weights (so that they all started with the same weights), and varying the seed for the training run, gave a mean of 3.684462, with an SD of 0.008672. Varying the seed for the model weights (so that they all started with different weights), and keeping the training run seed pinned, gave a mean of 3.673215 and an SD of 0.012062. Numbers chosen based on a misremembering of this XKCD . For some reason (perhaps because it rhymes) I thought that the old-timey funny number thing was "22 skidoo" rather than "23 skidoo".  ↩ On working through this later: with n samples from a dataset, it is (as I understand it) best to use n − 1 as the denominator here (Bessel's correction) for the "sample variance". If we had every possible value, then it would be correct to use n . However, while this changes a few details in the analysis, I don't think it changes the final conclusion of the post meaningfully (it would just bump up the SDs by 22% or so), so I've left it as-is.  ↩ I found it interesting that this model does the "you and I" hypercorrection that so many people do when trying to write formally! Based on the (correct) correction of "me and you move back home" to "you and I move back home", I think as a result of excessive pattern-matching.  ↩ Another grammatical error based on pattern-matching -- it would make sense that the possessive form of "it" in English was "it's", just like the possessive form of "John" is "John's".  ↩

0 views

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Anthropic didn't release their latest model, Claude Mythos ( system card PDF ), today. They have instead made it available to a very restricted set of preview partners under their newly announced Project Glasswing . The model is a general purpose model, similar to Claude Opus 4.6, but Anthropic claim that its cyber-security research abilities are strong enough that they need to give the software industry as a whole time to prepare. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser . Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems—systems that represent a very large portion of the world’s shared cyberattack surface. We anticipate this work will focus on tasks like local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems. There's a great deal more technical detail in Assessing Claude Mythos Preview’s cybersecurity capabilities on the Anthropic Red Team blog: In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex  JIT heap spray  that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets. Plus this comparison with Claude 4.6 Opus: Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more. Saying "our model is too dangerous to release" is a great way to build buzz around a new model, but in this case I expect their caution is warranted. Just a few days ( last Friday ) ago I started a new ai-security-research tag on this blog to acknowledge an uptick in credible security professionals pulling the alarm on how good modern LLMs have got at vulnerability research. Greg Kroah-Hartman of the Linux kernel: Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Daniel Stenberg of : The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. And Thomas Ptacek published Vulnerability Research Is Cooked , a post inspired by his podcast conversation with Anthropic's Nicholas Carlini. Anthropic have a 5 minute talking heads video describing the Glasswing project. Nicholas Carlini appears as one of those talking heads, where he said (highlights mine): It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome. [...] I've found more bugs in the last couple of weeks than I found in the rest of my life combined . We've used the model to scan a bunch of open source code, and the thing that we went for first was operating systems, because this is the code that underlies the entire internet infrastructure. For OpenBSD, we found a bug that's been present for 27 years, where I can send a couple of pieces of data to any OpenBSD server and crash it . On Linux, we found a number of vulnerabilities where as a user with no permissions, I can elevate myself to the administrator by just running some binary on my machine. For each of these bugs, we told the maintainers who actually run the software about them, and they went and fixed them and have deployed the patches patches so that anyone who runs the software is no longer vulnerable to these attacks. I found this on the OpenBSD 7.8 errata page : 025: RELIABILITY FIX: March 25, 2026 All architectures TCP packets with invalid SACK options could crash the kernel. A source code patch exists which remedies this problem. I tracked that change down in the GitHub mirror of the OpenBSD CVS repo (apparently they still use CVS!) and found it using git blame : Sure enough, the surrounding code is from 27 years ago. I'm not sure which Linux vulnerability Nicholas was describing, but it may have been this NFS one recently covered by Michael Lynch . There's enough smoke here that I believe there's a fire. It's not surprising to find vulnerabilities in decades-old software, especially given that they're mostly written in C, but what's new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues. I actually thought to myself on Friday that this sounded like an industry-wide reckoning in the making, and that it might warrant a huge investment of time and money to get ahead of the inevitable barrage of vulnerabilities. Project Glasswing incorporates "$100M in usage credits ... as well as $4M in direct donations to open-source security organizations". Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation. It would be great to see OpenAI involved as well - GPT-5.4 already has a strong reputation for finding security vulnerabilities and they have stronger models on the near horizon. The bad news for those of us who are not trusted partners is this: We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview. I can live with that. I think the security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views

Russia Hacked Routers to Steal Microsoft Office Tokens

Hackers linked to Russia’s military intelligence units are using known flaws in older Internet routers to mass harvest authentication tokens from Microsoft Office users, security experts warned today. The spying campaign allowed state-backed Russian hackers to quietly siphon authentication tokens from users on more than 18,000 networks without deploying any malicious software or code. Microsoft said in a blog post today it identified more than 200 organizations and 5,000 consumer devices that were caught up in a stealthy but remarkably simple spying network built by a Russia-backed threat actor known as “ Forest Blizzard .” How targeted DNS requests were redirected at the router. Image: Black Lotus Labs. Also known as APT28 and Fancy Bear, Forest Blizzard is attributed to the military intelligence units within Russia’s General Staff Main Intelligence Directorate (GRU). APT 28 famously compromised the Hillary Clinton campaign, the Democratic National Committee, and the Democratic Congressional Campaign Committee in 2016 in an attempt to interfere with the U.S. presidential election. Researchers at Black Lotus Labs , a security division of the Internet backbone provider Lumen , found that at the peak of its activity in December 2025, Forest Blizzard’s surveillance dragnet ensnared more than 18,000 Internet routers that were mostly unsupported, end-of-life routers, or else far behind on security updates. A new report from Lumen says the hackers primarily targeted government agencies—including ministries of foreign affairs, law enforcement, and third-party email providers. Black Lotus Security Engineer Ryan English said the GRU hackers did not need to install malware on the targeted routers, which were mainly older Mikrotik and TP-Link devices marketed to the Small Office/Home Office (SOHO) market. Instead, they used known vulnerabilities to modify the Domain Name System (DNS) settings of the routers to include DNS servers controlled by the hackers. As the U.K.’s National Cyber Security Centre (NCSC) notes in a new advisory detailing how Russian cyber actors have been compromising routers, DNS is what allows individuals to reach websites by typing familiar addresses, instead of associated IP addresses. In a DNS hijacking attack, bad actors interfere with this process to covertly send users to malicious websites designed to steal login details or other sensitive information. English said the routers attacked by Forest Blizzard were reconfigured to use DNS servers that pointed to a handful of virtual private servers controlled by the attackers. Importantly, the attackers could then propagate their malicious DNS settings to all users on the local network, and from that point forward intercept any OAuth authentication tokens transmitted by those users. DNS hijacking through router compromise. Image: Microsoft. Because those tokens are typically transmitted only after the user has successfully logged in and gone through multi-factor authentication, the attackers could gain direct access to victim accounts without ever having to phish each user’s credentials and/or one-time codes. “Everyone is looking for some sophisticated malware to drop something on your mobile devices or something,” English said. “These guys didn’t use malware. They did this in an old-school, graybeard way that isn’t really sexy but it gets the job done.” Microsoft refers to the Forest Blizzard activity as using DNS hijacking “to support post-compromise adversary-in-the-middle (AiTM) attacks on Transport Layer Security (TLS) connections against Microsoft Outlook on the web domains.” The software giant said while targeting SOHO devices isn’t a new tactic, this is the first time Microsoft has seen Forest Blizzard using “DNS hijacking at scale to support AiTM of TLS connections after exploiting edge devices.” Black Lotus Labs engineer Danny Adamitis said it will be interesting to see how Forest Blizzard reacts to today’s flurry of attention to their espionage operation, noting that the group immediately switched up its tactics in response to a similar NCSC report (PDF) in August 2025. At the time, Forest Blizzard was using malware to control a far more targeted and smaller group of compromised routers. But Adamitis said the day after the NCSC report, the group quickly ditched the malware approach in favor of mass-altering the DNS settings on thousands of vulnerable routers. “Before the last NCSC report came out they used this capability in very limited instances,” Adamitis told KrebsOnSecurity. “After the report was released they implemented the capability in a more systemic fashion and used it to target everything that was vulnerable.” TP-Link was among the router makers facing a complete ban in the United States. But on March 23, the U.S. Federal Communications Commissio n (FCC) took a much broader approach, announcing it would no longer certify consumer-grade Internet routers that are produced outside of the United States. The FCC warned that foreign-made routers had become an untenable national security threat, and that poorly-secured routers present “a severe cybersecurity risk that could be leveraged to immediately and severely disrupt U.S. critical infrastructure and directly harm U.S. persons.” Experts have countered that few new consumer-grade routers would be available for purchase under this new FCC policy (besides maybe Musk’s Starlink satellite Internet routers, which are produced in Texas). The FCC says router makers can apply for a special “conditional approval” from the Department of War or Department of Homeland Security, and that the new policy does not affect any previously-purchased consumer-grade routers.

0 views
neilzone 2 days ago

Sex and the Fedi

Over the weekend, Girl on the Net - an esteemed sex blogger who, incidentally, happens to be one of the smartest, strongest, and downright loveliest people that I know - tooted : If you ever get sick of me banging on about my life and think ‘ugh I wish she would stick to the porn’ then please know: hardly anyone ever boosts the … porn. And this made me think. I had an engaging conversation with numerous people about it, and I still don’t have good answers, but I enjoyed the discussion and wanted to keep a note of it. This is that note. I follow and chat with quite a lot of sex positive / sex work-related people in the fediverse, and many have expressed similar sentiments. They create, they share, they get “likes” - and, of course, ample criticism - but very few boosts / shares. It must be incredibly demoralising. (I am in a different position in that I neither know nor care how many views my blogposts get .) It made me ponder why people do not share sex-related content, when sex is clearly part of life for many (but not all) people. My thoughts were: stigma about sex as pleasure. It’s fine to have sex, but not to talk about it. One of Girl on the Net’s regular themes is about communication, and simply asking questions (not just about sex, but also including about sex and one’s preferences and horizons). But I imagine that, for some, talking about sex is uncomfortable, including sharing other people talking about sex. concerns relating to professional expectations and obligations. I fall into this category. I am sex positive, but I do not know where the Solicitors Regulation Authority would draw the line, and I don’t wish to be even close to where that line might be. So I play it safe, even though there is stuff that I would like to post or share. But, oh well, self-censorship ftw. Sometimes, I would love not to be “me” online . being embarrassed about what others here might think. Similar, but different, to the points above. This is about other fedizens, who might be co-workers, employers, family members, or whatever. sex as being in the sphere of one’s private life. older people, perhaps especially men, being self-aware of engaging with younger adults posting sex-related stuff, and coming across as creepy. I completely get this, and I am somewhat paranoid about it myself. Several people responded to say that, yes, they felt like this. They might want to engage with public content (and I’m not talking about responding lasciviously, or sending dick pics), but do not want to be perceived as being inappropriate. I received some thought-provoking feedback too: women and non-binary people said that they felt unsafe boosting or posting sex-related content, because of reactions from men hitting on them. That, by posting about sex, some men took it as an unwelcome opportunity to solicit sex with them. some people not wanting to boost as they feel that they don’t have enough followers to make it worthwhile. And, in terms of increasing the distribution of a toot, yes, that makes sense. It probably still sends a nice endorphin boost to the poster though, that someone likes their work enough to want to boost it :) Where someone has a popular “main” account, and a less popular “alt” account, but would only be willing/able to post sex-related stuff via that alt, this perhaps comes into play. just not liking the stuff enough to boost it. Fair enough! concerns over whether their server rules allow boosting of this kind of content, and not wanting to get blocked / banned. I can understand each of these, and why they might lead to a “like” rather than a “boost”. None of them inhibit paying or tipping someone, as a thank you for their work though, which is another way of being supportive. But this also comes against a backdrop of increasing difficulties for sex workers and other people post sex-related stuff. Payment processors denying income streams. Platform operators enforcing their ever more restrictive morality rules, making working harder, and requiring more admin just to keep going. If people take, take, take, without giving back in some meaningful way, then that is challenging even for those who create and share for fun (for appreciation, perhaps, rather than tooting into the void), let alone those for whom this is their livelihood. I wish that I had better answers than I do. stigma about sex as pleasure. It’s fine to have sex, but not to talk about it. One of Girl on the Net’s regular themes is about communication, and simply asking questions (not just about sex, but also including about sex and one’s preferences and horizons). But I imagine that, for some, talking about sex is uncomfortable, including sharing other people talking about sex. concerns relating to professional expectations and obligations. I fall into this category. I am sex positive, but I do not know where the Solicitors Regulation Authority would draw the line, and I don’t wish to be even close to where that line might be. So I play it safe, even though there is stuff that I would like to post or share. But, oh well, self-censorship ftw. Sometimes, I would love not to be “me” online . being embarrassed about what others here might think. Similar, but different, to the points above. This is about other fedizens, who might be co-workers, employers, family members, or whatever. sex as being in the sphere of one’s private life. older people, perhaps especially men, being self-aware of engaging with younger adults posting sex-related stuff, and coming across as creepy. I completely get this, and I am somewhat paranoid about it myself. Several people responded to say that, yes, they felt like this. They might want to engage with public content (and I’m not talking about responding lasciviously, or sending dick pics), but do not want to be perceived as being inappropriate. women and non-binary people said that they felt unsafe boosting or posting sex-related content, because of reactions from men hitting on them. That, by posting about sex, some men took it as an unwelcome opportunity to solicit sex with them. some people not wanting to boost as they feel that they don’t have enough followers to make it worthwhile. And, in terms of increasing the distribution of a toot, yes, that makes sense. It probably still sends a nice endorphin boost to the poster though, that someone likes their work enough to want to boost it :) Where someone has a popular “main” account, and a less popular “alt” account, but would only be willing/able to post sex-related stuff via that alt, this perhaps comes into play. just not liking the stuff enough to boost it. Fair enough! concerns over whether their server rules allow boosting of this kind of content, and not wanting to get blocked / banned.

0 views
Martin Fowler 2 days ago

Principles of Mechanical Sympathy

Modern hardware is remarkably fast, but software often fails to leverage it. Caer Sanders has found it valuable to guide their work with mechanical sympathy - the practice of creating software that is sympathetic to its underlying hardware. They distill this practice into everyday principles: predictable memory access, awareness of cache lines, single-writer, and natural batching.

0 views
Kev Quirk 2 days ago

Why Have a Dedicated Music Device?

In the last year or so I've read about many people moving from streaming services, like Apple Music and Spotify, to their own music library. To support these local libraries, many seem to be getting themselves a music player, such as the Fiio Echo Mini . While moving to a local library is something that I've thought about many times 1 , I don't understand why people are buying these little music players. The big selling points generally seem to be: With the exception of the 3rd point, pretty much every smartphone on the market will do all of this. And let's be honest, #3 doesn't really matter as most people use Bluetooth buds these days. Yes, I know some people still use old school wired earphones. I don't need an email from you. So if the device that's already in your pocket will do everything these little music players will already do, why get an extra device to lug around everywhere? I want to stress, these look really cool, and if that's why you want one, that's totally fine. But anecdotally, that's not what I'm seeing. Can someone enlighten me? I see the advantages of owning your own music library, but I don't get why people want to carry another device everywhere. I've decided to stick with streaming, but that's a post for another day.  ↩ Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Bluetooth connectivity so you can use with buds, or in your car. Plenty of local storage. Audio jack. Easy to drag and drop music. I've decided to stick with streaming, but that's a post for another day.  ↩

0 views

FlexGuard: Fast Mutual Exclusion Independent of Subscription

FlexGuard: Fast Mutual Exclusion Independent of Subscription Victor Laforet, Sanidhya Kashyap, Călin Iorgulescu, Julia Lawall, and Jean-Pierre Lozi SOSP'25 This paper presents an interesting use of eBPF to effectively add an OS feature: coordination between user space locking code and the kernel thread scheduler to improve locking performance. The paper describes most lock implementations as spin-then-park locks (e.g., busy wait in user space for some time, then give up and call the OS to block the waiting thread). A big problem with busy waiting is the performance cliff under oversubscription . Oversubscription occurs when there are more active threads than cores. In this case, busy waiting can be harmful, because it wastes CPU cycles when there is other useful work to do. The worst case occurs when a thread acquires a lock and then is preempted by the OS scheduler while many other threads are busy waiting. If the OS thread scheduler were smart, it would preempt one of the busy waiters and let the lock holder keep running. But alas, that level of coordination isn’t available … until now. In the good old days, researchers would have modified Linux scheduling code and tested their modified kernel. The modern (easier) way to achieve this is to use eBPF. The authors wrote an eBPF program that runs (in kernel space) each time a context switch occurs. This program is called the Preemption Monitor . The Preemption Monitor works in conjunction with a custom user space lock implementation. The net result is that the Preemption Monitor can reliably detect when the OS scheduler preempts a thread that is holding a lock. When this occurs the eBPF program writes information to a variable that user space code can read. The locking algorithm is as follows: First, try to acquire the lock with a simple atomic compare-and-swap. If that fails, then busy wait. Similar to Hapax locks , this busy waiting avoids contention on one cache line by forcing all threads to agree on the order they will acquire the lock and letting each thread spin on per-thread variables. During busy waiting, the variable written by the Preemption Monitor is checked. If this variable indicates that there currently exists a thread which has acquired a lock and has been preempted by the OS, then threads stop busy waiting and instead call the OS to block until the lock is released (using the same system call that a futex would use). Fig. 2 has performance results. The x-axis shows thread count (which varies over time). The green line is FlexGuard. The idea is that it gives great performance when there is no oversubscription (i.e., fewer than 150 threads) and offers performance similar to a purely blocking lock (the dark blue line) when there is oversubscription. Source: https://dl.acm.org/doi/10.1145/3731569.3764852 Dangling Pointers This problem seems ripe for overengineering. In some sick world, the compiler, OS, and hardware could all coordinate to support a “true critical section”. All pages accessed inside this critical section would be pinned into main memory (or even closer to the CPU), and the OS would try extremely hard not to preempt threads inside of the critical section. This would require some upper bound on the critical section working set and running time. Subscribe now First, try to acquire the lock with a simple atomic compare-and-swap. If that fails, then busy wait. Similar to Hapax locks , this busy waiting avoids contention on one cache line by forcing all threads to agree on the order they will acquire the lock and letting each thread spin on per-thread variables. During busy waiting, the variable written by the Preemption Monitor is checked. If this variable indicates that there currently exists a thread which has acquired a lock and has been preempted by the OS, then threads stop busy waiting and instead call the OS to block until the lock is released (using the same system call that a futex would use).

0 views