Posts in Ui (20 found)
Gabriel Weinberg 1 weeks ago

As AI displaces jobs, the US government should create new jobs building affordable housing

We have a housing shortage in the U.S., and it is arguably a major cause of long-term unrest about the economy. Putting aside whether AI will eliminate jobs on net, it will certainly displace a lot of them. And the displaced people are unlikely to be the same people who will secure the higher-tech jobs that get created. For example, are most displaced truck drivers going to get jobs in new industries that require a lot of education? Put these two problems together and maybe there is a solution hiding in plain sight: create millions of new jobs in housing. Someone has to build all the affordable homes we need, so why not subsidize jobs and training for those displaced by AI? These jobs will arguably offer an easier onramp and are sorely needed now (and likely for the next couple of decades as we chip away at this housing shortage). Granted, labor may not be the primary bottleneck in the housing shortage, but it is certainly a factor and one that is seemingly being overlooked. There are many bills in Congress aimed at increasing housing supply through new financing and relaxed regulatory frameworks. A program like this would help complete the package. None of this has been happening via market forces alone, so the government would therefore need to create a new program at a large scale, like the Works Progress Administration (WPA) at the end of the Great Depression, but this time squarely focused on affordable housing (and otherwise narrowly tailored to avoid inefficiencies). There are a lot of ways such a program could work (or not work), including ways to maximize the long-term public benefit (and minimize its long-term public cost), but this post is just about floating the high-level idea. So there you have it. I’ll leave you though with a few more specific thought starters: Every state could benefit since every state has affordable housing issues. Programs become more politically viable when more states benefit from them. Such a program could be narrowly tailored, squarely focused on affordable housing (as mentioned above), but also keeping the jobs time-limited (the whole program could be time-limited and tied to overall housing stock), and keeping the wages slightly below local market rates (to complement rather than compete with private construction). It could also be tailored to those just affected by AI, but that doesn’t seem like the right approach to me. The AI job market impact timeline is unclear, but we can nevertheless start an affordable-housing jobs program now that we need today, which can also serve as a partial backstop for AI-job fallout tomorrow. It seems fine to me if some workers who join aren't directly displaced by AI, since the program still creates net new jobs we will need anyway and to some extent jobs within an education band are fungible. We will surely need other programs as well to help displaced workers specifically (for example, increased unemployment benefits). Thanks for reading! Subscribe for free to receive new posts or get the audio version . We have a housing shortage in the U.S., and it is arguably a major cause of long-term unrest about the economy. Putting aside whether AI will eliminate jobs on net, it will certainly displace a lot of them. And the displaced people are unlikely to be the same people who will secure the higher-tech jobs that get created. For example, are most displaced truck drivers going to get jobs in new industries that require a lot of education? Put these two problems together and maybe there is a solution hiding in plain sight: create millions of new jobs in housing. Someone has to build all the affordable homes we need, so why not subsidize jobs and training for those displaced by AI? These jobs will arguably offer an easier onramp and are sorely needed now (and likely for the next couple of decades as we chip away at this housing shortage). Granted, labor may not be the primary bottleneck in the housing shortage, but it is certainly a factor and one that is seemingly being overlooked. There are many bills in Congress aimed at increasing housing supply through new financing and relaxed regulatory frameworks. A program like this would help complete the package. None of this has been happening via market forces alone, so the government would therefore need to create a new program at a large scale, like the Works Progress Administration (WPA) at the end of the Great Depression, but this time squarely focused on affordable housing (and otherwise narrowly tailored to avoid inefficiencies). There are a lot of ways such a program could work (or not work), including ways to maximize the long-term public benefit (and minimize its long-term public cost), but this post is just about floating the high-level idea. So there you have it. I’ll leave you though with a few more specific thought starters: Every state could benefit since every state has affordable housing issues. Programs become more politically viable when more states benefit from them. Such a program could be narrowly tailored, squarely focused on affordable housing (as mentioned above), but also keeping the jobs time-limited (the whole program could be time-limited and tied to overall housing stock), and keeping the wages slightly below local market rates (to complement rather than compete with private construction). It could also be tailored to those just affected by AI, but that doesn’t seem like the right approach to me. The AI job market impact timeline is unclear, but we can nevertheless start an affordable-housing jobs program now that we need today, which can also serve as a partial backstop for AI-job fallout tomorrow. It seems fine to me if some workers who join aren't directly displaced by AI, since the program still creates net new jobs we will need anyway and to some extent jobs within an education band are fungible. We will surely need other programs as well to help displaced workers specifically (for example, increased unemployment benefits).

0 views
Stratechery 4 weeks ago

An Interview with Rivian CEO RJ Scaringe About Building a Car Company and Autonomy

Listen to this post: Good morning, Today’s Stratechery Interview is with Rivian founder and CEO RJ Scaringe . Last week Rivian held their Autonomy and AI Day , where the company unveiled its plans for a fully integrated approach to self-driving . Rivian is building everything from its own chips to its own sensors — including video, LiDAR, and radar — and if all goes well, the company will supply a multitude of companies, particularly Volkswagen. In this interview we cover all aspects of Rivian, including the long path to starting the company, production challenges, and why partnerships with Amazon and Volkswagen are so important, and point to relationships in the future. We also dive into autonomy, and why Rivian is taking a different path than Tesla, plus I ask why CarPlay isn’t available on Rivian vehicles, and what that reveals about their nature. As an aside to podcast listeners: due to a mind-boggling mistake by me, the first 20 minutes of this podcast are considerably lower audio quality. I forgot to hit ‘Record’, so the segment that remains is what the Rivian PR represenative captured on her phone. I’m incredible grateful for the save. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for content and clarity. RJ Scaringe, welcome to Stratechery. RJS: Happy to be here. We are here to talk about Rivian and your recent Autonomy and AI Day. Before we get to that, however, I want to learn more about you and your background, and how you ended up sitting with me today. You were, as I understand it, into cars at a very early age. RJS: I’ve been around cars as long as I could remember. As a kid I was restoring and working on cars. I spent time in restoration shops helping and slowly learning how to do more than “help”, but actually really help. And then around the age of, I guess 10-ish, decided I wanted to start a car company. Oh, okay. So there was no like, “Oh, I got a computer and started typing out BASIC”, this is straight cars all the way. RJS: Yeah, I knew I wanted to start a car company. And at that point when you’re a kid, you have no idea what it entails, you have no idea what the business is going to be, but I just knew that it was something I wanted to do and I sort of started charting out my future path with that as the end state goal, with that as the context. So I went and worked as a machinist, I ultimately went to school for engineering, I did a degree in mechanical engineering, then I went and did a master’s and a PhD focused on automotive. And then the day after I finished my PhD, I officially started Rivian. So why did you think it was necessary to do that level of education? Not just a bachelor’s or not just gain experience, but to go all the way through to the PhD? RJS: It was actually pretty intentional. I knew that to start a car company, I was intellectually honest with myself that it would take a lot of money and I knew that I didn’t have any money. So that meant for me to do this and be successful, I would need to get other people to invest money into the idea and typically in the tech space, you could start something with not a lot of capital that you can make a very crude version of your first product… Because it’s software not hardware RSJ: Exactly, and I also knew that I didn’t want to go work for 25 or 30 years to accumulate experience that would make me credible. So I was like, “What’s the fastest path to credibility?”, and my thought was it would be a PhD. I said, “If I get a PhD from a top school” — I went to MIT — I thought that would be some earned credibility that would make it more likely that investors would want to get into the company. I didn’t grow up around venture capital, I had no idea around any of these things, that was my hypothesis. And amazingly, it proved to be a key element in Rivian’s journey, because one of our early investors, one of our earliest large investors, I should say, was someone that I was introduced to through MIT and was an alumni of the school and I was connected with them through the provost, and that was ultimately what led to some of the really critical early capital into the business. So is it the end state that the PhD was totally worth it, but the actual academics was completely incidental to this introduction? RJS: Yeah, and I think as is the case I think with all higher education, the biggest takeaway is to learn how to learn, and to learn how to solve complex problems. I think undergrad, you’re learning how to learn. Graduate school, particularly for technical degrees, you identify a problem and you work really hard to solve that problem, and you have broad responsibility and broad scope on doing that activity, and you build confidence and you build skillsets around problem solving. But the problems are going to change in the course of a life or in the course of your career, the things I was working on 20 years ago have nothing to do with Rivian whatsoever. Well, I’m actually kind of curious about that. What were the things that you were working on 20 years ago, and why aren’t they applicable? RJS: Well, in the case of automotive research, in 2005 the work that was funded, which is the kind of work that you do as a PhD student, so you get sponsored by companies or by grants, was to look at making engines, internal combustion engines more efficient, that was primarily the focus. And so I was working on something called a homogeneous charge compression ignition engine, which is a different type of combustion. We’d compression ignite a pre-mixed fuel air mixture, very hard to get- Like diesel? RJS: It’s like diesel efficiency with gasoline-like cleanliness is the idea. Obviously, it’s not a technology that has any runway and makes any sense in the future. (laughing) I’m hearing about if for the first time right now. RJS: Yeah, so it was an interesting project. It was really a study in software controls, because that was the challenge of this project. But I didn’t take a single piece of that and use it in starting Rivian. Now, that’s different, some folks turn their PhD into the foundation of a business and start a business off the back of it, I had the benefit of just being in the automotive lab, I had the benefit of working closely with car companies. Big, large car companies were funding a lot of the work and it further solidified my view that I didn’t want to go work at one of those companies, and I thought the likelihood of me learning the necessary skills is lower working at one of those places than me learning by doing, by just going and starting a company as a 26-year-old PhD graduate. Right. So if you start out with cars as a child and you’re coming all the way up, at what point did you know that the car — you wanted to start a car company, at what point did you know it was going to be electric and not internal combustion? RJS: That was far less clear. I wanted to make cars very efficient and I wanted to design cars that would essentially help define what the future of state would look like. But when you’re 10, you have no idea what that means. When you’re 20 — and at this point, this was early 2000s, it still wasn’t that clear — and so it didn’t really become clear until I started the business. Even before starting the business, one of the concepts that was competing for what Rivian ultimately would become was this idea I had for a pedal-powered car, which at the time I was thinking could be a hybrid-electric, except the hybrid, it was human-plus-electric drive and amazingly, full circle, that happens with e-bikes. E-bike is the most popular electric— Oh right, yeah. RJS: But 25 years ago, 20-plus years ago, that wasn’t clear that e-bikes were going to be an explosive success as they’ve been. And then I created within Rivian, a skunkworks team that’s now spun out into a new company to actually focus on this pedal and pedal-hybrid electric vehicles. So we have a quadricycle we’re doing with Amazon as a first big customer, but the name of this company is Also, and the idea of this spin-out from Rivian is that if you want to electrify the world, you need to electrify vehicles, but you also need to electrify everything else, and so Also is doing everything else. So Tesla started in 2003. Was there any inspiration or connection there, or is it just incidental that it ended up being kind of around the same time? RJS: Yeah, so they started in ’03, I started in ’09. Of course I was aware of Tesla, but Tesla launched their first car, the Roadster , before I even started Rivian and so they launched the Roadster and then they were working on the Model S , which it doesn’t get talked about a lot, but there was a time when the Model S was considering using a series hybrid architecture as well. Ultimately, they went pure EV but that was in like 2008, 2009, and I started the company just as, my view is there’s going to need to be a lot of successful choices, and I’d been on that mission for a while — what I didn’t expect is just the process of raising capital is really hard. So is that actually where they did help you a lot, just because eventually once they got over the hump and it was a successful venture, did that make it easier for you in the long run? RJS: I think so, and I think Tesla was the existence proof that I’d say more than raising capital, what Tesla did is they showed that electric cars could be cool. RJS: And they did that with the Roadster. So they launched this Roadster, they took a Lotus Elise , they re-engineered it, they made it electric. It was super fast, it was really cool, this was way before anybody had thought about electric vehicles as something that could be fun or fast accelerating. Now it’s hard to believe that 20 years ago this was the case, but at the time they really took electric cars from this perspective like golf carts, to like, “Oh, this can be a highly capable performance machine”, and that just shifted mindset, and that was important. So you start Rivian in 2009, I believe the first vehicle comes out in 2021 . That’s a long time period, what was going on for those 12 years? Are these painful memories? RJS: No, no, no, no, they’re useful memories. In the beginning you have no capital, so you can’t realistically make progress on building something like a car unless you have some level of capital. So if you’re spending $1 million a year, you need to spend 5,000 times more than that to ultimately launch a car, something like that, maybe more, and so you’re not able to actually make real progress, you’re just working on demos and proof of concepts. And we didn’t even start with $1 million, we started with zero. So the first financing was I refinanced the house that I owned, which is comical when I think back now that my level of conviction and optimism. No, it’s awesome. RJS: I thought, “I’ll refinance my house, take the $100,000 to get out of it and use that to start a car company”, so that was what we did. But it’s very hard to then hire, we’re just getting a semblance of traction to have some capital, some amount of money that we could actually make real progress on a product. Right. And this was all — you still didn’t really know for sure what you were going to build, right? RJS: That’s what I was going to say, it’s actually really helpful that in those years we didn’t have capital, because we could have started building the wrong thing, so it provided me this few year period where I was learning how to run a company. I’d never run a company before, I was learning how to lead teams, I was learning how to hire, I was learning how to have hard conversations, I was learning how to raise capital, I was learning about strategy and design and brand and all these things. And so it was a really wonderful period of time for me because we were iterating so dramatically, so significantly on the strategy, the product, the type of company we’re building, the skillsets we want to accumulate and build in-house, in ways that we couldn’t today. I couldn’t walk in the door to Rivian, say, “All right, everybody, we’re going to do a completely different set of products, get ready”. You were on an e-bike back then, you could sort of go where you wanted to. The bigger you get, the more locked in you are. RJS: Yeah, the whole team could fit in one little room with one little table, investor management was very straightforward, it just gave us the freedom to be very iterative. And I look back and I’m so thankful that happened because this squiggly path led us to what Rivian ultimately became, this idea of building a really strong brand around enabling and inspiring adventure that scales across different price points and form factors. We came up with the concept for R1 as the flagship product, then we would follow that with R2 , which is now about to launch, and then R3 , which is going to launch shortly thereafter, and things took a lot longer. Once we even got all that defined, we still had to raise a lot more capital. We then raised a lot of capital and we’re on the path of execution, and there’s some big unplanned events. COVID was very, very, very challenging and maybe the worst possible time you could imagine it. Right. You’re just about to launch. RJS: Yeah, so trying to build a plant starting in 2020, which is sort of wild, and then turn on a supply chain with a bunch of suppliers that didn’t want to work with us, we had to pay extremely high prices to them just to get them to provide us components. Just a bunch of things that when you’re planning it years before, you don’t think, “Well, there’s going to be a supply chain crisis, there’s going to be a pandemic”, and there’s going to be all these externalities that make it really hard to start in that moment. You mentioned getting to this adventure brand identity, the R1T, R1S being your initial products, what was the process of honing in on that? Why did you decide that was the way to go? I mean, from the outside, you view obviously, Rivian, you’re always going to be compared to Tesla in a certain respect. They have this futuristic car looking very aerodynamic, and Rivian comes along, it’s like, “Yes, thank God, a pickup truck”, not a Cybertruck, they got an SUV. That’s my perception of the outside, but what was it like inside? RJS: Yeah, I mean, it wasn’t too different from that. We recognized that in order for us to earn the right to exist, we needed to do something that was unique and could stand on its own, and so some of the early things we thought about, we’d originally thought about doing a sports car, we realized that we were just going to be too close to what Tesla had done, and what Tesla had done well, by the way. So we went through deep soul searching to say, “What are the things we’re passionate about?”, “What are the things we want to enable?”, “What are the things that are going to matter?”, once everything’s electric, imagine every car on the road is electric, you can’t say we’re differentiated because we’re electric. “Why are you differentiated?”, “What is the reason for someone to choose to buy our products?”, and so we went through a lot of those thought processes and came out of it with this idea of preserving and inspiring the adventures that you want to have in your lifetime, the kinds of things you want to take photographs of. The reason why you want cars is you can go anywhere— RJS: Yeah, you can do all this, you can go to your grandparents’ house, you can go to the beach, you can go climbing. So that led to this really clear vision, which then led to product requirements. “Okay, if we want a car that’s going to enable and inspire adventure, what does it want to look like?”, “What are the features it needs to have?”, so storage becomes a really big consideration, being able to drive on any type of terrain becomes a big consideration, and then you say, “Okay, what’s the vehicle form factors that are going to do that?” — a re-imagination of a pickup truck and a re-imagination of a large SUV, that’s a great flagship. So it wasn’t always as direct as that single sentence, sometimes it took us a month to get there or more, but a lot of iteration, a lot of the product concepts, some of the early R1T stuff that we put together looked really futuristic and not inviting, which is the word we use all the time, like inviting you to use it. Not wanting to get dirty, or it didn’t want to get used or you don’t want to put a surfboard on the top so we became really very intentional around, “What is a Rivian?”, “What is not a Rivian?”, so we do all these exercises from a design aesthetic point of view, which of course now we know our aesthetic, but in 2015, we had no idea what a Rivian aesthetic was, so we had to define that. So we do these is/is not exercises, it’s all things that it’s amazing sitting here today to see it having played out where people actually connect and resonate with the brand that we were hoping to. It’s an incredibly strong brand, you can identify it right away. You mentioned the COVID production challenges, there’s also a bit where actually scaling up production is just really hard, even if everything is perfect. How do you distribute the blame between COVID and between the fact that actually, “This is much harder than I thought it would be”, in terms of the challenges in getting out the door? RJS: Yeah. I think we made a tactical or strategic error, which is we decided to launch three vehicles at the same time. Yep, that’s one of my questions coming up. RJS: Launching any vehicle is really hard. So just to put this into perspective, you have around 30,000 discrete components, which you purchase as a company as maybe 3,000 items and the reason there’s more discrete components is you buy something like a headlight as a single assembly, but it has many components in it. But all those tier two, tier three, tier four supply components, any one of those can stop production if they’re missing and so you still have to think about it considering the full complexity of all the parts, every single mechanical part that’s in the vehicle and so turning on a supply chain for the first time is hard for any product. For a car, it’s really hard. And for a car when the supply chain doesn’t want to work with you, meaning business is thriving, it’s very different than let’s say 2010 or ’11 or ’12 when the suppliers were all beat up from the recession and were willing to take any business. In 2020, they were busy, they didn’t want to take on this new customer Rivian with an unproven brand and unproven product. So it was very, very hard to get them to work with us and so just getting all those suppliers to ramp at the same rate on one car would’ve been tough. And the reason I say same rate, if some ramp faster than others and you have inventory issues. Right, then you have these working capital problems. RJS: You have to ramp at the same time so you can make a complete car and sell it, sounds so simple. (laughing) No, don’t worry. It does not sound simple at all. RJS: So we were doing that across three vehicles at the same time, that was already a big — the R1T, R1S launched at the same time plus a commercial van . And then on top of that, we had COVID, which made everything more challenging. Yeah, so it was maybe the most perfect of perfect storms for difficulty and so I wouldn’t use COVID as an excuse or I’m not putting blame there, I’m just saying it’s just a reality, it was just a reality. Oh, for sure. No question, I don’t think anyone is denying that. RJS: But if I were to do it over again, I would’ve launched probably the SUV first, spaced out maybe 12 months, then launched the truck, spaced out probably 12 months, launched the van, and had smoother launches that consume less capital that would allow us to get to profitability faster. But hey, you learn. And so here we are in R2- RJS: Yeah. So R2 is like, we’re launching one build combination, we’re launching a launch edition, we’re not launching R3 at the same time. You’ll laugh at this, Ben, there was a lot of debate like, “Oh man, R3 is so cool”, we had thousands of customers like, “Oh, we can’t wait to get an R3 as well, can you guys launch that quicker?”, and we’re like, “Should we try to launch R2 and R3?”, “No, no, don’t do it! Don’t do it! Hold it back!”, and so we held back. It’s like you needed to hire someone back in 2021 that says, “If we ever consider doing this again, stomp on the table and say ‘No’.” RJS: As a product person, you have all these ideas, you want to see them out in the world as fast as possible so simplicity and focus has been a major emphasis for us and so the entire business is laser locked on launching R2 and it’s a beautiful thing. We don’t have other programs that we have to manage, it’s like, “Let’s get that, that has to ramp quickly, that’s what’s going to drive us to profitability”. It’s key for cash flow, we have this enormous R&D spend we’ve created intentionally to build out all these vertically integrated technologies , whether it’s our chips, our software, our compute platforms, our high voltage architectures, that the scale that R&D necessitates the scale of more than just R1, more than just a flagship product that needs a mass market product, and that’s what R2 brings us. Got it. So you mentioned the van. Ideally from a production standpoint, you do SUV, then you do truck, then you do van. The van though came with a lot of money from Amazon , is that a critical component in why you launched it maybe sooner than you should have? RJS: No, at the time I didn’t think it was sooner. I mean, at the time I thought it was the right thing to launch them all at the same time. Amazon’s still our largest shareholder and they’ve been a great partner and they were an investor in us when we were private, as you said. But what’s so exciting about that program is it took a space that has the logistics based on last-mile e-commerce space that has such a clear value prop for electrification, meaning the vehicles start and end the day in the same spot, which is a great thing from a charging point of view. You know what they’re going to do, that you can deterministically control what they’re going to do in terms of mileage. You know your 99th percentile route in terms of number of miles, and you know your one percentile, so you can really optimize it for total cost of ownership, and so that’s what we did. So we went about and said, “Let’s make the ultimate delivery van, let’s make it the most cost-effective way to deliver”. So you talk about the complexity of doing three vehicles. Is that just in terms of getting started or is there a production capability, like you only do so many things, or is that part fine? It’s just the part of getting started? RJS: Part of the challenge is when you’re launching a manufacturing and supply chain infrastructure for the first time, in our case, we didn’t fully appreciate all the things you need to be really good at to do it and so we tried to very quickly learn how to be able to launch multiple programs at the same time, which eventually Rivian should be able to launch multiple vehicles in the same year at the same time, but we just didn’t have the maturity of process, maturity of our organization or the depth of teams to be able to support that. The issue is not things you can plan for, it’s all the little things you don’t plan for, and there’s all these little things, each of which requires problem solving. So I used to describe it in 2021 and 2022, is it’s not like there’s some giant unlock. Like, “If we just solve this, we will make more vehicles and we’ll get our cost structure in line”. There was just a stack of thousands, truly thousands of little things that needed to be adjusted or changed or negotiated and I think the thing that compounded all this that was really hard, is a lot of those issues were at our suppliers, and then those suppliers had a lot of leverage over us, because they know that in that time where they couldn’t get enough- If this doesn’t get done, you’re done. RJS: We broke up with a lot of these suppliers, but some of them would just say, “We want you to pay us twice what we previously negotiated if you want parts”, and we’d say, “No”. and they’d say, “Okay, fine, we just won’t send you the parts”, and we’d say, “Okay, how about one and a half?”. We just had no leverage. So that’s changed so dramatically and we see it with R2. R2’s the first, I’d say, clean sheet from a supply chain point. Even with the updates we made to R1, we were able to get rid of a lot of that and they call it inflation-related, COVID-related cost growth that was born out of a lack of leverage that we had, R2 is the first time we were able to really reset the negotiations. You think of it from the perspective of if you’re Volkswagen, the leverage is the other way around, which is Volkswagen has so much scale and so many diverse sets of suppliers that they could say, “Hey, if you don’t bring your costs down, we’re just going to switch to another supplier”, we didn’t have that. We couldn’t say, “Well, look, we’re going to pull this other program from you” — it was no leverage, so we sort of were complete takers in that. Yeah, that makes sense. Before we got into the AI stuff, I did want to ask about the VW partnership . This includes access to your electric vehicle software, electrical architectures, you get supply chain expertise from them. How do you characterize this deal as a whole? I should also mention a sort of massive investment on their side as well, give me the framework of that deal and why it’s important. RJS: Yeah, it’s a $5.8 billion deal, some of which is technology licensing, some of which are investments. Right, I was going to ask about that. Some of it is just actually putting money in the company, and some of it is they’re going to license your software and things like that going forward. RJS: Yeah, and a lot of those are upfront licensing fees, most of which have already been paid and before I get to the business of it, it’s important to talk about the mission of it. We’ve spent a lot of time developing what we call a zonal architecture, but essentially think of it as a number of computers consolidating into one that perform a wide array of functions across a physical zone of the vehicle and it allows us to do things like over-the-air updates very seamlessly because rather than having a bunch of smaller function or domain-based electronic control units, little mini computers run the software for different functions, we run all this software for those functions on one computer on our OS, which makes it much easier to update. And so the strategy there was, “Boy, we’ve spent a mountain of investment building this tech stack, it’d be really nice to see it applied in another way. Yup. You need to get leverage on that investment and you just don’t have the volume by yourself. RJS: And it aligns to our mission in terms of enabling more electric vehicles to get highly compelling electric vehicles on the road and then it gives us a lot of scale, scale for sourcing the components that are shared and then it gives us the benefits of other, what we think of as joint sourcing agreements, so sourcing partnerships that can exist with Volkswagen. It’s been a great relationship, those types of relationships are very, very hard to build because it does require buy-in from the top so one of the things that allowed us to work so well with Amazon, I mean, you think about Amazon and it’s one of the largest companies in the world, certainly the largest e-commerce company in the world, and imagine they go out and say, “We’re going to build our future logistics network around a van that’s being not dual sourced, but single sourced to one company” — this is in 2019 — “has never built a car before at scale, and they’re like a startup”. But that was born out of a great relationship that I had with [Former Amazon CEO] Jeff [Bezos] and Jeff’s trust in supporting us and that enabled them to really lean in with us and lean in in defining the product, defining what it was, that was a really big leap. So we’ve built, I’d say, organizationally, really great capability of taking the strengths of being a fast-moving startup and working with very large companies as partners and in the case of Volkswagen, my relationship with Oliver Blume , the CEO of the group — so Volkswagen Group is, we think of VW as a brand, but they’re a group — they’ve got Porsche, Audi, Lamborghini, SEAT, Škoda, these are brands that aren’t sold in the United States, but it’s the second-largest car company in the world, largest industrial company in Europe, a huge company. But having Oliver and I aligned just allowed us to really move through the deal mechanics and the deal structuring quite quickly. So this bit, as you sort of zoom out, the deal makes a lot of sense to me. Actually, I think it makes a lot of sense for both sides. RJS: Yeah, it’s a win-win. They get expertise that they’re not going to develop internally. I’ve had plenty of German cars, the software is okay for what it is, I don’t think it’s going to sort of go where you’re going to go. You also have on your side, you can do these huge investments like you talked about last week , and we’re about to transition into that, with the promise of scale that is much more than you can certainly deliver today. Is there a bit of you though is like, “If we had ramped up correctly, if we had not done multiple vehicles, we could actually be at scale, we could keep this all to ourselves”, or is this ultimately the best outcome that you’re sharing with them in the long run? RJS: I think in hindsight, I wish we’d ramped up more quickly, there’s things you’d change, but they’re also all things you’d learn from. We don’t spend any time lamenting them or anything like that. But to be clear, both in the case of our in house software and zonal controllers, which is what we’ve done with in Infotainment, which is what we’ve done with Volkswagen, as well as our autonomy platform and AI platforms , which is separate from the Volkswagen venture. Is that part of the deal? RJS: No, that’s not part of the deal. RJS: That’s 100% Rivian. RJS: But in both cases, we developed them thinking that we would eventually leverage this, not just with our own products, but with other companies as well. Got it. Okay. RJS: And so Volkswagen was, in many ways, the ideal first customer. And the reason I say it’s the ideal first customer, 1) it’s huge, as we’ve already described, but 2) it has the complexity of managing across many different brands, and so being able to support a company like Volkswagen Group, which spans very premium brands, like Porsche, down to one of the products that’s been announced that we’re doing together, the Volkswagen ID1 , which is a $22,000 EV, it’s the existence proof that we, Rivian, can support working across large complex organizations, across large ranges of price and product features, and across very different vehicle form factors. And if you’re another car company, you couldn’t look at Rivian and say, maybe before you could have, but now you couldn’t, say, “Well, I don’t think you could do this at this price point” — well, actually we cover every price point across the spectrum. So there’s an opportunity for other car companies to do the same thing. RJS: Absolutely, yeah. And now on the autonomy front, I think the opportunity there is actually bigger because this is a very, very hard problem to solve, it requires vertical integration in ways that are not typically — it’s just things that OEMs typically don’t do. Tell me your vertical integration story, because it is really interesting. You’re on last week, you talk about everything from your chip to your sensors to your software, you talked about building your own compiler. We are talking total front-to-back, end-to-end vertical integration. Why is that important? RJS: Yeah. It’s important to just talk about how autonomy is now being developed, and I do think for anyone listening to this, it’s very, very important to understand this because there’s perhaps some histories to how it was done before. The idea of a vehicle driving itself isn’t a new idea, that’s been something, it’s been in sci-fi movies for decades, but in terms of actual technology development, it started in, call it early 2010s, in that time range, so roughly 15, 20 years ago. The early platforms and what was done in terms of the approach up until the very early 2020s was something that was designed around a rules-based approach and so what you would have is you’d have a set of sensors, perception that would identify objects in the world, so all the things in the scene, so that’s cars, people, bikes, kids, balls bouncing on the street, everything that you can see, it would identify all those objects, it would classify the objects as to what they are, it would then associate vectors to those objects, acceleration of velocity, and it would hand all those objects and their classifications and their vector associations to a rules-based planner. The rules-based planner was a team of software developers attempt to codify what are the rules of the road. So, I’m going to oversimplify here, but think of it as a whole series of if/then statements. Totally deterministic, by the biggest spaghetti code mess you’ve ever seen because there’s so many possibile exceptions and issues. RJS: It’s a giant, giant code base that’s trying to describe how the world works. And so, it wasn’t actually AI as we think about AI today, there was machine vision. Machine learning, neural nets, yeah. RJS: Yeah, there was machine vision for the object detection classification, but in terms of the planning and the actuation of vehicle was very much a rules-based environment. Then along came the idea of neural nets, and the idea of transformers to do encoding, and that happened, of course, in the LLM world, but that’s also happening in the physical world. Everything can be a token. We think about it, everyone thinks one of the context of letters and words, but everything can be a token. RJS: Yeah, everything can be tokenized and the whole world changed in self-driving, so everything that was done prior to, call it 2020, 2021 is largely throwaway, meaning the way the systems are now developed is you build, you need to have complete vertical control, it needs to be one developer that controls all the perception, because you don’t want a pre-processed set of outputs from a camera, you want the raw signals from a camera. If you have other modalities like a radar or LiDAR, you want the raw signals from those, you want to feed it in through a transformer-based encoding process early, so fuse all that information early, and build a complex, it’s hard to imagine in our human brains, but it’s a complex multidimensional neural net that describes how the vehicle drives. Then you want to train that with lots and lots of data, and you’re training it offline. The word that gets used all the time is end-to-end, so it’s trained end-to-end from the vehicle through the human drivers back to the model and so, to do that well, you need a few ingredients, you need this vertically-controlled perception platform, you need a really robust onboard data infrastructure that can both trigger interesting data events, hold them, do something to them to make them a little easier to move off the vehicle, ideally through Wi-Fi, and a worst case through LTE, but mostly through Wi-Fi, all that data gets moved off the vehicles, and this is happening at millions and millions of miles accumulating just in the course of a day. And so all that data is moving off the vehicle, and then you’re training it on thousands and thousands of GPUs. You’re going around and around and around, and it gets better and better and better. That’s an approach that is so different, as I said from what was done before, but to do that, you need all those ingredients. Well, you need cars on the road. RJS: You need cars on the road. So, we looked at it, we launched in 2021 with our Gen 1 architecture, we almost immediately after that realized we needed a complete rethink of our self-driving approach. Right, that’s exactly what I was going to ask. Was this an issue where in some respects you launched later than you wanted to because all the supply chain issues, but then you actually launched earlier than you wanted to because you didn’t have the right sort of stuff on your cars? RJS: Well, we launched — and we didn’t realize, and this is the thing, and even some of our Gen 1 customers are not happy with this, but when we developed the Gen 1 system, this was on 2018, 2019, we didn’t know this big technical massive shift was going to happen. So, our Gen 1 architecture uses a Mobileye front-facing camera, and it uses — it’s a collection of things, it’s very classical rules-based approach, if you’re going to develop something around AI, it’s a completely different architecture, not a single shared line of code, not a single shared piece of hardware. So we started working in the beginning of 2021, right after launching on a whole new clean sheet, everything new, we didn’t try to morph anything over, it’s a complete melt and re-pour. In that new architecture, we designed cameras, we designed a new radar, we designed a new compute platform, we built, we call this our Gen 2 architecture. We built it around an Nvidia processor, we designed a data flywheel, we designed an offline training program. The vehicle launched in the middle of 2024, the features then were trained on a very small number of miles, which was our own internal fleet and now over the course of last year, we’ve built up enough data that’s allowed us this flywheel starting to spin. Yep. And that data is only coming from the Gen 2 vehicles, right? Not from the Gen 1 ones? RJS: Only Gen 2. Gen 1, it’s asymptoted, both in terms of capability and it has no value to us in terms of data, so only Gen 2. And so, in parallel to kicking off this Gen 2 platform, which we said, we need to get this in the field as fast as possible because we need to start the data flywheel, we also need to get better hardware so that when we have the model built, we can run it with a higher ceiling. That kicked off updates to the cameras that are going to our Gen 3 architecture, very importantly, an in-house silicon program. Why is that very important? RJS: Compute inference on the vehicle, we wanted to have — what would we have in our Gen 2 is around 200 TOPS [Trillions of Operations per Second], we wanted that to be closer to 200 TOPS per chip, so 400 TOPS total, sparse TOPS. Well, what’s going to be in Gen 3 will be 1,600 sparse TOPS, but importantly, we designed it specifically around a vision-based robotic platform. And so, the utilization of those TOPS is very high, much higher than what we see in other platforms that are more generalized, and then the power efficiency is very high, and then the cost is much lower. So we have a very, very high capability, low cost platform for which we can afford to put enormous compute in. All that is true, but the actual development of that is very expensive. Is this going to pay off with those lower unit costs, and that increased capability with just your vehicles, or like the VW deal, is this something that you’re going to be looking to sell broadly? RJS: Well, this is an interesting one. Even on its own within Rivian, just R1 and R2, it’ll pay off because the cost savings are so significant on the chips. But more than that, we believe we’re very, very — we’re spending billions of dollars in developing our self-driving platform, our level of conviction as this being one of the most important, I shouldn’t even say one of the most, the most important shift in transportation and transportation technology means that we want it to control the whole platform. Then once we control the whole platform, it makes it a very interesting system that can be provided to other manufacturers. And so, I think in time, the number of companies that will have all the ingredients to do what I’ve just described, they’d be very limited, I think there’ll be less than five in the West. Did you get any of this thinking from Jeff Bezos? Because there is a bit here where our cars are where we get out and develop this and prove it out, but the real payoff is to do the platform at scale across other entities. It sounds a little Amazon-like. RJS: Amazon’s our largest shareholder, and Jeff’s somebody I look to for a lot of inspiration on these kinds of things. So, certainly I think there’s some of that. We think of our vehicles as our own dog food, but we’re going to make a platform that’s so darn good that we think others will- You’ll sell a lot of vehicles. RJS: And if others aren’t buying our platform, we’ll monetize it through selling more vehicles, and we’ll grab market share. I think on both sides of that, we can win. I do think that it’s going to move far faster than anyone realizes. I think, the way I describe it is if you look at the last three or four years of development in autonomy, and you try to draw a line to represent the slope of improvement, and you look at the next three or four years, the two lines are completely unrelated. Totally agree. RJS: But the acceleration is going to be so fast. And what I’m surprised is people aren’t — I say this, I don’t think people fully realize it, but the LLM space should teach us that. Yeah. GPT-1, GPT-2, GPT-3, GPT-4. RJS: But look at the 1.0 architectures. Oh, which are rules-based. Yeah, to your real point, it’s exactly what it is. RJS: Rules-based. And look at the progress that was made on Alexa, let’s say, relative to the progress that’s happened on GPT-3, 4 now, and beyond, it’s just like they’re not even closely related. And so the same thing is happening in the physical world with cars, and if you don’t have a data flywheel approach, you’re just not in the game and there’s no way you can compete. And so, very few people have that, far fewer I think is right. A big differentiator between what you’re doing and what Tesla is doing, and we have to sort of come back to it, they shifted to the pure neural network approach, but they’re doing vision only. Do you just think that’s a fundamentally flawed decision? RJS: We have a different point of view. Right. Because you have radar and LiDAR too, is the difference there. RJS: Yeah. There’s a lot of alignment, and we both agree, and we’re both approaching it as building a neural net. So, I want to call that out that we have a very aligned view. Right. Your core philosophy is absolutely the same. And I think there’s an extent where Waymo is getting there as well. RJS: The same philosophy. And then it’s like, “How can we teach the brain as fast as possible?” is our question. They have the biggest fleet of data acquisition in the world, they have fewer cameras, that have far less dynamic range. When I say dynamic range, I mean performance on very low light conditions, and very bright light conditions. Right, yep. RJS: We have much better dynamic range that of course adds bill of material cost, but we did that intentionally. And then, we have the benefit of our whole fleet, all Gen 3 R2s, think of those as ground truth vehicles. They’ll have LiDAR and radar on them. Tesla just has a few ground truth vehicles that do have radar and LiDAR, but they’re trying to service the whole fleet. RJS: Yeah, I’m looking out the window here at El Camino and you just have to stand at the corner and see Teslas driving around and around everywhere. One will go by eventually, yeah. So that’s the question, is the benefit of putting radar and LiDAR on all your cars, is that just something you need to do now so you can just gather that much more data that much more quickly? Or is that going to be a necessary component for at scale, everyone has an autonomous vehicle and they need to have radar and LiDAR? RJS: Yeah, I think, the way I look at it is, in the absolute fullness of time, I think the sensor set will continue to evolve. But in the process of building the models and until cameras can become meaningfully better, there’s very low cost, very fast ways to supplement the cameras that solve their weaknesses. So seeing through fog we can solve with a radar, seeing through dense snow or rain we can solve with a radar, seeing extremely far distances well beyond that of a camera or human eye, we can solve that with a LiDAR, our LiDAR is 900 feet. And then the benefit of having that data set from the radar and the LiDAR is you can more quickly train the cameras. The cameras, when I say train, it doesn’t mean we’re in there writing code to do this. I think my audience broadly gets how this works, yeah. RJS: The model understands this and so you feed this in and the neural net understands because you have the benefits of these non-overlapping modalities that have different strengths and weaknesses to identify, “Is that blurry thing out there actually a car?”, “Is it a person?”, “Is it a reflection off of a building?”, and when you have the benefit of radar and the benefit of LiDAR, that blurry thing way off in the distance that the camera sees starts to become — you can ground truth that much faster. And then you teach your camera to figure out what it is. RJS: Then your cameras become better, and so that’s our thesis. And of course, that’s important that we have a thesis that’s different than Tesla, if we had an identical thesis to Tesla on perception- They just have way more cars out there. RJS: Yeah, the only way to catch up is with building a fleet of millions of vehicles, we want to catch up faster than that. So is it also sort of this advantage that — to what extent do you feel the auto industry, you start out and you’re sort of the outsider, you can’t get suppliers to help you, they’re ripping you off, all the sorts of problems you talked about. Now you’re like, I can imagine Volkswagen at a minimum is looking at you, “Please figure this out, we have a relationship, we can sort of jump on if need be” — do you get that sense more broadly from the industry? Because I don’t think anyone expects Tesla to share their technology, Google is sort of its own thing, do you have the potential to be the industry champion in some ways? RJS: We hope. I mean, I think every manufacturer has three choices, it’s pretty simple. They’re either going to develop their own autonomy platforms, they’re going to buy an autonomy platform, or they’re going to make this not a priority and they’re going to lose market share. But the last one, you have to accept that in not too much time, if you don’t prioritize this, you will lose market share. It’d be like trying to sell a house without electricity, it’s going to become so fundamental to the functioning of the vehicle. Why do you think that autonomy is so tightly tied to electrical vehicles? Because there’s no reason an ICE vehicle couldn’t be autonomous. RJS: No, no. It’s more coincidence, it’s funny. I’d say autonomy, connectivity and modern infotainment and electrification are all completely separate topics, so there’s no reason they have to converge into one thing. It’s more just coincidence that all these things happen to be occurring at the same time and the electric vehicles tend to be the more advanced vehicles because they’re on new architecture. So it’s why you start to see from other non-peer review manufacturers that their EVs tend to be the most advanced but autonomy doesn’t care if it’s an engine or if it’s an electric motor. Right. It makes sense that’s just how it happened historically. Right. It makes sense that’s just how it happened historically. I do need to ask this question, I think I know what the answer is, but people will be mad at me if I don’t ask. Why is there no CarPlay in Rivians? RJS: It is a good question, we get asked that a lot. We’re very convicted on this point. We believe that the aggregation of applications and the experience, and importantly now with AI acting as a web that’s integrating all these different applications into a singular experience where you can talk to the car and ask for things and where it has knowledge of the state of health of the vehicle, the state of charge, distance, outside temperature, everything becomes much more seamless in time if the vehicle is its own singular ecosystem versus having a window within the vehicle that’s into another ecosystem. And is that the issue, just the implementation effort on your side or that the customers are actually short-circuiting themselves? RJS: We could turn on CarPlay really quickly, but then you end up with — you either enter into the CarPlay environment and it’s like Apple’s, they get to play the role of aggregating what apps are there and how they decide what’s integrated, how it’s done, versus us, and I think where it becomes really important is when AI happens. Our view is a lot of the applications will start to go away and you’ll have your AI assistant. There may be things happening below agent to agent under the covers, but when you say, “Rivian, tell me what’s on my schedule for later today”, you don’t care that it has to go agent to agent to Google Calendar to pull that out, you just want the information, that interface becomes really important, it becomes so fundamental to the user experience and the whole user journey. So as we’ve thought about this, inserting any sort of abstraction layer or aggregation layer that’s not our own just is extremely risky and you start to build dependencies on that that are hard to reverse. Is there a bit where Tesla covered for you because they don’t have CarPlay either, but now there’s a rumor they might add it and it might make it a little more difficult to hold your convictions? RJS: Maybe. As it’s always the case on these things, I think there’s people that are really used to having CarPlay and our goal is to make it such that the car is so good that they don’t even think of that. And if they were to go back to CarPlay, they’d miss having the integrated holistic experience that we can create. It’s interesting because I thought you were just going to go more on the — you just gave this strong pitch for integration and top-to-down, side-to-side, that wasn’t the core to your answer. I think your answer made a lot of sense in the future best interface, I can see your customers getting themselves on a local maxima because that’s what they’re used to and it’s there and they’re missing how much better it can be. But I guess it goes to your point, infotainment and electrification and autonomy, those are all separate areas. RJS: So think of it like this. The challenge is CarPlay is not everything, so if you have CarPlay and the vehicle’s driving itself, in most CarPlay instances, it takes over the whole screen. RJS: There are instances where you could have a screen in a screen, but then that is very — I always joke, this is something Apple would never do. They would never have a screen in a screen on their own devices. They would say, we want to have one experience and so you have one screen that’s putting up information that’s very specific to the vehicle operation that are things that are like, “Is the door open or closed?”, and then you have another that’s mapping— It’s competing. RJS: It’s like you have two different UIs playing out and I just think it’s poor UI, it’s a poor user experience. The only reason people want that is they’ve been trained because they’re in cars that have such bad UI that the life raft to escape the horrible UI that is embedded in the car is CarPlay, and CarPlay is a really important function for that. If I’m in a non-Rivian or non-Tesla and I get in, it’s like a disaster and I’m like, “Oh thank goodness there’s CarPlay”. It has some thoughtful UI, but we have a really thoughtful UI and the few things that are missing we’ve been adding. So we brought Google Maps in, which was a big one, there’s more mapping platforms that’ll come in over time. We’ve got all the music platforms, including Apple Music, natively integrated. But soon with AI integration, I just think a lot of this fades away because you want a singular layer and that may mean we’re running ChatGPT to do some portions, we may be running Gemini to do other portions, but we get to be the arbiter of all this stuff under the surface. What are we using for onboard diagnostics? What are we using for on the edge knowledge? What are we using for cloud knowledge? All that we get to build and decide on ourselves. And I think importantly, given how fast the models are moving, we have the ability to plug or unplug different models at our discretion, we can decide what’s the best model to use. For the record, I agree with your decision. And I think if Tesla added CarPlay it would be a bad decision. And the reason is, I think unless you own one of these vehicles, I have a Tesla, I don’t have a Rivian, but the tangible difference is, and people say this, but until you experience it it’s not quite clear, it is a computer on wheels, and the way I think about it is for ICE cars that I’ve had, automatic windshield wiping is like a luxury feature or automatic lights. If you step back it’s like, “Wait, this is a software thing that we can do it once and do it generally, of course even your cheapest Tesla is going to have this feature and then you get to remove the physical control and you should never even need to interface with that”. And if your car is a computer first and foremost, you have to go in on the user interface, it’s nuts to put something else there, even if people are crapping about it in the short run. So there’s my pitch for you for that answer next time. RJS: And I also think that people that are in Teslas and Rivians that are actually driving it, the number of people that actually complain about it is very, very low. The number of people that say they’re not buying Rivian because of CarPlay is a higher number, but once you get into it, you’re like, “Oh, what was I worried about? This is really good!”, and I think the same trend exists for Tesla. Yeah. RJ, it was very good to talk to you, thanks for coming on, I’m excited to see how this develops. RJS: Yeah, this has been great. Thanks so much. I appreciate the time, Ben. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Simon Willison 1 months ago

Useful patterns for building HTML tools

I've started using the term HTML tools to refer to HTML applications that I've been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I've discovered along the way. First, some examples to show the kind of thing I'm talking about: These are some of my recent favorites. I have dozens more like this that I use on a regular basis. You can explore my collection on tools.simonwillison.net - the by month view is useful for browsing the entire collection. If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to "view source" on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool. These are the characteristics I have found to be most productive in building tools of this nature: The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository. The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly. Claude calls this "Artifacts", ChatGPT and Gemini both call it "Canvas". Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their "tools" menus. Try this prompt in Gemini or ChatGPT: Or this prompt in Claude: I always add "No React" to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT. All three tools have "share" links that provide a URL to the finished application. Examples: Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I'm working on something more complicated, like my Bluesky thread viewer tool shown above. I also frequently use asynchronous coding agents like Claude Code for web to make changes to existing tools. I shared a video about that in Building a tool to copy-paste share terminal sessions using Claude Code for web . Claude Code for web and Codex Cloud run directly against my simonw/tools repo, which means they can publish or upgrade tools via Pull Requests (here are dozens of examples ) without me needing to copy and paste anything myself. Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN. The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them "Use PDF.js" or similar they'll be able to compose a URL to a CDN that's on their allow-list. Sometimes you'll need to go and look up the URL on cdnjs or jsDelivr and paste it into the chat. CDNs like these have been around for long enough that I've grown to trust them, especially for URLs that include the package version. The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them. I don't like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They're often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled. The end-user experience often isn't great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool. They're also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I'd like to still be able to access the tools I've created in the past. Being able to easily self-host is the main reason I like insisting on "no React" and using CDNs for dependencies - the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider. My preferred provider here is GitHub Pages because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my simonw/tools repository which is configured to serve static files at tools.simonwillison.net . One of the most useful input/output mechanisms for HTML tools comes in the form of copy and paste . I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else. Copy and paste on mobile phones is fiddly, so I frequently include "Copy to clipboard" buttons that populate the clipboard with a single touch. Most operating system clipboards can carry multiple formats of the same copied data. That's why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you'll get the content with formatting stripped. These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools. The key to building interesting HTML tools is understanding what's possible. Building custom debugging tools is a great way to explore these options. clipboard-viewer is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that's available on the clipboard. This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality. More debugging examples: HTML tools may not have access to server-side databases for storage but it turns out you can store a lot of state directly in the URL. I like this for tools I may want to bookmark or share with other people. The localStorage browser API lets HTML tools store data persistently on the user's device, without exposing that data to the server. I use this for larger pieces of state that don't fit comfortably in a URL, or for secrets like API keys which I really don't want anywhere near my server - even static hosts might have server logs that are outside of my influence. CORS stands for Cross-origin resource sharing . It's a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains. APIs that provide open CORS headers are a goldmine for HTML tools. It's worth building a collection of these over time. Here are some I like: GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call. All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools. Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account. I use the secrets pattern to store API keys for these services. This sucks from a user experience perspective - telling users to go and create an API key and paste it into a tool is a lot of friction - but it does work. Some examples: You don't need to upload a file to a server in order to make use of the element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality. Some examples: An HTML tool can generate a file for download without needing help from a server. The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats. Pyodide is a distribution of Python that's compiled to WebAssembly and designed to run directly in browsers. It's an engineering marvel and one of the most underrated corners of the Python world. It also cleanly loads from a CDN, which means there's no reason not to use it in HTML tools! Even better, the Pyodide project includes micropip - a mechanism that can load extra pure-Python packages from PyPI via CORS. Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well. Squoosh.app was the first example I saw that convinced me of the power of this pattern - it makes several best-in-class image compression libraries available directly in the browser. I've used WebAssembly for a few of my own tools: The biggest advantage of having a single public collection of 100+ tools is that it's easy for my LLM assistants to recombine them in interesting ways. Sometimes I'll copy and paste a previous tool into the context, but when I'm working with a coding agent I can reference them by name - or tell the agent to search for relevant examples before it starts work. The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code. I built pypi-changelog by telling Claude Code: And then, after it had found and read the source code for zip-wheel-explorer : Here's the full transcript . See Running OCR against PDFs and images directly in your browser for another detailed example of remixing tools to create something new. I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time. For HTML tools I built by chatting with an LLM platform directly I use the "share" feature for those platforms. For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my terminal-to-html tool and share that using a Gist. In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those in my tools.simonwillison.net colophon . I've had so much fun exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I'm building them with. If you're interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copying in pages generated in whatever manner you like. Bonus transcript : Here's how I used Claude Code and shot-scraper to add the screenshots to this post. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . svg-render renders SVG code to downloadable JPEGs or PNGs pypi-changelog lets you generate (and copy to clipboard) diffs between different PyPI package releases. bluesky-thread provides a nested view of a discussion thread on Bluesky. The anatomy of an HTML tool Prototype with Artifacts or Canvas Switch to a coding agent for more complex projects Load dependencies from CDNs Host them somewhere else Take advantage of copy and paste Build debugging tools Persist state in the URL Use localStorage for secrets or larger state Collect CORS-enabled APIs LLMs can be called directly via CORS Don't be afraid of opening files You can offer downloadable files too Pyodide can run Python code in the browser WebAssembly opens more possibilities Remix your previous tools Record the prompt and transcript Go forth and build A single file: inline JavaScript and CSS in a single HTML file means the least hassle in hosting or distributing them, and crucially means you can copy and paste them out of an LLM response. Avoid React, or anything with a build step. The problem with React is that JSX requires a build step, which makes everything massively less convenient. I prompt "no react" and skip that whole rabbit hole entirely. Load dependencies from a CDN. The fewer dependencies the better, but if there's a well known library that helps solve a problem I'm happy to load it from CDNjs or jsdelivr or similar. Keep them small. A few hundred lines means the maintainability of the code doesn't matter too much: any good LLM can read them and understand what they're doing, and rewriting them from scratch with help from an LLM takes just a few minutes. ChatGPT JSON to YAML Canvas made with GPT-5.1 Thinking - here's the full ChatGPT transcript Claude JSON to YAML Artifact made with Claude Opus 4.5 - here's the full Claude transcript Gemini JSON to YAML Canvas made with Gemini 3 Pro - here's the full Gemini transcript hacker-news-thread-export lets you paste in a URL to a Hacker News thread and gives you a copyable condensed version of the entire thread, suitable for pasting into an LLM to get a useful summary. paste-rich-text lets you copy from a page and paste to get the HTML - particularly useful on mobile where view-source isn't available. alt-text-extractor lets you paste in images and then copy out their alt text. keyboard-debug shows the keys (and values) currently being held down. cors-fetch reveals if a URL can be accessed via CORS. exif displays EXIF data for a selected photo. icon-editor is a custom 24x24 icon editor I built to help hack on icons for the GitHub Universe badge . It persists your in-progress icon design in the URL so you can easily bookmark and share it. word-counter is a simple tool I built to help me write to specific word counts, for things like conference abstract submissions. It uses localStorage to save as you type, so your work isn't lost if you accidentally close the tab. render-markdown uses the same trick - I sometimes use this one to craft blog posts and I don't want to lose them. haiku is one of a number of LLM demos I've built that request an API key from the user (via the function) and then store that in . This one uses Claude Haiku to write haikus about what it can see through the user's webcam. iNaturalist for fetching sightings of animals, including URLs to photos PyPI for fetching details of Python packages GitHub because anything in a public repository in GitHub has a CORS-enabled anonymous API for fetching that content from the raw.githubusercontent.com domain, which is behind a caching CDN so you don't need to worry too much about rate limits or feel guilty about adding load to their infrastructure. Bluesky for all sorts of operations Mastodon has generous CORS policies too, as used by applications like phanpy.social species-observation-map uses iNaturalist to show a map of recent sightings of a particular species. zip-wheel-explorer fetches a file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files. github-issue-to-markdown fetches issue details and comments from the GitHub API (including expanding any permanent code links) and turns them into copyable Markdown. terminal-to-html can optionally save the user's converted terminal session to a Gist. bluesky-quote-finder displays quotes of a specified Bluesky post, which can then be sorted by likes or by time. haiku uses the Claude API to write a haiku about an image from the user's webcam. openai-audio-output generates audio speech using OpenAI's GPT-4o audio API. gemini-bbox demonstrates Gemini 2.5's ability to return complex shaped image masks for objects in images, see Image segmentation using Gemini 2.5 . ocr is the first tool I built for my collection, described in Running OCR against PDFs and images directly in your browser . It uses and to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR. social-media-cropper lets you open (or paste in) an existing image and then crop it to common dimensions needed for different social media platforms - 2:1 for Twitter and LinkedIn, 1.4:1 for Substack etc. ffmpeg-crop lets you open and preview a video file in your browser, drag a crop box within it and then copy out the command needed to produce a cropped copy on your own machine. svg-render lets the user download the PNG or JPEG rendered from an SVG. social-media-cropper does the same for cropped images. open-sauce-2025 is my alternative schedule for a conference that includes a downloadable ICS file for adding the schedule to your calendar. See Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone for more on that project. pyodide-bar-chart demonstrates running Pyodide, Pandas and matplotlib to render a bar chart directly in the browser. numpy-pyodide-lab is an experimental interactive tutorial for Numpy. apsw-query demonstrates the APSW SQLite library running in a browser, using it to show EXPLAIN QUERY plans for SQLite queries. ocr uses the pre-existing Tesseract.js WebAssembly port of the Tesseract OCR engine. sloccount is a port of David Wheeler's Perl and C SLOCCount utility to the browser, using a big ball of WebAssembly duct tape. More details here . micropython is my experiment using @micropython/micropython-webassembly-pyscript from NPM to run Python code with a smaller initial download than Pyodide.

1 views
annie's blog 1 months ago

Fish bowl

Our very brains, our human nature, our desire for comfort, our habits, our social structures, all of it, pushes us into being fish bowl swimmers. Tiny people moving in tiny circles. Staying in the circumscribed ruts of our comfort. Ignoring a whole big world of what's different and new and interesting just beyond. That's the problem: stuff out there might be new, and interesting, but it's also different. The newness — which is really not new, at all, it's just new to us, so — the differentness, of another mindset or culture, language or belief system, method or opinion or morality or lifestyle, sends our inward threat-o-meter into overdrive. We interpret new and different as scary and difficult , because in terms of our emotions and our mental somersaulting, it is. We don't know how to act. We don't know how to evaluate. We don't know what is safe. We don't know where we fit in. We don't know how our safe, comfortable fish bowl living is affected by this new, different, expanded puddle. Sameness makes us comfortable. And comfort is the height, the very pinnacle, the crowning achievement in our pursuit of happiness. What I mean is that we've mistaken comfort for happiness. All the ways we could pursue happiness, all the freedom and technology and abilities we have to pursue meaning and joy and interaction and challenge and exploration and improvement and aliveness … All of that, at our fingertips, and being comfortable tends to top the list of what we actually want, what we're willing to put effort towards. This seems pathetic. It is pathetic. But also: We're working hard all the time in ways we often don't acknowledge. We have infinite options but finite agency. We have endless information access and very little processing power. We get fucking worn out. It's a lot of work to make a string of decent choices for 10 or 12 hours at a time. It's a lot of effort, some days (most days), to do what is required of us to feel like decent human beings, and the idea of putting in more effort, expending more energy, is exhausting. So we value comfort highly. We're tired. We're exhausted by constant inputs, invisible demands, and the burden of infinite options. Of course we don't leap out of our comfort zones when the opportunity arises: we've already been out of it for so long, on high alert. Our brains are efficiency machines. By valuing comfort so highly, and by equating comfort with sameness, we have programmed our brains to ignore the unfamiliar. Ever wondered why you can feel bored when you have constant stimulation? This is why. We carefully allocate our energy to the highest priorities. Things that aren't familiar don't help. So we ignore them. Of course, we can't always ignore stuff that is different. Sometimes it is right there, glaringly obvious, annoyingly immune to our discomfort, and we are forced to see it, acknowledge it, encounter it, at least mentally. But don't worry! We have defenses! Oh baby, do we have defenses. If we can't keep these alien objects from encroaching upon our consciousness, we can, at least, quickly evaluate the threat they pose and deal with them appropriately. Threat is precisely how we see things that are different. Comfort is bolstered, even built, by the familiar. All things unfamiliar are threats to our comfort. So we're quick to see other groups, philosophies, lifestyles, belief systems, family structures, choices, etc., as weird and wrong. We want to believe they are wrong, because we want to believe that pursuing our own comfort is right. We want to believe we have our priorities in check. Our very desire for comfort creeps into our logical reasoning, so deeply does the desire go. So insidiously does it carry out its programmed mission: to keep us from being uncomfortable, our brains will subvert objectivity and keep us from seeing the fallacies in our own thinking, keep us from recognizing that we are, at heart, selfish and misguided creatures whose greatest delight is sitting around and feeling pretty good about ourselves. If needed, then, we will happily sacrifice the validity and value of every thing, person, or choice that is different from what we know and define as normal. We will, for the sake of our own rightness, define all different things as wrong. We don't even hesitate. Hesitation is a sign that you might be starting to see the truth of your own motivation. If you start hesitating before defining, before casting judgment, before categorizing and labeling, look out: your comfort is at stake. Your brain is scurrying, be sure of it, to come up with great reasons for you to resist this awful urge to be fair. Fair. Fair? Fair! Fair has no place in the pursuit of comfort. Equality is not a factor here. If we value all people equally, we must admit that our own comfort is not the highest priority. We must admit that others, too, have valid needs, valid ideas, that the fact of their differentness is not adequate reason for us to deny them the same respect and autonomy we demand for ourselves. We can't have that. That sort of thinking gets us in trouble. That sort of thinking demolishes the layer upon layer of defensive triggers and traps that we have laid, so carefully, over the entire course of our lives. We are aware, so very aware, of how it could all fall apart. We know the reasons are thin. We know, deep down, the very idea of a fish bowl is absurd. We live in an ocean, and it's big, and it's full of creatures, and we're terrified. We want to believe we can limit what is around us. We want a fish bowl so we can feel like the biggest fish in it. It is the only way we know to feel safe. But there is another way: to see, first, that the fish bowl is an illusion of our own making, with imaginary walls upheld by discriminatory defense systems. If we can begin to see that the walls are not even real, we can see a way out. Maybe we can stop putting so much work into keeping them in place. It's scary. It is being alive. The threat only exists when we think we have something of our own, something utterly more important than all else, to protect and defend. But we don't. We are swimming in this together, all of us. There is no safer ocean, only this one.

0 views
Ruslan Osipov 1 months ago

Turns out Windows has a package manager

I have a Windows 11 PC, and something that really annoyed me about Windows for decades is the inability to update all installed programs at once. It’s just oh-so-annoying to have to update a program manually, which is worse for things I don’t use often - meaning every time I open a program, I have to deal with update pop-ups. I was clearly living under a rock, because all the way in 2020 Microsoft introduced package manager which lets you install, and more importantly update packages. It’s as simple as opening a command line (ideally as administrator, so you don’t have to keep hitting yes on the permission prompt for every program), and runinng . Yup, that’s it. You’ll update the vast majority of software you have installed. Some software isn’t compatible, but when I ran the command for the first time, Windows updated a little over 20 packages, which included the apps I find myself having to update manually the most often. To avoid having to do this manually, I’ve used windows Task Scheduler to create a new weekly task which runs a file, which consists of a single line: I just had to make sure Run with the highest privileges is enabled in task settings. So long, pesky update reminders. My Windows apps will finally stay up-to-date, hopefully.

0 views
Kev Quirk 3 months ago

Ten Pointless Facts About Me

I've seen this doing the rounds on a few blogs recently, so wanted to add my own version because I'm a narcissist. 🙃 Pete Moore did his version yesterday, and David did his version all the way back in April. I actually had this in draft from around then, but never got around to finishing it (there’s always something more fun to write). Well, I don’t have anything more fun to write at the moment, so Pete’s post prompted me to get it done. So here’s Ten Pointless Facts About Me… Kinda. A pet hate of mine is having food stuck in my teeth. So I always clean them out with a toothpick every time I eat. 🤢 All 3. I mostly drink water and coffee, but do enjoy a cup of tea with breakfast at the weekend. Crocs! I love Crocs! But I don’t wear them outdoors - they’re more like comfy slippers for around the house for me. When I’m out of the house, it’s usually trainers or walking shoes. Usually the latter as I’ll take comfort over fashion any day. My personal favourites are Merrell and Columbia. Anything lemon flavoured. Usually lemon drizzle, or lemon cheesecake (not the America kind though 🇬🇧). I always have a pint of water next to the bed. So the first thing I always do is to take a drink to freshen my mouth, then go to the bathroom to get rid of the water I drank the night before. Probably 28…ish. I think late 20s is a good balance between health, disposable income, and level of responsibility. I actually don’t know. 8 maybe? I have a few winter hats, a cap, some summer hats, and my old beret from when I was in the Army. A photo of one of the watches that I’m selling . I don’t take a lot of photos really. When I do, they’re mostly of my pets, my kids, or my motorbikes. No idea. I have a pretty low bar when it comes to TV and movies. I can usually find something I enjoy in pretty much everything I watch. The worst movie I’ve watched though was Dog Man ; absolute steaming pile of dog shit (pun intended). 💩 I didn’t have any serious aspirations to be honest. I was too busy being a child to worry about adult stuff. I did want to be a doctor for a while, but then I realised that I don’t like blood, and that I’m not clever enough. And that’s it, those are the Ten Pointless Facts About Me . Maybe you found it interesting and learned something about me? If you want to take part, here’s the questions in a copy/paste format to dump into your own blog post… Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ Reply to this post by email Do you floss your teeth? Tea, coffee, or water? Footwear preference? Favourite dessert? The first thing you do when you wake up? Age you’d like to stick at? How many hats do you own? Describe the last photo you took? Worst TV show? As a child, what was your aspiration for adulthood?

0 views
Ruslan Osipov 3 months ago

Thoughts on 3D printing

A few months back my wife gifted me a 3D printer: an entry level Bambu Lab A1 Mini . It’s a really cool little machine - it’s easy to set up, and it integrates with Maker World - a vast repository of free 3D models. Now that I’ve lived with a 3D printer for nearly half a year, I’d like to share what I’ve learned. After booting up the printer, printing benchy - a little boat which tests printer calibration settings, and seeing thousands of incredible designs on Bambu Lab’s Maker World - I thought I will never have to buy anything ever again. I was wrong. While some stuff printer on a 3D printer is fantastic, it’s not always the best replacement for mass produced objects. Many of the mass produced plastic items are using injection molding - liquid plastic that gets poured into a mold - and that produces a much stronger final product. That might be different if you’re printing with tougher plastics like ABS, but you also wouldn’t be using beginner-friendly machines like the A1 Mini to do that. So yeah, you still need to buy the heavy duty plastic stuff. And even as you print things, I wouldn’t say it’s cheaper than buying things from a store. It’s probably about the same, given the occasional failed prints, costs of the 3D printer, the need for multiple filaments, and the fact that by having a 3D printer you’re more likely to print things you don’t exactly need. Oh, I’ve printed so many useless things - it’s amazing. The Elden Ring warrior jar Alexander planter. Solair of Astora figurine. A beautiful glitch art sculpture. I even got a 0.2mm nozzle (smaller than the default 0.4mm) and managed to 3D print passable wargame and D&D miniatures. Which was pretty awesome, although you have to pay for the nicest looking models, which does take away from enjoyment of making plastic miniatures appear in your house “out of nowhere”. I’m not against artists getting paid, they certainly deserve it, but printed models were comparable to an mid-range Reaper miniature if you know what I mean, which certainly isn’t terrible, but it’s harder to justify breaking even. Maybe I could get better at getting the small details printed nicely. Oh, and if you’re into wargames - this thing easily prints incredible terrain. A basic 3D printer will pay for itself once you furnish a single battlefield. Once you’re done with printing basic things, you do need to start fiddling with the settings. Defaults only take you so far, and if you want a smoother surface, smaller details, or improvement in any other quality indicator - you have to tinker with the settings and produce test prints. It’s a hobby in it’s own, and it’s fun and rewarding, but this can get in the way when you’re just trying to print something really cool. But the most incredible feeling of accomplishment came when I needed something specific around the house, and I’d be able to design it. We bought some hanging plants, and I wished I could just hang it on the picture rail of our century home. And I was able to design a hanger, and it took me 3 iterations to create an item that fits my house perfectly and that I love. My mom needed a plastic replacement part for a long discontinued juicer. I was able to design the thing (don’t worry, I covered PLA in food-safe epoxy), and the juicer will see another few decades of use. Door stops, highly specific tools, garden shenanighans - the possibilities are endless. It took me a few months to move past using others’ designs and making my own - Tinkercad has been sufficient for my use cases so far, although I’m sure I’ll outgrow it as my projects get more complicated. 3D printers aren’t quite yet the consumer product, but my A1 Mini shoed me that this future is getting closer. Some day, we might all have a tiny 3D printer in our home (or have a cheap corner 3D printing shop?), to quickly and effortlessly create many household objects. Until then, 3D printers remain a tinkerer’s tool, but a really fun one at that, and modern printers are reducing the barrier to entry, making it much easier to get into the hobby.

0 views
codedge 3 months ago

Random wallpaper with swaybg

Setting a wallpaper in Sway, with swaybg, is easy. Unfortunately there is no way of setting a random wallpaper automatically out of the box. Here is a little helper script to do that. The script is based on a post from Silvain Durand 1 with some slight modifications. I just linked the script my sway config instead of setting a background there. Sway config : The script spawns a new instance, changes the wallpaper, and kills the old instance. With this approach there is no flickering of the background when changing. An always up-to-date version can be found in my dotfiles . Original script from Silvain Durand: https://sylvaindurand.org/dynamic-wallpapers-with-sway/   ↩︎ Original script from Silvain Durand: https://sylvaindurand.org/dynamic-wallpapers-with-sway/   ↩︎

0 views

LLMs Eat Scaffolding for Breakfast

We just deleted thousands of lines of code. Again. Each time a new LLM model comes out, that’s the same story. LLMs have limitations so we build scaffolding around them. Each models introduce new capabilities so that old scaffoldings must be deleted and new ones be added. But as we move closer to super intelligence, less scaffoldings are needed. This post is about what it takes to build successfully in AI today. Every line of scaffolding is a confession: the model wasn’t good enough. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. etc, etc... millions of lines of code to add external capabilities to the model. But look at models today: GPT-5 is solving frontier mathematics, Grok-4 Fast can read 3000+ pages with its 2M context window, Claude 4.5 sonnet can ingest images or PDFs, all models have native reasoning capabilities and support structured outputs. The once essential scaffolding are now obsolete. Those tools are backed in the model capabilities. It’s nearly impossible to predict what scaffolding will become obsolete and when. What appears to be essential infrastructure and industry best practice today can transform into legacy technical debt within months. The best way to grasp how fast LLMs are eating scaffolding is to look at their system prompt (the top-level instruction that tells the AI how to behave). Looking at the prompt used in Codex, OpenAI coding agent from GPT-o3 model to GPT-5 is mind-blowing. GPT-o3 prompt: 310 lines GPT-5 prompt: 104 lines The new prompt removed 206 lines. A 66% reduction. GPT-5 needs way less handholding. The old prompt had complex instructions on how to behave as a coding agent (personality, preambles, when to plan, how to validate). The new prompt assumes GPT-5 already knows this and only specifies the Codex-specific technical requirements (sandboxing, tool usage, output formatting). The new prompt removed all the detailed guidance about autonomously resolving queries, coding guidelines, git usage. It’s also less prescriptive. Instead of “do this and this” it says “here are the tools at your disposal.” As we move closer to super intelligence, the models require more freedom and leeway (scary, lol!). Advanced models require simple instructions and tooling. Claude Code, the most sophisticated agent today, relies on a simple filesystem instead of a complex index and use bash commands (find, read, grep, glob) instead of complex tools. It moves so fast. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead. Having an edge in building AI applications require deep technical understanding, insatiable curiosity, and low ego. By the way, because everything changes, it’s good to focus on what won’t change Context window is how much text you can feed the model in a single conversation. Early model could only handle a couple of pages. Now it’s thousands of pages and it’s growing fast. Dario Amodei the founder of Anthropic expects 100M+ context windows while Sam Altman hinted at billions of context tokens . It means the LLMs can see more context so you need less scaffolding like retrieval augmented generation. November 2022 : GPT-3.5 could handle 4K context November 2023 : GPT-4 Turbo with 128K context June 2024 : Claude 3.5 Sonnet with 200K context June 2025 : Gemini 2.5 Pro with 1M context September 2025 : Grok-4 Fast with 2M context Models used to stream at 30-40 tokens per second. Today’s fastest models like Gemini 2.5 Flash and Grok-4 Fast hit 200+ tokens per second. A 5x improvement. On specialized AI chips (LPUs), providers like Cerebras push open-source models to 2,000 tokens per second. We’re approaching real-time LLM: full responses on complex task in under a second. LLMs are becoming exponentially smarter. With every new model, benchmarks get saturated. On the path to AGI, every benchmark will get saturated. Every job can be done and will be done by AI. As with humans, a key factor in intelligence is the ability to use tools to accomplish an objective. That is the current frontier: how well a model can use tools such as reading, writing, and searching to accomplish a task over a long period of time. This is important to grasp. Models will not improve their language translation skills (they are already at 100%), but they will improve how they chain translation tasks over time to accomplish a goal. For example, you can say, “Translate this blog post into every language on Earth,” and the model will work for a couple of hours on its own to make it happen. Tool use and long-horizon tasks are the new frontier. The uncomfortable truth: most engineers are maintaining infrastructure that shouldn’t exist. Models will make it obsolete and the survival of AI apps depends on how fast you can adapt to the new paradigm. That’s what startups have an edge over big companies. Bigcorp are late by at least two paradigms. Some examples of scaffolding that are on the decline: Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. This cycle accelerates with every model release. The best AI teams master have critical skills: Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. It’s not easy. Teams are fighting powerful forces: Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt” In AI the best team builds for fast obsolescence and stay at the edge. Software engineering sits on top of a complex stack. More layers, more abstractions, more frameworks. Complexity was a sophistication. A simple web form in 2024? React for UI, Redux for state, TypeScript for types, Webpack for bundling, Jest for testing, ESLint for linting, Prettier for formatting, Docker for deployment…. AI is inverting this. The best AI code is simple and close to the model. Experienced engineers look at modern AI codebases and think: “This can’t be right. Where’s the architecture? Where’s the abstraction? Where’s the framework?” The answer: The model ate it bro, get over it. The worst AI codebases are the ones that were best practices 12 months ago. As models improve, the scaffolding becomes technical debt. The sophisticated architecture becomes the liability. The framework becomes the bottleneck. LLMs eat scaffolding for breakfast and the trend is accelerating. Thanks for reading! Subscribe for free to receive new posts and support my work. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt”

0 views
ptrchm 3 months ago

Event-driven Modular Monolith

The main Rails app I currently work on has just turned eight. It’s not a huge app. It doesn’t deal with web-scale traffic or large volumes of data. Only six people working on it now. But eight years of pushing new code adds up. This is a quick overview of some of the strategies we use to keep the codebase maintainable. After the first few years, our codebase suffered from typical ailments: tight coupling between domains, complex database queries spread across various parts of the app, overgrown models, a maze of side effects triggered by ActiveRecord callbacks , endlessly chained associations (e.g. ) – with an all-encompassing model sitting on top of the pile. Modular Monolith Pub/Sub (Events) Patterns Service Objects Repositories for Database Queries Slim and Dumb Models Bonus: A Separate Frontend App How Do I Start?

0 views
Luke Hsiao 4 months ago

Berkeley Mono Variable (TX-02) in Ghostty

Inspired by Michael Bommarito’s post , I’m just dropping some quick notes on getting Berkeley Mono Variable (TX-02) to work in Ghostty. Specifically, Berkeley Mono , released on 2024-12-31, running in Ghostty . Using the variable version of the font is highly convenient: it is very fast to change styles and tweak things until it is exactly how you like it, without having to iterate with installing static fonts. I suggest you read his post first for lots of nice context on fonts and their features. Then, the key bit of information I needed to get this working was the following. TX-02, the updated version of Berkeley Mono, has different OpenType features than the original. There is no documentation I could find on exactly what they mean, but via some trial an error, I’ve landed on the following config. Specifically, I found that and appear to change what the stylistic sets do (to different things than my comments). , , and don’t do anything that I noticed. So, effectively, it seems to me that the only two features you actually care about are your settings, and then whether you want ligatures with , and what style of / you want via stylistic sets. Another tip: if you have static versions of Berkeley Mono installed, I noticed that that sometimes breaks Ghostty from loading Berkeley Mono Variable. I’m unsure why, but I was able to resolve it by removing the static fonts, configuring things, and then putting them back.

0 views
Uros Popovic 4 months ago

RTL generation for custom CPU Mrav

Overview of how SystemVerilog RTL code is generated in the build flow for the Mrav custom CPU core.

0 views
Maurycy 5 months ago

Optimized cyanotypes:

As far as I’m aware, this is the most sensitive cyanotype formula on the internet, and is just about usable for in-camera photography (ISO 0.0001): The sensitizer solution must be protected from blue and UV light. The developer very slightly light sensitive, but realistically, it should be fine. The paper should be protected from stay light during the process. The developer solution can be reused multiple times: apply it liberally and collect the excess. My version is around 5 times as sensitive, and has well preserved highlights, allowing it to achieve compatible results in 1/20th the time of the classic formula: enough to turn what would be a 3 hour exposure into a 10 minute exposure. Using sunlight, a good exposure is between 100 kilolux seconds and 1000 kilolux seconds, and the effective ISO is around 0.0001. (The original method has an ISO of around 0.000005) It doesn’t get as dark as the classic formula , maxing out at the dark blue as shown in the image. This can actually an advantage for photography because it keeps the contrast manageable: The original formula tends to have very dark shadows, bright highlights and little in the way of midtones. The standard iron-ferricyanide/cyanotype formula has a number of problems: Because the pigment is formed during the exposure, it blocks light and slows down the reaction. The result is that it needs an exposure that’s much longer then it needs to be. A lot of pigment gets lost during washing. Even though they are insoluble, small particles can get suspended in water and carried away — resulting in missing highlights at best and the entire image disappearing at worst. Alkaline buffered paper just doesn’t work. The base effect the photochemistry itself, leading to a blotchy appearance and also bleaches the pigment over time. The final problem is that citrate really isn’t a good electron donor for photo-reduction. Of all the carboxillic acids, iron (III) oxalate is best at responding to light. The reaction is also pH sensitive, and works best in an acidic environment, something that isn’t present in the classic formula. [1] can be fixed by using a two step process, where the iron (III) salt is applied to paper, exposed and only then treated with ferricyanide. For [4], ferric ammonium oxalate is available, but it’s easier to just add oxalic acid to ferric ammonium citrate. The excess acid also takes care of the pH issue. As a bonus, the oxalic acid also takes care of [2] because it results in larger pigment crystals and [3] because it neutralizes any buffers that may be present. Iron (III) oxalate based formulas tend to leave a yellow stain composed of Iron (II) oxalate on the paper, which can be dissolved in citric acid. Doing this during development also allows the otherwise trapped iron to contribute to image formation. Slowest to fastest: I did not test Mike Ware’s “New Cyanotype”, because I don’t have ferric ammonium oxalate, and don’t want to play with dichromate. This test puts it between classic and two step. This is similar to Herschel’s original, but with a different ratio of citrate to ferricyanide. Probobly the most common contemporary mixture. Note: A concentrated solution should be prepared, which will form crystals of Ferric potassium oxalate. These need to be discarded, and then the remaining liquid is diluted before using. The sensitized paper is blue due to the lack of the intense yellow of ferricyanide and the presence of trace Prussian blue. Similar mixtures are commonly used in commercial blue printing. The main product is the reduced form, Prussian white, so the print must be oxidized with hydrogen peroxide before viewing. The main product is the reduced form, Prussian white, so the print must be oxidized with hydrogen peroxide before viewing. This formula produces a slightly fogged result. No frills, (and least sensitive) two-step process. Popularized by hands-on-pictures.com Acidified two-step process: more sensitive then the standard two-step. This is a usable alternative if you don’t have oxalic acid. Two step acidified with oxalic acid, which is quite strong, and the resulting oxalate ion is better then citrate at photoreduction. Current record holder in my testing. Spread the sensitizer on the paper. It doesn’t take much, just slightly wet the surface. I find spreading with a glass rod works better then brushing it on. Let the paper dry in a dark area. Expose the paper. Apply the developer solution. No finesse required: just pour it on. Wash the print with water for a minute or so to remove the unreacted chemicals. Even an invisible amount of residue can fog the image. The reaction is self limiting. Pigment washout. Limited paper compatibility. Classic [18% of max @ 25s in sun] Mike Ware’s “New Cyanotype” 2-Step classic “Cyanotype Rex” 2-Step: Ferric ammonium citrate + citric acid Blue sheet: Classic with ferr o cyanide 2-Step blue sheet: Ferr o cyanide developer. 2-Step: Ferric ammonium citrate + oxalic acid [18% of max @ 1s in sun]

0 views
underlap 5 months ago

Arch linux take two

After a SSD failure [1] , I have the pleasure of installing arch linux for the second time. [2] Last time was over two years ago (in other words I remember almost nothing of what was involved) and since then I’ve been enjoying frequent rolling upgrades (only a couple of which wouldn’t boot and needed repairing). While waiting for the new SSD to be delivered, I burned a USB stick with the latest arch iso in readiness. I followed the instructions to check the ISO signature using gpg: So this looks plausible, but to be on the safe side, I also checked that the sha256 sum of the ISO matched that on the arch website. My previous arch installation ran out of space in the boot partition, so I ended up fiddling with the configuration to avoid keeping a backup copy of the kernel. This time, I have double the size of SSD, so I could (at least) double the size of the boot partition. But what is a reasonable default size for the boot partition? According to the installation guide , a boot partition isn’t necessary. In fact, I only really need a root ( ) partition since my machine has a BIOS (rather than UEFI). Since there seem to be no particular downsides to using a single partition, I’ll probably go with that. Then I don’t need to choose the size of a boot partition. The partitioning guide states: If you are installing on older hardware, especially on old laptops, consider choosing MBR because its BIOS might not support GPT If you are partitioning a disk that is larger than 2 TiB (≈2.2 TB), you need to use GPT. My system BIOS was dated 2011 [3] and the new SSD has 2 TB capacity, so I decided to use BIOS/MBR layout, especially since this worked fine last time. Here are the steps I took after installing the new SSD. Boot from the USB stick containing the arch ISO. Check ethernet is connected using ping. It was already up to date. Launch and set the various options: I then chose the Install option. It complained that there was no boot partition, so I went back and added a 2 GB fat32 boot partition. Chose the install option again. The installation began by formatting and partitioning the SSD. Twelve minutes later, I took the option to reboot the system after installation completed. After Linux booted (with the slow-painting grub menu, which I’ll need to switch to text), I was presented with a graphical login for i3. After I logged in, it offered to create an i3 config for me, which I accepted. Reconfigured i3 based on the contents of my dotfiles git repository. Installed my cloud provider CLI in order to access restic/rclone backups from the previous arch installation. At this point I feel I have a usable arch installation and it’s simply a matter of setting up the tools I need and restoring data from backups. I wanted to start dropbox automatically on startup and Dropbox as a systemd service was just the ticket. The failed SSD had an endurance of 180 TBW and lasted 5 years. The new SSD has an endurance of 720 TBW, so I hope it would last longer, although 20 years (5*720/180) seems unlikely. ↩︎ I was being ironic: it was quite painful the first time around. But this time I know how great arch is, so I’ll be more patient installing it. Also, I have a backup and a git repo containing my dot files, so I won’t be starting from scratch. ↩︎ There was a BIOS update available to fix an Intel advisory about a side-channel attack. However, I couldn’t confirm that my specific hardware was compatible with the update, so it seemed too risky to apply the update. Also, browsers now mitigate the side-channel attack. In addition, creating a bootable DOS USB drive seems to involve either downloading an untrusted DOS ISO or attempting to create a bootable Windows drive (for Windows 10 or 11 which may require a license key), neither of which I relish. ↩︎

0 views
Jason Fried 5 months ago

A fly and luck

There was a tiny fly right by the drain, and I was about to wash my hands. Turning on the water would have sent it right down the hole. A quick end, or an eventual struggled drowning, hard to know. But that would be that, there was no getting out. Somehow, for a moment, I slipped into contemplation. I could just turn on the water, I could rescue it, I could use a different sink

0 views
Blargh 6 months ago

Software defined KISS modem

I’ve kept working on my SDR framework in Rust called RustRadio , that I’ve blogged about twice before . I’ve been adding a little bit here, a little bit there, with one of my goals being to control a whole AX.25 stack. As seen in the diagram in this post , we need: Applications talk in terms of streams. AX.25 implementation turns that into individual data frames. The most common protocol for sending and receiving frames is KISS . I’ve not been happy with the existing KISS modems for a few reasons. The main one is that they just convert between packets and audio . I don’t want audio, I want I/Q signals suitable for SDRs. On the transmit side it’s less of a problem for regular 1200bps AX.25, since either the radio will turn audio into a FM-modulated signal, or if using an SDR it’s trivial to add the audio-to-I/Q step. On transmit you do have to trigger PTT, though. You can do VOX, but it’s not optimal. But on the receive side it’s a completely different matter. Once it’s audio, the information about the RF signal strength is gone. It makes it impossible to work on more advanced reception strategies such as whole packet clock recovery , or soft decoding . Soft decoding would allow things like “CRC doesn’t match, but this one bit had a very low RF signal strength, so if flipping that bit fixes the CRC, then that’s probably correct. Once you have a pluggable KISS modem you can also innovate on making the modem better. A simple example is to just run the same modem in multiple copies , thereby increasing the bandwidth (both in the Hz sense and the bps sense). Since SDRs are not bound to audio as a communication medium, they can also be changed to use more efficient modulations. Wouldn’t it be cool to build a QAM modulation scheme, with LDPC and “real” soft decoding? Yes, an SDR based modem does have two main challenges: For the duplex problem, the cheap and simple solution is to use frequencies on different bands, and put a band pass filter on the receive port, thus blocking the transmitted power. SDR outputs are not clean, so you’ll need a filter on the transmit path too anyway. In other words, you can just use a diplexer . It gets harder if RX and TX need to be on the same band, or worse, the same exact frequency. Repeaters tend to use cavity filters . But that’s a bit bulky for my use cases. And in any case don’t work if the frequency is exactly the same. More likely a better use case here is to use half duplex, with a relay switching from RX to TX and back. But you need to synchronize it so that there’s no race condition that accidentally plows 10W into your receive port, even for a split second. But that’s a problem for the future. For now I’m just using two antennas. I’ve implemented it. It works. It’s less that 250 lines of Rust, and the actual transmitter and receiver is really easy to follow. Well… to me at least. In order to not introduce too many things at a time, here’s how to use the regular Linux kernel stack with my new bell202 modem. Bell202 is the standard and most used amateur radio data mode. Often just referred to as “1200bps packet”. Build and start the modem: Create Linux AX.25 config: Attach the kernel to the modem: Now use it as normal: Applications, client and server — I’ve made those . AX.25 connected mode stack (OSI layer 4, basically) — The kernel’s sucks, so I made that too . A modem (OSI layer 1-2), turning digital packets into analog radio — The topic of this post. Power. SDRs don’t transmit at high power, so you need to get it through a power amplifier. Duplex. Most TX-capable SDRs have two antenna ports. One for TX, one for RX. You’ll need to have two antennas, or figure out a safe way to transmit on the same antenna without destroying the RX port.

0 views
Cassidy Williams 7 months ago

Generating open graph images in Astro

Something that always bugged me about this blog is that the open graph/social sharing images used this for every single post: I had made myself a blank SVG template (of just the rainbow-colored pattern) for each post literally years ago, but didn’t want to manually create an image per blog post. There are different solutions out there for this, like the Satori library, or using a service like Cloudinary , but they didn’t fit exactly how I wanted to build the images, and I clearly have a problem with control. So, I built myself my own solution! Last year, I made a small demo for Cosynd with Puppeteer that screenshotted websites and put it into a PDF for our website copyright offering, aptly named screenshot-demo . I liked how simple that script was, and thought I could follow a similar strategy for generating images. My idea was to: And then from there, I’d do this for every blog title I’ve written. Seemed simple enough? Reader, it was not. BUT it worked out in the end! Initially, I set up a fairly simple Astro page with HTML and CSS: With this, I was able to work out what size and positioning I wanted my text to be, and how I wanted it to adjust based on the length of the blog post title (both in spacing and in size). I used some dummy strings to do this pretty manually (like how I wanted it to change ever so slightly for titles that were 4 lines tall, etc.). Amusing note, this kind of particular design work is really fun for me, and basically impossible for AI tools to get right. They do not have my eyes nor my opinions! I liked feeling artistic as I scooted each individual pixel around (for probably too much time) and made it feel “perfect” to me (and moved things in a way that probably 0 other people will ever notice). Once I was happy with the dummy design I had going, I added a function to generate an HTML page for every post, so that Puppeteer could make a screenshot for each of them. With the previous strategy, everything worked well. But, my build times were somewhat long, because altogether the build was generating an HTML page per post (for people to read), a second HTML page per post (to be screenshotted), and then a screenshot image from that second HTML page. It was a bit too much. So, before I get into the Puppeteer script part with you, I’ll skip to the part where I changed up my strategy (as the kids say) to use a single page template that accepted the blog post title as a query parameter. The Astro page I showed you before is almost exactly the same, except: The new script on the page looked like this, which I put on the bottom of the page in a tag so it would run client-side: (That function is an interesting trick I learned a while back where tags treat content as plaintext to avoid accidental or dangerous script execution, and their gives you decoded text without any HTML tags. I had some blog post titles that had quotes and other special characters in them, and this small function fixed them from breaking in the rendered image!) Now, if you wanted to see a blog post image pre-screenshot, you can go to the open graph route here on my website and see the rendered card! In my folder, I have a script that looks mostly like this: This takes the template ( ), launches a browser, navigates to the template page, loops through each post, sizes it to the standard Open Graph size (1200x630px), and saves the screenshot to my designated output folder. From here, I added the script to my : I can now run to render the images, or have them render right after ! This is a GitHub Gist of the actual full code for both the script and the template! There was a lot of trial and error with this method, but I’m happy with it. I learned a bunch, and I can finally share my own blog posts without thinking, “gosh, I should eventually make those open graph images” (which I did literally every time I shared a post). If you need more resources on this strategy in general: I hope this is helpful for ya!

0 views
JSLegendDev 8 months ago

How to Build a Sonic Themed Infinite Runner Game in TypeScript With KAPLAY - Part 2/2

In the previous part of the tutorial, we finished implementing Sonic’s movement and jumping logic. We also implemented platforms and background infinite scrolling. In this part, we will finish what remains to be implemented. Implementing Rings for Sonic to Collect Implementing a Scoring System Adding Enemies Implementing Collision Logic With Enemies Finishing The Scoring UI Implementing The Game Over Scene In the file, add the following code : Import the needed assets in . The function creates a ring game object. The component adds an method to that game object (used later in the tutorial) to destroy it once it leaves the screen. “ring” is a tag used to identify the game object in collision logic which we will later cover. Multiple game objects can have the same tag. In , add the following logic in the game scene to spawn rings. We create a recursive function called . When called, it first creates a ring by calling . In KAPLAY, you can set an loop specific to a game object that will be destroyed if the game object is destroyed. In that update loop, we make the ring move to the left at the same rate as the game’s speed. This will give the illusion that Sonic is approaching the ring while in reality, it’s the contrary. We use the method to destroy the ring when it exits the screen to the left. Using KAPLAY’s function we’re able to get a random number between 0.5 and 3 representing the time to wait before spawning another ring. KAPLAY’s function is used to only call the function once the wait time is elapsed. Implementing a Scoring System Now that the rings are spawned we need to write the logic for Sonic to collect them. Which implies needing to keep track of the score. In , under our game scene, add the following code : In addition to creating variables related to the score, we created a game object acting as our score UI. Using the component, we’re able to display text on the screen. The second param of that component is used to set the font and sizing needed. Finally, we use the component to make sure the score UI is always displayed on top of other game objects by setting it’s layer to 2. You should have the following result. Now, let’s update the score every time Sonic collides with a ring. Add the following code in our game scene : We used Sonic’s built-in method which takes as the first param the tag of a game object you want to check collisions with. The second param is a function that will run in case a collision does occur. Here, we play the “ring” sound and then destroy the ring game object Sonic collided with. Finally, we increment the score and change the score UI’s text to reflect the new score. If you run the game now, you should see the score updating every time Sonic collides with a ring. Adding Enemies The code needed for adding enemies to our game is going to be very similar to the one for adding rings. The only difference is that, contrary to rings, if Sonic touches an enemy, it’s game over. However, if Sonic jumps on that enemy, the enemy gets destroyed. In , add the following code : Here, we defined a function for creating our enemy, the “Motobug”. We used components that should now be familiar to you. However, you might have noticed that we pass an object to the area component. This is something you can do to define a custom hitbox shape. Here, we’re setting the shape of the hitbox to be a rectangle using KAPLAY’s Rect constructor. It allows you to set the hitbox’s origin relative to the game object. If you pass k.vec2(0,0), the origin will be the same as the game object’s. The second and third param of the constructor are used to set the width and the height of the hitbox. Once we will add enemies to the game, you’ll be able to use the debug mode to view how our hitbox configuration for Motobug is rendered. Add the following code to : The logic for spawning “Motobugs” is mostly the same compared to the one for “rings”. However, the “Motobug”s update loop is slightly different. When the game’s speed is inferior to 3000 we make the “Motobug” move faster than the scrolling of the platforms so that it appears as moving on the platforms towards Sonic. Otherwise, it would look like Sonic is the one moving towards stationary “Motobugs”. However, when the game’s speed gets really fast, it isn’t possible to really tell the difference. In that case, we simply make the “Motobug” move at the same rate as the scrolling platforms. At this point, you should see enemies spawn in your game. Implementing Collision Logic With Enemies At the moment, if Sonic collides with an enemy, nothing happens. Likewise, if he jumps on one. Let’s add the following code in : If you run the game now, you should be able to jump on enemies and if Sonic hits an enemy while grounded, you will be transitioned over to an empty game over screen. You’ll notice that we added logic to multiply the player’s score if they jump on multiple enemies before hitting the ground. We’re also registering the current player score in local storage so we can display it later in the game over scene. Since our game is very fast paced, it’s hard for players to keep track of how many rings they’re collecting. They would have to look up to the top left of the screen while risking not seeing an enemy in time to avoid it/jump on it. To mitigate this and to give the player a better sense of what they’re doing, I opted to display the number of rings collected after every collision with a ring or a jump on an enemy. This will also make combos easier to understand. Add the following code in to implement this feature : Now, if you run the game, you should see a +1 appear every time Sonic collides with a ring and a +10, x2, x3, etc… when he jumps on one or many “Motobugs”. An important concept present in the code above, is that game objects can have child game objects assigned to them in KAPLAY. This is what we do here : Instead of calling the function to create a game object, we can call the method to create a child game object of an existing game object. Here, we create the as a child of Sonic so that its position is relative to him. Finally, for our game over screen, let’s display the player current VS best score and allow them to try the game again if they wish to. In the game over scene code in , add the following : While it should be relatively easy to figure out what the code above does, I’d like to explain what we do here : Using KAPLAY’s function we’re able to get the data we previously set in local storage. However, when the player plays the game for the first time, they will not have a best score. That’s why we set to be 0 if returns null which is possible. We do the same with currentScore. Now, if you run the project, you should have the following game over screen appear after getting hit by an enemy. After, 1 sec you should be able to press the “Jump” button (in our case click or press the space key) to play the game again. Deployment Assuming you want to be able to publish the game in web portals like itch.io, you can make a build by creating a vite.config.ts file at the root of your project’s folder and specifying the base as . Now, run the command and you should see a folder appear in your project files. Make sure your game still works by testing the build using . Finally, once ready to publish, zip your folder and upload it to itch.io or to other web game platforms of your liking. Hope you enjoyed learning how to make games in TypeScript with KAPLAY. If you’re interested in seeing more web developement and game development tutorials from me. I recommend subscribing to not miss out on future releases. Subscribe now If you’re up for it, you can check out my beginner React.js tutorial. In the previous part of the tutorial, we finished implementing Sonic’s movement and jumping logic. We also implemented platforms and background infinite scrolling. In this part, we will finish what remains to be implemented. Table of Contents Implementing Rings for Sonic to Collect Implementing a Scoring System Adding Enemies Implementing Collision Logic With Enemies Finishing The Scoring UI Implementing The Game Over Scene

0 views