Latest Posts (20 found)
iDiallo Today

What Do You Charge For?

I've written about my journey to learn how to charge a fair price for building a website before. But even after landing on a strategy, there is still a question that remains unanswered. What should I charge for? Are you charging the price for the product itself? As in, the very cost for building a website? Or are you charging enough to make a living? This question applies to any field, whether you are a consultant, a mechanic, or a private chauffeur. I once worked with a company that built websites for non-profits. Their price tag? $35,000 for a standard WordPress site. Lucky for me, I got a first-hand view of their price breakdown because they were trying to expand their reach to smaller customers. They needed to figure out how to lower the price, so they invited me to the meeting. Every single person in the room was involved in building the website. The standard time frame to complete it was 6 weeks. So, the manager named each person, their title and how much time they spend on the project. There were the designers, the copy writers, the consultants that gathered the information. There were the sales people who started the process, the 2 developers that included me. Everyone at the table was indispensable. Then he gave a ball park estimate of salaries using glassdoor standards, and the price jumped to 35k. It was completely fair. "What if we have Ibrahim as the sole developer on this tier?" the director asked. "And we use only one designer, and we can reuse copy." The manager crunched the numbers and we were still going to charge 25k. "What if I don't get involved at all in this tier?" the manager removed the director's name from the list. He contributed only a couple of hours of work, yet the number went down to 22k. I originally thought $35k was an astronomical amount for a website, but their breakdown showed it didn't even include profit. The salary costs alone ate up the budget. The actual profit for the company came later, from managing the marketing campaign. This is Cost-Plus pricing. You add up what it costs to make the thing, and that becomes the price. It feels logical, but it relies entirely on your costs, not the value you provide. But then, there is another way. The market-based pricing. Take a car, for example. A vehicle costs $35k because that is what the market is willing to pay for that specific make and model. The materials and labor to build the car might be significantly cheaper, or on occasion even more expensive (Rivian) than the sticker price. The price is dictated by the buyer's perceived value, not just the manufacturer's receipt. This method became clearer to me after I started consulting. When I would get a new client, I initially tried to price based on the old model of calculating what I thought my time was worth from a salaried perspective. I later found that the recruiting company I worked with was charging clients $78 per hour for my services, while paying me $40. The market (or the recruiter's markup) was valuing my time at nearly double what I was charging myself. You know the mechanic is gonna charge you extra for that flat Then, there is the wild card method. I've been the unlucky guy who finds himself out of town with a flat tire. I stop at the first tire shop I can find, and the worker doesn't size up my car; he sizes me up. He decides how much to charge based on how desperate I look. In those misadventures, the price has ranged anywhere from $20 to $150. I'm usually in no position to argue when I'm stranded on the side of the road. But how do they decide on those numbers? Are they making a profit? Or are they just charging whatever they think fills their quota for the day? This is opportunistic pricing, highly effective for a quick buck, but I don't think you can build trust like that. All these methods for charging have their pros and cons. My goal isn't to tell you which number to pick, but to encourage you to decide how you pick that number. My advice, in the simplest terms, is this: Be consistent. Once you choose a method, it becomes your standard. Do not deviate. If you charge based on value today, but switch to charging based on your mood tomorrow, your clients will never trust your pricing. They will always wonder if they are getting the "real" price or just the price you felt like charging that morning. They will start looking for other consultants. Pick the method that works for you, stick to it, and let your clients know exactly where they stand. Personally, I apply a value based pricing with my clients, where the cost is tied to their specific needs and the time required to meet them. It's a method that requires trust and communication, but it can be the most fair and profitable for both parties when applied consistently. When they end up with an obscene bill , at the very least they are prepared.

0 views

New Thinkpad Means Back to Mac OS

On Wednesday I picked up a new (to me) Thinkpad P14s Gen 4. I was excited to finally get off my System76 Pang12, a computer that works, but has a long list of hardware and reliability issues. Thinkpad in hand, I installed Ubuntu 25.10 and immediately put it to work with a night of trimming down my client request backlog. The computer was incredible! Amazing keyboard, vastly better trackpad, perfect 14” form factor and everything worked out of the box on Ubuntu. Heck, it even had a usable webcam! But like a majority of things in my life, something always goes wrong. I knew it was too perfect, and wondered what I was going to find that ruined the joy. How about complete system crashes when you plug/unplug the system? Yep, that’ll do it. I spent all of yesterday and this morning debugging. Multiple distress, a long list of kernel params, different chargers and tweaking bios settings. Nada. About 50% of the time when you unplug, Gnome will slowly start to lock up, then the system restarts. Looking at logs it’s caused by a . At first I thought it might be related to the WiFi chips (based on pre-crash logs). Disabled via bios and still crashes. I’ve tested RAM, SSD and battery, all good. I have a new battery coming Monday just in case, but fully expect it won’t help. I’m out $500 USD, and honestly, I’m done with Linux for now. I love Gnome and Fedora+Ubuntu, but it’ll be a few years before I buy a new laptop after throwing away money on the Thinkpad (and the Pang12 2 years ago). Back to Mac OS Tahoe it is. Liquid ass and all. I’m hopeful that the Thinkpad problems are just on Linux. My wife has been wanting a laptop and she’s not ready to jump off Windows making it the perfect computer for her.

0 views

s/sed/ed

Read on the website: ed is a stupid simple text editor. sed is a nice streaming text processing tool. Why would one even want to use ed for anything, let alone for text processing if there's sed?

0 views

2026.17: He Came, He Saw, He Cooked

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on Mythos, Muse, and the Opportunity Cost of Compute . The End of the Tim Cook Era. My son, who is old enough to be on a multi-day school trip to Washington D.C., messaged me in shock that Tim Cook would be stepping down as CEO of Apple this September: that, more than anything, made me realize just how long we have been in the Tim Cook era. He was Apple’s CEO longer than my son has been alive, and a year longer than Steve Jobs. That, needless to say, is worth reflection. — Ben Thompson On Stratechery, I wrote about Cook’s Impeccable Timing and, in an Update , why John Ternus makes sense as the next CEO On Sharp Text, Andrew wrote a fantastic reflection on how Cook’s competence was both correct and boring, and representative of the overall maturation of the tech industry. On Dithering, John and I published our instant reactions on Tuesday , and additional reflections on Friday . Can Cursor and SpaceX Join the Model Wars?  When I first heard the news that SpaceX was partnering with Cursor (with an option to buy Cursor outright for $60 billion), my first reaction was to throw up my hands at the logic and broader plan. Forget it Jake, it’s Elontown , etc. That noted, I loved it when Ben’s Daily Update on Wednesday explained why, in theory, there is an obvious synergy between Cursor and SpaceX . Furthermore, I’m reminded that more AI competition would be a good thing, and for that reason alone I’m rooting for a deal like this to work. We went deeper on the topic during the second segment of Friday’s Sharp Tech, including bear and bull cases, and an attempt to nail down SpaceX’s core business as the company prepares to IPO and seeks a $1.75 trillion valuation.  — Andrew Sharp The Various Fronts of Cold War 2.0.  Most of our shows cover lots of ground, but this week’s episode of Sharp China was especially dense with updates and takes . The big news is that Xi is now publicly calling for the re-opening of the Strait of Hormuz, while several reports indicate China may be providing weapons to the IRGC in the interim. Elsewhere, Beijing passed new laws to crack down on decoupling (Bill says these laws have interested parties “freaked out”), while the U.S. is considering legislation that would close global loopholes on the sale of advanced semiconductor manufacturing equipment to China. My favorite part, though, was a segment on a cake controversy, a physical altercation between Pinduoduo staff and Shanghai regulators, and Xinhua reporting that provides a fascinating look at how the Chinese economy works in 2026.  — AS TSMC Earnings, New N3 Fabs, The Nvidia Ramp — TSMC’s earnings suggest that the company’s leadership is not truly bought into the AI growth story. Tim Cook’s Impeccable Timing — Tim Cook had an extraordinary run — and impeccable timing, both in terms of when he became CEO, and when he is stepping down. John Ternus and Apple’s Hardware-Defined Future, SpaceXAI and Cursor — The elevation of John Ternus suggests that Apple’s future is about hardware differentiation; then, the SpaceX-Cursor deal makes a lot of sense. An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment — An interview with Google Cloud CEO Thomas Kurian about Google’s cloud priorities, enterprise agent platform, and Google’s integration advantage. Tim Cook Personified Big Tech’s Maturity — For better and worse, Tim Cook’s Apple epitomized an era in which big tech companies grew up, took fewer risks, and took over the world. Tim Cook Steps Down How Tim Cook Changed Apple Itanium: Intel’s Great Successor South Korea Defied the Gods to Build its Steel Colossus Xi Wants the Strait of Hormuz Re-Opened; Cakes and An E-Commerce Crackdown; The Next Stage of Decoupling; The MATCH Act in Congress Play-In Chaos and Knueppel Slippage, Anyone But the Thunder, Title Picks and Awards Resolution Panic Rankings: Pistons Picking Up the Pieces, Rockets on the Ropes, Blazers Pinching Pennies, and More from the NBA Playoffs Tim Cook’s Exit and What Comes Next, A SpaceX Deal with Cursor, Q&A on Vibe Coding, TSMC, WhatsApp

0 views

Premium: How OpenAI Kills Oracle

Soundtrack — Brass Against — Karma Police   It was January 21, 2025. Per The Information , Larry Ellison, CEO of Oracle, had just flown to Washington DC from Florida, and had to borrow a coat “...so he wouldn’t freeze during an interview he did on the White House lawn, according to two people who were involved in the event.” He was there to announce a very big — some might even say huge — new project standing next to SoftBank CEO Masayoshi Son and OpenAI CEO Sam Altman. “Together, these world-leading technology giants are announcing the formation of Stargate, so put that name down in your books, because I think you’re gonna hear a lot about it in the future. A new American company that will invest $500 billion at least in AI infrastructure in the United States and very, very quickly, moving very rapidly, creating over 100,000 American jobs almost immediately,” said President Donald Trump . After he was done, Ellison stepped to the podium. “The data centers are actually under construction, the first of them are under construction in Texas. Each building’s a half a million square feet, there are ten buildings currently being built, but that will expand to 20.” Following Ellison, SoftBank’s Masayoshi Son added that Stargate would “...immediately start deploying $100 billion dollars, with the goal of making $500 billion dollars within [the] next four years, within your town!” turning to Donald Trump with his hands extended. It was unclear what town he was referring to. Altman added that it would be “an exciting project” and that “...we’ll be able to do all the wonderful things that these guys talked about, but the fact that we get to do this in the United States is I think wonderful,” though it’s unclear what “the wonderful things” or “this” refers to. It’s been 15 months, and Stargate LLC has never been formed. SoftBank and OpenAI have contributed no capital to the project, other than SoftBank’s own acquisition of a former electric vehicle manufacturing plant in Lordstown, Ohio that it intends to turn into a data center parts manufacturing plant with Foxconn, which is best known for effectively abandoning a $10 billion factory in Wisconsin back in 2021 . Oh, and Project Freebird, a SoftBank-built project that exists to funnel money to its subsidiary SB Energy , though I can’t imagine how SoftBank actually funds it. No government money was ever involved, no funding ever left anyone’s bank account, no "initiative" ever existed, and OpenAI, Oracle and SoftBank have, in my opinion, conspired to mislead the general public about the existence and validity of a project for marketing purposes.  The “data centers actually under construction” referred to a 1.2GW project in Abilene Texas that had been under construction since the middle of 2024 , and had originally been earmarked by Elon Musk and xAI, except Musk pulled out because he felt that Oracle was moving too slow . While Ellison said that there were ten buildings under construction with plans to expand to twenty, only eight were actually being built ( each holding around 50,000 GB200 GPUs across NVL72 racks ), with the extension up in the air until March 2026, when Microsoft agreed to lease 700MW — so another seven buildings — that were meant to go to OpenAI. These buildings will not make Oracle any money, as Oracle is, despite spending so much money, leasing whatever land it uses from Crusoe. As far as those eight buildings go, only two are actually online and generating revenue, though sources with direct knowledge of Oracle’s infrastructure have informed me that work is still being done on both buildings despite CNBC reporting that they were “ operational ” in September 2025.  Let’s break this down. Based on a presentation by landowner Lancium from May 2025 , the Stargate Abilene campus was meant to have 1.2GW of AI data centers online by year-end 2025. Based on reporting from DatacenterDynamics, the first 200MW of power was meant to be energized “ in 2025 .” As time dragged on, occupancy was meant to begin in the first half of 2025 , had “ potential to reach 1GW by 2025 ,” complete all 1.2GW of capacity by mid-2026 , be energized by mid-2026 , have 64,000 GPUs by the end of 2026 , as of September 30, 2025 had “ two buildings live ,” and as of December 12, 2025, Oracle co-CEO Clay Magouyurk said that Abilene was “on track” with “more than 96,000 NVIDIA Grace Blackwell GB200 delivered,” otherwise known as two buildings’ worth of GPUs.  Four months later on April 22, 2026, Oracle tweeted that “...in Abilene, 200MW is already operational, and delivery of the eight-building campus remains on schedule.” It is unclear if that’s 200MW of critical IT capacity or the total available power at the Abilene campus, and in any case, this is only enough power for two buildings, which means that Oracle is most decidedly not “on schedule.”  Sources familiar with Oracle infrastructure have confirmed that while construction has finished on building three, barely any actual tech has been installed. It also appears that while construction has begun on a power plant of some sort, it’s unclear whether it’s the 360.5MW gas power plant or 1GW substation. In any case, Abilene needs both to turn on the GPUs, if they ever get installed. Abilene is, for the most part, the only part of the Stargate project that’s anywhere near complete. I say that because the other data centers — Shackelford, Texas, Port Washington, Wisconsin, Doña Ana County, New Mexico, Saline, Michigan, and Milam County, Texas — are patches of land with a few steel beams, if that . To be explicit, every single Stargate data center is funded by Oracle and its respective financial backers. Oracle is taking on a massive amount of debt to build these data centers, working with a labyrinthine network of financiers and construction partners to pull together the capacity necessary to get paid for its five-year-long $300 billion compute deal with OpenAI .  Oracle has also, per Bloomberg , deliberately raised money using “ project financing ” loans that are repaid using the projected cashflow, allowing it to keep the massive amount of debt off of its balance sheet. This is remarkable — and offensive! — because it’s borrowing over $38 billion to fund construction of its Wisconsin and Shackelford data centers (the largest debt deal of its kind on record) and said debt will now effectively not exist despite its massive drag on Oracle’s cashflow, which sat at negative $24.7 billion in its last quarterly earnings . Based on estimates ($30 million in critical IT and $14 million in construction per megawatt) from TD Cowen’s Jerome Darling, the total cost of Oracle’s 7.1GW of data center capacity will be somewhere in the region of $340 billion to build. All of these data centers are being built for a single tenant — OpenAI — which expects, per The Information , to lose over $167 billion (assuming it hits annual revenues of over $100 billion) by the end of 2028, and as a result does not actually have the money to pay Oracle for its compute on an ongoing basis. In addition to its commitments to Oracle, OpenAI has also made commitments to spend $138 billion on Amazon over eight years , $250 billion on Microsoft Azure over an unspecific period , $20 billion with Cerebras over three years , $22.4 billion with CoreWeave over five years , and a non-specific amount with Google Cloud .  All of this is happening as Oracle’s core businesses plateau, even after Oracle reshuffled them in Q3 FY25 to represent Cloud, Software, Hardware and Services segments, the latter three of which have barely moved in the last 9 months as low-to-negative-margin cloud compute revenue grows.  In other words, Oracle’s only growth comes from a segment requiring hundreds of billions of dollars of compute.  To make matters worse, every single one of these data centers is behind schedule. Stargate Abilene was meant to be done at the beginning, middle, and now the end of this year, yet sources tell me there’s no way it’s finished before April 2027. Bloomberg also reported late last year that Oracle had delayed several data centers from 2027 to 2028 , but here in reality , every other Stargate data center is somewhere between a patch of dirt, a single steel beam , multiple steel beams , or less than half of a shell of a single building . Considering it’s taken two years for Stargate Abilene to build two buildings, I don’t see how it’s possible that these are built before the beginning of 2029. And at that point, where exactly will we be in the AI bubble? What GPUs will be available? What other kinds of silicon will exist? What will the demand be for AI compute? I don’t think that OpenAI exists for that long, and even if it does, it will have to raise at least $200 billion in the space of three years to possibly keep up with its commitments. I’m surprised that nobody ( outside of JustDario , at least) has raised the seriousness of this situation. Stargate, as it stands, will kill Oracle, outside of OpenAI becoming the literal most-profitable and highest-revenue-generating company of all time within the next two years. Even then, by the time that Abilene is built, its 450,000 GB200 GPUs will be two-years-old, and entirely obsolete far before its debts are repaid. A similar fate awaits whatever GPUs are put in the other Stargate data centers. Today’s newsletter is a thorough review and analysis of the ruinous excess of Stargate, a name that only really means “data centers being built for OpenAI in the hopes that OpenAI will pay for them.” Oracle is mortgaging its entire future on their construction, and even if it gets paid, I see no way that the cashflow from OpenAI’s compute spend can recover the cost before its GPU capex is rendered obsolete, let alone whether it can cover the debt associated with the buildout. I’m Larry Ellison — Welcome To Jackass. Welcome to the end of Oracle, or Sell The Compute To Who, Larry? Fucking Aquaman ? The total estimated cost of Oracle’s Stargate capacity is around $340 billion. OpenAI needs to make, in total, $852 billion in both revenue and funding through the end of 2030 to keep up with its compute costs with Oracle, Amazon, Google, CoreWeave and Microsoft. Oracle cannot afford to pay for the cost of construction and equipment out of cashflow, and has had to take on over $100 billion in debt and sell $20 billion in shares . Across a potential 7.1GW of planned Stargate capacity, Oracle stands to make around $75 billion in annual revenue. Abilene is expected to generate around $10 billion a year in revenue on completion for a project that will likely cost in excess of $58 billion. Stargate Abilene is extremely behind schedule, and likely won’t be finished until Q2 2027. Oracle estimated in 2024 that Abilene would cost it $2.14 billion a year in colocation and electricity fees. Oracle has spent over $5 billion in construction costs on the first two buildings of Abilene, with sources saying that it will likely spend over $10 billion to finish them, suggesting an overall cost of around $48-per-megawatt. Oracle’s remaining Stargate sites are barely under construction, and will likely not be finished before the end of 2028. Even if Oracle builds the data centers and OpenAI pays for them, the incredible upfront cost and NVIDIA’s yearly upgrade cycle will render much of the GPU capacity worthless within the next ten years.  And if OpenAI fails to pay, Larry Ellison likely has over $20 billion in personal loans collateralized by over $60 billion in Oracle shares, meaning that margin calls will follow with the collapse of Oracle's stock.

0 views

The Reading Room is Open

We’re launching something new: The Reading Room , a book club right here in The Coder Cafe community. We’re kicking things off with one of my all-time favorite technical book: Designing Data-Intensive Applications , since the second edition just got released. If you’re interested, here’s how it works : One chapter every two weeks (no pressure, no guilt). You can find the full schedule here . Discussion happens in the #ddia-v2 channel on Discord. O’Reilly is kindly sponsoring the reading group! 🎉 3 participants will be randomly selected at the start to receive a free digital copy of the book. Depending on engagement, we may also organize a live session every half of the book to discuss together. A shared reading experience with other engineers who care about the same stuff as you. Next steps : To join, add a 👍 to this message in the Discord. Not in the server yet? Join here . To have a chance to win one of the 3 free copies, fill in this form (O’Reilly requires an email address to send the free digital copy). The random draw will happen on May 1st. We will start reading the first chapter will start on May 4th . See you in The Reading Room . We’re launching something new: The Reading Room , a book club right here in The Coder Cafe community. We’re kicking things off with one of my all-time favorite technical book: Designing Data-Intensive Applications , since the second edition just got released. If you’re interested, here’s how it works : One chapter every two weeks (no pressure, no guilt). You can find the full schedule here . Discussion happens in the #ddia-v2 channel on Discord. O’Reilly is kindly sponsoring the reading group! 🎉 3 participants will be randomly selected at the start to receive a free digital copy of the book. Depending on engagement, we may also organize a live session every half of the book to discuss together. To join, add a 👍 to this message in the Discord. Not in the server yet? Join here . To have a chance to win one of the 3 free copies, fill in this form (O’Reilly requires an email address to send the free digital copy). The random draw will happen on May 1st. We will start reading the first chapter will start on May 4th .

0 views

GESS Stenography for Russian and English

Read on the website: GESS is a Soviet / Russian standard for stenography (fast handwriting.) I want to use it for both Russian and English. And I dare say it works!

0 views

Nicolas Solerieu

This week on the People and Blogs series we have an interview with Nicolas Solerieu, whose blog can be found at slrncl.com/blog . Tired of RSS? Read this in your browser or sign up for the newsletter . People and Blogs is supported by the "One a Month" club members. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. I’m dad, designer, cyclist, designer, texture guy – currently living in San Luis Obispo, CA. My oldest kid just learned to blow his nose. The other one is in his prime baby time. These days I day dream about bikepacking and permaculture . Born and raised in France, I landed in California in 2016. An odd mix of work ethic and ego led me to define myself through the stuff I make: all sorts of combinations of rectangles and text boxes, mostly for screens, solely because I got good enough to get paid for it. While I'm filled with gratitude for my career, I spend a humorously uncomfortable amount of time torn between ascetic ideals and pragmatism. While I’m not a technologist, I’m not a monk either. I’m way too fidgety. Time outdoors, family life, movement, and occasional meditation keep me sane. I adopted this domain name in 2016 as I didn't like having my real name spelled out in the URL, it felt weird. I bought my initial domain back in 2012: nicolas-soleri.eu , I thought it was clever. SLRNCL is a concatenation of my last name and my first name without the vowels. It's hard to remember, which is great since I'm not trying to play the SEO game. I truly started to put effort into writing in 2022. The birth of my first child probably had a lot to do with it – and getting off instagram. I couldn’t fathom the idea of being a dad with an instagram account. But I’d love for my kids to one days read the blog of their silly dad. Self-awareness and allergy to grandiosity creates a tension between craft, skepticism, and my embodied experience which I love to put into words. The blog-therapy is (still) working. It’s eating up most of my creative ego and filling my feed. Nowadays I use the default iOS notes app. I write whenever. I edit little. I used to have a notes.txt file on my desktop where I was putting down all interesting nuggets, like a wine cellar, hoping for them to mature. Instead, they mostly degenerated and created a bunch of anxiety from doing nothing of it. I breed an uncomfortably large amount of thoughts daily. Most of them are unexceptional. I cultivate poor writing hygiene because I do not want to truly get into writing. Yet, there seems to be something that keeps bringing me back to words. To tame my ego and avoid creating a generational supply of passable notes I use my blog as a graveyard. Typos are my own, I’m working on it. With AI it now feels like a mark of authenticity. Sometimes I ask my wife to proof-read, but that is rare because we end up arguing, worth it. Following the flow of life is what makes my creative juice flow. I often write on the toilet or in public parks while keeping an eye on my kids. I thrive in “white-space” time - time in between things. So I jot down notes when I’m out and about. I’m not a coffee shop person and I hate my home office . My website is home cooked. It runs mostly on PHP. I still have Jquery installed but I’m slowly removing all Javascript dependencies. I'm not a great dev and prefer to stay 5 years behind trends. My website is constrained by my skills. This has kept me grounded and covered most of my needs and ambitions. I don't recommend inspecting my code, it's really not great but decently light. Building stuff is a great way to keep myself grounded in the process. I use Inter as the only font because it's nice, plain, and open source. It will default to system font if Inter isn't available. Because I don't want to import anything custom or use CDN. I'm not better than Inter (and few out there are IMO). The site is hosted by OVH in France. I’m considering self-hosting since my house produces excess solar power. I’d use bearblog if I was not a pretentious web designer and had to start over. I recommended it to my wife , she likes it. The simplicity and authenticity of the project is lovely. That said I do not regret the torturous process of having redesigned my website tirelessly over the last decade. The process taught me a lot about myself. My domain name + hosting cost under 20 euros/year. I do not run ads or track anything - I don’t plan or change this ever. That means my website has had an incredible ROI considering the career opportunities it gave me. The many people who hired me all visited my website (and told me about it). I had some rewarding connections with internet strangers. My gratitude is larger than an html file can hold, and definitely magnitude greater than what it cost me to run my website. Money is important, and I’m a lucky bastard. I don’t have anything against people monetizing their thoughts - though I’m rarely compelled by a paywall. Digital patronage and crowdfunding seems highly relevant to get out of the social media hell realm of today. It has pitfalls, the main one being requiring mass adoption which seems highly delusional. But hope and compassion are contagious while big tech fights entropy. Social media always comes back in a different form, meanwhile, html is still there. It’s the cockroach business model. There are so many goodies out there, one link away. Sharing is fun, side projects too. In my case it took me a decade to get my head out of my own butt and realize the cost of my own ventures. I believe a lot of us are similar to me, moving through life and accumulating stuff. Cleaning up, giving up, and passing along are necessary processes. So as a closing thought I’d suggest to sit, close your eyes and think of all your stuff. If you’re comfortable with it, great. Otherwise, spring is coming. Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 138 interviews . People and Blogs is possible because kind people support it. These two crack me up and make me think: Keenan and Taylor town (already seen on P&B) Some wholesome Aussie stories: Beau Miles Maggie Appleton always gets me interested The only design blog Tobias Van Shneider I've ever read I'm a fan of Faircompanies stories and mission Not a blog but worth checking out James low audio archive

0 views
Nicky Reinert Yesterday

Why I Cancelled Claude: Token Issues, Declining Quality, and Poor Support

First enthusiasm A couple of weeks ago I subscribed to Claude Code, and during the first few weeks I had a really nice experience. It was fast, the token allowance was fair, and the quality was good. I learned they had raised the token allowance for non-rush hours , and since they opposed some …

0 views

DeepSeek V4 - almost on the frontier, a fraction of the price

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December . They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash . Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B). Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's possible the Pro model may run on it if I can stream just the necessary active experts from disk. For the moment I tried the models out via OpenRouter , using llm-openrouter : Here's the pelican for DeepSeek-V4-Flash : And for DeepSeek-V4-Pro : For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December , V3.1 in August , and V3-0324 in March 2025 . So the pelicans are pretty good, but what's really notable here is the cost . DeepSeek V4 is a very, very inexpensive model. Here's DeepSeek's pricing page . They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro. Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic: DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models. This note from the DeepSeek paper helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts: In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2. DeepSeek's self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. I'm keeping an eye on huggingface.co/unsloth/models as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Justin Duke Yesterday

What's Up, Doc?

What's Up, Doc? is, I guess, just a perfect film. I can remember exactly one other movie of its ilk that I watched with sheer glee — amazed by how contemporaneously funny it was, by how awful it was, and by how obviously, in retrospect, it influenced so much of the genre: the-thin-man . But even more so than that film, What's Up, Doc? is all gas, no brakes. The commitment to screwball never wavers, not even for a single second, ramping up and up and up in abject silliness until — as Babs says in a memorable closing line — you simply surrender to its tidal wave. Here's a confession I'll offer in lieu of anything interesting to say about this terrific, hilarious film that I recommend wholeheartedly: I don't think I've actually ever seen anything with Barbra Streisand in it before. In one of those self-reflexive memes, I know her more for the Streisand effect — literally the name — than any specific work of art. Until now. And she is so completely winning in this, in a way that I don't think I've actually seen from any other lead actress. It is rare for Hollywood to let a lead actress be funny, horny, and charming all at once. The industry, if it deigns to let women be sexual and possessed of a sense of humor, usually consigns them to the realm of the character role, or tries to diffuse things with some other means — i.e. fat jokes. But Babs here, who is in many ways the original manic pixie dream girl (albeit perhaps more of a nightmare), is an absolute tornado. I'm not sure I would find her as charming as her male retinue does, diegetically, but she commands every scene she's in and demands your attention, never letting pesky things like pathos or logic get in the way of her Looney Tunes sensibilities. Just an absolute delight.

0 views

Greenfield and Iterative Development

Crossposted from Prime Radiant's blog – I'm really excited about all of the stuff we are doing at Prime Radiant. For the most part we're blogging about it over there, but I'm going to continue to lift the occasional post m back to my personal blog. Today, we're pleased to share the initial research previews of two new pieces of technology we've built at Prime Radiant: Both of these projects are brand new. We've used and tested them internally, but they are not yet hardened production-grade software. We're releasing them today to start to gather feedback on how well they work for your projects. Greenfield and Iterative Development grew out of our work on Superpowers . Greenfield works as a standalone tool and Iterative Development depends on Superpowers for some of its magic. (Superpowers started life as my personal agentic development methodology. I'm the Founder and CEO of Prime Radiant. Superpowers is now a Prime Radiant project.) We first designed Greenfield as an experiment in agentic "clean room" reverse engineering. It's built to tease apart a software product, starting from a codebase, documentation, API clients, and other collateral. It turns all of that input into a corpus of behavioral specs for everything from public API contracts to user journeys. Just as importantly, it works hard to make sure that it doesn't include the product's internals in those specs. While you can use Greenfield to explore any codebase, we're most excited about the possibilities it opens up for extracting design and intent from under-documented historical "brownfield" codebases, making it possible to build new, clean implementations. Greenfield is incredibly token-hungry. Using it to generate specs from a non-trivial codebase with a Claude Max 20x subscription will almost certainly exhaust your five-hour window several times over. While we have some ideas for how to make it significantly more efficient, we're very focused on making its outputs as good as they can be and only then optimizing for token spend. One sample project we tested Greenfield + Iterative Development against was Ghost Pepper , Matt Hartman's excellent local-first dictation app for MacOS. We chose Ghost Pepper as an example because it's an open source app that I've been doing a significant amount of work on lately. It exercises enough UI complexity, OS framework integration, and third-party library usage to be non-trivial, but isn't so large that results are hard to evaluate. Also, because of how it was built, it had no significant design documentation. Over the course of a few hours, Greenfield generated approximately 500k of human-readable textual specs. We've published a snapshot of those specs and the regenerated version of "Ghost Pepper 1.9.0" on GitHub. You should not use this version of Ghost Pepper. It's just there so you can see what the generated output looks like. If you've spent any significant time using an agent to build software, you are likely aware of the pain that comes when you hand your agent a spec that's too big. It skips steps, misses features, and generally just fumbles the implementation. Even Superpowers tends to cap out at plans that are a small fraction of a Greenfield-generated specification. To that end, we're open-sourcing the first version of 'Iterative Development', a new set of skills and tools designed to augment Superpowers so it can take big spec packages, parse out individual requirements into something a little bit like "user stories", bundle those into development epics that coding agents can wrap their heads around, and then execute the heck out of an implementation. Iterative Development is very, very young, but our first experiences with it have been really promising. We've been testing it with both Claude Code and Codex and have been pretty happy with the early results. It builds working software from gigantic specs and has done a great job of not skipping requirements. The most recent run of "rebuild Ghost Pepper 1.9.0" built a fully working implementation of the product with dramatically better test coverage than the original, which was great. Manually testing the Ghost Pepper reimplementation, however, was a little tricky because the auto-updater configuration was correct and the reimplementation kept trying to "update" itself to the latest release of the real Ghost Pepper! One thing that wasn't yet as good about the rebuilt Ghost Pepper was that it ended up with a more complex internal API surface to support that better test coverage. Right now, a lot of the tuning we're doing to Iterative Development is around improving its engineering taste and architecture. If you try out Greenfield or Iterative Development, we'd love to hear from you. Drop us a line at [email protected] . Greenfield – our suite of tools for turning existing software into behavioral specifications. Iterative Development – an agentic methodology for building bigger software products from detailed specifications without dropping requirements.

0 views
Sean Goedecke Yesterday

Software engineering may no longer be a lifetime career

I don’t think there’s compelling evidence that using AI makes you less intelligent overall 1 . However, it seems pretty obvious that using AI to perform a task means you don’t learn as much about performing that task . Some software engineers think this is a decisive argument against the use of AI. Their argument goes something like this: I don’t necessarily agree with (2). On the one hand, moving from assembly language to C made programmers less effective in some ways and more effective in others. On the other hand, the transition from writing code by hand to using AI is arguably a bigger shift, so who knows? But it doesn’t matter. Even if we grant that (2) is correct, this is still a bad argument . Until around 2024, the best way to learn how to do software engineering was just doing software engineering . That was really lucky for us! It meant that we could parlay a coding hobby into a lucrative career, and that the people who really liked the work would just get better and better over time. However, that was never an immutable fact of what software engineering is. It was just a fortunate coincidence. It would really suck for software engineers if using AI made us worse at our jobs in the long term (or even at general reasoning, though I still don’t believe that’s true). But we might still be obliged to use it, if it provided enough short-term benefits , for the same reason that construction workers are obliged to lift heavy objects: because that’s what we’re being paid to do. If you work in construction, you need to lift and carry a series of heavy objects in order to be effective. But lifting heavy objects puts long-term wear on your back and joints, making you less effective over time. Construction workers don’t say that being a good construction worker means not lifting heavy objects. They say “too bad, that’s the job” 2 . If AI does turn out to make you dumber, why can’t we just keep writing code by hand? You can! You just might not be able to earn a salary doing so, for the same reason that there aren’t many jobs out there for carpenters who refuse to use power tools. If the models are good enough, you will simply get outcompeted by engineers willing to trade their long-term cognitive ability for a short-term lucrative career 3 . I hope that this isn’t true. It would be really unfortunate for software engineers. But it would be even more unfortunate if it were true and we refused to acknowledge it. The career of a pro athlete has a maximum lifespan of around fifteen years. You have the opportunity to make a lot of money until around your mid-thirties, at which point your body just can’t keep up with it. A common tragic figure today is the professional athlete who believes the show will go on forever and doesn’t prepare for the day they can’t do it anymore. We may be in the first generation of software engineers in the same position. If so, it’s probably a good idea to plan accordingly. If you’re thinking “wait, there’s research on this”, you can likely read my take on the paper you’re thinking of here , here or here . Of course, construction workers do have layers of techniques for avoiding lifting heavy objects when possible (cranes, dollies, forklifts, and so on). There’s a natural analogy here to a set of techniques for staying mentally engaged that software engineers are yet to discover. In theory labor unions could slow this process down (and have forced employers to slow down this race-to-the-bottom in other industries). But I’m pessimistic about tech labor unions for all the usual reasons: the job is too highly-paid, you can work (and thus scab) from anywhere on the planet, and so on. Using AI means you don’t learn as much from your work AI-users thus become less effective engineers over time, as their technical skills atrophy Therefore we shouldn’t use AI in our work If you’re thinking “wait, there’s research on this”, you can likely read my take on the paper you’re thinking of here , here or here . ↩ Of course, construction workers do have layers of techniques for avoiding lifting heavy objects when possible (cranes, dollies, forklifts, and so on). There’s a natural analogy here to a set of techniques for staying mentally engaged that software engineers are yet to discover. ↩ In theory labor unions could slow this process down (and have forced employers to slow down this race-to-the-bottom in other industries). But I’m pessimistic about tech labor unions for all the usual reasons: the job is too highly-paid, you can work (and thus scab) from anywhere on the planet, and so on. ↩

0 views

Extract PDF text in your browser with LiteParse for the web

LlamaIndex have a most excellent open source project called LiteParse , which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that LiteParse uses to run in Node.js. Refreshingly, LiteParse doesn't use AI models to do what it does: it's good old-fashioned PDF parsing, falling back to Tesseract OCR (or other pluggable OCR engines) for PDFs that contain images of text rather than the text itself. The hard problem that LiteParse solves is extracting text in a sensible order despite the infuriating vagaries of PDF layouts. They describe this as "spatial text parsing" - they use some very clever heuristics to detect things like multi-column layouts and group and return the text in a sensible linear flow. The LiteParse documentation describes a pattern for implementing Visual Citations with Bounding Boxes . I really like this idea: being able to answer questions from a PDF and accompany those answers with cropped, highlighted images feels like a great way of increasing the credibility of answers from RAG-style Q&A. LiteParse is provided as a pure CLI tool, designed to be used by agents. You run it like this: I explored its capabilities with Claude and quickly determined that there was no real reason it had to stay a CLI app: it's built on top of PDF.js and Tesseract.js, two libraries I've used for something similar in a browser in the past . The only reason LiteParse didn't have a pure browser-based version is that nobody had built one yet... Visit https://simonw.github.io/liteparse/ to try out LiteParse against any PDF file, running entirely in your browser. Here's what that looks like: The tool can work with or without running OCR, and can optionally display images for every page in the PDF further down the page. The process of building this started in the regular Claude app on my iPhone. I wanted to try out LiteParse myself, so I started by uploading a random PDF I happened to have on my phone along with this prompt: Regular Claude chat can clone directly from GitHub these days, and while by default it can't access most of the internet from its container it can also install packages from PyPI and npm. I often use this to try out new pieces of open source software on my phone - it's a quick way to exercise something without having to sit down with my laptop. You can follow my full conversation in this shared Claude transcript . I asked a few follow-up questions about how it worked, and then asked: This gave me a thorough enough answer that I was convinced it was worth trying getting that to work for real. I opened up my laptop and switched to Claude Code. I forked the original repo on GitHub, cloned a local copy, started a new branch and pasted that last reply from Claude into a new file called notes.md . Then I told Claude Code: I always like to start with a plan for this kind of project. Sometimes I'll use Claude's "planning mode", but in this case I knew I'd want the plan as an artifact in the repository so I told it to write directly. This also means I can iterate on the plan with Claude. I noticed that Claude had decided to punt on generating screenshots of images in the PDF, and suggested we defer a "canvas-encode swap" to v2. I fixed that by prompting: After a few short follow-up prompts, here's the plan.md I thought was strong enough to implement. I prompted: And then mostly left Claude Code to its own devices, tinkered with some other projects, caught up on Duolingo and occasionally checked in to see how it was doing. I added a few prompts to the queue as I was working. Those don't yet show up in my exported transcript, but it turns out running in the relevant folder extracts them. Here are the key follow-up prompts with some notes: I've started habitually asking for "small commits along the way" because it makes for code that's easier to understand or review later on, and I have an unproven hunch that it helps the agent work more effectively too - it's yet another encouragement towards planning and taking on one problem at a time. While it was working I decided it would be nice to be able to interact with an in-progress version. I asked a separate Claude Code session against the same directory for tips on how to run it, and it told me to use . Running that started a development server with live-reloading, which meant I could instantly see the effect of each change it made on disk - and prompt with further requests for tweaks and fixes. Towards the end I decided it was going to be good enough to publish. I started a fresh Claude Code instance and told it: After a bit more iteration here's the GitHub Actions workflow that builds the app using Vite and deploys the result to https://simonw.github.io/liteparse/ . I love GitHub Pages for this kind of thing because it can be quickly configured (by Claude, in this case) to turn any repository into a deployed web-app, at zero cost and with whatever build step is necessary. It even works against private repos, if you don't mind your only security being a secret URL. With this kind of project there's always a major risk that the model might "cheat" - mark key features as "TODO" and fake them, or take shortcuts that ignore the initial requirements. The responsible way to prevent this is to review all of the code... but this wasn't intended as that kind of project, so instead I fired up OpenAI Codex with GPT-5.5 (I had preview access) and told it: The answer I got back was enough to give me confidence that Claude hadn't taken any project-threatening shortcuts. ... and that was about it. Total time in Claude Code for that "build it" step was 59 minutes. I used my claude-code-transcripts tool to export a readable version of the full transcript which you can view here , albeit without those additional queued prompts (here's my issue to fix that ). I'm a pedantic stickler when it comes to the original definition of vibe coding - vibe coding does not mean any time you use AI to help you write code, it's when you use AI without reviewing or caring about the code that's written at all. By my own definition, this LiteParse for the web project is about as pure vibe coding as you can get! I have not looked at a single line of the HTML and TypeScript written for this project - in fact while writing this sentence I had to go and check if it had used JavaScript or TypeScript. Yet somehow this one doesn't feel as vibe coded to me as many of my other vibe coded projects: Most importantly, I'm happy to attach my reputation to this project and recommend that other people try it out. Unlike most of my vibe coded tools I'm not convinced that spending significant additional engineering time on this would have resulted in a meaningfully better initial release. It's fine as it is! I haven't opened a PR against the origin repository because I've not discussed it with the LiteParse team. I've opened an issue , and if they want my vibe coded implementation as a starting point for something more official they're welcome to take it. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . - I've written more about red/green TDD here . (it was messing around with pdfium) - I had a new idea for how the UI should work - see below - it's important to credit your dependencies in a project like this! - it was testing with Playwright in Chrome, turned out there was a bug in Safari - dropping screenshots in of small UI glitches works surprisingly well - it still wasn't working in Safari... - but it fixed it pretty quickly once I pointed that out and it got Playwright working with that browser As a static in-browser web application hosted on GitHub Pages the blast radius for any bugs is almost non-existent: it either works for your PDF or doesn't. No private data is transferred anywhere - all processing happens in your browser - so a security audit is unnecessary. I've glanced once at the network panel while it's running and no additional requests are made when a PDF is being parsed. There was still a whole lot of engineering experience and knowledge required to use the models in this way. Identifying that porting LiteParse to run directly in a browser was critical to the rest of the project.

0 views
alikhil Yesterday

How to Quickly Prepare for Software Engineering Interviews

A few months ago, I found myself needing to prepare for a series of job interviews within a very limited timeframe. It was a stressful experience, but it ultimately worked out well. I decided to share my notes and reflections in case they’re helpful to others in a similar situation. This is especially relevant if you’re not actively job hunting and suddenly receive an interview invitation, leaving you with limited time to prepare but a strong desire to maximize your chances of success. Disclaimer : The tips described in this post may be more useful for senior engineers with hands-on experience and engineering intuition. The internet is full of articles listing all possible HR interview questions. I recommend spending a bit of time on them just to understand what to expect and not be surprised. However, in my humble opinion, there are two main points to focus on during HR interview preparation. First, you need a short story that tells your experience briefly. Avoid listing every bullet point from your CV. Instead, focus on highlighting your key achievements. Also, your story must be aligned with the position you are applying to. Yes, you might need to adjust your story for different jobs at different companies. Second, it’s important to have a clear motivation. Why do you want to change your job, and why this company/role? What kind of job are you looking for? If you have some experience doing System Design interviews or have never done it, start by learning the Delivery framework . Understand each section. Watch at least one video on how it’s done. The more, the better. These videos from Hello Interview channel are really good, though. If you are applying to a FAANG company, you may search for leaked system design questions from that company and spend some time preparing for them. But there is no guarantee that you will get the same topic, thus I would not recommend spending all your time here. If you can, do a mock interview. Ask a friend or find someone to practice with. If you can’t, then try to walk through alone, but talk through everything out loud. During the interview, treat the interviewer as a colleague, ask questions, ensure you understand the problem, and that you have not missed any important requirements before building the design of the system. Don’t rush. This part is really tricky. If the company tends to use LeetCode-style interviews, there is no shortcut here. You need to solve hundreds of them to really feel confident. You may need to refresh your memory on algorithms you feel less confident about (for example, I always forget about corner cases for binary search). Again, if it’s a big / well-known company, you can try to search for leaked coding interview questions. S.T.A.R (situation task action result) & C.A.R.L (context action result learning) There are dozens of questions you could be asked in behavioral interviews. And you’re expected to structure your answers using the STAR framework. This means you need to tell a story by defining a context, your actions, and results. You could go and just prepare a STAR format answer to all such questions, but it will take a lot of time, and it’s suboptimal. This, combined with the fact that the same stories can be used for different questions, makes the situation easier for you. You can prepare 7–10 stories that will cover most of the questions. During preparation, you can write them as text, but don’t read them during the interview. It tends to sound unnatural. When telling your story using the STAR method, make sure your final sentence clearly highlights a positive outcome. Adjust your tone to emphasize this closing part so it stands out. The STAR framework is a standard. But also check CARL in some questions, it would be good to tell what you have learned from that story. Here are some materials that helped me to prepare for a behavioral interview: Some companies have such an interview stage. It’s quite unpopular but still exists. You’re asked to present a project or problem you worked on. You explain the context, problem, solution, results, and your role in this story. It’s like showing the result of your work to colleagues from different departments/teams. This stage is very open-ended. You are not given specific instructions, and there is not much information on the internet with recommendations on how to prepare and conduct such interviews. When I found out I would have this interview, I was initially shocked and unsure how to prepare, as I didn’t know what to expect. It wasn’t until I realized that in reality, it’s you, the interviewee, who rules this interview . You choose the project, decide what to include and omit, control the level of detail, and you are coming up with the story you know, with all the answers for all possible questions, because it’s your story. So, make the most of this stage. Prepare your story, make a few slides / notes / architecture sketches. Don’t dig into details too much. Leave a space for the questions. And even if there is no dedicated interview, you may be asked to tell in detail about a certain problem/project you were working on. So, be prepared. Have your story! When answering open-ended questions, aim to tell stories where the scale of the problem matches the level of the role you’re applying for . For example, if you are asked, “Tell me about a challenging/interesting problem/task you were working on recently.” Optimizing an SQL query by adding an index may be fine for junior roles, but it won’t carry enough weight for senior positions. Interviewers would expect to hear something bigger, challenging, higher stakes, and often involving cross-team collaboration, such as migrating a large system to Kubernetes. Question back . You should ask questions to learn more about the company, their culture, the hiring manager’s management style, and what they like or dislike about their work. Prepare a list of questions before the interview. Start preparing in advance . Even if you’re not planning to change jobs anytime soon, you can begin investing in your future by: Hello Interview - Behavioral Interview Discussion with Ex-Meta Hiring Committee Member - must watch Behavioral interview, although I would recommend watching it even before the HR interview, because it gives a bunch of helpful tips about self-presentation https://thebehavioral.substack.com/ - Strategies, tips, and resources to prepare for your next behavioral interview from a FAANG+ insider. solving one LeetCode problem a day keeping track of tasks/projects you’ve completed, along with your achievements (many companies require this anyway for performance reviews) – this would be a foundation for your stories in behavioral and project walkthrough interviews. keeping your CV and LinkedIn up to date.

0 views

A pelican for GPT-5.5 via the semi-official Codex backdoor API

GPT-5.5 is out . It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for! There's one notable omission from today's release - the API: API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon. When I run my pelican benchmark I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results. One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers. Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API. OpenClaw integrated directly with this mechanism, and was then blocked from doing so by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool. Does this mean anyone can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions? The other day Jeremy Howard asked : Anyone know whether OpenAI officially supports the use of the endpoint that Pi and Opencode (IIUC) uses? It turned out that on March 30th OpenAI's Romain Huet had tweeted : We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code. That’s why Codex CLI and Codex app server are open source too! 🙂 And Peter Steinberger replied to Jeremy that: OpenAI sub is officially supported. So... I had Claude Code reverse-engineer the openai/codex repo, figure out how authentication tokens were stored and build me llm-openai-via-codex , a new plugin for LLM which picks up your existing Codex subscription and uses it to run prompts! (With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!) Here's how to use it: All existing LLM features should also work - use to attach an image, to start an ongoing chat, to view logged conversations and to try it out with tool support . Let's generate a pelican! Here's what I got back : I've seen better from GPT-5.4 , so I tagged on and tried again : That one took almost four minutes to generate, but I think it's a much better effort. If you compare the SVG code ( default , xhigh ) the one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. used 9,322 reasoning tokens where the default used just 39. One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's going to be priced at twice the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15. GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens. GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus. Ethan Mollick has a detailed review of GPT-5.5 where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Install Codex CLI, buy an OpenAI plan, login to Codex Install LLM: Install the new plugin: Start prompting:

0 views
Kev Quirk Yesterday

Update on My Coffee Ridden Framework 13

A week or so ago, I talked about how I might have killed my Framework 13 by dumping a full mug of coffee over it while it was running. In that last post I explained how I'd stripped the laptop down and was waiting for some isopropyl alcohol (IPA) to be delivered so I could more thoroughly clean it. Well dear reader, the IPA turned up, I cleaned it as best I could, and left it for 24 hours to dry off. The next day I came back to it, re-assembled it and hit the power button with a fair amount of trepidation. I think it's dead, Jim. And I can't help thinking that turning the laptop on in haste, after the first clean is what completely screwed it. Oh well, we live and learn. In my desperation, I contacted Framework support and explained the whole saga to see if there was anything I was missing. There wasn't. They told me that the LED pattern I was seeing when powered on was indicative of a communication error with the board, so it's dead and needed to be replaced. Problem is, a new board is £700 (~$950) and I didn't fancy shelling out that much money out of my own pocket, so I contacted my home insurance provider to make a claim, and to be fair they were great. A case was logged and a couple of days later I had a payout that would cover the whole amount. The payout from the insurance was more than the repair cost, so I decided to upgrade from my current Ryzen 7 7840, to an AI 300 series board instead - nice little upgrade! The Framework site said it would be shipped in 5 days, and would probably be subject to delays of a further 7 days due to global freight disruptions. So I bought myself a ThinkPad T480 to see me through (which I'm typing this post on) as I couldn't bear to be on MacOS for another second. Framework overachieved again and the board is due for delivery tomorrow (Friday 24th April 2026). Once the board is delivered and my beloved Framework is (hopefully) working again, this nice little ThinkPad will go to my wife as an upgrade from here 2014(!) Gen 2 X1 Carbon. I've had a few people reach out telling me that they'd done something similar and their device's had survived. Unfortunately I wasn't as lucky, so what happened? I think it's because I didn't spill the coffee on my laptop, but next to it. Then as the puddle of coffee made its way over my desk and inevitably under my laptop, the spinning fan must have sucked it up and perfectly spread the coffee all over the main board. Thanks for that. Stupid fan. 🤣️ Had I spilled the coffee on my laptop, it would have had to make its way through the keyboard and chassis before it got to the board, by which point I would have had the laptop switched off and draining. I can't say for sure, but that's my theory. So anyway, wish my luck with the new board, folks! Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views

SQLAlchemy 2 In Practice - Chapter 6: A Page Analytics Solution

This is the sixth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! The goal of this chapter is to use the concepts you have learned to build a web traffic analytics solution. This will serve as reinforcement of the techniques demonstrated in previous chapters as well as an example of a more complex and realistic database design.

0 views
Stratechery 2 days ago

An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment

Listen to this post: Good morning, This week’s Stratechery Interview is with Google Cloud CEO Thomas Kurian . Kurian joined Google to lead the company’s cloud division in 2018; prior to that he was President of Product Development at Oracle, where he worked for 22 years. I previously spoke to Kurian in March 2021 , April 2024 , and April 2025 . The occasion for these interviews, at least for the last three years, is Kurian’s annual keynote at Google Cloud Next. You can watch the keynote here , and read the blog about Google’s announcements here . I spoke to Kurian a week ago, on April 15, and at that time only had access to the afore-linked blog post. With regards to the keynote, which I have since watched, I thought it was a powerful opening: Kurian returned to last year’s theme, about a unified architecture, but emphasized that the use cases were no longer theoretical or pilots but running at scale for real users. He also emphasized — in a foreshadowing of a point we discussed below — that Google itself was running on the same infrastructure as Google Cloud. Google CEO Sundar Pichai, meanwhile, talked about Google’s capex investment, and that (1) half of it was going towards Google Cloud, and (2) that Google Cloud was running the same stack as Google itself. I sense a theme! Pichai also emphasized security, a point that Kurian was also careful to raise in our talk, before discussing the shift to agents. To that end, in this interview — which again, was conducted before the keynote — we discuss agents. Specifically, I wanted to get Kurian’s take on the quality of Gemini’s harness (unsurprisingly, he thinks it’s great). Google has an integration advantage, but is it paying off in such a large company? I was also curious about how Google thinks about TPUs specifically and the cloud business generally in terms of balancing its internal needs with external customers like Anthropic. We also talk about the software ecosystem, why Google still believes in partnerships, and why the company was ready to seize the AI moment (hint: it’s because of Kurian). As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Thomas Kurian , welcome back to Stratechery. I promise I have recording turned on this year — in fact, I have two recordings turned on. TK: Thank you so much, Ben. Good to see you, thanks for taking the time. Well, I look forward to talking to you. It’s good to talk to you for multiple interviews, much better than talking to you multiple times in one interview, so we’re already doing better this year. But like last year, we are recording before your Google Next keynote . We’re actually quite a bit ahead, I think we’re several days ahead, but this podcast won’t be released until after the keynote. Therefore, I’m going to ask the exact same question I asked last year. Specifically, I like watching keynotes, not for the announcements, but for the framing that happens up front. Last year, that framing was infrastructure, [Google CEO] Sundar Pichai actually delivered that at the opening, then you came in and talked about that, and that was the context for everything that you talked about. What is the framing this year? TK: The framing this year is that as AI models have become more sophisticated, we see customers evolving the use of AI models from being used to answer questions in a chatbot-like fashion, to actually automating tasks on their behalf, and to automate process flows within the organization. By automating process flows, you both get efficiency improvements, productivity improvements, frankly, you can also change the way that you introduce new products and services to market, for example. In order to do that well, the technology, what you need is a world-class agent platform and to underpin the agent platform, you need world-class infrastructure. You need the way that the agents interact with your company’s data and your business — so you need capabilities to help an agent really understand the company’s business information and context. I think, as you’ve seen in the press, AI and cyber have become very contextual now, there’s a lot of concerns that AI will accelerate the speed of cyber attacks on people’s systems, and so we’re going to be talking about how we’re bringing AI and our cyber technology together to protect, including the integration of Wiz , and then we’re introducing Gemini Enterprise and our agent platform to customers. That’s sort of the theme of what we’re talking about. You mentioned agents last year, everyone was talking about them to a degree, what has really changed from last year to this year that makes this different? I read your whole blog post, it’s very long, and I think the word “agent” may appear in every single paragraph. TK: There’s three or four big things that have changed. The first is capabilities of models — Gemini is able to reason much more effectively as new versions of Gemini have come out. Second, they’re able to maintain long-running memory, which you require if you have an agent that’s automating tasks over many, many steps, it has to maintain a lot of state in memory. Third, their interaction with tools and the rest of the world, there have been good abstractions, skills, tools, MCPs [ Model Context Protocol ], as they’re called, they’re all abstractions for how an agent reasons and interacts with the rest of a company’s systems. All of them have advanced and so the core capabilities that the models themselves have gotten a lot better, the capability and the ability to use tools and interact with the rest of the world has become a lot better, the abstractions that the world exposes itself to the model has improved and so now you have models have these capabilities to do these very complex tasks. That all makes sense and certainly tracks. A lot of these announcements, though, as I was going through them, a lot was about the infrastructure around agents, which makes sense — the orchestration, registry, identity, security, all these bits and pieces. All of this is clearly necessary for large enterprises, something they’re going to worry about and ask about. But the agents have to actually work; do Gemini agents actually work? Because there’s a lot of talk, you know, Gemini was the belle of the ball four months ago, but over the last little bit, it’s been mostly a lot about Anthropic and Claude, Codex, a lot of talk about that, and Gemini, not much talk. What’s your feeling about your actual capabilities, not just agents in general? TK: I’ve always said when people ask us about it, I always say, “Let our customers talk about it, rather than we talk about it”, I think you’re going to hear from 500 customers telling their stories at Next. Even people building agents, we have a whole range of them, from Citigroup to Bosch to eBay to Virgin Voyages to Walmart, there’s a whole range of them, Food and Drug Administration, etc., Comcast, Unilever, all of them are going be talking about specific business problems they had. For example, for Citi, they’ll be talking about a new wealth advisor, Investment Management, where they’re using our agents to research a person’s investment priorities. So a person says, “Here’s my priorities for investment, my kids are going to school, I need this kind of cash flow in order to fund it”, and then it researches your financial portfolio and interacts with you to give you recommendations. If you look at Comcast, they’re using us for all of the work that they do for consumer services — this is repair, scheduling appointments, dispatching field technicians, there’s very complex flows that have many, many steps and interact with you with a lot of complex systems. If you look at some of these flows, they require all of the capabilities I talked about. So as an example, I want the capability to call a set of tools, and those tools may be I want to book an appointment, so I need calendar, I need to look up, if I’m dispatching a technician, I need to look up spare parts so I need to pull up from my inventory that spare parts inventory, I need to schedule that to be available at the same time as the person who’s going out, I need to update my inventory that have taken something out of it. I mean, these are very, very complex steps. What’s interesting about all these complex steps and going through all these bits and pieces, it sounds like you’re saying that almost the more constraints there are, the more things you’re bumping up into, is that actually a better environment for instituting these sort of flows just because what you need to do is clearly defined? TK: Just being perfectly frank, Ben, having constraints requires the model to be even more intelligent. Just as an example, the number of variants in a process flow that’s complicated many, many steps, the number of different idiosyncratic situations that you may encounter are large so you cannot a priori program every one of them. You need to teach the model to use, for example, to be able to spin up a virtual machine and use a tool in the virtual machine to generate code to deal with some of these situations. So the most sophisticated thing is where you can give the model a high level set of instructions and have it goal seek an outcome. So you say, “I need to schedule this appointment”, and it turns out there may be 19 different conditions that occur when you’re trying to schedule an appointment and as part of that, you can’t a priori tell the model every single possible condition deterministically. So you need to teach the model, “Okay, the user did not tell you what to do, but the goal was to schedule an appointment, so here is how you generate code to then create a collection of things that can interact with the model and understand what to do”. This is very interesting, you’re walking through this process, this makes a lot of sense. How do you have that conversation with DeepMind? You’re connecting the, “This is the workflow that is needing to happen, these are what we need the model to do, this is where it does well, where it doesn’t”, what’s the working relationship there? TK: We have a harness in which all these flows journeys, for example, as we see them with customers, we put them into the harness and they get into the reinforcement loop for Gemini. How tight is that process? TK: Very tight. We have people sitting next to [DeepMind CEO] Demis’ [Hassabis] team, in fact I just came from a meeting with them, that loop is what allows us — we are in a unique position in the market. We’re unique in three different ways, we’re unique because we have the whole stack of AI technology. In order to do agents well, you need to have a model that takes all these journeys and puts it into the harness that handles the improvement, as we call it, hill climbing, literally every hour of every day, and the complexity of the journeys we see are in some ways much more complicated because in companies, you have many different systems, different conditions, different flows, you may not see that in other domains, like in a pure consumer domain. In order to do these well, you also need, for example, models need to spin up compute, models need to now hold on to tokens for longer because they need to hold, for example, a KV cache that holds memory about what’s happening during the transaction flow. Having awesome infrastructure, both classical, what we call classical compute machines, and TPUs gives us real strength there. Third, as you walk through these, one of the things you find is a lot of the systems these models interact with are things like databases, enterprise applications. So understanding the context of these, like for example, “How much inventory do you have?”, defining “What is inventory?”, “What part are you talking about?”, “What part number are you talking about?”, those things require you to have technology that understands the business graph and the dictionary of all the objects and the sources of information in your company. Our strength in data processing gives us some technology that we’re going to be talking about next week around something we call Knowledge Catalog, think of it as as your global dictionary for all information within the company, that’s a unique strength. And then obviously you don’t want information that’s critical to your company exposed on the Internet, you don’t want your model to get attacked because now it’s handling very complex process flows, you don’t want it hijacked, and so all the anxiety around cyber, we have very specific tools on, so our differentiation is all these pieces working together. That makes sense, the integration is a big part of your pitch. At the same time, you’re also a big, sprawling company and I think there’s maybe a perception, that I maybe hold, that some of the frontier labs are much more focused, they’re much more top-down about, “This is how our harness is going to work, the way it’s going to use tooling”, and all the things you’re talking about having this feedback flow back in sounds great unless there’s so many different takes on the way it should work and then you have your own internal customers as well. How do you balance having a point of view versus getting stuck in the muck? TK: Every product that Google has is on the same Gemini version, on the same day, on the same hour, every one of us is using the same harness. And you feel good that that harness is where it needs to be — it’s not getting pulled in 50 million directions thanks to all your customers and Google’s workloads? TK: Absolutely not, we are very focused on working with Demis and [DeepMind CTO] Koray [Kavukcuoglu] who lead our team to make sure they see the sophistication of these scenarios and we work literally side-by-side, hour-to-hour with them. There’s been a lot of speculation on are we distracted the company… I don’t think you’re distracted, I think it’s more just a matter of it’s a classic big company versus small company bit. Like a startup comes in and you have a very clear point of view and you don’t have all the enterprise stuff, you don’t have all this protecting the data, or permissions and all those structures, and yet that stuff sort of gets pulled along because there’s such demand to use your product that works really well and then over here it’s like, “Hey, we have everything protected and we have all these things around it”, but does the core product actually deliver? TK: The core product is being used by lots of people. The proof of that — we generate 16 billion tokens a minute, up from 10 just last December or January. Well, your financial results certainly showed that as well. There’s a bit where you’re doing so well, I have to be a little hard on you here. TK: A lot of people told us we were dead in 2023 — we’re still living. I think you’re doing more than living, you’re doing very well. TK: And so we never say anything negative about anybody else, our results prove for themselves. I always say, let our customers tell the story, they’re doing amazing things with Gemini in companies, enterprise, and they see the value of what we’re delivering for them. You mentioned that everyone in Google is on the same version of Gemini, using the same harness. Does that also apply to all this infrastructure around agents you’re doing, around sort of identity and security? TK: Yeah, in the enterprise, the way that all the infrastructure works is we have configurable mechanisms. Like for example, when you configure an agent, a very simple thing is you want to configure the agent with a different identity from a person, just a very simple example so that you can track, “Who did this transaction? Was it the human or the agent?, because there’s issues like liability. You may want to revoke permissions for the agent at a certain point in time, you want to allow it to only do certain tasks and not everything that the human does so there are controls you want to put around an individual agent and a collection of things that’s separate from the person. As we bring agents to consumers as part of our Gemini app, very similar concepts want to be exposed, and so the architecture that we use allows us to have those things. The sources of that may be different. In the consumer world, they may use the Google login account, in the enterprise world, they may use a directory to store it, but that’s just an abstraction of our technology to the rest of the world. We’ve been talking a lot about Gemini agents and the whole Gemini platform, but you also have just the broader Google Cloud platform. One of your major tenants is a company I was just sort of referring obliquely to, which is Anthropic, they’re doing a lot of inference on TPUs in particular. If Anthropic wins deals at the expense of Gemini, is that still a win? TK: We sell different parts of our stack. One of the things people don’t realize is we monetize many different parts of the stack in different ways. Like Anthropic, there’s a lot of labs that use our stack — in fact, most of the large AI labs use our stack. So if somebody uses TPUs to either to train their model or to use it for inference, we’re monetizing that part of the stack, that gives us resources to then fund our R&D and other investments. Some of the labs use our TPU and our Gemini model, others may use our TPU and then buy our cybersecurity protection for their models. So as a platform player, we have to allow our technology to be monetized in as many ways as possible and we don’t see it as a zero sum. Sometimes, though, if you have the SaaS layer and the platform layer and the infrastructure, is there one that is the most important? On one hand, SaaS has the highest margins, it kind of decreases going down. On the other hand, that infrastructure needs to be used, you’re spending a lot of money on it, you want full utilization. How do you think about that in terms of what’s the most important? I know they’re all important, but how do you think about that tradeoff? TK: If we were making TPUs just for ourselves, we would have lower volume than we do as a general purpose TPU supplier, which means there would be times of day that we would not be using those TPUs. Do you follow me? Like if you think how chat systems work, they’re very diurnal in nature, because you ask questions when you’re awake and we have a great search business and we have a great Gemini app business, but there would be a certain diurnalty to it during the daytime, there’d be a lot of questions, what about in the evening? Because we sell TPUs in the market, we’re able to offer it at spot to the rest of the world because we have such a large business. We’re able to also get manufacturing, better terms with suppliers and other things because of a real volume player, and that in turn lowers our cost of goods sold. So there are many more dynamics. The company is very focused on ensuring we win every part of this, not just one part of it. Gemini is obviously a super important initiative for us, and you’ll see the big announcements are around— For sure, it’s almost all Gemini. TK: But I wouldn’t assume that if we do that, the only way to do that is to offer our chips along with our model. We see a strong business offering our chips to many other people and you’ll see all of this is what’s accelerating our differentiation, and you see it in our financial results. Your financials are incredible, your revenues up, margins are up hugely, I’ve been posting that chart of them for a long time, last quarter was amazing . I do have to ask about TPUs, though. You talk about selling our TPU chips, to date that has meant TPU instances on GCP, but now there’s talk about actually selling TPU chips, what’s the status of that? What’s the official word, can I go buy a TPU? TK: I’ll explain a little bit what we see. So let me talk briefly about what the announcements we’re making, what the product is being used for, and then how we bring some of it to market. TK: We’re introducing two big new TPUs next week. One is TPU 8t, which “t” stands for training, it’s more optimized for training, think of it as 9,600 TPU chips, a single pod, as we call it, it has three times better performance than the current generation, which is already the leading one in the market. Then there’s 8i, which is “i” for inference, it’s 1,152 chips, three times the SRAM, and it has a new thing called the Collectives Engine, which gives you super efficient calculation performance for inference. Now, along with that, we are introducing Nvidia VR200, we’re also introducing more ARM capability for classical compute, because people who use models increasingly need to spin up a VM in order to do tasks, and that VMs we see interest in. We’re introducing not just new compute families, but also new storage, there are two new storage offerings. There’s one, the fastest Lustre solution in the market, it’s 10 terabits per second, that’s just to give you a sense, it’s like five times number two. We’re also introducing a new thing for ultra low latency — when you do inference, you want super low latency in accessing storage, we call it Rapid Storage, it can give you 15 terabits per second with ultra low latency, like microsecond latency. So why are we introducing all this stuff? TPUs, definitely a big market is the AI labs, but we’re seeing interest from new segments of the market. So a big new segment is financial services and when I say financial services, capital markets, and the reason is that today, if you’re a trading firm, a capital markets firm, you spend a lot of time running algorithmic trading and algorithmic trading is running numerical algorithms on traditional Intel type cores, x86 cores. Now what they find is that models can do inferencing and the inference performance is actually better than traditional numerical computing. So that’s one new segment, the second segment is high performance compute. We see a ton of people wanting to do energy modeling, computational fluid dynamics, solid state, there’s a whole bunch of parameters there too. What’s interesting about those is, you will see at our event, Citadel Securities for example, talk in the keynote about how they’re using TPU. Citadel, as you know, is a large capital markets firm. Department of Energy, they have a mission called Genesis , which is the new national lab mission on changing the energy infrastructure for the United States. There’s a big Brazilian largest utility in Brazil, Axia, all of them are examples of people who are part of just the keynote talking about how they use TPUs. When we look at that, there’s a couple of different things we see. Capital markets firms say, “Hey, if we’re going to replace our algorithmic trading solution, you have to bring TPU to where the venue is”. Right, because they care about the latency of going to a data center, that’s why they’re all New Jersey. TK: Secondly, if you’re a national lab, you have so much data you’ve collected over the last X number of years with your experiments — saying you have to bring all that data to the cloud to reason on it doesn’t make sense, so you will see us putting TPU in other people’s venues, and when we do that, we’re introducing new ways of people also procuring it. When I say procuring it, you buy it as a system, you don’t have to buy it just as a cloud source. How does this new way of selling, which is almost like a third way, so you have in Google’s data centers, you have bringing TPUs to customers, but then you have a deal like last week where between Anthropic and Broadcom and Google, this is going in their data centers. There’s these sort of renegade data centers that have access to power, maybe they were doing Bitcoin or whatever it might be, there’s been a big push to get TPUs into those. Where does that fit into this? TK: I would not assume everything you read in the press is true. Well, the Anthropic announcement was definitely a a big announcement. TK: Just to be honest with you, we have a flavor that runs in the cloud and a flavor that runs in third-party data center. The technology, the machines are identical. My question here is, where is that coming from? Is that part of your TSMC allocation? Is that Broadcom’s? Because no one can get enough compute, so ultimately that goes all the way back to the root. TK: The chips are all part of our global — TPU is a Google chip, as you know. So it’s part of global allocation, Broadcom partner who manufactures the TPUs with us and so it’s just part of the overall business. The new thing we’re talking about is just that you can run TPU in other venues. Makes sense. Will we ever have enough compute? Last year you said, “I think we’re going to resolve it shortly”, it doesn’t seem very resolved, what’s the status there? TK: We’ve worked super hard as an organization, our team that’s done our compute infrastructure, our global data centers, machines, all that, they’ve done an amazing job, there’s always a shortage, there’s never enough. But it doesn’t mean that we’re not — we would not be growing at the rate we are if we didn’t have enough compute. And so there’s more that we want, but there’s also the reality of our teams have done an amazing job, and our customers who are using it will tell you they’re seeing the benefits of the hard work our teams have done. There’s potential customers in the market, maybe current customers, who may be willing to pay basically any price for compute at this point. How do you think about the short term, “Wow we can actually just make a lot of money right now”, versus, “We need to invest in our products” — you had Microsoft, who I’m not going to ask you to comment on, but last quarter they’re like, “Yeah, we allocated less to Azure because we had our own internal workloads”. These are real trade-offs that you need to think about, how do you think about that in terms of GCP? TK: We run a balanced portfolio, we want to grow different parts of our business, we sit down as an executive team and also with Sundar and work through how we’re going to balance the different parts of our portfolio. We see, broad brush, three to four buckets of things. One bucket of things is where we want to grow Gemini as a business, our core Gemini business is doing super well, 16 billion tokens a minute, up 40% since last quarter, even this product called Gemini Enterprise , which is our core agent platform, has grown 40% sequentially quarter-over-quarter. So that part of the business, we’re committed to making it super successful, it’s a priority for us. Second segment of the business is where Gemini is being used inside of some of our core products, so I’ll give you an example. We’ve introduced Gemini inside our threat intelligence tools. Why is that? Because we have real expertise at Google scanning the dark web to identify threats, the problem is there’s so many of them, an average organization doesn’t know which of those many threats apply to them. So we use Gemini to process and prioritize which threats might affect you, it’s 98% accurate and has processed 3.9 million threats in the last year, so that’s an example of Gemini being used as an embedded capability. Right. The whole SaaS, PaaS, IaaS — the SaaS bit is still important. TK: There’s that capability, there’s people who want to use Gemini to reason on data in our analytics infrastructure so there’s a second big set where Gemini is an embedded capability and that in turn depends on chips and TPUs and GPUs. And the third one is offering our compute platform to people. We balance across those because we want all of them to be successful by bringing hardware or out machines to other people’s venues. We’re broadening our TAM, total addressable market, in that part of the business also we see a different cash flow model than if you were putting CapEx so there’s a lot of different parameters we have to balance. All those ones you listed for you to make trade-offs on, but then you also have to get in a meeting with Sundar and the other leaders of Google to make trade-offs with DeepMind and their R&D and with the consumer products. What are those meetings like? TK: We have a regular set of cadence of meetings and we balance the different priorities and we want to be successful on many different dimensions. I wouldn’t assume all of these dimensions are zero sum. Like, for example, when we offer our product in other venues, we drive cash flow in a different way than putting CapEx — so to some extent, that changes the boundary of how we offer our capital boundary as a company also. So I think there’s a general view of there’s a compute shortage, and if you give one, you will have to take from another, I think that’s an overly simplistic view of it, having been in this for long enough and having been, my team does both parts. We are responsible for delivering all the infrastructure for Alphabet, and they’ve done an amazing job doing that, and I’m also responsible for running the cloud business, and you can tell that our differentiation, I come back to this, it would be a different problem if you didn’t have demand. You can, and whenever I ask us to prove that you’ve got demand, I always say, “Look at our results”. Well that’s been the biggest change even since January where there was still some sort of latent skepticism about, “Is all this CapEx worth it?”, feels like those questions have been completely erased at this point. Speaking of markets in the last couple months, all these SaaS companies are getting killed in the market, you have a big SaaS business, you’re definitely not getting killed in the market, why are you escaping it? TK: I think we have transitioned. The core fundamentals is finding, and this is the way we approach our product portfolio, I’ll give you a very simple example — 2023, we said, “Hey, at 2022, we said, we’re not just going to build a secure cloud, we’re also going to start offering cybersecurity products”. When we entered the market and then we looked at what other things people — the value of cyber is driven by two dimensions. Dimension one, “What is it protecting?”, because it has to protect high value things, and the other element is, “How good is it at protecting?”, “What’s the technology that it’s going to use to protect?”. So we said, “There are only two valuable places to protect, there’s either the endpoint”, which is your desktop on which apps run, other people are doing a good job there, the rest of the world is moving all their applications and data to the cloud, let’s protect that. Second, we said AI is going to find vulnerabilities because at the end of the day, finding vulnerabilities is a question of a model really understanding code, and if you can find vulnerabilities at a much more accelerated rate, people need to fix vulnerabilities at an incredibly aggressive, fast rate, and so we started a set of work back then and we said to ensure that we have the leading product portfolio, let’s acquire Wiz. We’re now working on, you’ll see a number of announcements, there’s the Threat Intelligence Agent that allows us to you know understand the threat landscape and use Gemini to prioritize what you should pay attention to where a lot of people are using Gemini to actually scan their code, and then we’re introducing three new Gemini-powered agents with Wiz , one called Red Agent — think of it as continuous red-teaming of your infrastructure, a Blue Agent that says, “Okay, I looked at what’s happening with the Red team and I know what you need to go fix”, and a Green Agent that says, “I’ll fix it for you”, and that’s going to cut the cycle time. Like our Threat Intelligence Agent, you will see reference customers from Chicago Mercantile Exchange, there’s a whole bunch of them talking next week, about how it takes an investigation that just take 30 minutes and does it in 30 seconds, that allows you to get response. Now, this is an example of when we started, people said, “Why would a hyperscaler want to become a cyber company?”, and we were like, “It’s not about being a hyperscaler, it’s about solving that problem at the intersection of — AI is going to accelerate cyber threats and you cannot do repair the old way”. Yep, it really answers the question that people had when you acquired Wiz, which is, “ Why do you need to buy it , why can’t you just build it?”. It’s like, “Well, in two years, it’s going to be too late”. That’s, I think, also felt very tangibly right now. TK: Today, we are where we are because we made that bet. TK: So when people ask, “Why are you guys growing even in sectors that may be struggling?”, it’s because we have differentiation and we made those decisions early. That makes sense. One of the interesting product announcements this year is this cross-cloud lakehouse which lets customers leave their data in AWS and Azure while still being query-able by by your services instantly. Is this the final admission that even if enterprises love your AI and love Gemini, they’re not going to shift all their workloads if they’re already on other clouds? Lots of your products have been about that in the past — even Wiz is about that to a certain exten — but is that just the reality? There’s not going to be a huge amount of spillover as far as pulling things from other clouds to Google. TK: If you use BigQuery today, you don’t have to move your transactional applications to BigQuery. If you’re using Gemini today, you can keep your applications in another cloud and use Gemini to reason on it. The problem we were trying to solve is a very specific problem. Today, when people talk about lakehouses, they say, “We have a multi-cloud lakehouse”. What they really mean is their lakehouse can be run on any cloud, but when it’s running on a particular cloud, you can only access the data in that cloud. And then people say, “That’s crazy, because I’ve got data in a SaaS app like Salesforce”, “I’ve got data in an ERP system”, “I’ve got data in Azure and Amazon, and I’d like to use analysis across all this”, one choice to customers is copy all that data out, that’s expensive for them because of the egress tax that everybody imposes. So we said, “Keep your data there, we can still give you world-class analysis”, and so it’s solving that custody. The customer has a problem, they want to do analysis, there are four things we’re giving them. Keep your data where it is, no matter how many clouds. We’re not talking about a single cloud lakehouse, we’re talking about across all the clouds and across all your SaaS apps, we can do analysis, one. Two, people said, “How fast can you run?”, the proof that we’re going to show is we’re 2x better in price performance than the market leader, right out of the gate. The third one, people said, “I’m not an expert on writing Python and Spark, can you give me essentially vibe coding for Python and Spark?” — yes, you’ll see us introduce a agent manager to generate Python and Spark code using Gemini. And then the last one people said today, Ben, if you ask a question, I was using that example of field service, I’m running a query on, “How much inventory do I have in parts?”, before I send the technician — that information sits inside an application in a set of tables in a database, most organizations have thousands of databases, teaching the model which system has what information, and the notion of part is split across 10 different tables in this particular database, you need a system that builds that semantic graph of all the information in your company. Right, this is the Knowledge Catalog . TK: That’s the catalog, and that gives you super good accuracy when you’re researching information. So we put all this together and back to, we’ve always been super pragmatic. I always say enterprises have certain problems that they see independent of a cloud. For example, security — they don’t want to buy three different security tools from three different hyperscalers. Analytics — they don’t want to buy three different analytic tools from three different hyperscalers. Others have chosen to say, “My stuff only works with my cloud”, that’s why enterprises often choose us, because we work across all the clouds and all the security environments you have and you can keep stuff wherever you are and use Gemini to access and automate stuff for you, so all that is just part of listening to customers. This all makes perfect sense, particularly this bit about the Knowledge Catalog definitely fits how I’ve been thinking. I wrote about this a few years ago about this importance of this whole layer and understanding it, it’s a bit of a big lift to get this in place. You have some sort of analog, say, with like a Palantir that’s putting in like their ontology thing . They have FDEs out on the site, multi-month projects doing this. You have OpenAI talking about Frontier , their agent layer, and they’re partnering with all the tech consultancies to build this out. Is this going to entail a lot of boots on the ground to get this graph working and functional in a way that your agents can operate effectively across it? TK: We’re not competing with Palantir, we’re not building a semantic dictionary or an ontology. What we’re doing is, today I’ll give you the closest analogy. TK: Today when you use a model, let’s say you use Gemini, and you ask a question, Gemini goes through reasoning, and then it shows you a citation. A citation is, “How did I answer the question and what’s the source I derived from?” Now imagine that citation was a query that needed to go to a folder in, for example, a storage system because there’s some documents there and a database because, for example, in a part number, just think about there’s a part number document that lists all the part numbers and sits in a drive and then that part number you need to fetch out to say it’s the modem that the guy is coming to repair, and that’s mapped to a table in a database. So what the graph does, we use Gemini, so we don’t need humans, we use Gemini to say, “Hey, go and read all these documents in these drives and extract the information from it and then match that to the database table that has the reference to the part number”, and so then when Gemini turns around and says, “I got this query about how much inventory of modems they are”, the first thing it does is it says, “Okay, go to the Knowledge Catalog and it says modem is part number one, two, three, four, five”, and then it says, “By the way the table in the database that has the inventory information about this part number is this table, here’s a SQL”, it then makes the quality of what we generate higher and then when it answers the question it shows back — back to your, “Trust my data”, it shows a grounding citation saying, “That’s where we got it from.” What do you need from everyone in the ecosystem if this is going to work, all these SaaS applications and across all these entities, not just what’s in your databases, but what’s in a SAP database or whatever it might be. How do you get them on board so you can understand their data and build this Knowledge Catalog? TK: Really easy, the first thing is to use the lakehouse we support a standard format, industry is very standardized on it, it’s called Iceberg , so anybody who supports Iceberg we can talk to it and so that’s pretty much the whole world right now, so we don’t need them to do anything special to make it work. Second, all of these business systems have API specifications, and our Catalog can learn off of those API specifications, we just teach Gemini to process those, and so we can build a catalog pretty quickly. There are reports that OpenAI on Amazon Bedrock has been massively popular. Are we going to get OpenAI on Vertex? TK: We would love to have them. We are announcing a variety of third-party models on Vertex, including Anthropic, including open source, we’re open to any model provider on Vertex. I believe you. That’s going to be great, when and if it happens. Just one last question. We’ve talked in this interview series previously about how I think, and this is before your time, it’s not your fault, that Google Cloud missed the boat in terms of being a point of integration for the Silicon Valley enterprise ecosystem. I think last year I asked you if AI represented a new opportunity to do that. However, is there a bit where the models, and you’re in this game because you have one of the leading models, is just going to eat everything and is going to gradually expand to do the jobs and everyone else is just going to be a system of record? It’s going to be all one interface, that the integration, such that it is, is all under the surface, it’s not necessarily tying things together in user space. Is Gemini going to be all the user needs in the long run? TK: We don’t see it that way. In fact, one announcement you’ll see us make next week is how many third-party SaaS and ISV [independent software vendors] vendors are embedding Gemini not just as a model, but as an agent platform, because they want to build agents and our agent platform, you can use to build agents, not just our own agents, but they can use it and there’s a lot of independent software vendors embedding those agents. And do they see you as like, “Hey, you’re another established guy, let’s go with you because we don’t know what these other folks are up to, they want to eat all of us”? TK: It’s also the capabilities. The differentiation, I would say, is just think about you’re a bank or an insurance company, and think about you’re a SaaS vendor selling to them or an independent software vendor, there’s a number of things around identity, policy management. For example, if you’re a bank and you have documentation about a person and their credit, you cannot have that egress the bank’s boundary, so we have a gateway that protects against that, that’s part of our agent platform. You want to have auditability on the agent to say which agent did what task on what system when, that’s built into the platform. You want to have a registry where you expose all your skills so that people are not duplicate building all these things, we have a registry that does that. This is sort of the bit we started with at the beginning, it’s not just going to benefit your agents it’s going to benefit all agents, that’s sort of the pitch. TK: So one of the things that people like is the fact that we built all that plumbing for them, and so they don’t have to invest in it, they can focus on the value add that they have on their agent side. Additionally, for companies in this broader ecosystem, the cost of agents — and it becomes part of their bill of materials, if you will, the cost of goods sold — the fact that we have these super efficient chips that run inference with such efficiency eventually translates into cost efficiency for a third party that’s building on top of us. You can see that all of those benefits, we’re taking away all that complexity for these guys, so we definitely don’t see that all the ecosystem is going to die, we definitely don’t see that, we see us facilitating that ecosystem. You’ll see us announcing a number of things, including a substantial investment in dollars to accelerate the partner ecosystem around our platform. Thomas Kurian, great to talk to you again. TK: Thanks so much, Ben. And just in closing, the work that we announce every year at Next is a testament to all those customers and partners who gave us a shot to work with them. You’ll see them telling their story, and it’s a testament to all those people at our organization that made a bet to solve a technical problem a different way, or to bring our technology — we’ve hugely expanded our go-to-market organization, and doing all that with growing top line and operating income at the same time is a testament to the demand we see for our products and services. I mean, six, seven years ago, people used to tell us, “You have no shot in the market”, I think we are now truly uniquely positioned. Name one other player that has the stack of technology to do AI, when I look forward, I think there’s no question in people’s minds that the central problem that companies need to solve and technology providers need to solve is how good is the capability you offer for AI. We’re the only ones with chips, models, the context to feed the models from all of the data infrastructure, the cyber tools, and then a world-class agent platform. I would also add, you’re actually an enterprise company now. The things you talked about, pragmatism, listening to customers, all these pieces, GCP did not have at all a decade ago — there’s a bit where Wiz was ahead of its time, for sure, being forward-looking, but there’s a bit where the organization is ready for this moment in a way I don’t think it would have been previously. I find it very impressive. TK: We are very proud of the team. Also for Alphabet, to do AI well, you have to do a couple of things. One, see the breadth of problems that we see, we see all of the consumer problems, we see the enterprise problems, we see the problems that search sees, we see the problems that YouTube needs, we see all those that we’re solving with AI, that gives us a breadth of capability that the model needs to solve, that over time is a real strength because the diversity of problems we’re solving. Second, in order to do AI well, you have to invest, and in order to invest, you need to monetize in as many different ways as possible. I think we are very confident that our team, we do not have any hubris, but we are confident in where we stand. I think it’s very impressive. I look forward to your keynote. TK: Thanks so much Ben, it’s a privilege to talk to you every year and it’s great that you took the time to speak with me. And it’s all recorded, I can promise you that! This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Zak Knill 2 days ago

SSE token streaming is easy, they said

I wrote about AI having ‘durable sessions’ to support async agentic applications, and in the comments everyone said: “Token streaming over SSE is easy” . …so I figured I’d dig into that claim. Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

0 views