Latest Posts (20 found)

how i enjoy movies

I'm not much of a movie watcher. I somehow prefer watching multiple episodes of a TV show over a few hours over investing 2 hours into a movie. I get antsy in the second half of the movie and episodic stuff can more easily be paused for a break. My wife has gotten me into more movies the past few years though, especially the recent months. Catching up on classics like all the Star Wars movies, Lord of the Rings 1-3, American Psycho, Fight Club, some popular Studio Ghibli movies, some old genre-defining horror movies, and more. What makes movies a lot more bearable to me is talking about them while watching them, even pausing the movie while discussing. I know many people hate this and just want to watch something in peace, not tear it apart during or even be interrupted. Understandably, they don't want the fantasy and make-believe to be destroyed during. But my wife and I are on the same wavelength about this. She is my favorite person to watch movies with because of this. It would bore me to death to sit through 2+ hours in silence, just staring, and then both of us moving on from it and just saying "Yeah it was good.". I need to have some breaks to readjust my position, get something from the kitchen, drink some water, and have minutes in-between just psychoanalyzing characters, giving our interpretations of things that are still unclear, or saying what we would do if we were the characters. Also discussing the broader context, production, if something was real or CGI... I love it. It keeps me engaged, and it makes the movie more memorable for me. I also learn so much more about it and plot details I would have otherwise missed get revealed to me. I especially love watching something with my wife when it's something she is really interested in or has seen multiple times. Last night, we watched an Indiana Jones movie ( Raiders of the Lost Ark ), and I got so much info from her during it. "Harrison Ford improvised this scene because he was tired of reshooting it all the time." "In this scene you can spot C3PO and R2-D2 in the background. And you can see the Ark in the background of a Clone Wars episode." "I think this shot is actually a matte painting on glass." I'm more of a Lara Croft person, and so we also talked about the similarities and differences between the two, especially with Lara's reboot content and her grappling with the fact that her work tends to cause more harm than good, something Indiana doesn't seem to have to face that much. We also discussed some silly stuff; like how the snakes would realistically survive in that pit, and whether a bunch of snakes are flammable or not. All while watching it and occasionally pausing. Technically, we also do this for TV shows. Severance and Pluribus especially, but even X-Files . It's just so good! I just need to engage with someone about what I'm seeing and pick their brain about an aspect of it. Acknowledging something was produced, these were all actors, this didn't really happen, this was CGI, this is a plot inconsistency etc. doesn't ruin the entertainment for us at all :) Reply via email Published 11 Apr, 2026

0 views
iDiallo Today

Your friends are hiding their best ideas from you

Back in college, the final project in our JavaScript class was to build a website. We were a group of four, and we built the best website in class. It was for a restaurant called the Coral Reef. We found pictures online, created a menu, and settled on a solid theme. I was taking a digital art class in parallel, so I used my Photoshop skills to place our logo inside pictures of our fake restaurant. All of a sudden, something clicked. We were admiring our website on a CRT monitor when my classmate pulled me aside. She had an idea. A business idea. An idea so great that she couldn't share it with the rest of the team. She whispered, covering her mouth with one hand so a lip reader couldn't steal this fantastic idea: "what if we build websites for people?" This was the 2000s, of course it was a fantastic idea. The perfect time to spin up an online business after a market crash. But what she didn't know was that, while I was in class in the mornings, my afternoons were spent scouring Craigslist and building crappy websites for a hundred to two hundred dollars a piece. I wasn't going to share my measly spoils. If anything, this was the perfect time to build that kind of service. That's a great idea , I said. There is something satisfying about having an idea validated. A sort of satisfaction we get from the acknowledgment. We are smart, and our ideas are good. Whenever someone learned that I was a developer, they felt this urge to share their "someday" idea. It's an app, a website, or some technology I couldn't even make sense of. I used to try to dissect these ideas, get to the nitty-gritty details, scrutinize them. But that always ended in hostility. "Yeah, you don't get it. You probably don't have enough experience" was a common response when I didn't give a resounding yes. I don't get those questions anymore, at least not framed in the same way. I have worked for decades in the field, and I even have a few failed start-ups under my belt. I'm ready to hear your ideas. But that job has been taken, not by another eager developer with even more experience, or maybe a successful start-up on their résumé. No, not a person. AI took this job. Somewhere behind a chatbot interface, an AI is telling one of your friends that their idea is brilliant. Another AI is telling them to write out the full details in a prompt and it will build the app in a single stroke. That friend probably shared a localhost:3000 link with you, or a Lovable app, last year. That same friend was satisfied with the demo they saw then and has most likely moved on. In the days when I stood as a judge, validating an idea was rarely what sparked a business. The satisfaction was in the telling. And today, a prompt is rarely a spark either. In fact, the prompt is not enough. My friends share a link to their ChatGPT conversation as proof that their idea is brilliant. I can't deny it, the robot has already spoken. I'm not the authority on good or bad ideas. I've called ideas stupid that went on to make millions of dollars. (A ChatGPT wrapper for SMS, for instance.) A decade ago, I was in Y Combinator's Startup School. In my batch, there were two co-founders: one was the developer, and the other was the idea guy. In every meeting, the idea guy would come up with a brand new idea that had nothing to do with their start-up. The instructor tried to steer him toward being the salesman, but he wouldn't budge. "My talent is in coming up with ideas," he said. We love having great ideas. We're just not interested in starting a business, because that's what it actually takes. A friend will joke, "here's an idea" then proceeds to tell me their idea. "If you ever build it, send me my share." They are not expecting me to build it. They are happy to have shared a great idea. As for my classmate, she never spoke of the business again. But over the years, she must have sent me at least a dozen clients. It was a great idea after all.

0 views

BlogLog April 10 2026

Subscribe via email or RSS I added a new page to my blog in the header showing all the specifications of my homelab and self-hosted services. It will be updated as I continue to update my services or infrastructure. Fixed misspellings in Overview of My Homelab post.

0 views
Stratechery Yesterday

2026.15: Myth and Mythos

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Sharp Tech video is on why OpenAI’s enterprise pivot makes sense. Anthropic Anthropic Anthropic . In the current AI era, it feels like a new company is crowned the winner every few months, and right now Anthropic is wearing the crown. However, a point I make on Sharp Tech is that Anthropic’s exponential growth includes the part of the curve everyone misses: the company has been on this once-barely-visible trajectory for nearly two years now. Now the company has what is undoubtedly the most powerful model in the world, so powerful, in fact, that Anthropic says it can’t release it publicly. There’s reason for cynicism, given Anthropic’s history, but the part of the “Boy Cries Wolf” myth everyone forgets is that the wolf did come in the end. — Ben Thompson The New York Times and Another Paradigm Shift. If you’re interested in media, this week’s Stratechery Interview with New York Times CEO Meredith Kopit Levien is a fantastic listen. The  Times  has nailed the internet era better than media company in the world, and they’ve succeeded by making deliberate choices — a paywall before it was cool, a clear point of view, integrated business and editorial strategies — to differentiate themselves from a sea of commoditized content in an era of aggregators and content abundance. That playbook worked wonders for the Times in the previous generation of the internet, and I enjoyed hearing Levien’s thoughts on updating it for an era dominated by AI and video.  — Andrew Sharp The New Yorker  Explains Sam Altman. This week’s Sharp Text hit a few different beats, including thoughts on the Strait of Hormuz and a fun bit of E-ZPass history, but I opened with a take on the sprawling Sam Altman profile from the New Yorker . The 16,000 word profile is certainly an exhaustive recital of questions that have been asked about Altman for more than a decade, but better topics went unexplored. It’s frustrating — and representative of too much tech coverage — that so much effort went into what’s effectively a well-written Wikipedia entry, anchored by a predetermined conclusion, and ignoring more dramatic questions than whether Sam Altman is a good person. — AS OpenAI Buys TBPN, Tech and the Token Tsunami — OpenAI’s purchase of TBPN makes no sense, which may be par for the course for OpenAI. Then, AI is breaking stuff, starting with tech services. Anthropic’s New TPU Deal, Anthropic’s Computing Crunch, The Anthropic-Google Alliance — Anthropic needs compute, and Google has the most: it’s a natural partnership, particularly for Google. Anthropic’s New Model, The Mythos Wolf, Glasswing and Alignment — Anthropic says its new model is too dangerous to release; there are reasons to be skeptical, but to the extent Anthropic is right, that raises even deeper concerns. An Interview with New York Times CEO Meredith Kopit Levien About Betting on Humans With Expertise — An interview with New York Times Company CEO Meredith Kopit Levien about human expertise as a moat against Aggregators and AI. Hormuz, Rushmore and a Sam Altman Story That Missed the Story — On the New Yorker’s profile of Sam Altman, the future in the Middle East, and the power of E-ZPass history . OpenAI Buys TBPN Mythos, Altman, New York Times VLIW: The “Impossible” Computer Gas Turbine Blades and their Heat-Defying Single-Crystal Superalloys A Ceasefire and Reports of PRC Pressure; Another Politburo Investigation; Mythos, DeepSeek, and a Token Crunch An Exclusive Hornets-Suns Report and Mail on LeBron, Wemby, the Pistons, ABS in the NBA, Bulls Fandom for Kids Malone to Carolina and Karnisovas Out in Chicago, Cooper and Kon Battling to the Finish, A Jokic-Wemby Classic in Denver Mythos and Project Glasswing, The Year of Anthropic Continues Apace, Q&A on the NYT, Altman, De-globalization

0 views

Premium: The Hater's Guide to OpenAI

Soundtrack: The Dillinger Escape Plan — Setting Fire To Sleeping Giants In what The New Yorker’s Andrew Marantz and Ronan Farrow called a “tense call” after his brief ouster from OpenAI in 2023, Sam Altman seemed unable to reckon with a “pattern of deception” across his time at the company:  No, he cannot. Sam Altman is a deeply-untrustworthy individual, and like OpenAI lives on the fringes of truth, using a complaint media to launder statements that are, for legal reasons, difficult to call “lies” but certainly resemble them. For example, back in November 2025, Altman told venture capitalist Brad Gerstner that OpenAI was doing “well more” than $13 billion in annual revenue when the company would do — and this is assuming you believe CNBC’s source — $13.1 billion for the entire year . I guarantee you that, if pressed, Altman would say that OpenAI was doing “well more than” $13 billion of annualized revenue at the time, which was likely true based on OpenAI’s stylized math, which works out as so (per The Information): This means that, per CNBC’s reporting, OpenAI barely scratched $10 billion in revenue in 2025, and that every single story about OpenAI’s revenue other than my own reporting (which came directly from Azure) massively overinflates its sales. The Information’s piece about OpenAI hitting $4.3 billion in revenue in the first half of 2025 should really say “$3.44 billion,” but even then, my own reporting suggests that OpenAI likely made a mere $2.27 billion in the first half of last year, meaning that even that $10 billion number is questionable. It’s also genuinely insane to me that more people aren’t concerned about OpenAI, not as a creator of software, but as a business entity continually misleading its partners, the media, and the general public. To put it far more bluntly, the media has failed to hold OpenAI accountable, enabling and rationalizing a company built on deception, rationalizing and normalizing ridiculous and impossible ideas just because Sam Altman said them. Let me give you a very obvious example. About a month ago, per CNBC , “...OpenAI reset spending expectations, telling investors its compute target was around $600 billion by 2030.” This is, on its face, a completely fucking insane thing to say, even if OpenAI was a profitable company. Microsoft, a company with hundreds of billions of dollars of annual revenue, has about $42 billion in quarterly operating expenses .  OpenAI cannot afford to pay these agreements. At all. Hell, I don’t think any company can! And instead of saying that, or acknowledging the problem, CNBC simply repeats the statement of “$600 billion in compute spend,” laundering Altman and OpenAI’s reputation as it did (with many of the same writers and TV hosts) with Sam Bankman-Fried . CNBC claimed mere months before the collapse of FTX that it had grown revenue by 1,000% “during the crypto craze,” with its chief executive having “ ...survived the market wreckage and still expanded his empire .” You might say “how could we possibly know?” and the answer is “read CNBC’s own reporting that said that Bankman-Fried intentionally kept FTX in the Bahamas ,” which said that Bankman-Fried had intentionally reduced his stake in Canadian finance firm Voyager ( which eventually collapsed on similar terms to FTX ) to avoid regulatory disclosures around (Bankman-Fried’s investment vehicle) Alameda’s finances. This piece was written by a reporter that has helped launder the reputation of Stargate Abilene , claiming it was “online” despite only a fraction of its capacity actually existing.  The same goes for OpenAI’s $300 billion deal with Oracle that OpenAI cannot afford and Oracle does not have the capacity to serve . These deals do not make any logical sense, the money does not exist, and the utter ridiculousness of reporting them as objective truths rather than ludicrous overpromises allowed Oracle’s stock to pump and OpenAI to continue pretending it could actually ever have hundreds of billions of dollars to spend. OpenAI now claims it makes $2 billion a month , but even then I have serious questions about how much of that is real money considering the proliferation of discounted subscriptions (such as ones that pop up when you cancel that offer you three months of discounted access to ChatGPT Plus ) and free compute deals, such as the $2500 given to Ramp customers , millions of tokens in exchange for sharing your data , the $100,000 token grants given to AI policy researchers , and the OpenAI For Startups program that appears to offer thousands (or even tens of thousands) of dollars of tokens to startups . While I don’t have proof, I would bet that OpenAI likely includes these free tokens in its revenues and then counts them as part of its billions of dollars of sales and market spend . I also think that revenue growth is a little too convenient, accelerating only to match Anthropic, which recently “hit” $30 billion in annualized revenue under suspicious circumstances . I can only imagine OpenAI will soon announce that it’s actually hit $35 billion in annualized revenue , or perhaps $40 billion in annualized revenue , and if that happens, you know that OpenAI is just making shit up.  Regardless, even if OpenAI is actually making $2 billion a month in revenue, it’s likely losing anywhere from $4 billion to $10 billion to make that revenue. Per my own reporting from last year, OpenAI spent $8.67 billion on inference to make $4.329 billion in revenue , and that’s not including training costs that I was unable to dig up — and those numbers were before OpenAI spent tens of millions of dollars in inference costs propping up its doomed Sora video generation product , or launched its Codex coding environment. In simpler terms, OpenAI’s costs have likely accelerated dramatically with its supposed revenue growth. And all of this is happening before OpenAI has to spend the majority of its capital. Oracle has, per my sources in Abilene, only managed to successfully build and generate revenue from two buildings out of the eight that are meant to be done by the end of the year, which means that OpenAI is only paying a small fraction of the final costs of one Stargate data center. Its $138 billion deal with Amazon Web Services is only in its early stages, and as I explained a few months ago in the Hater’s Guide To Microsoft , Redmond’s Remaining Performance Obligations that it expects to make revenue from in the next 12 months have remained flat for multiple quarters, meaning that OpenAI’s supposed purchase of “ an incremental $250 billion in Azure compute ” are yet to commence. In practice, this means that OpenAI’s expenses are likely to massively increase in the coming months. And while the “ $122 billion ” funding round it raised — with $35 billion of it contingent on either AGI or going public (Amazon), and $60 billion of it paid in tranches by SoftBank and NVIDIA — may seem like a lot, keep in mind that OpenAI had received $22.5 billion from SoftBank on December 31 2025 , a little under four months ago.  This suggests that either OpenAI is running out of capital, or has significant up-front commitments it needs to fulfil, requiring massive amounts of cash to be sent to Amazon, Microsoft, CoreWeave ( which it pays on net 360 terms ) and Oracle.  And if I’m honest, I think the entire goal of the funding round was to plug OpenAI’s leaky finances long enough to take it public, against the advice of CFO Sarah Friar. One under-discussed part of Farrow and Marantz’s piece was a quote about OpenAI’s overall finances, emphasis mine : As I wrote up earlier in the week , OpenAI CFO Sarah Friar does not believe, per The Information , that OpenAI is ready to go public, and is concerned about both revenue growth slowing and OpenAI’s ability to pay its bills: To make matters worse, Friar also no longer reports to Altman — and god is it strange that the CFO doesn’t report to the CEO! — and it’s actually unclear who it is she reports to at all, as her current report, Fiji Simo, has taken an indeterminately-long leave of medical absence . Friar has also, per The Information, been left out of conversations around financial planning for data center capacity. These are the big, flashing warning signs of a company with serious financial and accounting issues, run by Sam Altman, a CEO with a vastly-documented pattern of lies and deceit. Altman is sidelining his CFO, rushing the company to go public so that his investors can cash out and the larger con of OpenAI can be dumped onto public investors. And beneath the surface, the raw economics of OpenAI do not make sense. You’ll notice I haven’t talked much about OpenAI’s products yet, and that’s because I do not believe they can exist without venture capital funding them and the customers that buy them. These products only have market share as long as other parties continue to build capacity or throw money into the furnace. To explain: While OpenAI is not systemically necessary , the continued enabling and normalization of its egregious and impossible promises has created an existential threat to multiple parties named above. Its continued existence requires more money than anybody has ever raised for a company — private or public — and in the event it’s allowed to go public, I believe that both retail investors and large equity investors like SoftBank will be left holding the bag. OpenAI has a fundamental lack of focus as a business, despite how many articles have claimed over the last year that it’s working on a “SuperApp” and has some sort of renewed plan to take on whoever it is that OpenAI perceives as the competition in any given calendar month.  Everything OpenAI does is a reaction to somebody else. Its Atlas browser was a response to Perplexity’s Comet browser , its first ( of multiple! ) Code Reds in 2025 was a reaction to Google’s Gemini 3, and its rapid deployment of its Codex model and platform was to compete with Anthropic’s Claude Code . I’ve read about this company and the surrounding industry for hours a day for several years, and I can’t think of a single product that OpenAI has launched first . Even its video-generating social network app Sora was beaten to market by five days by Meta’s putrid and irrelevant “Vibes.” Actually, that’s not true. OpenAI did have one original idea in 2025 — the launch of GPT-5, a much-anticipated new model launch that included a “model router” to make it “more efficient,” except it turned out that it boofed on benchmarks and that the model router actually made it (as I reported last year) more expensive , which led to the router being retired in December 2025 .  I tend to be pretty light-hearted in what I write, but please take me seriously when I say I have genuine concerns about the dangers posed by OpenAI. I believe that OpenAI is an incredibly risky entity, not due to the power of its models or its underlying assets, but due to Sam Altman’s ability to con people and find others that will con in his stead. Those responsible for rooting out con artists — regulators, investors, and the media — have not simply failed , but actively assisted Altman in this con. Here’re the crucial elements of the con: Sam Altman is a dull, mediocre man that loves money and power. He appears to be superficially charming, but his actual skill is ingratiating himself with others and having them owe him favors, or feel somehow indebted to him otherwise. He remembers people’s names and where he met them, and is very good at emailing people, writing checks, or finding reasons for somebody else to write a check. He is not technical — he can barely code and misunderstands basic machine learning ( to quote Futurism ) — but is very good at making the noises that people want to hear, be they big scary statements that confirm their biases or massive promises of unlimited revenue that don’t really make any rational sense. While OpenAI might have started on noble terms, it has since morphed into a massive con led by the Valley’s most-notable con artist.  I realize that those who like AI might find this offensive, but what else do you call somebody who makes promises they can’t keep ($300 billion to Oracle, $200 billion of revenue by 2030), spreads nonsensical financials (promises to spend $600 billion in compute), makes announcements of deals that don’t exist (see: NVIDIA’s $100 billion funding and the entire Stargate project), and speaks in hyperbolic terms to pump the value of his stock (such as basically every time he talks about Superintelligence). Altman has taken advantage of a tech and business media that wants to see him win, a market divorced from true fundamentals, desperate venture capitalists at the end of their rope , hyperscalers that have run out of hypergrowth ideas , and multiple large companies like Oracle and SoftBank that are run by people that can’t do maths. OpenAI is a psuedo-company that can only exist with infinite resources, its software sold on lies, its infrastructure built and paid for by other parties, and its entire existence fueled by compounding layers of leverage and risk.  OpenAI has never made sense, and was only rationalized through a network of co-conspirators. OpenAI has never had a path to profitability, and never had a product that was worthy of the actual cost of selling it. The ascension of this company has only been possible as part of an exploitation of ignorance and desperation, and its collapse will be dangerous for the entire tech industry. Today I’ll explain in great detail the sheer scale of Sam Altman’s con, how it was exacted, the danger it poses to its associated parties, and how it might eventually collapse. This is the Hater’s Guide To OpenAI, or Sam Altman, Freed.  OpenAI’s ChatGPT Subscriptions are, like every LLM product, deeply unprofitable, which means that OpenAI needs constant funding to keep providing them. I have found users of OpenAI Codex who have been able to burn between $1,000 and $2,000 in the space of a week on a $200-a-month subscription, and OpenAI just reset rate limits for the second time in a month. This isn’t a real business. OpenAI’s API customers (the ones paying for access to its models) are, for the most part, venture-backed startups providing services like Cursor and Perplexity that are powered by these models. These startups are all incredibly unprofitable, requiring them to raise hundreds of millions of dollars every few months ( as is the case with Harvey , Lovable, and many other big-name AI firms), which means that a large chunk — some estimate around 27% of its revenue — is dependent on customers that stop existing the moment that venture capital slows down. OpenAI’s infrastructure partners like CoreWeave and Oracle are taking on anywhere from a few billion to over a hundred billion dollars’ worth of debt to build data centers for OpenAI, putting both companies in material jeopardy in the event of OpenAI’s failure to pay or overall collapse. 67% of CoreWeave’s 2025 revenue came from Microsoft renting capacity to rent to OpenAI , and $22 billion (32%) of of CoreWeave’s $66.8 billion in revenue backlog , which requires it to build more capacity to fill.  Oracle took on $38 billion in debt in 2025 , and is in the process of raising another $50 billion more as it lays off thousands of people , with said debt’s only purpose being building data center capacity for OpenAI. OpenAI’s lead investor SoftBank is putting its company in dire straits to fund the company, with over $60 billion invested in the company so far, existentially tying SoftBank’s overall financial health to both OpenAI’s stock price and SoftBank’s ability to continue paying (or refinancing) its loans. SoftBank took on a year-long $15 billion bridge loan in 2025 , had to sell its entire stake in NVIDIA , and expand its ARM-stock-backed margin loan to over $11 billion to give OpenAI $30 billion in 2025, and then took on another $40 billion bridge loan a few weeks ago to fund the $30 billion it promised for OpenAI’s latest funding round . Creating a halo of uncertainty around the actual efficacies of LLMs, to the point that a cult of personality grew around a technology that obfuscated its actual outcomes and efficacies to the point that it could be sold based on what it might do rather than what it actually does . Creating a halo of “genius” around Altman himself, aided by constant and vague threats of human destruction with the suggestion that only Altman could solve them. Normalizing the idea that it’s both necessary and important to let a company burn billions of dollars. Normalizing the idea that it’s okay that a company has perpetual losses, and perpetuating the idea that these losses are necessary for innovation to continue at large.

0 views

Moving my mobile numbers to VoIP

For the last year or so I’ve been running three eSIMs on my iPhone: personal, work, and a data-only travel SIM that swaps in whenever I’m abroad. iOS only lets two eSIMs be active at any one time, which meant a small but constant dance of enabling and disabling profiles depending on what I was doing that day. I’ve now ported both my personal and work mobile numbers to VoIP, and the eSIM juggling is gone. The nudge came from Michael Bazzell’s Extreme Privacy: What It Takes to Disappear , which recommends moving your “real” numbers off a carrier and onto a VoIP provider as part of a broader privacy strategy. For Bazzell the point is untangling your identity from the mobile network. For me it’s almost entirely convenience. Whichever phone I pick up in the morning rings for both numbers, and the data SIM can sit wherever it’s most useful without me having to decide which mobile identity to sacrifice for the day. I’m using Andrews & Arnold (AAISP) as the VoIP provider. I’ve used them for broadband on and off for years and they remain one of the few ISPs I’d actively recommend: technically competent, refreshingly honest, and perfectly happy for you to do slightly unusual things with your service. Porting two mobile numbers to them was painless. For the client I’m using Groundwire from Acrobits. I’ve been through plenty of SIP clients over the years and most of them are either ugly, flaky on push, or weirdly hostile to the idea of multiple accounts. Groundwire is the first one that’s felt like a proper phone replacement. Push notifications actually work, call quality is good, and it handles multiple accounts without any drama. AAISP exposes SMS through a plain-text HTTP API, and Groundwire expects messages to be delivered via its own web service hooks in XML. The two formats don’t match, so out of the box sending and receiving text messages just didn’t work: calls were fine, but SMS was effectively dead. I ended up writing a small PHP proxy that sits between them. Outbound messages go from Groundwire into the proxy, get reshaped, and hit the AAISP API. Inbound messages arrive via an AAISP webhook, get stored in SQLite, and are picked up the next time Groundwire polls. It also pokes Acrobits’ push service when something arrives, so iOS actually surfaces the notification rather than silently waiting on the next poll cycle. It’s called aaisp-sms-proxy and it’s on GitHub if anyone else is in the same boat. AAISP credentials stay server-side, each number gets its own token so they’re properly isolated, and there’s a tiny bit of rate limiting and log sanitisation in there because it’s on the public internet. I use it every day now and mostly forget it’s there. The other reason this matters is that I’m planning to move my daily driver to GrapheneOS . If your numbers live on a physical or embedded SIM, switching devices is a faff: SIM swaps, eSIM transfers, carrier-app dances, the lot. With VoIP the numbers live in an account, so I install Groundwire on whichever phone I’m carrying and it just rings. Pixel one day, iPhone the next, both at the same time if I want. The one remaining puzzle is Signal. Signal still treats the phone as the primary device and the desktop clients as tethered secondaries, which is fine for a single-phone setup but doesn’t quite fit mine. I want something closer to proper multi-device: two phones, both independently functional, one potentially offline for weeks at a time without losing messages when it comes back online. That isn’t how Signal is designed to work today, so figuring out a sensible workaround is next on the list. If you’re reading Bazzell and coming at this from a privacy angle, AAISP isn’t the answer. They’re a UK telco and they verify you like any other provider, so the number is still firmly tied to your legal identity. Moving off a SIM buys you some separation from the mobile network itself, but not the kind of disappearance the book describes. For that you’d want a provider willing to sell you a number without identity checks, and AAISP explicitly doesn’t. My goal was never to vanish, just to stop playing eSIM Tetris every time I landed in another country. The juggling is gone.

0 views
Kev Quirk Yesterday

I've Completed 100 Days To Offload (Again)

I just published my motorbike servicing rant and went over to my Pure Blog Dashboard to take a look at some stats, when I noticed this: 101 posts in the last year; which means I've complete 100 Days to Offload for a second time! 🎉 The whole point of the is to challenge you to publish 100 posts on your personal blog in a year. Mission accomplished! If you're interested in taking part in the challenge too, make sure you get yourself added to the hall of fame once you've completed it. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Kev Quirk Yesterday

Motorbike Servicing Rant

So my BMW S1000XR is now a year old and it's going in for its first "full service" . It had it's "break in" service after a few weeks of ownership, but that's just an oil change. New bikes come with a very thin oil inside the engine that's used to help with the break-in process. After 500 or so miles, this needs to be swapped out for proper oil. I contacted the dealership for a price and some potential dates, this is the breakdown they came back with: So nearly £350 for what's effectively an hour's work and around £50 in parts. I'm mechanically minded and could easily do this at home, but like most modern vehicles, my BMW doesn't come with a service book that is stamped. These days the service history is all stored centrally with BMW, so means that the service has to be carried out by them. There is a misconception that home servicing will void the warranty of a new bike. It won't as long as the person doing the service uses OEM parts and has done it to manufacturers specification - which I always do. But I bought this bike from BMW, so if I hand it back after 3 years with a generic eBay service book that's been stamped by me, even though it's been done to a high standard, it will affect the trade-in value. Ipso facto, they have me by the balls. I get it, margins are small and this is how dealerships make money, but I wish they would make it accessible for mechanically minded people, like me, to service at home. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Labour - £150 Oil disposal - £20 Oil - £80.60 Sump plug washer - £0.96 Oil filter - £17.29 Brake fluid - £11.92 Tax @ 20% - £56.15 Total: £336.92 (~$455)

0 views
David Bushell Yesterday

No-stack web development

This year I’ve been asked more than ever before what web development “stack” I use. I always respond: none. We shouldn’t have a go-to stack! Let me explain why. My understanding is that a “stack” is a choice of software used to build a website. That includes language and tooling, libraries and frameworks , and heaven forbid: subscription services. Text editors aren’t always considered part of the stack but integration is a major factor. Web dev stacks often manifest as used to install hundreds of megs of JavaScript, Blazing Fast ™ Rust binaries, and never ending supply chain attacks . A stack is also technical debt, non-transferable knowledge, accelerated obsolescence, and vendor lock-in. That means fragility and overall unnecessary complication. Popular stacks inevitably turn into cargo cults that build in spite of the web, not for it. Let’s break that down. If you have a go-to stack, you’ve prescribed a solution before you’ve diagnosed a problem. You’ve automatically opted in to technical baggage that you must carry the entire project. Project doesn’t fit the stack? Tough; shoehorn it to fit. Stacks are opinionated by design. To facilitate their opinions, they abstract away from web fundamentals. It takes all of five minutes for a tech-savvy person to learn JSON . It takes far, far longer to learn Webpack JSON . The latter becomes useless knowledge once you’ve moved on to better things. Brain space is expensive. Other standards like CSS are never truly mastered but learning an abstraction like Tailwind will severely limit your understanding. Stacks are a collection of move-fast-and-break churnware; fleeting software that updates with incompatible changes, or deprecates entirely in favour of yet another Rust refactor. A basic HTML document written 20 years ago remains compatible today. A codebase built upon a stack 20 months ago might refuse to play. The cost of re-stacking is usually unbearable. Stack-as-a-service is the endgame where websites become hopelessly trapped. Now you’re paying for a service that can’t fix errors . You’ve sacrificed long-term stability and freedom for “developer experience”. I’m not saying you should code artisanal organic free-range websites. I’m saying be aware of the true costs associated with a stack. Don’t prescribed a solution before you’ve diagnosed a problem. Choose the right tool for each job only once the impact is known. Satisfy specific goals of the website, not temporary development goals. Don’t ask a developer what their stack is without asking what problem they’re solving. Be wary of those who promote or mandate a default stack. Be doubtful of those selling a stack. When you develop for a stack, you risk trading the stability of the open web platform, that is to say: decades of broad backwards compatibility, for GitHub’s flavour of the month. The web platform does not require build toolchains. Always default to, and regress to, the fundamentals of CSS, HTML, and JavaScript. Those core standards are the web stack. Yes, you’ll probably benefits from more tools. Choose them wisely. Good tools are intuitive by being based on standards, they can be introduced and replaced with minimal pain. My only absolute advice: do not continue legacy frameworks like React . If that triggers an emotional reaction: you need a stack intervention! It may be difficult to accept but Facebook never was your stack; it’s time to move on. Use the tool, don’t become the tool. Edit: forgot to say: for personal projects, the gloves are off. Go nuts! Be the churn. Learn new tools and even code your own stack. If you’re the sole maintainer the freedom to make your own mistakes can be a learning exercise in itself. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Daniel Mangum Yesterday

PSA Crypto: The P is for Portability

Arm’s Platform Security Architecture (PSA) was released in 2017, but it was two years until the first beta release of the PSA Cryptography API in 2019, and another year until the 1.0 specification in 2020. Aimed at securing connected devices and originally targeting only Arm-based systems, PSA has evolved with the donation of the PSA Certified program to GlobalPlatform in 2025, allowing non-Arm devices, such as popular RISC-V microcontrollers (MCUs), to achieve certification.

0 views

watgo - a WebAssembly Toolkit for Go

I'm happy to announce the general availability of watgo - the W eb A ssembly T oolkit for G o. This project is similar to wabt (C++) or wasm-tools (Rust), but in pure, zero-dependency Go. watgo comes with a CLI and a Go API to parse WAT (WebAssembly Text), validate it, and encode it into WASM binaries; it also supports decoding WASM from its binary format. At the center of it all is wasmir - a semantic representation of a WebAssembly module that users can examine (and manipulate). This diagram shows the functionalities provided by watgo: watgo comes with a CLI, which you can install by issuing this command: The CLI aims to be compatible with wasm-tools [1] , and I've already switched my wasm-wat-samples projects to use it; e.g. a command to parse a WAT file, validate it and encode it into binary format: wasmir semantically represents a WASM module with an API that's easy to work with. Here's an example of using watgo to parse a simple WAT program and do some analysis: One important note: the WAT format supports several syntactic niceties that are flattened / canonicalized when lowered to wasmir . For example, all folded instructions are lowered to unfolded ones (linear form), function & type names are resolved to numeric indices, etc. This matches the validation and execution semantics of WASM and its binary representation. These syntactic details are present in watgo in the textformat package (which parses WAT into an AST) and are removed when this is lowered to wasmir . The textformat package is kept internal at this time, but in the future I may consider exposing it publicly - if there's interest. Even though it's still early days for watgo, I'm reasonably confident in its correctness due to a strategy of very heavy testing right from the start. WebAssembly comes with a large official test suite , which is perfect for end-to-end testing of new implementations. The core test suite includes almost 200K lines of WAT files that carry several modules with expected execution semantics and a variety of error scenarios exercised. These live in specially designed .wast files and leverage a custom spec interpreter. watgo hijacks this approach by using the official test suite for its own testing. A custom harness parses .wast files and uses watgo to convert the WAT in them to binary WASM, which is then executed by Node.js [2] ; this harness is a significant effort in itself, but it's very much worth it - the result is excellent testing coverage. watgo passes the entire WASM spec core test suite. Similarly, we leverage wabt's interp test suite which also includes end-to-end tests, using a simpler Node-based harness to test them against watgo. Finally, I maintain a collection of realistic program samples written in WAT in the wasm-wat-samples repository ; these are also used by watgo to test itself. Parse: a parser from WAT to wasmir Validate: uses the official WebAssembly validation semantics to check that the module is well formed and safe Encode: emits wasmir into WASM binary representation Decode: read WASM binary representation into wasmir

0 views
Evan Hahn Yesterday

In defense of GitHub's poor uptime

In short: GitHub’s downtime is bad, but uptime numbers can be misleading. It’s not as bad as it looks; more like a D than an F. 99.99% uptime, or “four nines”, is a common industry standard. Four nines of uptime is equivalent to 1.008 minutes of downtime per week. GitHub is not meeting that, and it’s frustrating. Even though they’re owned by Microsoft’s, one of the richest companies on earth, they aren’t clearing this bar. Here are some things people are saying: According to “The Missing GitHub Status Page” , which reports historical uptime better than GitHub’s official source, they’ve had 89.43% uptime over the last 90 days. That’s zero nines of uptime. That implies more than 2.5 hours of downtime every day ! I dislike GitHub and Microsoft, so I shouldn’t be coming to their defense, but I think this characterization is unfair. I’m no mathematician, but let’s do a little math. Let’s say your enterprise has two services: Service A and Service B. Over the last 10 days: 3 of the last 10 days had outages. That’s 70% uptime total. (That’s how the Missing GitHub Status Page calculates it.) GitHub’s status page lists ten services: core Git operations, webhooks, Issues, and more. Sometimes they’re down simultaneously, but usually not. If all ten of those services have 99% uptime and outages don’t overlap, it’d look like GitHub had 90% uptime because some part of GitHub is out 10% of the time. That’s much worse! The numbers look better if outages happen at the same time. For example, if Service A and Service B go down on Saturday and Sunday, you’d have 80% uptime overall instead of 70%. Compared to the previous scenario, Service A is down twice as long, but the uptime number looks better. A downstream effect of this calculation is that your uptime numbers look worse if your services are well-isolated . I think it’s good that Service A doesn’t take down Service B! I think it’s good that a GitHub Packages outage doesn’t take down GitHub Issues! But if all you see is one aggregate uptime number, you might miss that. Things look rosier when you look at features individually. Over the last 90 days, core Git operations have had 98.98% uptime, or about 22 hours where things were broken. That’s still bad, but not as bad as some people are saying. D tier, not F tier. Also, an incident doesn’t mean everything is broken. For example, GitHub recently had an issue where things were slow for users on the west coast of the United States. Not good , but not “everything is broken for all users”. Again, the number doesn’t tell the whole story. I still think GitHub’s uptime is unacceptably low, especially because they’re owned by Microsoft, but I don’t think we’re being honest when we say that GitHub has “zero nines” of availability. To me, it’s more like: they have a bunch of unstable services which cumulatively have horrible uptime, but individually have not-very-good uptime. There are better reasons to dislike these companies. “GitHub appears to be struggling with measly three nines availability” “World’s First Enterprise Solution With Zero Nines Uptime” “Sure, they may have made the uptime worse, but remember what we got in exchange – when it’s up, the UI is slower and buggier.” Service A had one day of downtime. That means it has 90% uptime. Service B had two days of downtime on different days. That means it has 80% uptime.

0 views

Has Mythos just broken the deal that kept the internet safe?

For nearly 20 years the deal has been simple: you click a link, arbitrary code runs on your device, and a stack of sandboxes keeps that code from doing anything nasty. Browser sandboxes for untrusted JavaScript, VM sandboxes for multi-tenant cloud, ad iframes so banner creatives can't take over your phone or laptop - the modern internet is built on the assumption that those sandboxes hold. Anthropic just shipped a research preview that generates working exploits for one of them 72.4% of the time, up from under 1% a few months ago. That deal might be breaking. From what I've read Mythos is a very large model. Rumours have pointed to it being similar in size to the short lived (and very underwhelming) GPT4.5 . As such I'm with a lot of commentators in thinking that a primary reason this hasn't been rolled out further is compute. Anthropic is probably the most compute starved major AI lab right now and I strongly suspect they do not have the compute to roll this out even if they wanted more broadly. From leaked pricing, it's expensive as well - at $125/MTok output (5x more than Opus, which is itself the most expensive model out there). One thing that has really been overlooked with all the focus on frontier scale models is how quickly improvements in the huge models are being achieved on far smaller models. I've spent a lot of time with Gemma 4 open weights model, and it is incredibly impressive for a model that is ~50x smaller than the frontier models. So I have no doubt that whatever capabilities Mythos has will relatively quickly be available in smaller, and thus easier to serve, models. And even if Mythos' huge size somehow is intrinsic to the abilities (I very much doubt this, given current progress in scaling smaller models) it has, it's only a matter of time before newer chips [1] are able to serve it en masse. It's important to look to where the puck is going. As I've written before, LLMs in my opinion pose an extremely serious cybersecurity risk. Fundamentally we are seeing a radical change in how easy it is to find (and thus exploit) serious flaws and bugs in software for nefarious purposes. To back up a step, it's important to understand how modern cybersecurity is currently achieved. One of the most important concepts is that of a sandbox . Nearly every electronic device you touch day to day has one (or many) layers of these to protect the system. In short, a sandbox is a so called 'virtualised' environment where software can execute on the system, but with limited permissions, segregated from other software, with a very strong boundary that protects the software 'breaking out' of the sandbox. If you're reading this on a modern smartphone, you have at least 3 layers of sandboxing between this page and your phone's operating system. First, your browser has (at least) two levels of sandboxing. One is for the JavaScript execution environment (which runs the interactive code on websites). This is then sandboxed by the browser sandbox, which limits what the site as a whole can do. Finally, iOS or Android then has an app sandbox which limits what the browser as a whole can do. This defence in depth is absolutely fundamental to modern information security, especially allowing users to browse "untrusted" websites with any level of security. For a malicious website to gain control over your device, it needs to chain together multiple vulnerabilities, all at the same time. In reality this is extremely hard to do (and these kinds of chains fetch millions of dollars on the grey market ). Guess what? According to Anthropic, Mythos Preview successfully generates a working exploit for Firefox's JS shell in 72.4% of trials. Opus 4.6 managed this in under 1% of trials in a previous evaluation: Worth flagging a couple of caveats. The JS shell here is Firefox's standalone SpiderMonkey - so this is escaping the innermost sandbox layer, not the full browser chain (the renderer process and OS app sandbox still sit on top). And it's Anthropic's own benchmark, not an independent one. But even hedging both of those, the trajectory is what matters - we're going from "effectively zero" to "72.4% of the time" in one model generation, on a real-world target rather than a toy CTF. This is pretty terrifying if you understand the implications of this. If an LLM can find exploits in sandboxes - which are some of the most well secured pieces of software on the planet - then suddenly every website you aimlessly browse through could contain malicious code which can 'escape' the sandbox and theoretically take control of your device - and all the data on your phone could be sent to someone nasty. These attacks are so dangerous because the internet is built around sandboxes being safe. For example, each banner ad your browser loads is loaded in a separate sandboxed environment. This means they can run a huge amount of (mostly) untested code, with everyone relying on the browser sandbox to protect them. If that sandbox falls, then suddenly a malicious ad campaign can take over millions of devices in hours. Equally, sandboxes (and virtualisation) are fundamental to allowing cloud computing to operate at scale. Most servers these days are not running code against the actual server they are on. Instead, AWS et al take the physical hardware and "slice" it up into so called "virtual" servers, selling each slice to different customers. This allows many more applications to run on a single server - and enables some pretty nice profit margins for the companies involved. This operates on roughly the same model as your phone, with various layers to protect customers from accessing each other's data and (more importantly) from accessing the control plane of AWS. So, we have a very, very big problem if these sandboxes fail, and all fingers point towards this being the case this year. I should tone down the disaster porn slightly - there have been many sandbox escapes before that haven't caused chaos, but I have a strong feeling that this is going to be difficult. And to be clear, when just AWS us-east-1 goes down (which it has done many , many , times ) it is front page news globally and tends to cause significant disruption to day to day life. This is just one of AWS's data centre zones - if a malicious actor was able to take control of the AWS control plane it's likely they'd be able to take all regions simultaneously, and it would likely be infinitely harder to restore when a bad actor was in charge, as opposed to the internal problems that have caused previous problems - and been extremely difficult to restore from in a timely way. Given all this it's understandable that Anthropic are being cautious about releasing this in the wild. The issue though, is that the cat is out of the bag. Even if Anthropic pulled a Miles Dyson and lowered their model code into a pit of molten lava, someone else is going to scale an RL model and release it. The incentives are far, far too high and the prisoner's dilemma strikes again. The current status quo seems to be that these next generation models will be released to a select group of cybersecurity professionals and related organisations, so they can fix things as much as possible to give them a head start. Perhaps this is the best that can be done, but this seems to me to be a repeat of the famous "obscurity is not security" approach which has become a meme in itself in the information security world. It also seems far fetched to me that these organisations who do have access are going to find even most of the critical problems in a limited time window. And that brings me to my final point. While Anthropic are providing $100m of credit and $4m of 'direct cash donations' to open source projects, it's not all open source projects. There are a lot of open source projects that everyone relies on without realising. While the obvious ones like the Linux kernel are getting this "access" ahead of time, there are literally millions of pieces of open source software (nevermind commercial software) that are essential for a substantial minority of systems operation. I'm not quite sure where the plan leaves these ones. Perhaps this is just another round in the cat and mouse cycle that reaches a mostly stable equilibrium, and at worst we have some short term disruption. But if I step back and look how fast the industry has moved over the past few years - I'm not so sure. And one thing I think is for certain, it looks like we do now have the fabled superhuman ability in at least one domain. I don't think it's the last. Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎ Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎

0 views
Giles's blog Yesterday

Writing an LLM from scratch, part 32j -- Interventions: trying to train a better model in the cloud

Since early February, I've been trying various interventions on a 163M-parameter GPT-2-style model that I trained from scratch on my local RTX 3090 , using code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". My original model got a loss of 3.944 on my test set, while the original GPT-2 weights got 3.500 on the same dataset. I wanted to see if I could close that gap, and had a list of potential changes to the training setup, and to the model itself. Which of them would help? I found a list of solid-looking interventions, and in my last post I came to the conclusion that the improvements in loss I had seen with all of them -- with two possible exceptions -- seemed unlikely to be in the noise. What would happen if I tried to put them into a new model? Let's start by looking at the results that we have for the interventions so far -- this is the table I've been using as I go through them, but I've updated it to contain the loss figures for each model to six decimal places instead of three, and made each model name link to the associated post. I've also corrected the loss for the model, which was mistakenly using the training loss at the end of the run rather than the loss on the test set 1 . As I've mentioned before, simply moving to training in the cloud improved things markedly, getting loss down from 3.944 to 3.691526; I suspect this was due to having a closer-to-optimal batch size (more about that in my next post). What to do about the other interventions, though? It seemed clear that two of them were not helping: weight tying, and the one using the figure for weight decay that I'd (I suspect incorrectly) derived from a paper by Cerebras Research. The "no-AMP" run (which would be better described as "full-fat float32") had a small positive effect, but was so costly in terms of both time and money that it wasn't worthwhile. So we had five interventions to try: How would they stack up? It seemed pretty unlikely that their independent contributions would just sum up neatly so that we got a total improvement of 0.013209 + 0.022141 + 0.048586 + 0.050244 + 0.089609 = 0.223789 (though that would certainly be nice!). One question to consider was how independent they were. For any set of interventions, you can imagine them being independent and adding up nicely, or pulling in separate directions so that the combined effect is worse than the sum, or pulling in the same direction so that they amplify each other. My intuition was that gradient clipping and removing dropout were pretty independent, at least conceptually. They might affect other interventions indirectly (eg. via changing the training run's use of the random number generator) but they'd be unlikely to have a direct effect. QKV bias I was less sure about, but it seemed -- again, just intuitively -- at least reasonably independent of the others, with one important exception (which I'll get into below). By contrast, weight decay and the learning rate interact together quite strongly, at least in standard gradient descent, and I'd tested them in isolation. The result for changing the weight decay to 0.01 was based on a fixed learning rate of 0.0004, and the result for scheduling the learning rate was based on a weight decay of 0.1. That felt like an issue, and definitely needed some thought. Additionally, there were some issues with which interventions might have not had a real effect, and instead just been the results of the use of randomness. While my analysis of how that might have affected things was somewhat limited by the number of test runs I could afford to do, it did show up two plausible issues: After some thought, I came up with a plan. If I were doing this properly and scientifically, I suppose I'd try every combination of interventions, but that would be ruinously expensive 2 , so a sensible minimal set of training runs felt like this: When those completed, I'd find the test set loss for both models. I'd choose the best run, and then do another run with those settings, but with weight decay switched back to the original value of 0.1. I chose to revert weight decay rather than the learning rate stuff because this was the one I was least sure about -- the updated "GPT-2" value of 0.01 is very unusual by today's standards, and I'd come to it via a rather circuitous route -- see the post for more details. The best of the three runs would be the winning combination of interventions. Again, this was not an exhaustive plan 3 . But it seemed to make sense. Let's see how it turned out. Just to recap, this one had these interventions against the baseline: It did not have QKV bias. You can see the config here . Here's the loss chart over the course of the training run: As normal with learning rate scheduling, I also charted that to make sure it was doing the right thing (you can see that it was): And I also tracked the gradient norms -- you can see that there was some clipping happening near the start of the run: At the end of the run, it reported this: That's a slightly lower final train loss than normal, and it took 3h10m, which is faster than usual, but about the same as the other train we did without dropout -- that makes sense, as the process of zeroing out random activations isn't free. I downloaded the model -- here it is -- and then ran the smoke test: ...and got its loss on the test set: Not bad at all -- the best result we've had so far, albeit not quite up to the standard of the original GPT-2 weights. Now the next one, with QKV bias. This one had these interventions: You can see the config here . Here's the loss chart: ...the learning rate: ...the gradient norms (note that we had more clipping, about halfway through): ...and the final printout at the end. That final train loss is slightly higher, which is normally an indicator that the test loss will be higher, but we'll have to see. Time to download the model -- here it is -- and on to the smoke test: ...and then the moment of truth -- what was its loss on the test set? As I suspected from the training loss at the end, slightly worse than the run without QKV bias. So, that meant that we should do the next run, with a weight decay of 0.1, with no QKV bias. Given the above results, this one had these interventions vs the baseline: Weight decay was back to the baseline value of 0.1, rather than the value of 0.01 used in the previous two runs, and QKV bias was switched back off. You can see the config here . Here's the loss chart: You can see that it's much choppier than the previous two runs; that initially surprised me, as the higher weight decay means that we're regularising the model more than we were with those, which I thought would "calm things down". But on reflection, I had it backward. Hand-waving a bit, a more regularised model is fitting less closely every detail to the data it has seen, considering the typical stuff more than it does the outliers. That means that when something a bit more out-of-distribution appears, it might not have yet learned how to integrate it into its model of the world. Well, it sounds plausible, anyway :-) On to the learning rate (just to double-check), and it's fine: And again, the gradient norms: ...which similarly to the loss chart show more occasions where gradients spiked and had to be clipped -- even towards the end of the training run this time. The final printout at the end: Once again, although the final train loss is not definitive, it tends to be indicative of the test loss. It's in between the last two runs, so we'd expect the test loss to be likewise in between theirs: Time to download the model -- here it is -- and on to the smoke test: Hmm. At least vaguely coherent, though I'm not 100% convinced. It looks like ads for personal injury lawyers have crept into FineWeb somehow... Still, it's time for the test loss (drumroll): As predicted from the train loss, it's in between the two runs above. Let's put these three runs into the results table: As a reminder: You can see that adding on QKV bias actually made the model worse than the learning-rate-only intervention. That pushes me slightly away from the "it's all about the initial weights" direction; perhaps instead the bias adds some kind of stability that the learning rate scheduling also provides, and they fight against each other? Unfortunately I think the only way to pick it apart would be to do a full set of runs, switching each intervention on and off independently, and that would be too costly. The fact that the weight decay change from 0.1 to 0.01 actually did help when combined with the learning rate change and scheduling was a bit of a surprise; because they're both coupled when we think about standard gradient descent, I was expecting them to be too intertwined for my tests of them in isolation to have been valid. Quite pleased that it didn't work out that way, though, because sweeping across values for different parameters is much easier than it would be if they were connected. However, at this point it occurs to me that it might be because we're using the AdamW optimiser. As I understand it, its big difference versus Adam is that it decouples weight decay. I don't have a solid mental model of what that means exactly (will read up and post about it eventually), but it certainly seems pertinent here. Anyway, I have to say, I'm both pleased with and disappointed by these results. Pleased because we got a result by putting interventions together that was better than any of them in isolation, but disappointed that the end result wasn't even better. The difference between 's loss, at 3.691526, and original GPT-2 small's, at 3.5, was 0.191526. Our best result, for , was 3.577761, so an improvement of 0.113765. That's about 60% of the way there. That said, by sheer chance, while trying out the different sizes of cloud machines, I'd got from a loss of 3.944 training locally to the baseline's value of 3.691526 -- I suspect due to the fact that training in the cloud meant that I could use batch sizes of 96. So a different way of looking at it is that we should include that in the calculations too. From 3.944 to 3.5, the gap with GPT-2 small was 0.444. And we went from 3.944 to 3.577761, an improvement of 0.366239. And that means that we managed to get 82% of the improvement we needed. On the other hand, it means that in terms of my improvements, 0.252474 came from a happy accident, while all of my careful work on interventions only got me 0.113765. :-( Anyway, I think that for now, I'll have to rest happy with that as a result -- and next time around, let's see if we can get to the same level of improvement locally, using gradient accumulation. Luckily the difference was small enough that it doesn't change any of the conclusions I'd made about it.  ↩ Because there are five interventions, and each can be on or off, then it's equivalent to a 5-digit binary number. So that's 2 5 trains, less the five ones I'd already done and the baseline, for a total of 32 − 6 = 26 . At US$50-odd for a train, that's definitely a no-go.  ↩ I did also consider changing the random seed at the start of the code to 67 rather than 42, given that it seemed to provide better initial weights when I was exploring the effects of random noise on the training. I even started the first two training runs with that in place. However, on reflection I realised that it would be one step too far away from scientific rigour. I'm not trying to be 100% rigorous in these posts, but it seemed like a step too far to diligently test all of the interventions against one seed, and then YOLO in a different one for the final training runs.  ↩ Gradient clipping. QKV bias (that is, adding bias to the attention weight matrices). Changing weight decay to the GPT-2 value (0.01 rather than the 0.1 that is typical nowadays). Removing dropout Updating the learning rate from 0.0004 to 0.0014, but also scheduling it so that it varies over the course of the training run. Adding gradient clipping looked like it might have been within the training run noise. Adding QKV bias would have had a large effect on the model's initial weights. All of the others would have started with essentially the same weights (apart from weight tying, though even that would have had the same values for the initial weights apart from the tied ones). But adding the bias would have completely changed them, and its effect size was comfortably within the range of differences you might expect from that. Start a training run with all of the interventions apart from QKV bias. In parallel (Lambda instance availability permitting) run another one, with all of the interventions including QKV bias. Gradient clipping at 3.5 Weight decay changed from 0.1 to 0.01 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. Gradient clipping at 3.5 Weight decay changed from 0.1 to 0.01 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. QKV bias switched on. Gradient clipping at 3.5 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. was gradient clipping at 3.5, weight decay changed from 0.1 to 0.01, dropout removed, and the learning rate intervention, but no QKV bias was gradient clipping at 3.5, weight decay changed from 0.1 to 0.01, dropout removed, and the learning rate intervention, with QKV bias was gradient clipping at 3.5, dropout removed, and the learning rate intervention, but no QKV bias, and no change to weight decay . Luckily the difference was small enough that it doesn't change any of the conclusions I'd made about it.  ↩ Because there are five interventions, and each can be on or off, then it's equivalent to a 5-digit binary number. So that's 2 5 trains, less the five ones I'd already done and the baseline, for a total of 32 − 6 = 26 . At US$50-odd for a train, that's definitely a no-go.  ↩ I did also consider changing the random seed at the start of the code to 67 rather than 42, given that it seemed to provide better initial weights when I was exploring the effects of random noise on the training. I even started the first two training runs with that in place. However, on reflection I realised that it would be one step too far away from scientific rigour. I'm not trying to be 100% rigorous in these posts, but it seemed like a step too far to diligently test all of the interventions against one seed, and then YOLO in a different one for the final training runs.  ↩

0 views
Jim Nielsen Yesterday

Fewer Computers, Fewer Problems: Going Local With Builds & Deployments

Me, in 2025, on Mastodon : I love tools like Netlify and deploying my small personal sites with But i'm not gonna lie, 2025 might be the year I go back to just doing builds locally and pushing the deploys from my computer. I'm sick of devops'ing stupid stuff because builds work on my machine and I have to spend that extra bit of time to ensure they also work on remote linux computers. Not sure I need the infrastructure of giant teams working together for making a small personal website. It’s 2026 now, but I finally took my first steps towards this. One of the ideas I really love around the “local-first” movement is this notion that everything canonical is done locally, then remote “sync” is an enhancement. For my personal website, I want builds and deployments to work that way. All data, build tooling, deployment, etc., happens first and foremost on my machine. From there, having another server somewhere else do it is purely a “progressive enhancement”. If it were to fail, fine. I can resort back to doing it locally very easily because all the tooling is optimized for local build and deployment first (rather than being dependent on fixing some remote server to get builds and deployments working). It’s amazing how many of my problems come from the struggle to get one thing to work identically across multiple computers . I want to explore a solution that removes the cause of my problem, rather than trying to stabilize it with more time and code. “The first rule of distributed computing is don’t distribute your computing unless you absolutely have to” — especially if you’re just building personal websites. So I un-did stuff I previously did (that’r right, my current predicament is self-inflicted — imagine that). My notes site used to work like this : It worked, but sporadically. Sometimes it would fail, then start working again, all without me changing anything. And when it did work, it often would take a long time — like five, six minutes to run a build/deployment. I never could figure out the issue. Some combination of Netlify’s servers (which I don’t control and don’t have full visibility into) talking to Dropbox’s servers (which I also don’t control and don’t have full visibility into). I got sick of trying to make a simple (but distributed) build process work across multiple computers when 99% of the time, I really only need it to work on one computer. So I turned off builds in Netlify, and made it so my primary, local computer does all the work. Here are the trade-offs: The change was pretty simple. First, I turned off builds in Netlify. Now when I Netlify does nothing. Next, I changed my build process to stop pulling markdown notes from the Dropbox API and instead pull them from a local folder on my computer. Simple, fast. And lastly, as a measure to protect myself from myself, I cloned the codebase for my notes to a second location on my computer. This way I have a “working copy” version of my site where I do local development, and I have a clean “production copy” of my site which is where I build/deploy from. This helps ensure I don’t accidentally build and deploy my “working copy” which I often leave in a weird, half-finished state. In my I have a command that looks like this: That’s what I run from my “clean” copy. It pulls down any new changes, makes sure I have the latest deps, builds the site, then lets Netlify’s CLI deploy it. As extra credit, I created a macOS shortcut So I can do , type “Deploy notes.jim-nielsen.com” to trigger a build, then watch the little shortcut run to completion in my Mac’s menubar. I’ve been living with this setup for a few weeks now and it has worked beautifully. Best part is: I’ve never had to open up Netlify’s website to check the status of a build or troubleshoot a deployment. That’s an enhancement I can have later — if I want to. Reply via: Email · Mastodon · Bluesky Content lives in Dropbox Code is on GitHub Netlify’s servers pull both, then run a build and deploy the site What I lose : I can no longer make edits to notes, then build/deploy the site from my phone or tablet. What I gain : I don’t have to troubleshoot build issues on machines I don’t own or control. Now, if it “works on my machine”, it works period.

0 views
Grumpy Gamer Yesterday

Death by Scrolling Consoles

After some delays with getting console certification, I’m happy to announce the release of Xbox , PlayStation and Switch (and Steam update) of Death by Scrolling on April 16. Console and Steam feature a big update that includes a new playable character, new world, new powerups, new stuff and new fun. We completely reworked your ability to customize your character. It’s a huge update.

0 views

Overview of My Homelab

I've had a homelab for quite some time now, although it hasn't been a linear process. I first got into it when I heard about Plex, which at first, I was under the impression of it being a free streaming service with everything. I set it up with the installer on my computer and was frustrated and confused to learn that it I gave up on it for who knows how long. Then, I heard about Jellyfin, which is an open-source version that a lot of people seemed to like. I wanted to learn more. I set up Jellyfin on my computer and loaded some movies onto it, then streamed them from the same PC hosting it. Okay, I thought. So it provides a video player basically. Big deal. I have no idea how to access it from other devices or anything interesting. So again I gave up. It wasn't until me and my brother went halfsies on a Synology NAS on June 14, 2024 1 and I had a few years of university and self-tinkering knowledge under my belt that I truly got into homelabbing and self-hosting. At that point, I knew full well what a server and client was, and all about networking. 2 I set up the Synology NAS, at the time living with my parents, and installed both the 8TB HDD that I had bought for my items, and the 16TB HDD that my brother bought for his. 3 I used it as a network-attached storage, as intended at first. Backups and all that. However, I really wanted to get into hosting services . I had been following technical blogs at that point as well as r/selfhosted and really wanted to sink my teeth into it. The Synology NAS has limited resources, being mainly for storage. That didn't stop me from hosting some basic items. I started with Plex, then moved on to Jellyfin. I hosted both at the same time so that if Jellyfin didn't work, I could just use Plex. To this day I use Infuse on my Apple TV and other devices and have it hooked up to my Jellyfin server. Next, I tried Mealie, then switched to Tandoor, since I love to cook and bake at home. I also set up Actual Budget, which is probably one of my top-used services now. It completely changed the way I handle my money. Eventually, I went in on a used Dell PowerEdge R730, which is a 2U rack-mounted enterprise server designed for data center and business-critical workloads. For me, it's a great noise-making machine that has lots of upgrade potential! Here is the boring technical details: A year into using it, and it does exactly what I need it to do every time, no questions asked. Over time, I connected it to an APC UPS to protect it from power outages, and hooked up a used Dell Optiplex I had sitting around to the same UPS. I used to call the Optiplex my "Minecraft Machine," because all it did was run Minecraft servers (and worked excellently). At this point, I've moved all my servers to the PowerEdge, managed by the service CraftyController for easy setup and server start-and-stop. The Optiplex now serves as a remote desktop solution, since my lab is at my parents', 4 allowing me to access the network easily. I also use Tailscale to access serveral services remotely without fully exposing them. When I want to expose a service normally, I use free cloudflare tunnels . For my hypervisor, I have Proxmox installed on the PowerEdge, and all of my services run in their own LXC containers. In the future, I hope to migrate most services to a more energy-efficient and compact mini computer running Ubuntu or Debian Server and managed with Docker instead. For now, Proxmox is very powerful and intuitive, and made it incredibly easy for me to set up snapshots and backups as well as monitor resource usage. Finally, here is a list of my services: It's quite easy to get started yourself making a homelab or self-hosting services. Buying a VPS can make it even easier, like Hostinger's one-click deployment options. You can also simply install Linux with docker containers on an old laptop or other computer you don't use anymore. I know it's been more than worth it for me. Check out r/selfhosted , self.hst newsletter, and YouTube if you want to learn more about selfhosting. Subscribe via email or RSS I went through my Amazon order history for this date. ↩ I would say my first experience hosting a server was hosting multiple Minecraft servers over the years for me and my friends. This is also where I learned basic networking concepts, like what a LAN is, what TCP/UDP is, port forwarding, etc. ↩ I thought this was enough storage to last a lifetime at the time. Scroll through r/DataHoarder and think again. ↩ My parents' house is powered by solar panels, making this a much cheaper and more manageable option for my poor student situation. ↩ Wouldn't work unless my PC stayed on, Didn't really have ad-free subscription-free streaming. Apparently you had to acquire the content yourself. 8 Bay 2.5" SFF H730 Raid Adapter Dual Xeon Processors Dual 750W PSU Total PCI Express X8 Slots: 3 Optical Drive Type: DVD Player Number of Processor Cores: 16 Total PCI Express X16 Slots: 1 Memory Type: DDR4 Memory Frequencies Supported: 1333, 1600, 1866, 2133 Total USB Ports: 4 Processor Series: Intel Xeon E5 Total Serial Ports: 1 Server CPU Model: E5-2667 v4 Maximum # of Hard Drives: 8 Total Memory Slots Available: 24 Server Series: PowerEdge R730 LAN Compatibility: 10/100/1000 Gigabit Maximum Hard Drive Size Supported (GB): 43200 CPU Socket: Dual LGA 2011 Front USB 2.0 Ports: 2 Total Hot-Swap Bays: 8 Total RAM (GB): 16 Maximum Memory Supported (GB): 768 I went through my Amazon order history for this date. ↩ I would say my first experience hosting a server was hosting multiple Minecraft servers over the years for me and my friends. This is also where I learned basic networking concepts, like what a LAN is, what TCP/UDP is, port forwarding, etc. ↩ I thought this was enough storage to last a lifetime at the time. Scroll through r/DataHoarder and think again. ↩ My parents' house is powered by solar panels, making this a much cheaper and more manageable option for my poor student situation. ↩

0 views
ava's blog 2 days ago

the public

Don't you hate it when you go out in public, and the public is there? Jokes aside, my relationship with the public is difficult. I think most interactions are actually neutral; just passing each other, sitting next to each other, exchanging glances, paying for things. Some are good, and they are so rare that it restores a lot of faith in me. I love that the barista at the coffee shop is always so heart warming and genuinely happy and kind; I have brought him a little chocolate Santa before to thank him. The negative experiences unfortunately stick with me longer, and are the first thing I think of. Vomit, dog poop and litter on the sidewalk, loud music in public transport, smoking and spitting everywhere, getting honked at while walking down the street, people under the influence or in a mental health episode harassing others, public spaces filled with either intense perfume/deodorant or piss and sweat smell... just to name a few. I'm very sensitive to smell and sound, and it often feels like my skin is peeling off and my head will explode when I am exposed to these. My home is my retreat, my silent refuge. I go there to recharge. Basically all of my hobbies can be done independently inside by myself. Aside from work, I don't really go out that often because I don't feel welcome or comfortable outside a lot of times. The above negative experiences, together with urban car-centric design, overfilled cafes or restaurants, and infection risks just don't make it that enticing for me. The exceptions are going out in the dark when the streets are empty, or on long walks in the forest. I need my solitude and quiet, and the few people I see in the forest usually have the common decency not to act like teenagers in a small park area do. When I want to do outdoor stuff with my wife or friends, of course I have to step outside. The museums lately were wonderful, for example. I enter the public, but mainly because my focus is spending time with them. When I dress in bright colors, put on one of my colorful wigs, adorn my hair with stuff and put rhinestones on my face, I mainly do that for me and them; any onlooker is welcome to enjoy it too, of course. Maybe it makes someone feel happy or brave to see that. Still, there is this expectation by many that once you put yourself out there, you consent to what happens to you, and that you perform for others... and that can be disappointing and make you question whether you wanna commit to this at all. Like you should have anticipated rude comments if you dress like that, for example (hasn't happened in quite a while, but still!). I find my relationship to the internet similarly complicated, if not even more so. After all, the internet is where the very same public is that I otherwise tend to have issues with. I have to go outside for necessities, work and enjoyment; but do I have to expose myself on and to the online? Why do I do it? Walking outside, I have very rarely wondered what that person on the opposite side of the street thinks about a topic, or their opinion on how I am dressed; yet at home, in my refuge from the public, I open the internet, and invite the public into my safe space via me seeing their stuff. I see their thoughts, despite being at home. I see things and it's like seeing dog poop not picked up on the sidewalk. I put things online about myself, and therefore invite the public to consume it, to comment on it. It feels weird to acknowledge that. The same thing from above applies here: If you make it public, anything goes. If you didn't want that , you shouldn't have put that online. Makes sense, depending on what it is. An online presence feels so at odds with being a private person in some ways, or being picky about people, and being intentionally harder to access in real life. It can even feel like a narcissistic shrine to oneself at times, or a hardening cast around you that makes it more difficult to change it and let it grow with you as time goes on. I deal with that right now. Online, you can't really retreat; either you're there or you're not, obscurity by using smaller platforms doesn't help much. It also feels weird because in a way, you are expected to put on a performance for an online crowd once you are there. In the offline public, I simply exist in the space to go where I need to go, or to enjoy a meal or the time at the lake. In the online public, I am content to be consumed. We are invited to criticize people like product reviews, or as if they are annoying ads shoved down our throats (and I guess influencers are that). The reactions to people changing up their online presence seem less like they're about a person and more like anger when the formula of a product you like got changed. If someone comes up to you on the street saying you'll never find a man in that getup, they're rightfully seen as a weirdo, but online, it's discourse and engagement is farmed. Recently, I've been wondering why I put in the effort of putting my stuff online to the very same public I don't particularly care about, or sometimes even dislike, on the street 1 . In the offline world, I don't really give them anything, but online, I give them so much. My art, my thoughts, my research, my help. Is it worth it, is it hypocritical? Is it believable when I say I do this for me, my wife, my friends, and some drive-by eyeballs? I could just keep it to myself, keep it all in the journal, start a password-protected blog elsewhere. I don't have any good answers to this; for now, it seems I have to walk around as a contradiction. In real life, I cannot make myself selectively visible to just a few people (I wish I could!); online, I could find a way, but I don't. That's odd. Maybe there is pride in my work and what I do, an urge to be seen by others who understand me, something to prove I was there too, a way to show people alternative ways of being online, or spreading more awareness about specific rights or health issues. Still, it's curious that I would do this online, but not offline - I would not walk up to a random person and say something, or walk around with a banner, or stand at a town square with a megaphone. But do I have to? Or is online simply the best way for me to find a way to interact with, and be in, the public? It's easy to see the internet as a self-obsessed thing, filled with navelgazing; people might read personal blogs online and go "Why should we listen to you? Who even are you? Who cares, who asked? Why do you think anyone needs to hear this from you? Isn't this just digital garbage? This isn't even an original thought." I understand how this view is fostered in a time when anyone can throw their opinions online in seconds; but in a way, this is unprecedented, and previous generations in history would have appreciated the ability to be so easily heard/seen and making their feelings known to so many people without relying on flyers or a newspaper. So maybe this is a privilege we should not take for granted, especially as tensions and censorship across the globe rise. And as always, you have to let in nasty stuff if you also wanna let in love. Close yourself off, and you receive neither. I have to walk past dog poop and sit in sometimes excruciating trams for 45 minutes to reach the nice barista or have a good in-person talk with a coworker. I have received some truly shitty emails over the years 2 , but the good ones outweigh them. I wade through the Discover feed to see some beautiful gems. What makes the online public so difficult is that once it's out there, it's out; even when you change your mind or grow. While we want our online presence to be a continuous process readjusting boundaries, it's more like committing to the most vulnerable piece that is still online, over and over again. In contrast: Before I step into the offline public space, I can readjust how I want to appear every time. The stranger on the street doesn't see all the history attached, doesn't see all the past versions of me that have stepped outside. And here I am, once again, stepping out into the public. Reply via email Published 09 Apr, 2026 Of course I care in the sense that everyone should have a home, money, healthcare, a support network, access to education, fulfilling work etc etc., but that's not what I'm talking about here. ↩ No, socially anxious person reading this and thinking this could be about you, it wasn't you. It could have never been you. The people I am talking about don't care about how they come across and haven't spent a second self-reflecting. You're good. ↩ Of course I care in the sense that everyone should have a home, money, healthcare, a support network, access to education, fulfilling work etc etc., but that's not what I'm talking about here. ↩ No, socially anxious person reading this and thinking this could be about you, it wasn't you. It could have never been you. The people I am talking about don't care about how they come across and haven't spent a second self-reflecting. You're good. ↩

0 views

What does it mean to create with AI?

For some weird reason, I always had some kind of slight “mental hesitation” with the meaning of data encoding versus decoding . Which one goes in what direction? To be honest, I have the same kind of weirdness with other concepts: daylight saving time for instance (are we gaining or losing an hour? I can never tell, sometimes even for many days after a change). So I wanted to create a diagram to illustrate the dichotomy between encoding and decoding, for a course I’m creating on software engineering. So one way to “create with AI” would be to ask one: “Can you please create a diagram to illustrate the difference between data encoding and decoding”.

0 views

SQLAlchemy 2 In Practice - Chapter 4 - Many-To-Many Relationships

This is the fourth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! Continuing with the topic of relationships, this chapter is dedicated to the many-to-many type, which as its name implies, is used when it is not possible to identify any of the sides as a "one" side.

0 views