Posts in Java (20 found)
マリウス 3 days ago

80Retros x HMX Monochrome

After spending a fair amount of time with the KTT x 80Retros GAME 1989 Orange , I figured it was about time to take a closer look at the HMX -side of the 80Retros catalogue. The 80Retros x HMX Monochrome have been with me for a while, ever since I picked them up back in Seoul. The switches stand out from the rest of the 80Retros lineup as they don’t ship in a film canister, and they have a fairly boring black and white colorway. The 80Retros x HMX collaboration comprises of a handful of linear switches, amongst others the KD200 (a Kodak -yellow homage), the FJ400 (a Fujifilm -green homage), the GAME 1989 Classic (a Game Boy DMG-grey homage with pink stems), the Joker (a green/white/purple character homage), and the Monochrome , which arrived as one of the later releases. While most other 80Retros switches ship in oversized film-canister packaging, which is probably half the reason people bought into the lineup in the first place, the Monochrome , however, break that pattern, as they come in a plain sealed pack. 80Retros have framed this as a practical decision, since a sealed bag preserves the factory lube better than a (non-airtight) film canister. The Monochrome have a white top housing, a black stem, and a black bottom housing. There’s no nostalgia, just basically a clean, modern industrial look. It’s probably one of the few switches in the lineup that would feel at home on a build that’s trying to look new rather than old. The interesting thing here is that the Monochrome seem to be materially identical to the KD200 , at least from the information I was able to dig up on them. It seems like they use the same PA12 top housing, same LY stem, same 13.55mm stem length, and the same HMX P2 bottom housing. The only spec that appears to be different on paper is the spring, that is a 42g on the Monochrome versus a 45g on the KD200 . The Monochrome seem to basically be a KD200 in different clothes with a lighter spring. Therefor it seems like most of the KD200 -flavoured tendencies show up here too. The first thing you notice is just how light they are. 42g is on the gentle end of the linear spectrum these days, and even coming from the GAME 1989 Orange at 40g actuation, the Monochrome feels softer, probably because the PA12 top, HMX P2 bottom, and LY stem combo doesn’t have the same dry, gritty character the KT2 stem gives the Orange . There’s no audible texture in the travel here. It’s just smooth from top to bottom. Stock smoothness is very good. HMX ’s factory lube is well applied, with visible coverage on the stem sides and along the spring contact points. Slow-pressing a single switch at ear level reveals nothing worth complaining about, as there’s no scratch, no spring ping, and no leaf chatter. This means you can just install them and stop thinking about them, which, for a stock switch, is probably what most people would want. Wobble seems to be in line with the rest of HMX ’s newer-mold output. There’s a touch of north-south play and a touch of east-west, neither of which are distracting in normal typing. The Monochrome has a sound profile that’s noticeably soft, light, and, for lack of a better word, swooshier . The Korean reviewer who teardown-photographed the whole 80Retros x HMX lineup described it as a “wave-like” sound. There’s still a clean tonk on the bottom-out, but it sits lower in the mix and the upper harmonics that make for a louder pop are largely absent. Volume-wise, the Monochrome is on the quieter side. Not silent, not Volume 0 -quiet, but noticeably more restrained than e.g. the GAME 1989 Orange . On softer builds (gasket-mount, Poron -foamed, that sort of thing), it leans firmly into muted thock territory. On more rigid aluminium builds I’d expect it to open up slightly, but my own testing has been on softer cases, so take that with a grain of salt. In short, where the Orange has audible character, the Monochrome is doing something quieter and a little more uniform. If you enjoy the Orange ’s pop you’re probably be slightly disappointed with the Monochrome . As for the factory lubing, it is competently done. I peeked into a few switches and the application is consistent enough that I didn’t feel any particular urge to retune them. If you’re someone who lubes everything regardless, maybe be sparing here, as otherwise you’ll smother what little articulation the switch already has. The switches accept films, like everything else in the lineup, and films do their usual job of tightening housing tolerances and compressing the sound profile slightly. Given how restrained the Monochrome already sounds, I’d hesitate to film them unless the build absolutely needs it. You’d mostly be removing what little air is left in the sound. The 80Retros x HMX Monochrome are soft and gently-weighted linears with very few rough edges and they are relatively quiet in volume. Whether that’s the switch you want depends entirely on what you’re trying to build. If you want acoustic complexity, the GAME 1989 Orange is definitely more interesting. If, however, you want a low-effort and low-noise linear that disappears into the build, the Monochrome fit that role pretty well. I wouldn’t call it an exciting switch, but I would, however, call it a sort of grown-up switch. Disclaimer: I’m not a switch scientist. I don’t own a force curve rig, I can’t tell you the exact durometer of the KT2 blend, and my ears are probably not calibrated to the standards of someone like ThereminGoat . This review is based on my personal experience typing on these switches across a few different boards and ultimately actively using them on my primary keyboard . Your mileage may vary based on your plate material, case, keycaps, and other factors. Take everything here as one person’s experience and use it as a starting point for your own.

0 views
ava's blog 1 weeks ago

computers, privacy and data protection conference 2026

I attended the Computers, Privacy and Data Protection Conference (CPDP) in Brussels for the first time. The conference has lots of different rooms mostly in the same building where multiple panels, workshops and other things are happening at the same time in specific slots, so you gotta choose what you participate in (was difficult at times!). Next to that, you have some fun rooms, some quiet working spaces and spaces to just hang out and talk. Based on the programme, the focus this year was definitely on age verification/youth 'protection', human AI relationships, consumer rights and marginalized groups. Lots of different groups and people present; people from the EU Commission and Parliament, AlgorithmWatch , Bits of Freedom , noyb and Max Schrems, IGLYO , EDRi , Equilabs , Equinox Initiative for Racial Justice , INTITEC , the EDPS and Wojciech Wiewiórowski, Privacy International , the International Committee of the Red Cross , the Office of the United Nations High Commissioner for Human Rights , the European Consumer Organization (BEUC), Future of Privacy Forum , AIRegulation.com , data protection authorities of different countries (CNIL, BFDI, etc.), ALTI , European Disability Forum , d.pia.lab , AI Now Institute , OECD , the IAPP , and all kinds of universities, plus companies like Mozilla, Mastodon, Signal, Wikimedia, Microslop, Uber, TikTok, Google and more. I was there for the opening remarks, then went on to visit: My takeaways/new things learned: Microsoft co-wrote parts of the EU's Energy Efficiency Directive , which allows data centers to keep their energy use confidential under the guise of business secrecy. The draft literally had paragraph's of Microsoft's proposal copied in unchanged. The Dutch government used racial/ethnic profiling via algorithms in the assessment of childcare benefit applications, which led to false allegations of fraud against thousands of families, particularly affecting those from ethnic minorities. I heard about this before, but learned more about it that day. To contest it all and defend democracy, we all need to train our AI literacy skills , support and have good tech journalism that questions and exposes it all (404media is, imo, a good example of what they meant), crafting and changing the social media narrative around AI and Big Tech, listening to affected people, demanding transparency via standards and audits etc. We cannot forget that officials know ; many of the effects we criticize are not accidents or side effects, they are the entire point. Like when tech predominately negatively targets marginalized communities, this is a bonus to people in power, and nothing to be fixed. Workers can resist by reminding their leaders of the liabilities and legal risks, strategic issues, money issues etc. that AI brings; demand specific definition of the needs that AI will fulfill at the workplace, instead of letting AI become the purpose instead of the tool. Age verification is racist and migrantphobic : Many people have issues with their ID, or have none, or are undocumented, and age verification in their country requires them to have contact with officials, police, etc. Age verification is transphobic : Relying on ID means many trans people are forced to reveal their deadname or are forced to come out, as it reveals they are trans if the ID is not or cannot be updated. The platforms are harmful, but we have so many ways and ideas against that that doesn't take away important spaces and support groups or bar entire groups of people. Age verification makes it possible for platforms to avoid working on their problems and becoming better, enables avoiding legislation and regulation, and enables control and surveillance by them; meanwhile, the truth is that you don't suddenly turn 16-18 and know how to handle porn, gore, harassment and all other negative parts of social media. The negative sides to social media that are named as the reason for age verification and banning of social media for specific age groups also affect adults negatively . We need to put more effort into education on how to handle these things. Yes, we can protect children's privacy by banning them off of platforms, but this also affects their other (digital and offline) rights, and privacy rights don't trump all . Children and teens should learn and be encouraged to control their own spaces and moderation via FOSS : Matrix, Mastodon, etc. where they can also seclude from adults and aren't reliant on Big Tech. Age verification and banning would take this away from them and also make it harder for FOSS projects. If children only ever enter the political discourse as victims, the only response can be rescue; that it why we have to make sure they enter as participants. Protection is not (just) space away from the risk, but confronting the systems that cause harm and eliminating them. 16-18% of US citizens report having engaged romantically with a bot, 45% of them said it made them feel more understood, 36% said it gave them stronger emotional support than their human partner. Problem: Current version of AI Act doesn't cover romantic and sexual use, no guidance for safeguards for emotionally responsive AI systems that protects around the risk of suicide, crimes, distress when service slows down or shuts down or model changes, discrimination as you get more if you pay etc.; drafts mention some of it now in Art. 50. With all the talk around becoming emotionally dependent on AI, nudging into harmful behaviors, etc. we cannot forget that you are also vulnerable on other services and in human romantic relationships, where the same routinely happens (weak argument, but to be fair, I also often forget this). We also cannot forget that it is not always a replacement - it often just supplements social life, and there are also surprisingly many people who just don't want or need romantic or sexual relations with a human ; they want bots specifically , and only bots. Disclosure agreements (meaning: labels everywhere that this is just a bot and not real) are most often useless, because people know and intentionally seek it out (exception for Insta/Snap DMs etc.) The latter about Human-AI intimacy was extra interesting because it had someone on the panel who directly works with people who use bots for romance and sex, and her experience has been mostly positive and that it helps her clients. Afterwards, I sadly was too overwhelmed, exhausted and in pain to continue and went back to the apartment to rest. Unfortunately, all the stress around the apartment and the generally more exhausting day triggered my digestive tract badly (Crohn's disease), but within the first few hours, all toilets in the venue were out of service due to an issue outside the venue or the organizer's control, and the alternative toilets were much further away. I didn't wanna have to deal with that with upset intestines. I missed the ' Designing Fairness ' Workshop, and the ' Consumer Rights at the age of acceleration' panel. Didn't meet anyone that day. Look at this ridiculous Gemini Photobooth they had that I saw no one use in the entire 3 days. This day, I managed to attend everything on my list, thankfully, as I felt a bit better. I attended: My takeaways/new things learned: The digital omnibus is mostly there to enable AI made in Europe to aid sovereignty and be competitive with US and China; AI here needs a framework to access data without much regulatory risk - that is what the EU Commission person said. Enforcing the law and and making it sharper is actually leveling the playing field and furthering innovation, because there is a massive power concentration of a handful companies that can do what they want, barely pay fines, have the fines suspended because of the US government bargaining with the EU, or who see them as a cost of doing business. Competition is impacted this way, as small companies are hit harder than the big ones. If the omnibus goes through with changing definitions of personal data etc., it will take years for case law, literature, standards etc. to catch up, it wastes money in companies who need to re-do everything to comply; so it doesn't simplify anything and makes praxis harder. You may set ChatGPT/Claude/Gemini etc. to not send feedback or training data in your settings, but when you react thumbs down/up to their request of whether the output was good or not, or choose between two different versions, the entire chat log until then gets sent for training and potential human review. So, these popup feedbacks override your settings . I need to read more papers by Theodore Christakis. Here is one of them. US and UK discovery and disclosure laws/principles go directly against EU data minimization principles; as long as data is relevant to a case it should be accessible, which is why in their cases, they can just have access to million's of people's data if necessary, and in a divorce case, they have the right to ask for AI chatlogs. There is no AI protection or privilege: If you use AI for legal stuff, you have no expectation of confidentiality like you would with a lawyer, so it is not safe from discovery. There is tension between tracking for harmful behavior/threats vs. data privacy rights ; what if someone threatens to kill themselves, kill others, etc.? Should company look for it, track it, report it, alert anyone, suspend the account, send help resources? Still unclear. There is also tension between people wanting the bonus features/ease of use coming from pesonalization and free services, while also not wanting to be tracked or charged. Advertisers see themselves as enablers of a good thing, as people want fitting ads, good algorithms, good suggestions, and free access; so if their business model is challenged or fails, people will have worse access and worse user experiences in their view. They also fear that if their business model is hindered, things will move into a more extreme, embedded, hard to avoid direction that you don't control or decide (Black Mirror ad type of stuff). I previously wrote about Consenter on the blog, and one panel had people from it there and showing screenshots; changed my mind on it a lot and made me understand the new features and goal better, I will probably write an update on it some time. We have different other options all covering something different about tracking, cookies, consent, or going about things differently, old and new: ADPC, GPC, ConStand, Global Privacy Control, DoNotTrack etc.; important for new stuff is granular consent, sent to the website, user given explanations etc. Uninformed decisions and bad practices lead to unfair competition ; bad actors erode trust level overall, so users resignate, experience fatigue and say yes in the same rates between "good" and "bad" services. Will read soon: Our data after us by the CNIL , and future release: Model rules on succession and access to digital remains by Eigenmann und Harbinja Digital remains can be split into assets (copyright, crypto, business tools, money), personal (messages, photos, identities, AI replicas), and third party data. GDPR only addresses living people; dead people's digital remains are subject to member state laws. There might be a need for something harmonized and European, though. For good digital hygiene , we should remember death and make it as easy as possible or sensible for the people we leave behind to get the access they need to manage our stuff how we want them to. Leave instructions, set emergency/legacy access when available (Google, Facebook, Instagram and Apple have it), include digital assets in your will, decide how your data is allowed to be used after death, especially around AI replicas. Hospice, nurses, families etc. should learn to ask affected parties about these things. Thanks to the focus on agentic AI, there is massive need for inference compute, which is super expensive. Almost all of it is in the control of, or can only be afforded by, the hyperscalers. At the same time, anything that seeks to enable or disable things for AI agents on the web can also affect accessibility programs like screen readers. It is in the best interest of the Big Tech companies to keep things individual, because it distracts from the collective issues and changes they'd have to do; it is easier to blame the person for agreeing to tracking than make sweeping changes to how much can be tracked. Individual consent doesn't consider the fact that data doesn't just affect you, but reveals things about your family, friends, partners, coworkers and more, as data is deeply interconnected. If your friend agrees to share his data and it also includes you, that is your data, still going to the service you'd have disagreed to. We as users have no collective bargaining tools yet; even big worker unions aren't negotiating with Microsoft about the terms of their employer using Microsoft Teams, when they actually should. We should also build up data unions made from users who bargain with the platforms. Strikes could look like boycotting the service, blocking trackers, scrambling data, massive amounts of access requests etc. Look into something called a Worker Data Trust ; this was used to prove Uber's predatory dynamic pricing (Worker's Info Exchange). Lots of workers made access requests, the data was combined and analyzed by researchers. After a failed attempt to meet up during lunch, I managed to meet up with another Country Reporter from noyb for a little while until the next panel happened, and sadly we didn't go to the same one. At this point, I was miffed about lunch at the conference. They made a big deal at registration about how the event will be mostly vegan and vegetarian to offset the climate impact of everyone traveling there, and they asked you to select your preference. I chose vegan. But for the entire three days, the food wasn't clearly labeled, some food was mislabeled as vegan when it wasn't, and there was way too little of it and wasn't restocked. It was more like "vegetarian snacks for birds". Vegan people had no warm food option at all, just sandwiches or wraps all three days that would have been enough for maybe 10 people. I mostly starved and I accidentally ate real cheese one time too because the food situation was so confusing. Here was one of the buffet menu cards, which were a bit to the side removed from the food, partially hidden by other stuff, and incorrect (anything with lactose is not vegan). I have no idea how, on a sea of silver platters with lots of bread, I am supposed to be able to differentiate the vegan gluten free bread option and the vegetarian gluten free bread that has scarmoza (italian cheese). It was a roundtable buffet, so everyone was waiting on you to hurry and grabbing stuff; I can't just grab bread and lift off the top to see the ingredients and then put it back, man. At least group the vegan stuff together or put labels directly in front of each thing. Also, while I am not reliant on gluten-free food, I think the people sensitive to it or having celiac disease don't appreciate that either. I skipped the Cocktail parties and big CPDP party, because it's not really feeling fun when you don't drink alcohol, have trouble just going up to people with your mask and hoping they hear you, and have no one to meet or go with. Last day was rather empty in the programme, so I arrived later and left earlier. I attended: My takeaways/new things learned: The AI warfare one was a bit of a letdown, because they all just accepted war as a right, an inevitable thing that has to happen. There was not even a nuance of fighting war itself, or banning AI weapons, etc; it focused more on the dual nature of the data , in which through surveillance, tracking, etc. not only can military use it to target people, NGO's and others can use it to warn, evacuate, render humanitarian aid etc. and document realities on the battlefield. There was also no possibility for the idea that we could enter an age where drones fight drones automatically and no one needs to get hurt or be traumatized or get to kill people like a game, and that is only because everyone is so attached to the idea that war has to have human casualties. It's hard to legislate and restrict because the data is taken from a whole ecosystem : Telecommunications, cloud services, civilian infrastructure, social media etc. and most of the data is collected during times of peace. Warfare is often explained with national security as a reason, which then again is a legitimate interest or fulfills other opening clauses in data protection and privacy laws. It is a problem that the richest men in the world, close to the US admin, lead the biggest companies worldwide, almost all in the US, and control almost all of AI and AI warfare. Project Maven from 2017 was continuously developed on and is now the Maven Smart System , which was used in Venezuela and Iran recently. Our Art. 15 GDPR right of access as it is right now is making up for Germany and Austria's lack of discovery and disclosure rights respectively. Controllers can usually drag stuff out, cite trade secrets and rights of others to evade data access, but the data subject barely has any power. Not having to justify the access request and it not having to be limited to data protection rights is good in this regard and needs to be kept up. Otherwise, also too much confusion and court cases whether a request was abusive or not if now, any request for a court case instead of privacy rights is deemed possibly abusive. We don't only need to focus on reidentification in general, but about the ability to single people's data out; you might not be able to identify them, but you can build a profile anyway. Learned about the term digital twin , or in terms of user data, a data twin that can be used for similation and is similar enough. AI-act-standards.com exists. Many don't know that the AI Act isn't a GDPR for AI, but serves more as market classification, as it sorts AI into different boxes who have to fulfill different requirements. The details of these requirements are/will be set with CEN/ISO standards and frameworks . You can see the progress of development on these standards on that website, and what they cover and how they interact. Hovering over the elements gives additional info. This is done by the JTC21 , and you can also get involved by registering with your national standardization body (in Germany, this is DIN) or when they do public consultations. Disabled people experience both extremes of AI - better accessibility options, often more reliant on AI, so also more subject to surveillance and having their privacy rights violated, while bad governments can use the data to harm disabled people, all under the guise of research. Marginalized groups are often the first trial group in anything, while not being stakeholders in the tech, or even invited to the table. See: AI used in immigration etc. and with deregulation and AI everywhere, we see a loss of reasonable suspicion thresholds in law enforcement and other groups. Learned about adversarial auditing . The previous two days, I did the whole fancy dress pants and blazer thing (one black blazer, one dark red/purple blazer), but for the last day and the drive home, I wore my Bearblog shirt and wide orange jeans: Someone from noyb staff thankfully recognized me and approached me, so we talked for a bit until he had to leave for another lunch meeting. That concludes the human contact I had. And then I left to drive home with my wife. She will hopefully soon write a guest post on my blog about how she navigates a new city in another country without mobile data/a smartphone (she has a tablet with WiFi only), because while I was at the conference, she explored the city on her own. It's kind of difficult to show up to these conferences as someone who isn't sent there for work, who doesn't have coworkers or ex-coworkers also attending, and who doesn't have much or any industry contacts yet. Most people there know each other from work or previous/other conferences, and I don't. These events are primarily for networking, keeping in touch, and talking about what you have seen and learned though. I couldn't discuss anything with anybody present, and it made me feel really lonely and silly. Just going up to people and striking up a conversation is not my strong suit, and it's something I am working on and has already gotten better, but the mask I am usually wearing in these big crowds and gatherings because I am on immunosuppressive medication is actively keeping me isolated. I know people have trouble understanding me, can't see me smiling at them, and think I am sick, so that keeps both sides hesitant. Unfortunately, if I attend next year, I will have to leave away the mask and maybe try out these protective sprays for nose and throat that are supposed to reduce viral load. It seems like you can only 'afford' to wear a mask if you are already in a group of people. Weeks before the event, I asked some people if they would attend, they said they will and we had a group chat of 10 to coordinate meetups. But during the entire conference, I was the only one trying to make something happen - saying where I am/where I will be, identifiers you could spot me with (as we never met before and you can't see name tags well on the lanyard), meeting points etc. and the two people mentioned were the only ones who took me up on it. The others just ghosted me/ignored my messages. That saddened me a lot during the conference. And unfortunately, these types of events are always really exhausting to me beyond the normal amount everyone experiences, because of things that trigger my conditions, my lower energy, my needs to lie down sometimes, sensory issues, food restrictions etc. so I really have to weigh if it's worth it to me. I'm not sure it is, without the social aspect. Many of the panels I chose had an issue of being not well organized. Instead of short speaker times, precise audience questions, interactions, dialogue, disagreements, different sides, answering the panel's topic and offering solutions etc., it often resulted in every speaker having a 10 minute monologue saying their peace, the other speakers not reacting or intervening because it's too much, everyone more or less saying the same thing or zoning out, and then having too little time to really give much attention to audience questions. Some gathered audience questions to answer them in batches and predictably, that resulted in nuance being lost and almost nothing being precisely answered. From many panels, I walked away with less learned than I wanted to, and just being reaffirmed in what everyone knew already. There were almost no further or new resources, or real takeaways of what the next steps should be and how we can tackle or solve an issue. They say " there should be more transparency " but not how we ask for it, how we legislate it, how it should happen. It's often just a vague " Someone should do more of something, and fast. " It was easy for people from the EU Commission to dodge mine and others' questions about the omnibus bullshit with no convincing answer. (: It disillusioned me a bit about my own goal to be speaking at a panel one day, because so often it felt like it was just there to platform someone to give them a chance to ramble and that's it, or just so that they can put this on their CV. Looking into the panelists, so many of them are genuinely great, very accomplished and admirable people with a lot of expertise, but the way things were set up, it couldn't shine through. You would have been better off talking to them directly. As a final bonus for reading this far, help me delete this (fortune) cookie. Reply via email Published 23 May, 2026 Contesting AI & Defending Democracy ; Possibilities for European AI Futures ( x ) Youth protection through inclusion and empowerment : a rebuttal of the exclusion-based narrative ( x ) Intimacy by Design: Governing Human AI relationships ( x ) Microsoft co-wrote parts of the EU's Energy Efficiency Directive , which allows data centers to keep their energy use confidential under the guise of business secrecy. The draft literally had paragraph's of Microsoft's proposal copied in unchanged. The Dutch government used racial/ethnic profiling via algorithms in the assessment of childcare benefit applications, which led to false allegations of fraud against thousands of families, particularly affecting those from ethnic minorities. I heard about this before, but learned more about it that day. To contest it all and defend democracy, we all need to train our AI literacy skills , support and have good tech journalism that questions and exposes it all (404media is, imo, a good example of what they meant), crafting and changing the social media narrative around AI and Big Tech, listening to affected people, demanding transparency via standards and audits etc. We cannot forget that officials know ; many of the effects we criticize are not accidents or side effects, they are the entire point. Like when tech predominately negatively targets marginalized communities, this is a bonus to people in power, and nothing to be fixed. Workers can resist by reminding their leaders of the liabilities and legal risks, strategic issues, money issues etc. that AI brings; demand specific definition of the needs that AI will fulfill at the workplace, instead of letting AI become the purpose instead of the tool. Age verification is racist and migrantphobic : Many people have issues with their ID, or have none, or are undocumented, and age verification in their country requires them to have contact with officials, police, etc. Age verification is transphobic : Relying on ID means many trans people are forced to reveal their deadname or are forced to come out, as it reveals they are trans if the ID is not or cannot be updated. The platforms are harmful, but we have so many ways and ideas against that that doesn't take away important spaces and support groups or bar entire groups of people. Age verification makes it possible for platforms to avoid working on their problems and becoming better, enables avoiding legislation and regulation, and enables control and surveillance by them; meanwhile, the truth is that you don't suddenly turn 16-18 and know how to handle porn, gore, harassment and all other negative parts of social media. The negative sides to social media that are named as the reason for age verification and banning of social media for specific age groups also affect adults negatively . We need to put more effort into education on how to handle these things. Yes, we can protect children's privacy by banning them off of platforms, but this also affects their other (digital and offline) rights, and privacy rights don't trump all . Children and teens should learn and be encouraged to control their own spaces and moderation via FOSS : Matrix, Mastodon, etc. where they can also seclude from adults and aren't reliant on Big Tech. Age verification and banning would take this away from them and also make it harder for FOSS projects. If children only ever enter the political discourse as victims, the only response can be rescue; that it why we have to make sure they enter as participants. Protection is not (just) space away from the risk, but confronting the systems that cause harm and eliminating them. 16-18% of US citizens report having engaged romantically with a bot, 45% of them said it made them feel more understood, 36% said it gave them stronger emotional support than their human partner. Problem: Current version of AI Act doesn't cover romantic and sexual use, no guidance for safeguards for emotionally responsive AI systems that protects around the risk of suicide, crimes, distress when service slows down or shuts down or model changes, discrimination as you get more if you pay etc.; drafts mention some of it now in Art. 50. With all the talk around becoming emotionally dependent on AI, nudging into harmful behaviors, etc. we cannot forget that you are also vulnerable on other services and in human romantic relationships, where the same routinely happens (weak argument, but to be fair, I also often forget this). We also cannot forget that it is not always a replacement - it often just supplements social life, and there are also surprisingly many people who just don't want or need romantic or sexual relations with a human ; they want bots specifically , and only bots. Disclosure agreements (meaning: labels everywhere that this is just a bot and not real) are most often useless, because people know and intentionally seek it out (exception for Insta/Snap DMs etc.) Simplification for Whom? Unpacking the Consumer Impact of the Digital Omnibus ( x ) My Chatbot, My Confidant: Protecting User Privacy in Generative AI Conversations ( x ) Informed consent: The breakthrough in Art. 88b GDPR / Digital Omnibus and current initiatives in the field of PIMS and technical standardisation ( x ) Digital Legacy Beyond GDPR: Succession, Data Protection, Access Rights, and Platform Power ( x ) The Agentic Assistant: What does Big Tech’s goal of creating a universal digital intermediary mean for society? ( x ) Designing Collective Technology Governance ( x ) The digital omnibus is mostly there to enable AI made in Europe to aid sovereignty and be competitive with US and China; AI here needs a framework to access data without much regulatory risk - that is what the EU Commission person said. Enforcing the law and and making it sharper is actually leveling the playing field and furthering innovation, because there is a massive power concentration of a handful companies that can do what they want, barely pay fines, have the fines suspended because of the US government bargaining with the EU, or who see them as a cost of doing business. Competition is impacted this way, as small companies are hit harder than the big ones. If the omnibus goes through with changing definitions of personal data etc., it will take years for case law, literature, standards etc. to catch up, it wastes money in companies who need to re-do everything to comply; so it doesn't simplify anything and makes praxis harder. You may set ChatGPT/Claude/Gemini etc. to not send feedback or training data in your settings, but when you react thumbs down/up to their request of whether the output was good or not, or choose between two different versions, the entire chat log until then gets sent for training and potential human review. So, these popup feedbacks override your settings . I need to read more papers by Theodore Christakis. Here is one of them. US and UK discovery and disclosure laws/principles go directly against EU data minimization principles; as long as data is relevant to a case it should be accessible, which is why in their cases, they can just have access to million's of people's data if necessary, and in a divorce case, they have the right to ask for AI chatlogs. There is no AI protection or privilege: If you use AI for legal stuff, you have no expectation of confidentiality like you would with a lawyer, so it is not safe from discovery. There is tension between tracking for harmful behavior/threats vs. data privacy rights ; what if someone threatens to kill themselves, kill others, etc.? Should company look for it, track it, report it, alert anyone, suspend the account, send help resources? Still unclear. There is also tension between people wanting the bonus features/ease of use coming from pesonalization and free services, while also not wanting to be tracked or charged. Advertisers see themselves as enablers of a good thing, as people want fitting ads, good algorithms, good suggestions, and free access; so if their business model is challenged or fails, people will have worse access and worse user experiences in their view. They also fear that if their business model is hindered, things will move into a more extreme, embedded, hard to avoid direction that you don't control or decide (Black Mirror ad type of stuff). I previously wrote about Consenter on the blog, and one panel had people from it there and showing screenshots; changed my mind on it a lot and made me understand the new features and goal better, I will probably write an update on it some time. We have different other options all covering something different about tracking, cookies, consent, or going about things differently, old and new: ADPC, GPC, ConStand, Global Privacy Control, DoNotTrack etc.; important for new stuff is granular consent, sent to the website, user given explanations etc. Uninformed decisions and bad practices lead to unfair competition ; bad actors erode trust level overall, so users resignate, experience fatigue and say yes in the same rates between "good" and "bad" services. Will read soon: Our data after us by the CNIL , and future release: Model rules on succession and access to digital remains by Eigenmann und Harbinja Digital remains can be split into assets (copyright, crypto, business tools, money), personal (messages, photos, identities, AI replicas), and third party data. GDPR only addresses living people; dead people's digital remains are subject to member state laws. There might be a need for something harmonized and European, though. For good digital hygiene , we should remember death and make it as easy as possible or sensible for the people we leave behind to get the access they need to manage our stuff how we want them to. Leave instructions, set emergency/legacy access when available (Google, Facebook, Instagram and Apple have it), include digital assets in your will, decide how your data is allowed to be used after death, especially around AI replicas. Hospice, nurses, families etc. should learn to ask affected parties about these things. Thanks to the focus on agentic AI, there is massive need for inference compute, which is super expensive. Almost all of it is in the control of, or can only be afforded by, the hyperscalers. At the same time, anything that seeks to enable or disable things for AI agents on the web can also affect accessibility programs like screen readers. It is in the best interest of the Big Tech companies to keep things individual, because it distracts from the collective issues and changes they'd have to do; it is easier to blame the person for agreeing to tracking than make sweeping changes to how much can be tracked. Individual consent doesn't consider the fact that data doesn't just affect you, but reveals things about your family, friends, partners, coworkers and more, as data is deeply interconnected. If your friend agrees to share his data and it also includes you, that is your data, still going to the service you'd have disagreed to. We as users have no collective bargaining tools yet; even big worker unions aren't negotiating with Microsoft about the terms of their employer using Microsoft Teams, when they actually should. We should also build up data unions made from users who bargain with the platforms. Strikes could look like boycotting the service, blocking trackers, scrambling data, massive amounts of access requests etc. Look into something called a Worker Data Trust ; this was used to prove Uber's predatory dynamic pricing (Worker's Info Exchange). Lots of workers made access requests, the data was combined and analyzed by researchers. Data-driven warfare : AI, civilian risks, and corporate responsibility ( x ) Digital Omnibus meets the Charter of Fundamental Rights ( x ) Toward a Standard for Fair AI-driven Recruitment ( x ) Data protection law as a shield, not a weapon: empowering historically marginalized communities in the EU in times of de-regulation ( x ) -> this choice was especially rough, because I was also very interested in ' The U.S. Deregulatory Effect ' happening elsewhere at the same time The AI warfare one was a bit of a letdown, because they all just accepted war as a right, an inevitable thing that has to happen. There was not even a nuance of fighting war itself, or banning AI weapons, etc; it focused more on the dual nature of the data , in which through surveillance, tracking, etc. not only can military use it to target people, NGO's and others can use it to warn, evacuate, render humanitarian aid etc. and document realities on the battlefield. There was also no possibility for the idea that we could enter an age where drones fight drones automatically and no one needs to get hurt or be traumatized or get to kill people like a game, and that is only because everyone is so attached to the idea that war has to have human casualties. It's hard to legislate and restrict because the data is taken from a whole ecosystem : Telecommunications, cloud services, civilian infrastructure, social media etc. and most of the data is collected during times of peace. Warfare is often explained with national security as a reason, which then again is a legitimate interest or fulfills other opening clauses in data protection and privacy laws. It is a problem that the richest men in the world, close to the US admin, lead the biggest companies worldwide, almost all in the US, and control almost all of AI and AI warfare. Project Maven from 2017 was continuously developed on and is now the Maven Smart System , which was used in Venezuela and Iran recently. Our Art. 15 GDPR right of access as it is right now is making up for Germany and Austria's lack of discovery and disclosure rights respectively. Controllers can usually drag stuff out, cite trade secrets and rights of others to evade data access, but the data subject barely has any power. Not having to justify the access request and it not having to be limited to data protection rights is good in this regard and needs to be kept up. Otherwise, also too much confusion and court cases whether a request was abusive or not if now, any request for a court case instead of privacy rights is deemed possibly abusive. We don't only need to focus on reidentification in general, but about the ability to single people's data out; you might not be able to identify them, but you can build a profile anyway. Learned about the term digital twin , or in terms of user data, a data twin that can be used for similation and is similar enough. AI-act-standards.com exists. Many don't know that the AI Act isn't a GDPR for AI, but serves more as market classification, as it sorts AI into different boxes who have to fulfill different requirements. The details of these requirements are/will be set with CEN/ISO standards and frameworks . You can see the progress of development on these standards on that website, and what they cover and how they interact. Hovering over the elements gives additional info. This is done by the JTC21 , and you can also get involved by registering with your national standardization body (in Germany, this is DIN) or when they do public consultations. Disabled people experience both extremes of AI - better accessibility options, often more reliant on AI, so also more subject to surveillance and having their privacy rights violated, while bad governments can use the data to harm disabled people, all under the guise of research. Marginalized groups are often the first trial group in anything, while not being stakeholders in the tech, or even invited to the table. See: AI used in immigration etc. and with deregulation and AI everywhere, we see a loss of reasonable suspicion thresholds in law enforcement and other groups. Learned about adversarial auditing .

0 views
Tara's Website 1 weeks ago

Spring 2026 updates

Spring 2026 updates Servus from … a random hotel. I’m sitting cross-legged on the bed in my pyjamas, laptop on my legs. Mx Liebe is sitting beside me, a reminder that he is part of my portable home. I glance outside the window. The sun is starting to set, that small daily proof that the days are getting longer. Outside there is a large walled lorry park, all its lights turned on: a safe place for drivers to spend the night before entering their final destination.

0 views
Jack Vanlightly 1 weeks ago

Introducing Dimster, a performance benchmarking tool for Apache Kafka

Dimster = DIMensional teSTER for Apache Kafka On GitHub: https://github.com/dimster-hq/dimster Most of my career in distributed systems has been as a tester, performance engineer and formal verification specialist. I’ve written performance benchmarking tools in the past, for RabbitMQ and Apache Pulsar but in recent years I’ve used OpenMessagingBenchmark (OMB) to run benchmarks against Apache Kafka and other messaging systems. But OMB is hard to deploy and has several limitations compared to more sophisticated benchmarking systems I’ve developed in the past. With Claude becoming so much better since Christmas I decided to write a Kafka-centric performance benchmarking tool, with a lot of inspiration from OMB. I took the bits I like about OMB and the things I like about the tooling I’ve built in the past, to make a performance testing tool for testing Apache Kafka. In this post I’ll introduce some aspects of Dimster that are core to its design: Dimensional testing Shareable, self-contained results with reproducibility in mind Benchmark prep and post-processing Kubernetes as a standardized runtime A benchmarking and stress testing technique I’ve used for years is something I have called “Dimensional Testing”. We can think of all the configs and workload aspects as forming N-dimensional space. Within that space we can explore the impact of points in that space along a single dimension, or even co-varying dimensions. Take a config or an aspect of a workload as a dimension, and run a series of identical benchmarks where a set of points along that dimension are explored (while everything else remains the same). The dimension could be a client config, such as batch.size or acks. It could be an aspect of the workload such as number of consumers, type of consumer, number of consumer groups, the partition count, the produce rate and so on. There are hundreds of dimensions to explore, which requires some patience and care lest you become overwhelmed. The below depicts just three dimensions, and a set of three scenarios which test performance along one or two dimensions at a time. Fig 1. Three examples of varying or co-varying an aspect of a workload as dimensions Each of the above 16 test points (across 3 scenarios) is a separate benchmark, with a fresh topic, warm-up time, recorded time, and cooldown time etc. The generated charts for throughput and various latencies are repeated for each of the three scenarios, with each test point within a scenario plotted as a series/bar on those charts. This makes it easy to compare the performance results of varying the values of a single dimension (or co-varying values across multiple dimensions). Fig 2. Each scenario maps to a set of charts, with the test points as data series. With share groups being relatively new, I could compare the performance of regular consumers against share group consumers, with identical benchmarks where the dimension explored is consumer type (CONSUMER_GROUP|SHARE_GROUP). The following test has as the base workload of ten topics with each topic having 6 partitions, 6 consumers and 4 producers. Each scenario changes the producer rate, and compares consumer groups to share groups. Record keys are used, so batch sizes will be small, which is a tougher workload than a no-key test which typically results in larger batches. The charts below show the results for an EKS deployment with Kafka deployed on 3x m6i.2xlarge with 300 MB/s provisioned gp3. At 50 MB/s we see that p99 end-to-end latency is stable, with roughly 15 ms overhead for share groups. At 200 MB/s, p99 end-to-end exhibits peaks in a periodic fashion. Dimster uses environments. The sizing of a test is determined by which environment is used. I ran some share group consumer scaling tests, with full mTLS, on Kafka clusters assigned 2, 4, and 8 CPUs. These are the equivalent of vCPUs, as my Threadripper has SMT (hyperthreading) enabled. 2-CPU environment on my Threadripper: I ran the following workload with the above environment, with the CPU requests/limit of 2, 4 and 8. Then I used the dimster compare command to generate comparison charts based on the JSON result files of each run. Each chart compares each test point side-by-side. 10k msg/s - 1000 consumers (6th test point in 1st scenario) We see that 2 CPUs fare a lot worse than 4 and 8 CPUs. 100k msg/s, 250 consumers (4th test point, 3rd scenario) The 2 CPU cluster simply can’t keep up with 100k msg/s and 250 consumers. If we unselect 2-CPU, we see that 4-CPU and 8-CPU was ok. Dimster charts are interactive. Series can be toggled, time and percentile ranges can be selected. One thing I really like about OMB is that it produces a JSON file for the results. These files are easy to store and easy to share. But there was also a lot missing for full traceability and reproducibility. Dimster includes the following in every test campaign result (a set of files in a result directory): Results :  The JSON result file which contains all the test point performance results. For each test point, it includes the effective workload and client configuration. It also includes the hardware and other metadata to know what the benchmark was run against. A CSV file generated from the result JSON file (to make it easy to put in a spreadsheet or run custom visualizations). Source configs : The source workload file itself, as well as any additional files such as any dedicated client config file, the broker config file, the version of Kafka, the version of the Kafka clients, and the CPU/memory/disk given to the brokers and clients. Log files : the log files of dimster-core, the benchmarking framework, and each Kafka broker. Charts : Throughput and latency charts (clickable, zoomable) generated from the result JSON file. Dashboards : Grafana dashboards converted to interactive HTML files. I can run a test campaign then send you the results and you’ll be able to reproduce the results because you know exactly what was run and on what. The results are also completely self-contained, if you want to see the dashboard to look at Kafka metrics during the test, it’s right there as an HTML file in the results. No need for access to Grafana and Prometheus and no need to keep monitoring infrastructure around, it can be ephemeral. Dimster comes with four test modes (which all support dimensional testing): Run : Fixed throughput benchmarks, plus: Live-interaction . Run-mode also supports live interaction with the user. The user can change the producer rate, number of producers and consumers, message size, etc.  Availability : Optionally measure availability (producer/consumer/aggregate) during the standard run-mode benchmark. Explore : Discover the highest sustainable throughput while staying under a target end-to-end latency and percentile. Drain-backlog : Build a backlog and time how long it takes for the consumers to drain it. Optionally set a producer rate during the drain phase, such as when testing if a cluster is big enough to drain a backlog while under normal producer load. Correctness : Detects data loss, data corruption, out-of-order delivery and duplicates.  Example 1: Peak sustainable throughput, 1 partition, share group consumers Explore mode on my Threadripper. The idea was to see the bottleneck of a single partition, as consumers are scaled out. The rule was for p75 e2e latency to stay below 50ms. Example 2: Consumer group vs share group with 1 ms processing time The prior example was an unrealistic synthetic test where the consumer spent no time processing. This explore test added 1 ms consumer processing time per message with 300 consumers. It compared a 300 member consumer group with 300 partitions, vs a 300 member share group, with 5, 10, 25 and 50 partitions. Share groups managed the same throughput (95% of theoretical max based on 1 ms processing time and consumer count), on only 10 partitions. Consumers groups needed 300 partitions. Personally, explore and run are my bread and butter benchmark modes. For a given workload I usually start by finding the throughput limit where Kafka transitions from normal stable performance into degraded territory. I either use run mode and use live interaction to discover the performance limit, or I use explore which is slower but I can leave to run and it discovers the limit in an automated way. For latency benchmarks, once I know the limit, I can craft benchmarks that fit inside the performance envelope for that workload on the specific version of Kafka on the specific hardware I am using. The Dimster CLI has some commands that help before running benchmarks and for post-processing. Dimster resources command The resources command calculates the network and disk throughput required to service a workload. This is important in the cloud for selecting the right instances, ensuring that baseline network and disk throughput are greater than the workload’s demands. Dimster compare command Compare different runs that were executed on different hardware, different broker configurations, different broker versions etc. Dimster pivot command You can slice and dice the data any way you want based on the CSV data. However, you can also pivot the results and generate a chart with the pivot command. This compares the Nth test point across all scenarios. Dimster is easiest to use with Kubernetes. Dimster has a CLI you use from your laptop which speaks Kubernetes and leverages it to run benchmarks on any hardware, any cloud, any laptop or workstation using the exact same orchestration logic. All it needs is a properly configured k8s cluster. It could be minikube or k3d on a laptop or workstation, or AWS EKS or Google Cloud GKE or your own in-house cluster. You can tell Dimster to deploy Apache Kafka to a stateful set in the k8s cluster: Fig 3. Dimster architecture in full deploy mode Or point Dimster (deployed to k8s) at a Kafka service or in-house Kafka cluster. When testing a Kafka service, you can provision a single powerful instance for the Dimster coordinator and worker, and deploy them to a local k8s distro such as Minikube, K3d or Kind. A single worker will happily consume all the cores and memory you give it. Fig 4. Dimster architecture in external deploy mode Or run a super-slim full setup in a tiny minikube/kind/etc local k8s distro: Fig 5. Dimster deployed in a tiny local k8s cluster The workflow is the same. If you can provide a k8s cluster, then Dimster does the rest. Deployment is really simple, monitoring, gathering results, troubleshooting is all simplified via a mix of the CLI being relatively capable, and k8s providing a well-understood platform. K8s is not obligatory , you can run dimster-core directly as a Java program, and point it at a Kafka cluster already provisioned. But you lose many features such as monitoring, live-interaction, automatic gathering of logs, automatic chart and CSV generation and so on. However, you can use the post-processing command dimster chart to generate the charts of a result JSON file. Run the Java directly via the benchmark script: ./bin/benchmark -w path/to/workload file I will be publishing a blog post regularly about Dimster and what you can do with it. So stay tuned. I invite you to go and play around with Dimster , even if it's just running benchmarks on your laptop or workstation. You can get an idea of what charts get produced, what kinds of benchmarks you can run, trying out dimensional testing etc. The docs are pretty decent and should cover most of it. It’s fully featured but still a 0.X version. Myself and a Confluent colleague are the only ones who have run it thus far, so there may be bugs you encounter, if you do encounter a problem, please open an issue with repro steps. If you want to run serious benchmarks, you’ll likely need an EKS or GKE type of Kubernetes cluster. Dimster comes with a special CLI for EKS to deploy EKS with node groups for Kafka, Dimster workers/coordinator, Grafana/Prometheus, as well as storage classes for gp3.  While evaluating consumer group vs share group consumers, I’ve been running benchmarks in k3d on my beefy Threadripper 9980X workstation with 64 cores (128 threads), 256 GB RAM and an Samsung 9100 PRO 8TB SSD, which is plenty to run an entire medium sized Kafka cluster plus workers on it. I’ll be sharing some share group benchmarks tomorrow. Happy testing! Dimensional testing Shareable, self-contained results with reproducibility in mind Benchmark prep and post-processing Kubernetes as a standardized runtime Results :  The JSON result file which contains all the test point performance results. For each test point, it includes the effective workload and client configuration. It also includes the hardware and other metadata to know what the benchmark was run against. A CSV file generated from the result JSON file (to make it easy to put in a spreadsheet or run custom visualizations). Source configs : The source workload file itself, as well as any additional files such as any dedicated client config file, the broker config file, the version of Kafka, the version of the Kafka clients, and the CPU/memory/disk given to the brokers and clients. Log files : the log files of dimster-core, the benchmarking framework, and each Kafka broker. Charts : Throughput and latency charts (clickable, zoomable) generated from the result JSON file. Dashboards : Grafana dashboards converted to interactive HTML files. Run : Fixed throughput benchmarks, plus: Live-interaction . Run-mode also supports live interaction with the user. The user can change the producer rate, number of producers and consumers, message size, etc.  Availability : Optionally measure availability (producer/consumer/aggregate) during the standard run-mode benchmark. Explore : Discover the highest sustainable throughput while staying under a target end-to-end latency and percentile. Drain-backlog : Build a backlog and time how long it takes for the consumers to drain it. Optionally set a producer rate during the drain phase, such as when testing if a cluster is big enough to drain a backlog while under normal producer load. Correctness : Detects data loss, data corruption, out-of-order delivery and duplicates.

0 views
Max Bernstein 1 weeks ago

Travel notes: RubyKaigi Hakodate

I just got back from a three and a half week trip to Japan. It was the longest trip I have ever been on (aside from studying abroad in Germany, which felt different). I made the following wild circuit with only a backpack and a duffel: This trip was split into three parts: time with my immediate family, going to a conference, and then time with my partner. They were all great and also I am glad to be home. I’ll post my abbreviated travel notes here, including activity and food recommendations. We started in Tokyo but we were only there for about 40 hours. We focused our time mostly on arts and crafts: we did a kintsugi workshop, spent time at an artists cooperative, and then did a lot of walking around. This was a good intro to the trip, because everyone kept waking up at 4am and crashing at 7pm due to the jet lag. 4am wakeup makes for nice morning walks to 7-Eleven. I brought my family to T’s Tantan in Tokyo Station because I’m vegetarian and it’s otherwise hard to find ramen that approaches kosher in Japan. It continues to be great and I really appreciate having a steady vegetarian option available. Many years ago when I visited Tokyo there was a place that served a delicious tomato-based vegetarian ramen, but I hear it has since permanently closed. Bummer. We took the shinkansen to Kanazawa. I love the train. It’s fast. It’s quiet. You can eat your snacks on board and gaze out the window as the world whizzes by. It’s nice. We toured a soy sauce factory (meh; they don’t let you in the room where the magic happens) and the old town (pretty!) before finally eventually ending up at our small hotel in Toyama: Satoyama Auberge Maki No Oto. I highly recommend this hotel. It is beautiful, the staff is lovely, the food was excellent, and they were very accomodating of me being vegetarian. We continued on to Toyama, which is a port town. We got to talking with an older local guy who told us all about his favorite local spots. We learned after leaving that this guy has extraordinarily fancy taste and they were all either Michelin starred or at least Michelin rated and with a lead time of months. We opted to instead go to a local brewery, which had a ghost pepper beer (!) and pizza. We then moved on via train to Osaka, where we transferred to a car to head (eventually) to our hotel in the hills near Nara. We toured the Daimon sake brewery. They explained every little thing about the process, which was especially interesting to me, as I’ve done some small amount of homebrewing and I bake. They sounded similar. We had a tasting and even got to talk to Daimon-san. I recommend going. I also recommend the Akame 48 waterfalls walk/hike, which has some exquisite falls, and Murou Art Forest. They had some really wonderful installations. My brother and I parted ways from the rest of my family in Osaka: they headed further west and we headed north to Itō on the Izu peninsula. We got a surprise perfectly clear view of Fuji along the way. It’s beautiful there. They don’t seem to welcome foreigners in a lot of their restaurants (we were turned away several times) but one place had a guy who enthusiastically welcomed us in. We ended that evening enjoying a some food and a beer while also being stared at by a 300lb completely tattooed guy. It was a little unsettling but we left without incident. My brother and I made our way to Tokyo for the day before his flight and before my train north to Hakodate for RubyKaigi. I once again did that thing where I walked around in humid 80F heat with a large backpack and pants and was extraordinarily warm toward the end of the day. After about a liter of Aquarius on the train north I felt better. I stayed at Yunokawa Prince Hotel Nagisatei which I would like to especially call out for having an enormous, diverse, and very vegetarian friendly breakfast. Every morning I got to try new and tasty things and even feel full after. It was great. Hakodate is beautiful in the spring. I arrived at peak cherry blossom season and Goryokaku, their star shaped fort, is absolutely decked out in cherry blossoms. It is also moderately swarmed by tourists (in this case, three cruise ships). It didn’t feel over-crowded though. I enjoyed eating at The Bear King which had a vegetarian friendly option. The next day was the committer meeting. I don’t remember a ton from it other than people talking at length about the semantics of deep freezing an object (do you freeze its class? its class’s superclass? …?). I picked up my badge and also got to check out my colleague Chris Salzberg’s bar SOLENOID ! It’s a neat spot. I headed out to go find some dinner. This is about when I got a message on my phone that there was going to be an earthquake, so I walked back into the bar and said “hey, did you get this?” just before everything started shaking. It was the biggest earthquake I’ve experienced, but I was metaphorically not too shaken up. Then we got the tsunami warning. Chris’s bar is already something like 8 meters above sea level and at the foot of Mt Hakodate. With the city sirens going off and the police directing traffic with batons, though, I decided my best bet was just to march directly up the mountain to get more elevation. Since the tsunami wasn’t scheduled to arrive for about 20 or 30 minutes and my hotel was across the sea-level part of town, I parked myself on a little concrete post. Chris found me eventually. Someone told us that there was a middle school offering refuge, so we went and hung out on the side of the gymnasium. They were really nice about it. On Wednesday, the conference started. It was really well signed and organized. My usual complaint with conferences is that there’s nothing to eat for vegetarians (or that we get mashed with the gluten-free people and each group only gets a salad and bad bread) but that did not happen! They had really stellar vegetarian bento. They had a lot of leftovers toward the end of lunch so I even went and got a second. This was about when I started freaking out because my speaking slot was approaching and I wasn’t yet feeling my talk. Normally when I give a talk, I get up in front of people and I pace and gesticulate and productively complain and throw in some fun anecdotes and the audience, one way or another, ends up learning about JITs at scale, or Scheme semantics, or something. It’s what I’d done for my little lunch talk at Brown two weeks prior. I even titled that talk One must imagine compiler engineers happy so there was plenty of room for educational complaining. But this RubyKaigi talk was in front of an enormous crowd and toward a more general audience than I was used to addressing. The slides did not feel like they were flowing until about twenty minutes before my talk. In the end it went alright. I realized about 40 seconds in that I had way too much content so I ended up speaking rapidly for 30 minutes straight, completely unaware of the audience (which you can’t see anyway because of the lights). I only really noticed people when I made a dumb six-seven joke and Aaron laughed. The rest of the conference I was able to relax and enjoy other people’s talks. I got some good hallway track in, too. I think there’s a good group of people who are interested in Ruby tracing (for example, Perfetto in ZJIT ) so maybe we will make something happen. We had a nice small dinner at Yasai Bar Miruya , which was vegan (!) and had some nice sake. The host was very friendly, too. I nerd-sniped John and J into implementing a VM for the Universal Machine . This was a daunting homework assignment back in undergrad but it was a fun project later in life. S joined toward the end of the conference. She’s also vegetarian so we got some really excellent vegetarian ramen at MAIDO Ramen . Finally, S and I headed south on the shinkansen for Nikko. Nikko is small, beautiful, and a tourist day-trip town. Dinner closes early. Shops close earlier. Since we were staying there we had to make sure to track down and visit the one or two vegetarian places before they shuttered. S and I, along with J and J, took the bus up from Nikko, up the windiest switchbacks, to the Kegon Falls. We were going to take a boat across the lake, but the water level was too low for the dock on the other side, so we ended up half hiking and half taking a bus. Then we continued our hike through the Senjōgahara Marshland (beautiful), to the Yudaki Cascades (lovely), which also had a surprise restaurant and ice cream shop at the base! It’s called Yutaki Rest House . After some great (vegetarian friendly!! wow!!) udon, we marched up the waterfall and around Yuno Lake at the top to Yumoto Onsen. In order to make the last reasonable bus back to town, we just enjoyed putting our feet in the foot bath. One day was rainy. In the evening, J and I thought it would be fun to continue our Universal Machine implementations. As Norman Ramsey would say, “my implementation is 90 lines long and runs sandmark in under six seconds.” We also enjoyed doing a tour of the shrines right above Nikko. The shrines are resplendent against the backdrop of forest. Pro bus tip: you can either pay by IC card or credit card. No need to grab a ticket if you do that. S and I shipped our bags (thanks, Yamato) before continuing on to the small town of Moka, the staging area for our big pottery festival day. Unfortunately, there was no good way to get there: there was no reasonable series of trains and no taxi would take us. Ultimately we ended up taking the train to Utsonomiya and catching the long local bus to Moka. About twenty minutes into this ride, in the middle of nowhere, bus nearly empty, the bus driver pulled over and ran over to us looking kind of panicked. He asked where we were going and was visibly relieved when we said Moka. I suppose we are not the usual riders. Very nice of him. Upon arrival, S introduced me to CoCo ICHIBANYA, which is also super vegetarian friendly. I loved it. We ate really well before walking to our tiny hotel. We did not really know what to expect from the Mashiko pottery festival. The internet said it would be crowded and to arrive early, so we got up at 6:30am for estimated 7am departure on the tiny train from Moka to Mashiko. On most trains you can pay with an IC card but we were out in the sticks so we asked the only other guy on the platform how to pay for the train. He said he had no idea and that this was his first time here. When the train showed up completely packed to the gills and we had to (politely) push onto it, we started to realize that this was The Event and it was going to be mayhem. Also, fun fact: the way the Moka train payment works is that you grab a little ticket from the train, and, upon arrival, wait in line to present your ticket to two very overwhelmed looking people at a table, who charge you, and you pay in cash. Onto Mashiko: the festival was packed . There’s pottery everywhere the eye can see. There are tents and there are full buildings. It varies in quality and artistry from fine to jaw-droppingly spectacular. You could completely stock your kitchen from this fair alone and it would even be cost-effective. The main bummer for us is that we had to get pottery safely back home. We limited ourselves to a reasonable assortment but we really wanted to buy a beautiful painted 20 inch plate with a bird on a branch. After a ton of walking around, we took another long long bus back to Utsonomiya and continued onto Karuizawa. We didn’t know what to expect from Karuizawa but, having been, I could probably concisely describe it as “Aspen for people from Tokyo”. It was… fine. We loved our hotel, Tsuruya Ryokan. The manager was very excited when we borrowed a Studio Ghibli DVD from their collection. We continued on to Tokyo, our final stop. We our usual tour of stationery stores and bakeries—the bread was something to write home about (har har). We enjoyed a (vegetarian!! friendly!!) kaiseki meal at Hyoki Shabu-shabu Ginza before enjoying some live music at Rocky Top . We also recommend Jikasei MENSHO for vegetarian ramen. Bakery checklist: We had an uneventful and reasonably easy trip home. Whew. Long post for a long trip. See you next year in Miyazaki! BOUL’ANGE NIHONBASHI (check! good croissants) Bricolage bread & co (check! good everything) Brasserie Viron Marunouchi Beaver Bread Bricolage bread & co. Bartizan Bread Factory Gontran Cherrier Tokyo Aoyama Shop Comme’N Tokyo Shiomi Bakery The Little BAKERY <!– https://www.jocjapantravel.com/kanto-tokyo-bakeries/ –>

0 views
annie's blog 1 weeks ago

It’s either a poem or a piece of cheese // Week 20 — 2026

Are these weeknotes? Yes they are! Will I do them again next week? Who knows! Sunday 10 May: Got home from hospital shift around 7:30pm. Exhausted, hangry. Walked into a clean tidy home, flowers and cards, and the kids cooking dinner (spring roll bowls which were so so so good). Plus! a NEW CHAIR for the balcony. We ate and talked and did that thing where you laugh so hard you cry. Then I sat on my new balcony chair & had some nice bourbon while they cleaned everything up. Anyway it was a great Mother's Night 💗 More spaces in my life for uncensored unfettered thinking. Less platform, more workshop. Less stage, more garage. Less producing, more tinkering. Tuesday 12 May: Took a sick day. Felt off, sore throat, achy yesterday. Woke up with the full experience. This was to be an uncomfortably busy day and instead I am canceling all the things I can. Left with a couple of items to do from the comfort of the couch. Hot tea. Window open. Cats sitting in the sun. Breeze and blue sky outside. If I feel enough energy I’ll take a slow walk later. Dreamed about being evicted. Felt very real. Woke up panicked. Relieved to realize it was a dream and I have a two-year lease. Wednesday 13 May: Took my chemistry final. Not as difficult as anticipated! A relief, since I didn’t study as much as planned. “I want you to see all kinds,” he would say to her. “I want you to realize that this whole thing is just a grand adventure. A fine show. The trick is to play in it and look at it at the same time.” “What whole thing?” “Living. All mixed up. The more kinds of people you see, and the more things you do, and the more things that happen to you, the richer you are. Even if they’re not pleasant things. That’s living. Remember, no matter what happens, good or bad, it’s just so much” — he used the gambler’s term, unconsciously — “just so much velvet.” —from So Big by Edna Ferber Denial and suffering may be good methods for undoing the old / destructing but they are not good methods for creating / constructing what you actually wish to build. Thursday 14 May: Still sick. Tried to do a bit of work. Mostly just rested. Feeling somewhat better but end of day. Friday 15 May: Mara’s college graduation day. Those two years have flown by. Many feelings! So proud of her. Saturday 16 May: Lily’s birthday! A weekend full of celebrations. Took her and a group of friends to one of those combo bowling / laser tag / arcade / overstimulation places. They did all the things & had fun. I got some studying done. But is it doable? Sunday 17 May: Hiking church. Warm today, 70℉ when we started. Chubb Trail from West Tyson. It is a painful confession but the art of poetry carries its own power without having to break them down into critical listings. I do not mean that poetry should be raffish and irresponsible clown tossing off words into the void. But the very feeling of a good poem carries its own reason for being.  …primarily Art is its own excuse, and it’s either Art or it’s something else. It’s either a poem or a piece of cheese. —from On Writing , Charles Bukowski 💪 One gym session (Monday) before the sickness took me out Tues-Thurs, then it was A Weekend of Events. Back to our regularly scheduled program next week, I hope. 👟 A few short walks, and a nice hike. 📺 Unfamiliar (loved it) and season 1 of The Thaw (liked it, will watch the rest). Lots of tv time with sick days. 📚 So Big by Edna Ferber (finished) and On Writing by Charles Bukowski. 🔗 The old world of tech is dying and the new cannot be born // Baldur Bjarnason No matter the flavour of Christianity, a core idea baked into every aspect of the religion is that singular revelatory events can fundamentally change the world. There’s the “before”. Then the “event”. Then an “after” that has been completely transformed. In Christianity itself this is usually associated with Christ’s chaotic transit schedule –  “He is here! He has left! He is about to arrive again! Now he’s leaving again! But he’s also somehow always been here! And not.”  – but the mode of thinking is common throughout literature, philosophy, and storytelling in the Christian west. 🔗 Letting things build // Tracy Durnell The way I often read non-fiction — snatches of twenty pages here, twenty pages there, putting a book down for two months (or two years) at a time — is  not conducive to *finishing* books, but I do find it conducive to thinking . Rich texts can take a while to sink in, so I’ll jump to another book while I let the first one marinate. 🔗 You are here // Sebastian As I approach my topics and ideas through writing—whether in the form of brief notes or by looking back when I pick up the journal and flip through its pages—a process of contextualization takes place. And that is important. For me, this is a form of metacognition: observing myself as I think and being able to analyze and categorize my thoughts “from the outside.” It doesn’t completely solve the black box problem of self-perception, nor does it eliminate the blind spot of the mind that seeks to explain itself from within itself, but it does make things a lot easier and more accessible.

0 views

Freedom from unreal loyalties

In the work against war, Woolf notes that women—unlike many of their brothers—have four great but perhaps misunderstood teachers: And those teachers, biography indicates, obliquely, and indirectly, but emphatically and indisputably none the less, were poverty, chastity, derision, and—but what word covers “lack of rights and privileges?” Shall we press that old word “freedom” once more into service? “Freedom from unreal loyalties,” then, was the fourth of their teachers; that freedom from loyalty to old schools, old colleges, old churches, old ceremonies, old countries which all these women enjoyed, and which, to a great extent, we still enjoy by the law and custom of England. We have no time to coin new words, greatly though the language is in need of them. Let “freedom from unreal loyalties” then stand as the fourth great teacher of the daughters of educated men. Woolf, Three Guineas , page 267 These are strange teachers. We may be forgiven for not seeing them as such when they’ve visited us. Woolf continues: By poverty is meant enough to live upon: That is, you must earn enough to be independent of any other human being and to buy that modicum of health, leisure, knowledge and so on that is needed for the full development of body and mind. But no more. Not a penny more. By chastity is meant that when you have made enough to live on by your profession you must refuse to sell your brain for the sake of money. That is you must cease to practice your profession, or practice it for the sake of research and experiment; or, if you are an artist, for the sake of the art; or give the knowledge acquired professionally to those who need it for nothing. By derision—a bad word, but once again, the English language is much in need of new words—is meant that you must refuse all methods of advertising merit, and hold that ridicule, obscurity, and censure are preferable, for psychological reasons, to fame and praise. Directly badges, orders, or degrees are offered, fling them back in the giver’s face. By freedom from unreal loyalties is meant that you must rid yourself of pride and nationality in the first place; also, of religious pride, college pride, school pride, family pride, sex pride, and those unreal loyalties that spring from them. Directly the seducers come with their seductions to bribe you into captivity, tear up the parchments; refuse to fill up the forms. Woolf, Three Guineas , page 270 Woolf is echoing what we already know of wealth, fame, and loyalty—namely, that they encourage possessiveness and defensiveness, that they drive us to the violent defense of prestige and power, and that on that road lies war . We see this possessiveness and defensiveness in the whingeing insecurity of the leaders declaiming DEI; in the boss who insists his workers flatter his every decision, however foolish and arbitrary; in the patriarch who demands obedience from his wife and children; in the man who beats his partner when she tries to leave. (The most dangerous time for a woman in an abusive relationship is always when she is trying to leave.) Woolf, again: “the public and the private worlds are inseparably connected…the tyrannies and servilities of the one are the tyrannies and servilities of the other.” 1 If we are to prevent war in our public worlds, then we must also root it out in the private. And we must root it out among ourselves. For we are no more immune to the appeal of tyranny than anyone else: And the facts which we have just extracted from biography seem to prove that the professions have a certain undeniable effect upon the professors. They make the people who practice them possessive, jealous of any infringement on their rights, and highly combative if anyone dares dispute them. Are we not right then in thinking that if we enter the same professions we shall acquire the same qualities? And do not such qualities lead to war? Woolf, Three Guineas , page 249 In naming these teachers, Woolf transforms a proscription into a refusal. The lack of wealth becomes the refusal of it; the lack of fame, of prestige, of authority becomes the rejection of all those ugly and pernicious forces. (The one benefit of living in an era in which we are bombarded with the lives of the super wealthy is we cannot even for one moment forget that they are deranged.) By claiming that lack as a refusal, we release ourselves from longing for that which we can never have; we end a ravenous hunger that could never be sated. For had we great rank and great wealth and all the rest, we would be as eager for war as the warmongers, as miserable and unhappy as the billionaires. Without, we can see war for the horror it is; we can use our time and attention to imagine other worlds, and other roads to get there. I think these teachers go by other names—frugality, integrity, humility, and solidarity, to name a few. Like the best teachers, they ask a lot of us. Perhaps too much on some days; we may not always be able to hear them, especially through the din of the war drums and the noise of the platforms and the very real fear of precarity that screams ever so loudly in our ears. But I think perhaps that if we make an effort to listen, we will find that they still have much to teach us, that we still have much to learn. Woolf, Three Guineas , page 364  ↩︎ View this post on the web , subscribe to the newsletter , or reply via email . Woolf, Three Guineas , page 364  ↩︎

0 views
iDiallo 2 weeks ago

Software Engineers are Obsolete

In my first interview for a developer position, I shared a link to my personal project with the interviewer. It was a website for learning how to program. I created it from the ground up. I built the PHP app, designed the database schema, made a nice design to tie it all together. I wrote down my process, and it became the first tutorial on the site. Then I collected tutorials from all over the web and displayed them on my website, which acted as a portal. There was a section for PHP tutorials, for Ruby on Rails, for .NET, etc. Each one individually curated by me. My interviewer was so impressed. I got the job. Later, I added a section where anyone could submit their own tutorials. It was fascinating how quickly people found my website and started submitting links. The tutorials were coming in so fast that I removed the verification system and let people upload links directly. But then my mind wandered. What if I start a blog? Yes, I had another blog before this one. I built an entire blog engine from scratch. A colleague found my blog. He was so excited that he shared his own with me. At lunch, we would discuss ideas, and that same evening after work, we would buy a domain name and start a new project. We shared tips and tricks on how to rank on Google. We had a skill, being web developers, and we took full advantage. When we had an idea, we would fire up our computers that same night and build it. Friends and family would come to us for validation. We were the ultimate deciders of what was a good idea and what was a bad one. We were the gatekeepers. We knew how to program, and nobody outside our circle could say otherwise. Now, friends and family don't come to us anymore. They go straight to ChatGPT, and it tells them their idea is brilliant . They launch their favorite AI agent, which builds their entire product from a single prompt. Some of them even manage to host it on the web, accessible to the world, and they are seeing their first customers. People who used to confuse Java with JavaScript now tell me they have a platform. People who don't even know what programming is are standing at the forefront of software innovation, advocating, evangelizing, and making money. This skill I spent years honing has been made obsolete by everyday people. We, the developers, are no longer the gatekeepers. In fact, now we need to keep up or risk being left behind. Some commenters online tell me I'm just jealous, that I need to embrace progress. I don't want to be obsolete. I'm on openclaw, moltenclaw. I have accounts on all the video generation websites. I have accounts on ChatGPT, Claude, Gemini, and Mistral. Just as I'm getting a hang of one tool, my friend who works in a warehouse tells me, "just use Perplexity for that." But Perplexity isn't enough, because another friend says GenSpark is better. For some reason I can't sign into my Manus account anymore. And apparently, to get the most out of it, I need to get Meta Ray-Bans. Everyone is empowered, no one needs me, and that's that. The developer is now obsolete. But then, I opened LinkedIn. My peers, fellow developers who for some reason all have the word "AI" in their job title, are saying the opposite. "Developers are not losing their jobs to AI," they say. "Developers are losing their jobs to other developers who use AI." They are vibe-coping to the max. The history of technology has always been a story of nearly missing out. I remember another job I applied for and totally didn't get. The company had moved all their client-facing apps to Silverlight. If you're wondering what Silverlight is, you might understand why I chuckled when the interviewer described their plight: they were struggling to find developers to help them migrate to HTML and JavaScript. I'm fairly sure that chuckle is why they never called me back. It's one thing to embrace new technology. It's another thing entirely to put all your eggs in one basket. Companies are betting everything on Silverlight. Sorry, I mean AI. Without thinking through what happens if things don't pan out. AI has lowered the barrier to entry. That's a good thing. More people can now bring a fresh pair of eyes to the software engineering field. But there's a problem. Those new entrants won't become better engineers over time. Why? Because they are not writing code, not reading code, not debugging code. Their growth path, with time and experience, is to become better prompters. What this means is that, amid all the noise, my role as a software engineer may seem obsolete. But in the long run, we will be back to square one, where engineers writing code with their own meatware will hold all the cards. These are the people who learned the hard way: by reading documentation, by debugging broken apps, by having their seemingly perfect Stack Overflow question closed as a duplicate. These are the engineers who will hold the keys to software. Not because they're guarding secrets, there are no secrets. It's simply that the new developer is not, and will never be, interested in learning. While we pride ourselves on producing more software than ever, it doesn't take long to realize that software is never truly finished at delivery. It has to be maintained. It's strange, computers whose entire purpose is to repeat the same process over and over, perfectly, somehow manage to degrade over time. My tutorial website, seemingly working fine, returned an error when I visited it after months of neglect. I restarted all the services and brought it back up. It was now full of spam and NSFW URLs. An application that worked perfectly yesterday is broken today. It could be a memory leak, unexpected input, or just users with fat fingers. Your completed application is suddenly incomplete, and you have to fix it. In an ideal world, we wouldn't keep producing more software. We would have working software, and less of it to maintain. AI thrives on quantity. If you need me, I'll be in the back, patiently waiting for you to realize you can't prompt your way out of a Silverlight migration. My rates just doubled.

0 views

Where Are All The Data Centers?

If you liked this piece, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . My Hater's Guides To Private Credit and Private Equity are essential to understanding our current financial system, and my guide to how OpenAI Kills Oracle pairs nicely with my Hater's Guide To Oracle . My last piece was a detailed commentary on the circular nature of the AI economy — and how the illusion of AI demand is just that, an illusion.  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  During every bubble there’s one very obvious thing that keeps happening: things are said, these things are repeated, and are then considered fact. Sam Bankman-Fried was the smiling, friendly, “ self-made billionaire ” face of the crypto industry. NFTs were the future of art, and would change the way people think about the ownership of digital media. The actual evidence, of course, never lined up. NFT trading was dominated by wash trading — market manipulation through two parties deliberately buying and selling an asset to raise the price. Cryptocurrency never took off as anything other than a speculative asset, and altcoins are effectively dead . Sam Bankman-Fried was only a billionaire if you counted his billions of illiquid FTX tokens, but that didn’t stop people from saying he wanted to save the world weeks after the collapse of Terra Luna, a stablecoin that he himself had bet against and may have helped collapse .  Three months before his arrest, a CNBC reporter would fly to the Bahamas to hear SBF tell the story of how he “ survived the market wreckage and still expanded his empire, ” with the answer being that he had “stashed away ample cash, kept overhead low, and avoided lending,” as opposed to the truth, which was “crime.”  The point is that before every scandal is somebody emphatically telling you that everything’s fine. Everything seems real because there’s enough proof, with “enough proof” being a convincing-enough person saying that “most of FTX’s volume comes from customers trading at least $100,000 per day,” when the actual volume was manipulated by FTX itself , and the “$100,000 a day in customer funds” were being used by FTX to prop up its flailing token .  In the end, the “proof” that SBF was rich and that FTX was solvent was that nobody had run out of money and that nothing bad had happened to anybody. SBF was a billionaire sixteen times over because enough people had said that it was true.  Anyway, one of the most commonly-held parts of the AI bubble is that massive amounts — gigawatts’ worth — of data centers have both already been and continue to be built… …but then you look a little closer, and things start getting a little more vague. While Wood Mackenzie’s report said that there was “ 25GW of data center capacity added to the funnel ” in Q4 2025 does not say how much came online. CBRE said back in February that “net absorption of 2497MW” happened in primary markets in 2025 , with other reports saying that somewhere between 700MW and 2GW of capacity was absorbed every quarter of 2025. At the time, I reached out for any clarity about the methodology in question and received no response. Okay, so, I know data centers are getting built and that they exist . I believe some capacity is coming online. But gigawatts? Or even hundreds of megawatts? How much data center capacity is actually coming online?  Why did Anthropic get so desperate it took on a years old data center, xAI’s Colossus-1 , full of even older chips from a competitor — one whose CEO described the company as “evil, ” and that’s currently facing a lawsuit from the NAACP over allegations the facility’s gas turbines are polluting black neighborhoods ?  Remember, Colossus-1 is an odd data center, with around 200,000 H100 and H200 GPUs and an indeterminate amount of Blackwell GB200s, weighing in at around 300MW of total capacity… which isn’t really that much if we’re talking about gigawatts being built every quarter, is it?    So, I have two very simple questions to ask: how long does it take to build a data center, and how much data center capacity is actually coming online? These simple questions are surprisingly difficult to answer. There exists very little reliable information about in-progress data centers, and what information exists is continually muddied by terrible reporting — claiming that incomplete projects are “operational” because some parts of them have turned on , for example — and a lack of any investor demand for the truth. Hyperscalers do not disclose how many data centers they’ve built, nor do they disclose how much capacity they have available.  I find this utterly inexcusable, given the fact that Amazon, Google, Meta and Microsoft have sunk over $800 billion in capex (and more if you count investments into Anthropic and OpenAI) in the last three years . So I went and looked, and what I found was confusing. So, you’re going to hear people say “well Ed , data centers are being built ,” and what I’m talking about is data centers that have been fully constructed and then turned on . It’s really, really easy to find data centers that are under construction , but as I’ve discussed in the past, that can mean everything from a pile of scaffolding to a near-complete data center . Yet finding the latter is very, very difficult. I’ve spent the last week searching for data centers that broke ground in 2023 or 2024 that have actually been finished, and come up surprisingly empty-handed. Some projects are stuck in construction hell, eternally dueling with planning departments over permitting, some are chugging along with no real substantive updates, some, as is the case with Nscale’s Loughton, England data center, have done effectively nothing for the best part of a year , some are perennially adding more capacity to the order as a means of continuing raking in construction bills, and some are claiming their data centers are “operational” as only a single phase has turned on. You should also know that even once construction has finished, the buildings themselves must be fully filled with the necessary cooling, power and compute hardware, at which point it can be configured to meet a client’s specifications (which can take months), at which point the unfortunate soul building the facility can actually start making money. I think it’s also worth revisiting how difficult data center construction is, and how large these new projects are.  This starts with a very simple statement: nobody has actually built a 1GW data center (to be clear, it’s usually a campus of multiple buildings networked together) yet. There are campuses — such as Stargate Abilene — which promise to reach 1.2GW, but nearly two years in sit at two buildings at around 103MW of critical IT load each with, based on discussions with sources with direct knowledge of Abilene’s infrastructure , a third building sitting fully-constructed but with barely any gear inside it. It’s fundamentally insane how many different companies are trying to build these things considering how difficult even the simplest data center is to build. Take, for example, American Tower Corporation’s edge data center in Raleigh, North Carolina, which I’ll mention a little later. This is a 1MW facility — or one-thousandth the size of a gigawatt facility — occupying 4000 sq ft of real estate at first and expanding to 16,000 if ATC actually gets it up to 4MW. That’s about two-and-a-bit times larger than the typical American home . And, from ground-breaking to ribbon-cutting , it took eleven months to complete. And that’s not including all the other necessary time-consuming bits, like finding land, securing permits, and so on.  That’s a simple one. People want to build data center campuses a thousand times larger than that. Look at how difficult it is. In fact, it’s so difficult that the companies can’t build all of it at once. Larger data center campuses are almost always divided into “phases,” in part because that’s the smartest way to build them, and in part with the express intention of convincing you that they’re “fully operational.”  For example, CNBC’s MacKenzie Sigalos reported in October 2025 that Amazon’s Indiana-based (allegedly) 2.2GW Project Rainier data center was “operational,” but only seven out of a planned 30 buildings were actually operational, and her comment of “with two more campuses [of indeterminate capacity] underway.” This comment was buried two videos and 600 words into a piece that declared the data center was “now operational,” with the express intent of making you think the whole thing was operational. To give her credit, at least she didn’t copy-paste the outright lie from Amazon, which claimed that Rainier was “ fully operational ” in a press release the same day. You’ll also note that Amazon never provides any clarity about the actual capacity of Rainier. Sigalos did exactly the same thing when the first (of eight) buildings of Stargate Abilene opened, declaring that “OpenAI’s first data center in $500 billion Stargate project is open in Texas,” burying the comment that only one was operational with another nearly complete several hundred words earlier.  These are intentionally attempts to obfuscate the actual progress of the data center buildout, and if I’m honest, I’ve spent months trying to work out why big companies that were supposedly building large swaths of data centers would be trying to do so. Unless, of course, things weren’t going to plan. In its last (Q3 FY26) quarterly earnings call , Microsoft CEO Satya Nadella claimed that “[Microsoft] added another gigawatt of capacity this quarter, and [remained] on track to double [its] overall footprint in two years.” A quarter earlier , he claimed to have added “nearly one gigawatt of total capacity,”  with Karl Keirstead of UBS saying that he “...thought the one gigawatt added in the December quarter was extraordinary and hints that the capacity adds are accelerating.” As I’ll discuss below, I can find no evidence of anything more than a few hundred megawatts of Microsoft’s data center capacity coming online. While I’ll humour the idea that it doesn’t announce every new data center, and that there may be colocation and neocloud counterparties ( 67% of CoreWeave’s revenue comes from Microsoft, for example ) that make up the capacity, as I’ll also discuss, I don’t know where the hell that might be. So, to be aggressively fair, I asked Microsoft to answer the following questions on May 4, 2026: A Microsoft representative from WE Communications promised to "circle back" by 5PM ET on Monday May 4th, but did not return further requests for comment via text and email, which is incredibly strange considering the simple and straightforward nature of my questions. That’s probably because the vast majority of its publicly-announced or documented data center capacity doesn’t appear to be getting finished. In September 2025, CEO Satya Nadella claimed that Microsoft had added 2GW of capacity “in the last year,” and acted as if Fairwater, a project with two actively-constructed data centers with one in Wisconsin that broke ground in September 2023 and another in Atlanta that broke ground in July 2024 , was something to be “announced” rather than “a very expensive project that has taken forever.” Nadella also claimed that there are “multiple identical Fairwater datacenters under construction,“ though he neglected to name them. To be clear, “Fairwater” refers to a project where multiple data centers are linked with high-speed networking to make one larger cluster, a project that sounds ambitious because it is , and also unlikely because it’s yet to have been built.  Fairwater Atlanta — the latter of the Fairwaters — was “launched” in November 2025 and it’s unclear how much capacity it has. Cleanview claims it’s at 350MW of capacity , and Microsoft’s own community outreach page claims construction would be completed by the beginning of October 2025 , but, as I’ll get to, it’s unclear whether this is just one phase, given that reporting shows multiple other buildings still under construction . I have serious doubts that Microsoft stood up a 350MW data center in less than a year, given everything else I’m about to explain. Fairwater Wisconsin is also a data center of indeterminate size, but Cleanview claims Phase 1 is 400MW , quoting a story from FOX6 News Milwaukee from September 2025 that said that Microsoft was “investing an additional $4 billion to expand the campus,” featuring a video of a very much in construction data center saying the following: So, $3.3 billion — at a rate of around $14 million per megawatt per analyst Jerome Darling of TD Cowen — is about 235MW of capacity, which is a lot lower than 400MW.   Seven months later, Satya Nadella said that the Fairwater datacenter in Wisconsin was “going live, ahead of schedule,” a sentence written in the present tense, but also said that it “ will bring together hundreds of thousands of GB200s in a single seamless cluster,” which is in the future tense.  It’s a great time to remind you that Microsoft claims that it brought online roughly eight times that capacity (around 2GW) in the past six months.  To make matters worse, it doesn’t appear that Fairwater Wisconsin is actually operational. Ricardo Torres of the Milwaukee Journal-Sentinel reports that Microsoft has said it isn’t actually online , and that while there “...is equipment inside the data center conducting start-up opportunities…the company anticipates [they] will continue to happen for the next several weeks.”  Epoch AI’s satellite footage of Fairwater Wisconsin — which mentions  a completely wrong capacity because it’s uniquely terrible at calculating it ( it claimed Colossus-1 has 425MW capacity, for example) — notes that as of April 2026, one building appeared to be operational, with a second under construction. So, that’s one building in Wisconsin that might be complete, and based on the permitting application from August 2023 dug up by Epoch, the project is designed to have 117MW of capacity, which is a lot lower than 235MW. While Epoch didn’t have permitting for building two, it did for three and four, which are designed to have around 719MW of capacity , and as of April 2026 still appear to be slabs of concrete.  In simpler terms, there’s at most around 117MW of capacity running at Fairwater Wisconsin. The Fairwater data centers are Microsoft’s most-publicized data centers, yet they’re shrouded in secrecy, with the Atlanta Journal-Constitution having to file an open records request to find the site being developed by QTS, a data center developer owned by Blackstone . Videos of Fairwater Atlanta from last November show a giant campus with two large buildings and a patch of yet-to-be-developed dirt. DataCenterMap refers to it as “ under construction .” Epoch AI’s satellite footage notes that as of February 2026, building four’s roof was complete and “all mechanical equipment appears to be installed,” but “there is still a lot of construction activity around the building.”  Based on air permits filed as part of the project (that Epoch found), it appears that each building is powered by a number of Caterpillar 3516C Generator Sets at around 2.5MW each, with building one having 47 (117.5MW), building two having 13 (32.5MW), building three having 30 (75MW), and building four having 35 (87.5MW). If we’re very generous and assume that three buildings are complete, that means that Fairwater Atlanta is at around 225MW of capacity (not IT load!). So, that’s about 342MW of data center capacity being built by one of the largest companies in the world, in its most-publicized and written-about data centers. Put another way, for Microsoft to come remotely close to its so-called 2GW of capacity in the last six months, it will have had to bring online a little under six times that capacity. I’m calling bullshit. I really did want Microsoft to give me some answers, but I’m very confused as to how it can remotely claim it brought even a gigawatt of capacity online in the last year. I also question whether Microsoft is actually building multiple other “identical” Fairwater data centers, as I can’t find any announcements or pronouncements or mentions or hints as to where they might be. In fact, I’m having a little trouble finding where else Microsoft has been building data centers, and those I can find are extremely suspicious. In Microsoft’s announcement of its Wisconsin data center , it mentioned two other projects — one in Narvik Norway that had already been announced months beforehand by OpenAI , and another with Nscale in Loughton, England that was also announced by OpenAI that very same day as part of the entirely fictional Stargate project . If you’re wondering how those are going, Microsoft had to take over the entire Narvik project (which does not appear to have started construction) from OpenAI , and the Loughton data center ( which OpenAI also backed out of ) is currently a pile of scaffolding . For two straight quarters , Microsoft has said it’s brought on an entire gigwatt of capacity,and I have to ask: where?  Because when you actually look at the projects it’s announced, very little appears to have been built, and that which has is nowhere near its theoretical capacity. To be specific about what Microsoft is claiming, it’s saying it’s brought around 4GW of capacity online in the space of two years, and at a 1.35 PUE, that’s about 2.96GW of critical IT load, which works out to the power equivalent of around 284,600 H100 GPUs, which may be possible — after all, Microsoft apparently bought 450,000 H100 GPUs in 2024 — but I can’t find much evidence of data centers that could house that many GPUs, nor that might be in construction.  Let’s dig in. Microsoft broke ground on three data centers in Catawba County North Carolina in 2024 — one in Hickory, another in Lyle Creek, and another in Boyd Farms: Alright, maybe I’m being unfair! Maybe it’s just a North Carolina problem. There must be another that broke ground and got built…right?  Microsoft also broke ground on a data center in Quebec City, Canada in September 2024 , and as of April 2026 , “generator testing has been completed,” and “civil works will continue until Autumn 2026.”  Okay, well, maybe it’s a Canada problem. What about Microsoft’s New Albany, Ohio data center that broke ground in October 2024 ? Well, as of March 2026, “spring activity would resume,” and “beginning soon, soil will be delivered to the site via a designated truck route. I’ll note that Microsoft specifically says that Ames Construction is currently leading it, and that it will “resume the lead role in project communications” once the final phase of construction is done at some unknown time. Alright, well, how about the August 2025 ground breaking in Cheyenne, Wyoming that was allegedly “ due to launch in 2026 ”?  Well, Microsoft hasn’t updated its community page since it said there’d be a community meeting planned for November 2025 and that “neighbors within the vicinity will be notified ahead of construction,” which sounds like construction is yet to commence. Not to worry though, it announced on April 14, 2026 that it planned to expand it to “ accelerate innovation and economic growth ” How about that 2023-announced Southwest Hortolândia Brazil data center ? That’s right, the last update was in September 2025 , and the update was “construction activities continue to progress in alignment with local regulations.” A piece from Folha De S.Paulo from March 2026 mentioned that Microsoft “had begun operating its first artificial intelligence data centers in Brazil,” but satellite footage shows that it’s barely finished. What about the Newport, Wales data center it announced in 2022 ? Well, as of November 2025, a politician was standing on a concrete slab saying how many jobs it’ll theoretically bring in , which it won’t. What about Microsoft’s four data centers in Irving, Texas, announced December 2024 ? The best I’ve got for you is a news report about a data center in Irving Texas breaking ground in January 2025 . Its San Antonio data center, announced in July 2024 ? Well, construction was underway as of December 2025 , and it appears that construction will begin in the summer of 2026 on another one in the area. How about the two data centers outside of Cologne, Germany , announced in November 2024? Well, as of September 2025, Microsoft has… plans to build one of them ? …what about the 900 acres of land it bought in June 2024 in Granger, Indiana ? Great news! According to 16NewsNow , Microsoft officials “could break ground on a proposed data center…in late April or early May [2026].” How about Project Ginger West, a data center planned in Des Moines. Iowa since March 2021 ? Hope you like waiting , because Microsoft itself says that it’s estimated to finish construction in Summer 2028 . Ginger East , announced a few months later? Mid-2028 . Project Ruthenium ( announced 2023 )? I don’t have shit for you I’m afraid. Rutheniumkanda Forever! This company claims it’s built four fucking gigawatts of capacity , but when I go and look to see what it’s actually built I’ve failed to find a single announced data center from the last three years that got turned on outside of its Fairwater Atlanta and Wisconsin sites. To be clear, all of these sites are somewhere in the 200MW to 300MW range. For Microsoft to have brought online 4000MW of data center capacity in the last two years would require it to have completed thirteen or more of these projects, all while choosing not to promote them, with every project operating in such a veil of secrecy that no local or national news outlet reported a single one of them.  I truly cannot work out how Microsoft has brought on any more than 500MW of capacity in the last year based on my research, and think Microsoft is deliberately obfuscating whether said capacity was contracted rather than actively in-use , much like CoreWeave refers to itself having 3.1GW of “ total contracted power ” but only added 260MW of active power capacity in a single quarter at the end of 2025.  However, the exact verbiage used in Microsoft’s earnings transcripts is that it “added another gigawatt of capacity,” which sounds far more like it’s saying it brought them online… …but it didn’t, right? It obviously hasn’t. Where are all the data centers, Satya? Where are they? Why are your PR people too scared to tell me?  No, really, where are they?  So, to be fair, analyst Ben Bajarin, one of the more friendly pro-AI posters, argues that actually all of that capacity is secretly behind-the-scenes , something I’d humour if there was any kind of paper trail to a bunch of Microsoft data centers that were secretly being built.  I’d also be more willing to humour it if any of the data centers that have been publicized as “breaking ground” had actually been finished, or if both Fairwater Atlanta and Wisconsin weren’t so deceptively-marketed. My only devil’s advocate is that Microsoft could, in theory , be working with colocation partners to stand up several gigawatts of capacity through shell corporations and SPVs, but even then , not a single one has any sort of trail to Microsoft? All of that capacity?  It’s really, really weird, and the only answers I get are smug statements about how “Fairwater is ahead of schedule.” But if I’m honest, I’m having trouble even making these numbers add up. Considering how loud, offensive and conspicuous the AI bubble has become, it feels like we should have a far, far better understanding of how much actual capacity has been built. I also think it’s time to start being realistic about how long these things are taking to build. For example, I was only able to find a few data centers that for sure, categorically, definitively opened, and for the most part, it appears that a data center takes around 18 months to go from groundbreaking to opening. And these, I add, are all facilities that are relatively modest — at least, when compared to the kinds of gigawatt-scale campuses that are reportedly in active development.  Digging deeper, I found a lot of projects stuck in development Hell: While there are absolutely data centers under construction , and some, somewhere , are actually being completed , the vast majority of projects I’ve found are either in a mysterious limbo state or, in most cases, under construction years after breaking ground. Across the board, the message seems to be fairly simple: it takes about 18 to 24 months to build any kind of data center, and the bigger they are, the less likely they are to get completed on schedule. Those that actually “come online” aren’t actually fully constructed, but have brought on a single phase — something I wouldn’t begrudge them if they were anything close to honest about it. In reality, data center companies actively deceive the media and customers about the actual status of projects, most likely because it’s really, really difficult to build a data center. In any case, what I’ve found amounts to a total mismatch between the so-called “rapid buildout” of AI data centers and reality.  It also doesn’t make much sense when you factor in how many GPUs NVIDIA sold. In October last year, NVIDIA CEO Jensen Huang told reporters that it had shipped six million Blackwell GPUs in the last four quarters , though it eventually came out that he was counting two cores for every GPU , making the real number three million. I disagree with the framing, I think it’s incoherent and dishonest, but I’ve confirmed this is what NVIDIA meant. In any case, if we assume two cores per GPU, a B200 GPU has a power draw of around 1200W, for around 3.6GW of IT load for 3 million of them. I realize that NVIDIA also sells B100 and B300 GPUs (similar power draw) and NVL72 racks of 72 GB200 GPUs and 36 CPUs, but bear with me. Blackwell GPUs only started shipping with any real seriousness in the first quarter of 2025, which means that a good chunk of these data centers were built with H100 and H200 GPUs in mind. Nevertheless, I can find no compelling evidence that significant amounts — anything over 500,000 GPUs — of Blackwell-based data centers have been successfully brought online.  When I say I struggled to find data centers that had been both announced and brought online, I mean that I spent hours looking, hours and hours and hours, and came up short-handed.  I want to be clear that I know that there is Blackwell capacity actually being built , and believe that the majority of that capacity is retrofits of previous data centers, such as Microsoft’s extension to its Goodyear Arizona campus which it began building in 2018 that likely houses Blackwell GPUs. But I no longer believe that the majority of Blackwell GPUs are doing anything other than collecting dust in a warehouse. Blackwell GPUs require distinct cooling, a great deal more power than an H100, and cost an absolute shit-ton of money, making it unlikely that a 2023 or early-2024 era data center could handle them without significant modifications. I fundamentally do not believe more than a million — if that! — Blackwell GPUs are actually in service.  If that’s the case, NVIDIA is likely pre-selling GPUs years in advance — experimenting with the dark arts of “ bill-and-hold ” — and helping certain partners like Microsoft install the latest generation to create the illusion of utility, availability and viability that does not actually exist. If I’m honest, I also have serious questions about the current status of many H100 and H200 GPUs. Based on what I’ve found, I’d be surprised if more than 3GW of actual capacity was turned on in the last two years, which means that NVIDIA has sold anywhere from double to triple the amount of GPUs that the world can hold. While the Anthropic-Musk compute deal is an obvious sign about xAI’s lack of demand for compute, it’s also, as I mentioned earlier, a clear sign that AI data centers are mostly not getting finished, and those that do get finished are taking two or three years even for smaller builds. While it sounds a little wild, I think in reality only a few hundred megawatts — if that — of actual, usable AI compute capacity is being spun up every quarter. If I was wrong, there’d be significantly more progress on, well, anything I could find.  Why can’t Microsoft offer up a data center that isn’t called Fairwater, and why are its Fairwater data centers taking so long? How much actual capacity has Microsoft brought online? Because it certainly isn’t fucking 2GW in six months. I’m willing to believe that Microsoft has a number of collocation agreements with parties that don’t disclose their involvement. I’m also willing to believe that Microsoft doesn’t publicize every single data center it’s building or has built.  2GW of capacity is a lot. It’s nearly ten times the (likely) existing capacity of Fairwater Atlanta. If Microsoft is bringing so much capacity online, why can’t we find it, and why won’t they tell us? And no, this isn’t some super secret squirrel “they’re building secret data centers for the government” thing, it’s very clearly a case where “capacity” refers to “something other than data centers that actually got brought online. Despite their ubiquity in the media, AI data centers are relatively new concepts that are barely five years old. They are significantly more power-intensive than a regular data center, requiring massive amounts of cooling and access to water to the point that the surrounding infrastructure of said data center is often a massive construction project unto itself.  For example, OpenAI and Oracle’s Stargate Abilene data center is (in theory) made up of two massive electrical substations , a giant gas power plant and eight distinct data center buildings, each with around 50,000 GB200 GPUs, at least in theory. Every data center requires that power exists — as in it’s being generated in both the manner and capacity necessary to turn it on, either through external or grid-based power — and is accessible at the data center site. This means that every single data center, no matter how big, is its own construction nightmare. You’ve got the power, the labor, the permits, the planning, the construction firm, the power company, the specialist gear, the temporary power (because on-site power is slow ), the backup power (because you can’t just rely on the grid for something you’re charging millions for!), the cooling, the uninterruptible power supplies — endless lists of shit that needs to go very well or else the bloody thing won’t work. These are very difficult and large projects to complete. Edged Computing’s (theoretically) 96MW data center in Illinois is 200,000 square feet in effectively two large squares. For comparison, every single inch of gambling space in Caesar’s Casino Vegas is around 130,000 square feet . These things are fucking huge, fucking difficult, and fucking expensive, and all signs point to capacity not coming online.  Let’s go back to Anthropic mopping up Musk’s fallow data center capacity, which stinks of desperation for both companies. If there were modern data centers full of GB200s being turned on and available anywhere in the next month or two, wouldn’t it be more financially prudent to wait for it, even if it’s just on an efficiency level? A franken-center made up of H100s and H200s with some GB200s stapled onto the side feels like a stopgap solution. I have similar questions about the results of adding this capacity — that “...Anthropic plans to use [it] to directly improve capacity for Claude Pro and Claude Max subscribers ,” “doubling” (whatever that means) the 5-hour rate limit and removing the recently-added peak rate limits.  What’s the plan here, exactly? Less than a month ago Anthropic’s Head of Growth, Amol Avasare , said that Anthropic was “looking at different options to keep delivering a great experience for users” because Max accounts were created before the era of Claude Code and Cowork . How does adding 300MW of capacity magically resolve that problem? Was that always the plan?  Or was this a knee-jerk reaction to the surging popularity of OpenAI’s Codex ? Because the original justification for peak hours was that Anthropic needed to manage “ growing demand for Claude ,” demand that I bet Anthropic claims hasn’t gone anywhere. It’s also important to remember that last year, OpenAI’s margins (which are already non-GAAP), per The Information , were worse than expected because (and I quote) it had to “..to buy more expensive compute at the last minute in response to higher than expected demand for its chatbots and models.”  In other words, Anthropic has deliberately tanked its already-negative 2026 gross margins by desperately buying the fallow compute from a company whose CEO threw up the nazi salute , called the company “ misanthropic and evil ,” and has the “right to reclaim the compute” if Anthropic “engages in actions that harm humanity.” Surely you’d wait a few months for some new, less tainted source of compute, right? And surely it wouldn’t be such a big deal, because new data centers get switched on every day, right?  So, let’s get to brass tacks. Anthropic and OpenAI have now committed to spending $748 billion across Amazon Web Services, Google Cloud, and Microsoft Azure , accounting for more than 50% of their remaining performance obligations. The very future of hyperscaler revenue depends both on Anthropic and OpenAI’s continued ability to pay and both of them having something to actually pay for.  I also think it’s fair to ask why Microsoft’s theoretical gigawatts of new compute aren’t producing tens of billions of dollars of new revenue.  Microsoft’s $37 billion in annualized AI run rate (sigh) is mostly taken up by OpenAI’s voracious demands for its :compute , and only ever seems to expand based on OpenAI’s compute demands and the now 20 million lost souls paying for Microsoft 365 Copilot . There’s supposedly incredible, unstoppable demand for AI compute, and Microsoft is apparently sitting on gigawatts’ worth , but somehow those gigawatts don’t seem to be translating into gigabillions , likely because they don’t fucking exist. All of this makes me wonder what Google infrastructure head Amin Vahdat meant last November when he said that Google needed to double its capacity every six months to meet demand . Many took this to mean “Google is doubling its capacity every six months,” but I think it’s far more likely that Google is taking on capacity requests from Anthropic that are making said capacity demands necessary. Similarly, I think CEO Sundar Pichai’s comment that it would have made more money had it had more capacity to sell was a manifestation of a distinct lack of new capacity rather than a result of bringing on swaths of new data centers that immediately got filled. I also need to be blunt on two things: Look, I know it sounds crazy, but I’m telling you: I don’t think very many data centers are coming online! While I keep wanting to hedge my bets and say “I bet a few gigawatts came online,” I cannot actually find any compelling literature that backs up that statement. I’ve spent hours and hours looking, and I’ve come up with a few hundred megawatts delivered in the past two years. Every major project is stuck in the mud, a phase or two in, or facing mounting opposition from locals that don’t want a Godzilla-sized cube making a constant screaming sound 24/7 so that somebody can generate increasingly-bustier Garfields.  I’m not even being a hater! It’s just genuinely difficult to find actual data centers that have been announced that have also been fully turned on.   So, humour me for a second: if hyperscalers are bringing on hundreds of megawatts of capacity a year, then that means that the ever-growing quarterly chunks of depreciation ripped out of their net income are just a taste of what’s to come. Last quarter, Google’s depreciation jumped $400 million to $6.482 billion, with Microsoft’s jumping nearly a billion dollars from $9.198 billion to $10.167 billion, and Meta’s from $5.41 billion to $5.99 billion. While Amazon’s technically dropped quarter-over-quarter, it still sat at an astonishing $18.94 billion. Remember: depreciation only increases when an item is actually put into service. If Microsoft, Google, Amazon and Meta are sitting on tens of billions of yet-to-be-installed GPUs, and said GPUs are only being installed at a snail’s pace every quarter, that means that these depreciation figures are set to grow dramatically. In fact, year-over-year, Google’s depreciation has jumped 30.7%, Amazon’s 24.7%, Microsoft’s 23.9%, and Meta’s an astonishing 34.9% .  And that’s with an extremely slow pace of deployment.  I do kind of see why the hyperscalers are sinking capex into these big AI infrastructure gigaprojects now, though. Shareholders are currently tolerating the capex because they think stuff is coming online, and that’s where the “incredible value” is. When a $20 billion or $30 billion a quarter depreciation bill first rears its head — as I said, Amazon is close, reporting $18.945bn in depreciation and amortization expenses in the most recent quarter — it’ll become obvious that the only people seeing value from AI are Jensen Huang and one of the massive construction firms slowly building these projects.  Actually, it’s probably important to state that I don’t think the majority of these projects are doing anything untoward I just don’t think any of them realized how difficult it is to build a data center, and unlike basically any other problem the tech industry has ever faced, simply throwing as much money as possible at it doesn’t really change the limits of physical construction.  I think every one of these data center projects is its own individual construction nightmare, and thanks to the general market psychosis around the AI bubble, nobody has thought to question the core assumption that these things are actually getting built. With all that being said , I’m not sure that anyone building these things is moving with much urgency either. Perhaps they don’t need to — perhaps hyperscalers are happy, because they can continually string out both the AI narrative and put off those massive blobs of depreciation. But we really do need to reckon with the fact that nearly two years in, Stargate Abilene has only two buildings’ worth of actual, operational, revenue generating capacity, and nobody has given me an answer as to how it doesn’t have even a quarter of the 1.7GW of power it’ll need to turn everything on , if it ever gets fully built. Maybe they can really pick up the pace, but as of early April, barely any actual gear was in the third building.  And then we get to the other problem: Oracle. As I’ve discussed before, Oracle is building 7.1GW of total capacity for OpenAI , and keeps — laughably! — saying 2027 or 2028, when at this rate, Stargate Abilene won’t be done until mid-2027, and the rest either never get finished or are done in 2030 or later.  This is setting up a horrifying situation where Oracle desperately needs OpenAI to pay it for capacity that doesn’t exist, and if it ever gets built, it’s likely to be years after OpenAI has run out of money, which is the same problem that Microsoft, Google, and Amazon have with their $748 billion of deals with Anthropic and OpenAI, though thanks to the $340 billion or more necessary to build the Stargate data centers, Oracle’s problems are far more existential. I’ve repeatedly — and correctly! — said that the problem is that these companies didn’t have the money to pay for their capacity, but Oracle lacks Microsoft or Google’s existing profitable businesses to fall back on if these data centers are delayed, with its existing business lines plateauing and its only real growth coming from theoretical deals with OpenAI and GPU compute with negative 100% margins .  Anthropic’s desperation for new sources of  compute also suggests that it’s bonking its head against the limits of its capacity, and will continue to do so as long as it continues to subsidize its users . I also think that the slow pace of construction will eventually lead to OpenAI facing similar problems. These companies need to continue growing to continue to raise the hundreds of billions of dollars in funding necessary to pay Oracle, Google, Microsoft, and Amazon their respective pounds of flesh.  It’s now very clear that the whole “inference is profitable” and “most compute is being used for training” myths are dead, because if they weren’t, Anthropic would either need way more compute or way higher-quality compute. Colossus-1 was specifically built as a training cluster, yet its current use is “reduce rate limits for our subsidized AI subscriptions,” which is most decidedly inference provided by three-year-old hardware . Despite writing over 9000 words and driving myself slightly insane trying to find out, I still haven’t got an answer as to how much actual data center capacity has come online. Hyperscalers have clearly been retrofitting old data centers to fit their new chips, and based on my research, I can find no compelling evidence that they’ve added more than a few hundred megawatts a piece since 2023.  What I do know is that, across the board, a data center of anything above 50MW (or lower, in some cases) takes anywhere from 18 to 36 months to complete, and nobody has actually built a gigawatt data center despite how many people discuss them. For example, Kevin O’Leary — known as “Mr. Dogshit” to his friends — is allegedly building a 9GW data center in Utah , but he may as well say that he’s building a unicorn that shits Toyota Tacomas, as doing so is far more realistic than a project that will likely cost $396 billion, assuming that locals and bankers don’t drag him to The Other Side like Dr. Facilier .  Nobody has built a 1GW data center, so I severely doubt Mr. Dogshit will be able to do anything other than create another scandal and lose a bunch of people’s money. In other words, any time you hear about a “new data center project,” add a year or two to whatever projection they give. If it’s 2027, assume 2029, or that it never gets built. Anything being discussed as “finished in 2030” may as well not exist. In any case, what I’m suggesting is that very, very few data centers are actually getting finished, and if that’s true,  NVIDIA has sold years worth of chips that are yet to be digested.  And if that’s true, somebody is sitting on piles of them.  I’m trying to be fair, so I’ll assume that an unknown amount of data centers got retrofitted to fit Blackwell GPUs. But I also refuse to believe that even half of the three million Blackwell GPUs that got shipped have actually been installed. Where would they go? You can’t use the same racks for them that you would with an H100 or H200, because Blackwell requires so much god damn cooling. Another sign that these things aren’t actually getting installed is Supermicro’s $1.4 billion or so of B200 GPUs left in inventory from a canceled order from Oracle .  Why not? Isn’t this meant to be a chip that’s extremely valuable? Isn’t there infinite demand? Is there not a place to put them? Apparently Oracle wanted to use faster GB200 GPUs from Dell , but why aren’t there other customers lining up to buy these things?  Also… how was Oracle able to cancel an order of over a billion dollars’ worth of GPUs?  Can anybody do that? Because if they can, one has to wonder if this doesn’t start happening as people realize these data centers aren’t getting built. Pick a data center. It’s probably barely under construction, or if it’s “finished” it’s actually “partly done” with no real guide as to when the rest will finish.  Remember that $17 billion deal with Microsoft and Nebius signed ? The one that’s a key reason why Nebius’ stock is on a tear? Well, its existence is based on the continued construction of a data center out in Vineland, New Jersey facing massive local opposition, and multiple sources now confirm that construction has been halted due to local planning issues. The data center is horribly behind schedule already, and Microsoft has the option to cancel its entire contract if Nebius fails to meet milestones . That data center is a major reason that people value Nebius’ stock! It cannot make a dollar of revenue without its existence! It has the funds and blessing of Redmond’s finest — the Mandate of Heaven! — and it can’t get things done! This is bad, and indicative of a larger problem in the industry — that it’s really difficult to build data centers, and for the most part, they’re not being fully built! You’ve heard plenty about data centers getting opposed and canceled — how about ones that fully opened? No, really, if you’ve heard about them please get in touch, because it’s really difficult to find them. Why don’t we know? This is apparently the single most important technology movement since whatever the last justification somebody made up was, shouldn’t we have a tangible grasp? Because the way I see it, if these things aren’t coming online at the rate that people think, we have to start asking for fundamental clarity from NVIDIA about where the GPUs are, and when they’re coming online.  NVIDIA’s continually-growing valuation is based on the conceit that there is always more demand for GPUs, and perhaps that’s true, but if this demand is based on functionally selling chips two years in advance. That makes NVIDIA’s yearly upgrade cadence utterly deranged. Buy today’s GPUs! They’re the best, for now, at least. By the time you plug them in they’re gonna be old and nasty. But don’t worry, it’ll take two years for you to install the next one too! To be clear, Blackwell GPUs are absolutely being installed! But three million of them?  People love to use “enough to power two cities” to illustrate these points, but I actually think it’s better to illustrate in real data center terms.  Stargate Abilene has taken two years to build two buildings of around 103MW of critical IT load. 3 million B200 GPUs works out to about 3.6GW of IT load. Do you really think that nearly thirty five Stargate Abilene-scale buildings were built in 2025? If so, where are they, exactly? You may argue that other data centers are smaller, and thus it would be easier to build. So why can’t I find any examples of where they’ve done so?  By all means prove me wrong! It’s so easy! Just show me a data center announced or that broke ground in 2023 and find obvious proof it turned on. I’ll even give you credit if it’s partially open! The problem is that I keep finding examples of “partially complete” and those are the only examples of “finished” data centers.  Isn’t this a little insane? This is all we’ve heard about for years, everybody is ACTING like these things exist at a scale that I’m not sure is actually true!  I expect a fair amount of huffing and “well of course they’re coming online” from the peanut gallery, but come on guys, isn’t this all kind of weird? Even if you want to marry Sandisk and name your children “Western” and “Digital,” why can’t you say with your whole chest several data centers that got finished? We have macro level “proof” but when you try and look at even a shred of the micro you find a bunch of guys with their hands on their hips saying “sorry mate that’ll be another $4 million.”  Something doesn’t line up, and it’s exactly the kind of misalignment that happens in a bubble — when infrastructural reality disconnects from the financials. NVIDIA is making hundreds of billions of dollars and it’s unclear how much of it is from GPUs installed in operational data centers. It feels like Jensen Huang might have run the largest preorder campaign of all time.  This has massive downstream consequences. Sandisk, Samsung, SK Hynix, Broadcom, AMD, Microsoft, Google, Oracle, and Amazon’s remaining performance obligations total [find] and are dependent on being *able* to sell gigawatts worth of computing gear or compute access. If data centers are not getting built in anything approaching a reasonable timeline, that makes the future of these companies only as viable as the construction projects themselves. Even if you truly believe Anthropic will be a $2 trillion company and a $200 billion customer of Google, the compute capacity has to exist to be bought, and it does not appear to be built or, in many cases, anywhere further than the earliest stages of construction.  If they don’t get built in the next few years, there’s no space for that solid state storage or those instinct GPUs. There’s no reason for NVIDIA to have reserved most of TSMC’s capacity , either. There’s also no reason to get excited about Bloom Energy, as it’s not making real revenue on those until Oracle finishes its data centers sometime between the next two years and never .  And if they don’t get built, hundreds of billions of dollars have been wasted, with large swaths of those billions funded by private credit, which in turn is funded by pensions, retirements and insurance funds . I’ve got a bad feeling about this.  Microsoft claims to have brought around 4GW of data center capacity online in the last two years, but it’s unclear how much actually got built. In an analysis of all announced groundbreakings and land acquisitions, it appears that Microsoft has only finished the first phase of its Atlanta and Wisconsin data centers.  It is unclear where this capacity could be. When Mr. Nadella said on his most-recent earnings call that Microsoft had (and I quote) "added another gigawatt of capacity this quarter," did he mean active, revenue-generating capacity?  In the event he did not, what did he mean? How much active, revenue-generating capacity has Microsoft brought online in FY2026 so far? Outside of Fairwater Wisconsin and Atlanta, where has that capacity been built?  Microsoft’s latest update on the Hickory/Stover site is that it “will” begin “initial site setup and earthwork activities” as of February 2026, and it appears the contractor has changed from Ames Construction to Clayco. The latest Microsoft update on the Boyd Farms site is that it started construction on April 1, 2024. A February 2026 piece from the Charlotte Observer claimed it had started construction again after a 10 month (!) delay. The latest Microsoft update on the Lyle Creek site — which it adds began construction in March 2024 — is that its contractor, Whiting-Turner, “will begin initial site preparation once weather conditions allow” as of February 2026.  A press release from a Canadian satellite firm from February 2026 said that it had “identified renewed construction activity at all three of Microsoft’s permitted data center campuses in Catawba County North Carolina.” Novva’s 60MW data center in Reno, Nevada. Announced in May 2023, operational as of July 2025 , or around 26 months. Edged Energy’s 36MW Phoenix, Arizona data center that broke ground in August 2024 and opened in April 2026 , or around 20 months. Duos Edge AI’s 450KW (lol) data center in Corpus Christi, Texas that was announced in July 2025 and opened in May 2026 , or around 10 months. Edge Energy’s 24MW, Columbus, Ohio-based data center that broke ground in August 2024 and opened in September 2025 , or around 13 months. American Tower’s 1MW (scalable to 4MW!) Raleigh, North Carolina data center that broke ground in June 2024 and came online in May 2025 , or around 11 months. EdgeCore’s 36MW Santa Clara, California data center campus that broke ground in January 2023, said it would be “energized in Q1 2024,” and opened in September 2025 , or around 32 months . Edged Energy’s “180MW” data center in Atlanta broke ground in July 2023 , and around 33 months later in April 2026 ,  it managed to top off a single 42MW building . EdgeCore’s two-building, 216MW campus that broke ground in August 2023 with plans to complete “as early as late 2025” is, as of March 2026, still under construction. Edged Energy broke ground on a 100MW data center in Aurora, Illinois in May 2023 , and has, as of February 2025, successfully opened (per DataCenterDynamics) “phase 1” — 24MW of capacity — but in its own press release from the same day referred to it as 96MW , choosing not to refer to any phases or separate buildings, something it has done since before the 24MW phase was complete.  CyrusOne’s 40MW Aurora, Illinois data center broke ground in October 2024 , which was apparently so significant that CyrusOne would announce that it had broken ground a second time on January 28 2025 . Confusingly, CyrusOne has another campus it’s linking to the Bilter Road one on Diehl Road, which may or may not be the same one, and as of May 2026 is still very much under construction . As of March 2026, locals were still opposing the data centers , slowing down the process further. Vantage’s “192MW” OH1 data center in New Albany Ohio broke ground in October 2024 , with its first phase to be due live sometime in 2025. As of August 2025, Vantage had topped off the second building , and per its own website about OH1 , the first building was meant to be operational in December 2025, but it’s unclear whether it actually opened. PowerHouse’s 65MW data center campus in Reno, Nevada broke ground in October 2024 , and its website states that “delivery” will happen in April 2026, with “construction/delivery” due “Q3 2024 to Q2 2026.” Oppidan’s Carol Stream, Illinois data center broke ground in November 2024 , with the “first phase” due live in 2026. Per Clearview, it is still “ planned .” Databank’s 20MW Ashburn, Virginia “IAD4” data center that broke ground in July 2024 was “set to go live in Q1 2026,” and as of May 2026 is still referred to in the future tense on Databank’s website . Aligned’s 96MW “NEO-01” Ohio-based data center that broke ground in May 2024 was “scheduled to be opened by end of this year” as of March 2026 . Aligned’s 72MW Hillsboro. Oregon data center campus broke ground in October 2023 , topped off the first building in July 2024 (Aligned also plans a separate building, too!), and as of May 2026, Cleanview still marks the first one as “planned.” Flexennial broke ground on a Denver-based 22.5MW data center in October 2024 , and as of April 8. 2026, a local Facebook group has said that it will be operational by January 2027 .   Flexennial, on the other hand, has been referring to it as “ the new build ” — in terms that make it sound like it was built — as far back as February 2025. If hyperscalers are truly not bringing on that much capacity, they cannot make those hundreds of billions of dollars from Anthropic and OpenAI. The current “AI compute demand is insatiable” narrative is utterly false , and a direct result of a lack of capacity coming online.

0 views

How AI Productivity Fails

Most AI users today get ~10–20% more productive no matter how “game changing” they claim it is or how many lines of code they output. Yet, I still think 2x or even 10x+ is both real and reasonably expected. Real transformation requires two changes at once: personal practice and organizational refactoring. Whether the output lands as 10x leverage or 10x slop depends on the practice and the org around it. So far in 2026, I’ve seen exponential increases in output but linear increases in realized impact. This post covers some of the issues I’ve found “debugging” the issue. I’ve grouped observations and recommendations into: Personal Pitfalls — guidance for individuals Organization Pitfalls — guidance for organizations and leadership Cartoon via GPT Image 2 Personal Pitfalls You don’t shift left, so you don’t understand what you ship. AI removes the friction that used to force planning. Without something pushing back on bad abstractions, the upfront thinking gets skipped silently. You ship systems you can’t debug or extend, full of blanks AI quietly filled in. Outline first: headers, structure, audience, plan, principles, what-done-looks-like, then let AI fill it in. Review shifts up the stack: outcome-vs-plan instead of line-by-line. The interrogation belongs inside planning: spawn subagent critics to red-team the plan before you let anything generate against it. The easy review at the end is only easy because the hard review at the start was good. If you find it taxing to review your own AI generated output, you didn’t shift left . Context overhead doesn’t shrink with task size, and small tasks are mostly edges. AI handles the middle 80% of work well but can be brittle on the first and last 10%: setup, edge cases, final-mile review. With AI, often a two-line fix pays the same setup cost as a full feature. So you spend more time briefing the agent than just writing it. Even then it lacks context, ships something subtly wrong, and you redo it. Lift task ambition until the context cost is justified. The heuristic: if it’s smaller than a meaningful unit of work (a PR, a section, a chart, a campaign), it’s probably too small. Starting “small” with AI might be the reason you don’t graduate to larger leaps of AI-driven outcomes. AI scales generation, not your human-bound working memory. The ceiling on parallel agents is how many threads a human can hold without dropping context or understanding, and that number is small. You either single-thread with a coffee break while you watch it code or overshoot and abandon five threads. Stay at ~three or fewer active in your cognitive window, or close the loop (next section) and pull fully out. If you are only running a single session at a time, you are probably not delegating enough. If you are struggling to manage dozens of sessions, consider whether you even need to be “managing” them, whether sessions should take longer leaps (fewer agents that do more work), or to intentionally stay linear until you invest in the right skills and context. Closing the last 1% is verification infrastructure and an incredibly well-defined definition of done: a different discipline than generation, so people skip it and layer AI on top of existing handoffs instead. You become the screenshot human, the agent waiting on your click or upload to validate. Close the loop end-to-end: tests, queries, sandboxes, browser tools, type checks, real API responses. Whatever lets the agent see its own output and iterate without you. Before automating the surrounding process, delete it. Closed loops are how you exceed the ~3-agent ceiling: work runs underneath your cognitive window because it doesn’t need to be in it. Believe it or not, AI can handle the “fuzzy” verifications well too with the right pre-defined context, including the “does this actually make sense for long-term system architecture” and “is this the right feature to be adding to the product” questions that I often hear naively claimed as only something a human can do in the loop. Our chat interfaces often treat every session as ephemeral. Taste is a reusable artifact, but the interface gives you nowhere to deposit it. So encoding never starts, or you overcorrect and write one skill per task, ending up with a directory of one-shots nobody else can use. Build skills for the class of task, not the instance. Some are specs (how to do a thing); others are principles (how to think about a class of things). The durable artifact is the skill (often literally some markdown file), not the prompt. The recognition signal is concrete: any time you find yourself editing or “bullying” the output for a better answer, that’s a rule you can codify once and stop bullying forever. Consider even meta skills that regularly take feedback (from you or across the users of a process) and make the right skill or context modifications from accumulated learnings. Refuse to edit an AI output manually (i.e. typing over it literally) or in a way that doesn’t feed into learnings for next time. Tell it why it’s being dumb, how you think, and make sure that sticks for next time. Skill comes from cognitive struggle, and AI removes the struggle by completing the thought before you’ve had it. The learning loop never closes—you can’t tell when the model is wrong, you can’t operate without it, the domain skill quietly atrophies. You grow through resistance: edit, interrogate, override. Bootstrap by using AI on tasks where you’re the domain owner, so the friction is real and the corrections you push back are correct. Juniors get hit hardest: they offload cognition before building the capacity to evaluate output. The people who can evaluate (taste-holders, domain owners) need to be the ones encoding skills and quality-bars even if traditionally the Super Senior ICs and managers used to not touch the codebase. No task should be permanently too hard for an AI system and as it learns, human effort moves up the stack to architecting the system AI builds: a place where AI is genuinely worse and the friction lives. AI will get good at that as well, so you just keep moving up to harder and harder meta derivative skill building . Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. The personal pitfalls above and the organizational ones below are one problem at two scales. AI optimizes individual roles but leaves the process that constrains them intact. Personal practice is where you collapse the steps; organizational design is where you collapse the handoffs. The gains only show up when there’s shared ambition large enough to do both. Usage is easy to measure; impact is hard. Managers and leadership often carry the cultural pressure to praise visible use over invisible value. Tokens land in perf reviews and in the next cycle Claude loops get left running to inflate counts. Teams ship new AI-shaped systems instead of fixing the existing ones that matter. Reward what shipped, not what got used. Track usage as a leading indicator for enablement and resistance (where to focus training and tooling, especially early in a rollout), but never as the long-term goal. Critically, and this is where I’ve seen the most confusion, pure short-term impact with no recurring AI-powered leverage (often) does not maximize long-term business goals. Weight outcomes by the reusable leverage they leave behind: closed loops, AI-friendly architected systems, codified skills, shared context that make the next ship exponentially cheaper than this one. Build cost used to be the de facto curator. Only worthwhile tools got built. AI removed that filter without replacing it, people keep shipping, and not everyone has the right taste filter. Discovery becomes harder than building another tool, and context gets inconsistently duplicated across teams and roles. An architect-owner at the top should be accountable for the taxonomy across what you build, what you buy, and which providers you standardize on. Pilots at the bottom—small, scoped, not broadcast prematurely—are how new tools earn graduation, with telemetry and explicit retirement criteria so dead tools get pulled. Consolidate tools where you can and enforce a maintained shared context layer they pull from. Authoring used to require the expertise being authored. AI broke that coupling, so production no longer signals authority. You get ten invokable ways to do anything, knowledge fragmented across wikis and CLAUDE.mds, personal skills bleeding into shared with no quality bar. A top-down architect or domain-expert should decide the core skill set and assign ownership to the taste-holders who’ll build it. If the people who know what good looks like aren’t writing the skills, the skills are mediocre by default. Personal vs. shared context is an explicit split: personal skills can be loose, shared skills are operating practice and have to be built like one, with review and a real quality bar. Generation outpaces review, AI doesn’t progressively disclose, and authorship gets fuzzy when “the prompt” wrote it. Long docs nobody reads, PRs reviewers can’t keep up with, slop ships because no one wants to compromise “productivity”. Hold people accountable for the artifact even when AI generated it; build pushback culture with harsh, specific feedback when output crosses into slop. Use AI and assume that at the end of the day all content will be AI-generated. And despite this, you have to understand what you ship well enough to defend it under questioning. Reviewers should refuse to be the debugger of last resort. People shipping AI output need domain ownership themselves with consistent reliance on skills built and maintained by domain owners. Most orgs are organized by function, so getting anything done means handoffs. Coding was always ~20% 1 of the cycle; the other 80% (approvals, reviews, syncs) was the rest. AI (if you are doing it right) compressed the 20% to near-zero, leaving the 80% as the entire bottleneck. A 5-minute fix sits 3 days in review; sync meetings spring up to unblock work AI already finished. Loop ownership should replace function ownership: one person closes the chain from problem to deployment, with the right guardrails so they can move without sacrificing function-level taste. Specialists shift to platform—encoding their taste into the systems, prompts, and context that loop owners’ agents use. Bottom-up speed alone hits a wall here; the org has to reorient around loops to absorb the acceleration. I call this transposing your organization. Mandates transmit behavior, not taste or judgment. Without ground truth at the top, each layer strips intent on the way down. Engineers get pushed into reviewing AI slop instead of being elevated into architects of it—the job becomes downstream cleanup, not upstream design, and replacement fear takes root underneath. Mandate is still a powerful lever. What’s often missing is clarity: explicit expectations, updated role-definitions, and the why behind the importance and urgency of AI adoption. Be intentional about when top-down is the right lever and when bottom-up is. And don’t kill the fun of building. People like building things and you want people to like what they do — it’s the top-down’s job to make sure they are building useful things. Bottom-up energy needs a target to compound against. Without a shared definition of return, every team optimizes locally and the gains never aggregate. Usage looks great while token spend decouples from business outcomes, and sprawl goes unchecked. Make ROI legible: every team articulates return on its AI investment, even crudely. Wire working behaviors—closed loops, codified skills, outcome-aligned spend—into career pathways so the right behaviors get rewarded structurally, not just culturally. The 2x version of you isn’t a token-count away. AI multiplies what you already do well and whatever your org already enables. If your practice is loose, AI compounds the looseness. If your org runs on handoffs, AI accelerates the frequency of handoffs. Both have to change at once, or neither change matters. The 10–20% is free at this point. Anything past that is rebuilding: personal practice on one side, organizational design on the other. I think most folks still have an uncomfortable amount of self-refactoring left to do. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. This post as a slide. Trying this out using GPT Image 2 in case folks find it useful. This is a made up strawman number Claude suggested. Pick whatever x% and 1-x% you’d like for whatever AI-augmented task you care about. Personal Pitfalls — guidance for individuals Organization Pitfalls — guidance for organizations and leadership

0 views
David Bushell 3 weeks ago

Unscrewing lightbulbs

Giving lightbulbs a MAC address was a mistake that I’m living with. I’m literally unscrewing lightbulbs to renew their DHCP lease @dbushell.com - Bluesky Instead of enjoying the bank holiday Monday I updated my homelab software. I was ‘inspired’ by the Copy Fail Linux bug to run full distro upgrades. This is my self-hosted update for Spring 2026 (rough documentation to give future me a chance). Monday’s fun risked a week of pain. I do have backups but restoring them on a broken LAN is tricky. I have an ISP provided wifi router to dust off in an emergency. Along with an absurdly long 15 metre HDMI cable I do not care to unravel. My winter update added a hardware fallback but that too requires careful rejigging. I have Proxmox hosts, virtual machines, and Raspberry DietPis . They were all on Debian 12 (Bookworm) with a kernel potentially susceptible to the bug. Minimal Debian installs are perfect because I run everything in Docker anyway. Data volumes are easy to backup or network mount. I can change host at will for any service. Debian is just sensible, well documented no-fuss Linux. I used to run “minimal” Ubuntu server. Following 24.04 I found myself debloating most of the Ubuntu part (i.e. snaps). It sounds like the new coreutils are a CVE party . Glad I escaped before that drama! As it happens, this week’s Linux Unplugged episode had Canonical’s VP of Engineering spewing embarrassing AI platitudes. “Ubuntu is not for you” was the only thing said worth remembering. I updated most of my VMs first because they’re easy to restore if anything fails. I followed Lubos Rendek’s guide . Start with a full package update and then change the package sources before running another step-by-step upgrade. The only non-Debian sources I have are Docker and Tailscale. Yes that means I run Docker inside Proxmox VMs — and you can’t stop me! That’s not even my worse crime… After the Trixie upgrade I found VMs were failing to obtain a LAN IP address. The virtual network device had been renamed from to . I edited and just changed the reference. There is surely a better/more predictable fix but this was the quickest. The same name was used across all VMs so I guess 18 is the magic number. Everything has been stable so far. If issues arise I’ll just nuke and pave from a Debian 13 ISO. Docker config and volumes are backed up independently of the VM images. DietPi has a long Trixie upgrade post I didn’t read. I just curled to bash: I gave the script a cursory glance before hitting enter. I have a Pi 4 running failover DNS and a Pi 5 running my public Forgejo instance . DietPi is ideal because of the tiny footprint; I run Docker here too. Raspberry Pi still hasn’t merged upstream Copy Fail fixes. I’m already in trouble if this bug can be exploited but I did the temporary fix out of caution. I wasn’t going to bother with Proxmox 9 but after a GUI update I was informed version 8 “end of life” was August 2026 . That is soon! I followed the official upgrade guide on my Mini-ITX server . Proxmox has a tool to check compatibility. I saw no red lights so I stopped all VMs, updated package sources to Trixie, and ran the upgrade. It is critical to run again before rebooting. I ran into the systemd-boot issue . Apparently if this is not removed the system fails to boot. If my particular box fails to boot I’m in big trouble because I broke video output and have yet to fix it. I have another Proxmox machine running virtualised OPNsense for my home router. I can’t stop the OPNsense VM and upgrade the host to Proxmox 9 because the host would have no network access. I had two options: I specifically set up option 1 for such a purpose. I went with option 2. I figured any software running in memory is still alive until I reboot, right? I didn’t question whether Proxmox would kill any processes itself (it didn’t). The update was suspiciously fast. I ran again and saw a lot of yellow warnings. Yikes. Eventually I noticed I’d failed to update some sources to Trixie and I’d installed a franken-distro. After fixing mistakes all I could do was reboot and pray for an agonising two minutes. OPNsense is the only non-Debian operating system in my homelab. I manage it entirely via the web GUI. The 26.1 update had quite a few significant changes. My DHCP setup was considered “legacy” and my firewall rules required a manual migration. Despite dumbening my smart home my lightbulbs still demand a WiFi connection. I program them myself to avoid Home Assistant and proprietary apps. Turns out I hard-coded IP addresses (discovery protocols are a joke.) Despite having dynamic IPs they remained stable until the OPNsense 26.1 DHCP update. I had no easy way to identify each light. Why would they name themselves anything useful? That’s how I ended up unscrewing the bulbs one by one to see which MAC address fell off the network. I gave them static IPs on a VLAN for future me to appreciate. And with that, my home network is up to date! Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds. Use my failover VM YOLO it live

0 views
Stavros' Stuff 3 weeks ago

Adding a feature to a closed-source app

I use Audiobookshelf (abbreviated ABS) for all my legal audiobooks that I bought legally, and I really like it. I also use the Smart Audiobook Player (abbreviated SABP) Android app, which I also bought (legally this time) to listen to books, because it has the strongest featureset out of all the apps I’ve tried, particularly when it comes to navigating around books. Unfortunately, there’s one problem: SABP can’t synchronize my reading progress with the ABS server, which is inconvenient for me. I use SABP when cycling or walking, but use other apps that integrate deeply with ABS (mostly Lissen and ABS’s own app) on my car’s Android console, and the lack of syncing between the two is a major pain. The ABS-compatible apps are mostly open source, and what better way to contribute to open source than to submit some patches that add the features I like? “However”, I thought, “why not not do that, and instead see if I can add Audiobookshelf syncing to the app?” “Yes”, I decided, “this sounds reasonable, despite SABP being a closed-source Android app, a platform with which I have zero familiarity”. What I do have familiarity with, though, is telling Claude what to do and steering it along. Therefore, I decided I would do the impossible , and use LLMs to add ABS syncing to SABP ! The first step was to see whether this is possible at all. Android apps come as APKs, which are just zip files containing bytecode. The first thing I did was to ask Claude to decompile the app (even though I didn’t really know if that was possible, or how it was done). Luckily, all this required was to run and on the files in the APK. is a utility that turns bytecode into a textual representation (called smali) so that it can be edited. This is a lossless, reversible process (which means you can edit the resulting code and recompile it back into the app), but the textual representation is basically assembly, and pretty hard to work with. , on the other hand, decompiles to (hopefully) readable Java, but is useful only for illustration; you can’t recompile it back into an app, and you can’t really edit it in any way. Some developers use obfuscation tools (like ProGuard) to make their decompiled code much more opaque and hard to read. So, the question at this stage was whether the app could be decompiled, and how readable the resulting output would be. Running the tools gave some promising results: The app was fairly readable, with even human-readable class names having been partially preserved! A lot of the code was obfuscated, with names like , , , but I lucked out and enough relevant code was readable that I didn’t have to spend hours piecing things together. This was encouraging, but I still didn’t know whether I could easily inject syncing code into the app. To begin my due diligence, I asked Claude to trace whether there was a point where we could add a hook to send our position to the server. After a bit of digging around, it discovered that one function, , was being called by every code path that saved progress to disk: regular ticks, pauses, file changes, backgrounding, they all saved progress using it. The existence of this code path was a stroke of luck, as it meant that I had found a natural point to hook my progress updating into, but Claude did a lot of work to verify that the code paths actually converged. This was great, we found a single spot where we could hook things, but how could we do the hooking itself ? We can’t edit or recompile the decompiled Java, and smali, which we can edit and recompile, is a real pain to write anything significant in. Still, though, the impossible was slowly drifting within my reach. The second part of due diligence was to see for myself how the ABS API worked, so I knew what to send in the payload if I ended up being able to hook into the syncing. I sent a few requests by hand, but kept getting some weirdness. The times I was submitting didn’t match what I was getting back, and the progress indicator was out of sync with the submitted position in seconds. This was surprising to me, because I know ABS progress syncing works fine with other apps. After some trial and error, I realized that during my testing I had accidentally set to on the book I was testing with, and ABS was resetting the progress when the book transitioned from “finished” to “not finished”. This is a surprising thing to happen, since I’d expect the server to reset when I’m going the other way (i.e. when I finish the book), but I guess the rationale is that I’m starting the book fresh if I mark as on an already-finished book. When I used a non-finished book as the target, the API started responding reasonably, and I had all the info on the endpoints I needed, with their payload shapes, which I gave to Claude. It’s important for me to do this sort of experimentation myself, as often edge cases will be hiding in these API contract boundaries, and I want to build a good mental model of how the change will work before I ask the LLM to implement it. Having the API calls was good, but writing smali code to perform an HTTP request and send/receive JSON would still be taxing work, even for an LLM, and I couldn’t really help here. Luckily, Claude knew that Android makes modding significantly easier than other platforms: We didn’t have to write smali at all! We could write all the syncing code in bog-standard Java, compile it with into bytecode, create the necessary file with (which ships with the regular Android SDK!), and put that into the tree. Then, we just needed a tiny bit of smali code in to jump to our compiled Java code, and everything should work: This works because Android itself natively supports multiple files in one APK, so you don’t have to hack around anything. The investigation was finished, but now we also needed to actually build the thing (an affair whose success was still not guaranteed). Writing the code for this and compiling it into an APK was all Claude, with steering from me. You can read about my exact LLM workflow in my recent post , but it roughly consists of planning (using ticket to write… tickets), implementation, and review steps. Claude discovered that apktool 2.7.0 doesn’t like $-prefixed filenames in the resource table, and decided to use the original manifest, which was fine because we weren’t using custom resources. It also caught a timing bug in the smali patch, where it needed to call a function after another one was run, otherwise the BookData field would be stale. These issues did affect the final implementation, and I was relieved that Claude is smart enough to catch and fix them. Claude did a lot of heavy lifting here, and we ended up with ~550 lines of Java, and some smali magic with to jump to our Java code. The code review phase was all LLMs (Opus 4.6/GPT-5.5), and it’s a step I never skip, as I’ve found that it catches most of the bugs. In one case, Claude had written thirty lines of reflection code because it assumed a setter didn’t exist. The reviewer caught that the setter existed, and had Claude use it directly and remove the superfluous code. This is a pattern I see very frequently in LLM-assisted development, where one model will have big blind spots, leading to bugs or departures from the desired functionality. A second review pass with another model generally fixes this, though I’m not sure whether it’s because of different models spotting different things (like “you can’t spot your own typos” for LLMs) or because a second, focused review pass makes the model pay more attention. I suspect it’s a combination of the two. The reviewer also caught a mistaken compression of the resources file, which would have caused the APK to silently fail to install on my device, even though it looked fine. There was also a race condition that was flagged and fixed in this step, and an instruction to clamp the end timestamp to the book’s length, though I would hope that this check happens on the server too. The codey bits having been done, I had to decide how to handle book matching and server configuration. I needed to make a decision on two things: There were a few options, one of them being adding an “Audiobookshelf” section to the settings, and adding the server’s hostname and API key there, but this was too much work, especially trying to find call sites to patch into existing screens. For the book matching, Claude recommended that we do a lookup of the book by name every time we loaded progress, but that was brittle and would break with more than one book of the same name. I decided to use a config file in the book directory, which was a simple JSON file that looked like this: This way, the app could load everything it needed with minimal fuss (the Java code could simply read this file at startup). There was something that Claude didn’t catch, and actually recommended the opposite: Its advice was to only send the timestamp to the server if it was later than the server’s timestamp (ie if it was later in the book). I pointed out to Claude that this would create a significant problem where, if you seeked to a later position for some reason, you’d never be able to come back from it. The app would keep syncing your position to the later one when loaded, and never update the server’s timestamp, effectively not only invalidating the syncing, but also forcing you to remember your position manually, which is quite a big regression from current functionality. This bug would also cause other apps to get their position overwritten with the later one every time SABP loaded. Claude quickly agreed that this was an issue, and changed the code to sync all seeks. Testing it out, I realized that Claude never retrieved the book’s position from the server at all. I pointed out here that this was necessary to avoid clobbering the position in other apps, because I might use Lissen (and progress there), go back to SABP, and have my (true) progress overwritten by the old position. This was a serious data loss issue that the LLMs completely missed, both in planning/implementation and in review, and an issue that human involvement solved. The code was now in good enough shape to actually try out, which led to another problem. Android, like basically any modern platform, requires apps to be signed by the developer before they can run. Unfortunately, I’m not the developer of SABP, which means I didn’t have access to the key used to sign the app. This isn’t a big obstacle, since apps can be signed by any key (though Google is trying to force us to show them ID to run our apps on our devices), so I just created my own key and signed the recompiled APK with it using . Unfortunately, this does have one downside: The resigned app can’t be installed over the old one, you need to uninstall the old app (and probably lose data) and install the new one again. I opened it up, I started playing a book, and verified that the ABS server position got updated. I didn’t even lose any settings, because SABP keeps its settings in a file next to the audiobooks, which wasn’t deleted when uninstalling. Modifying the application to add the feature I wanted worked fine, and, with the increased skill the LLMs gave me, the lack of source access didn’t block me (it merely posed a sizable problem). However, there was still significant friction (what with the decompile dance, smali, figuring out call sites, etc), and I got very lucky that the code wasn’t more obfuscated. Even after the functionality has been implemented, though, I can’t share the output, both because of potential legal issues and because it’s just a hassle and will break every release. The journey was fun, and having an app that works how I want it is helpful, but there’s a wider point: Before LLMs, the code’s license didn’t matter much for end users wanting to modify their software. Whether the source was open or closed, the biggest reason people didn’t mod their software was just that they didn’t know how to . LLMs have expanded the candidate pool, and, now that many more people can write code that works, the availability of the source is the most important hurdle. The set of people who can now modify their software has increased by orders of magnitude, and includes people who always had good ideas, or good product sense, but didn’t have the skills to make them a reality. In this example, the feature I implemented will be used by me, and basically nobody else, because closed-source software has close to no mechanism for change ingestion. Open source software has always had concrete ways to accept contributions from others, you’d simply make the change you wanted and submit it to the maintainers for inclusion/rework/feedback. This contribution process is even more important now that code can be generated orders of magnitude more cheaply, and the fact that it exists is an important advantage that open-source software has over closed-source. When starting out, I thought this would be impossible, but each step turned out to be very doable. Where a few years ago only a handful of people could reverse engineer an app, now it’s within reach of the average developer with a free afternoon. I’m really happy about the way this feature turned out, but this adventure only made me realize that open source software just aligns with my interests so much more. I’m going to do what I joked I wouldn’t at the start of this article, and switch to Lissen as my audiobook player. I hadn’t used it in a while, but, while writing this post, I fired it up again, and it seems to have gained a few features, plus it’s always been very well-designed and looks great. I guess I’m not going to need SABP any more, but, well, the journey is the destination. The hostname and API key of the ABS server. The ID of each book on the server, so it can submit progress to the specific book without having to rely on name matching.

0 views

Model-Harness-Fit

Why mixing a frontier model with a foreign harness quietly tanks performance, and what the open source code tells us about why. I keep three coding agents alive on the same workstation. Claude Code in one terminal. Codex CLI in another. GitHub Copilot CLI in a third. Same files. Same git tree. Same bash. Three different harnesses that look indistinguishable. A few weeks ago I ran the same prompt through all three and the behavior was visibly different in ways that went well past the surface differences of style and speed that I had expected to see across vendors. The Codex run cited a memory entry I had taught it months ago, applied the rule, and kept going without asking. The Claude Code run flagged the same context but refused to assert it without first verifying that the file path was still valid. The Copilot CLI run produced a longer, more cautious plan and asked me to approve it before taking any side effect on disk. The hand wave answer is that "models behave differently because they are different models." But Copilot CLI was running Claude Opus, the same family that Claude Code runs by default. Same model family, same prompt, two harnesses, materially different output. The hand wave does not cover it. Models are post trained against the harness, not just the API. The tool names they expect, the input schemas they emit, the citation tags they wrap around remembered facts, the file structure of skills they invoke, the planning protocol they follow when the harness says "make a plan first" (none of these are generic capabilities of the model). They are byte level conventions baked into the post training of one specific model against one specific harness. Pull the model out of its harness and you give up performance you cannot get back without rewriting either side. This has a direct consequence that anyone who has tried to ship a "model agnostic" agent has run into. You cannot just swap a model. Supporting BYOK and multi model (which is the responsible posture, since relying on a single provider is risky) adds real engineering complexity, and that complexity is worth paying. To swap a model cleanly, you have to swap the harness with it: the tool surface, the schema shapes, the skill bodies that name those tools, the citation contract, the memory ritual, the system prompt structure, sometimes the planning protocol. Everything above the model has to move when the model moves. That is why every agent vendor that supports multiple providers ends up either (a) running a degraded variant of every model they support, or (b) maintaining a separate full stack per model and exposing the choice to the user as "you are picking a product, not just a model." Option (b) is the path that wins on quality, and it is worth the engineering cost to avoid being locked into one lab. Swapping orchestrators is not a cosmetic change. It is a model swap in disguise. The frontier lab spent the last year shaping the model's instincts to a particular tool surface, a particular memory ritual, a particular skill format. When you mix and match, you spend that work. I think this is the single most underrated constraint in agent design today, and it has a clean name. Call it model harness fit . I dug into three open implementations that ship today: Codex CLI (OpenAI, fully open source at , Rust workspace, ~80 crates), Claude Code (Anthropic, closed binary, but a Rust port called at tracks upstream behavior closely enough to read at ~48,600 LOC across 9 crates, and Claude Code's own runtime injects observable blocks on every turn that confirm or contradict claims from the port), and GitHub Copilot CLI , where the SDK is fully open source MIT licensed at with five language bindings (Node.js TypeScript at 5208 LOC across 8 files, plus Python, Go, .NET, Java), and the JSON RPC wire protocol is documented at (currently version 3). The CLI binary that the SDK spawns as the agent runtime server is closed, but the client wrapper, the protocol, the session lifecycle, the system prompt section overrides, and every RPC method are all open source and readable. Here is what I will cover: Companion piece: I covered the memory layer in detail at Agent Memory Engineering . This article is about everything else, with memory revisited only where it intersects orchestration. If you want the bottom up tour of how MEMORY.md indexes, system reminder injection, age in days warnings, and signal gates work, read that one first. Before any argument about architecture, look at the leaderboard. Terminal-Bench 2.0 evaluates agents on bash heavy multi step tasks, and it ranks by harness plus model pair, not by model alone. From on April 30, 2026: Two things jump out. First, Claude Opus 4.6 paired with ForgeCode hits 79.8%, while the same model paired with Capy hits 75.3%. Same weights, different harness, and a 4.5 percentage point spread between them on a benchmark where every entry is fighting for a tenth of a point. Second, the upper rankings are not dominated by the labs that trained the models. ForgeCode is a third party harness that lands three of the top six entries by routing across model families. Stanford's IRIS Lab paired Opus 4.6 with an automated harness evolution system called Meta-Harness and pushed the same model to 76.4% on the same benchmark, well past the best baseline they started from. The harness is moving the score by more than the model upgrades are moving it. Cursor's research team makes the point even sharper. In their April 30 post on harness engineering, they note that they took their own coding agent from "Top 30 to Top 5 on Terminal Bench 2.0 by only changing the harness." Same model. Same benchmark. Different scaffolding. A 25-position jump on a public leaderboard, attributable to the harness alone. That is not a tuning artifact. That is the entire ranking. LangChain's Vivek Trivedy puts the same observation in one sentence: "Opus 4.6 in Claude Code scores far below Opus 4.6 in other harnesses." Anthropic's flagship model in Anthropic's flagship harness loses to the same weights in third party scaffolding. If you only saw the model name on the spec sheet, you would not predict that. This is the empirical case for model harness fit. Hold the model fixed and swap the harness, and the pass rate moves by enough to outweigh a model generation upgrade. Anyone shipping a coding agent in 2026 who picks the model first and the harness second is leaving most of the performance on the floor. The rest of this article is about why. What exactly does the harness do that lets two implementations of the same model produce different scores? Each harness picks a different orchestration protocol. The model was trained on that protocol's exact wire format. These are not three implementations of the same idea. They are three different contracts between model and runtime. Codex is a typed asynchronous protocol. The model emits a with an and gets back a stream of typed messages. The protocol is defined at with explicit enums. There is a second protocol layered on top: is 10,721 lines of JSON RPC for cross process clients (IDE plugin, desktop app), where v1 (245 lines) is frozen and all new RPCs go to v2. Methods are named with singular resource names, camelCase wire format. The two protocols stack: agent layer for in process, JSON RPC layer for cross process. The model was trained to emit submissions and consume events. Claude Code is a direct typed conversation loop. The runtime's consumes a per turn from . variants are , , , , and . There is no separate submission queue. The protocol is the Anthropic Messages API plus a tight in process tool dispatcher. The model was trained to emit tool calls inside an assistant message and respond to tool results in the next turn. GitHub Copilot CLI is a supervisor protocol. The host app does not run the agent loop. It spawns the bundled binary as a subprocess, opens a channel over stdio, and sends with the full configuration: model, system message, tools, MCP servers, custom agents, skill directories, hook flags. The agent loop runs inside the child process. The host gets notifications back. The model was trained to run inside this supervisor and emit JSON RPC events that the supervisor can route. You can see the architectural commitment harden in each design. Codex's literally polices crate growth: "Resist adding code to . The largest crate is explicitly off limits for new features." A 500 line soft cap, 800 line hard cap per Rust module. New features pay rent in the form of a new crate. This is a compiler toolchain attitude applied to an agent harness, and the model was trained to operate inside it. Claude Code's port enforces a different rule: "one agent loop, not a fan out of specialized agents," which is why subagents in Claude Code start with a fresh context and cannot recurse. Copilot CLI's supervisor model is what lets a single binary serve three surfaces (terminal, cloud agent, third party hosts). Each surface gets the same model behavior because the model is always running inside the same supervisor. Now imagine you swap models. Take a model trained to emit and feed it Claude Code's stream. The model has been taught one wire shape. The harness expects another. The mismatch shows up not as an outright failure but as a quiet degradation: missed tool calls, wrong reasoning effort levels, inconsistent compaction triggers, citation tags that the harness never parses. The wire format is part of the model. This is where post training is most visible. Every harness has a tool registry. The names look similar at the top: , , , , . But once you go past the first six, the surfaces diverge in ways that the model has been taught to exploit. Codex's exposes a particular vocabulary: Claude Code's port enumerates 40 specs in : Copilot CLI bundles a different default, drawn from the public changelog: A model trained on Codex's eight verb subagent surface knows how to send a message to a running subagent. A model trained on Claude Code's tool does not have that verb in its instinct set. The harness can paper over this with a router, but the router cannot give the model an instinct it does not have. Cursor's harness team puts the underlying mechanic plainly. From their April 30 research post: "OpenAI's models are trained to edit files using a patch-based format, while Anthropic's models are trained on string replacement. Either model could use either tool, but giving it the unfamiliar one costs extra reasoning tokens and produces more mistakes. So in our harness, we provision each model with the tool format it had during training." This is the single cleanest description of model harness fit I have seen from any vendor, and it is not a hand wave about model preferences but a specific measurable cost in reasoning tokens paired with an observable increase in error rate, recorded at scale across millions of agent turns in production. This is where model harness fit shows up most visibly. The tool surface is the model's vocabulary for the world. Cross train on a different vocabulary and you lose precision in every interaction. Skills look interchangeable on the surface. All three harnesses use a file with YAML frontmatter ( , , optional metadata). Codex even baked in cross compat: parses Claude style markdown skills. Copilot CLI explicitly reads config. The format is so similar that the same body would parse in all three. But skills are not just markdown. A skill carries an implicit contract about which tools it expects to call. That contract is not in the frontmatter. It is embedded in the body, in the form of imperative instructions that name specific tools by name, with specific argument shapes, and with specific verbs the model must emit. Look at what each harness ships as a system skill. Codex's bootstrap skills, baked in via and extracted to on first launch, are five: , , , , . The body invokes and as scripts ( ). It assumes the model can call to run a Python script. It assumes the model knows that scripts in of a skill folder are invokable. It assumes a sparse checkout fallback for private repos. None of that is in the frontmatter. All of it is in the body. Claude Code's skills are different. The plugin ships , , , , , plus many more. The bodies invoke Claude's specific tools: to bootstrap into a workflow, to track steps, to dispatch parallel subagents, / for file changes, / for search. The skills also encode hard process rules: "Use this BEFORE any creative work," "Use when about to claim work is complete." These rules anchor on the harness's injection model, which Codex does not have in the same form. Copilot CLI's skills are part of the plugin marketplace ecosystem, and the changelog reveals a different posture. v1.0.5 added "Embedding based dynamic retrieval of MCP and skill instructions per turn" as experimental. The model was trained to consume skill instructions delivered as a per turn injection chosen by an embedding ranker, rather than as a description match. A skill body that assumes "you will see all skills in the system reminder" does not behave the same way when the harness ranks skills via embedding and only injects the top three. This is why "we both use SKILL.md" is misleading. The format is identical; the contract underneath is not. Skills carry tool specs implicitly, and the implicit specs are pinned to the harness that authored them. The same applies to plugin manifests. Copilot CLI's v1.0.22 explicitly added: "Plugins using or manifest directories now load their MCP and LSP servers correctly." That is GitHub treating Claude Code's plugin format as a substrate to interoperate with at the file level. But the skills inside those plugins still bring assumptions about Claude Code's tool surface. Loading the file does not give the model the right vocabulary. The lesson generalizes. A skills marketplace that claims to be cross harness is a routing problem, not just a parsing problem. Each skill needs to either declare its target harness explicitly, or get rewritten per harness, or run inside a router that translates tool calls between dialects. None of these are free. I covered memory in detail in Agent Memory Engineering , so I will keep this section to the parts that matter for harness fit. Three memory architectures, three different bets: The architectural choices already differ. But the harness fit story is sharper than that. Each model was trained to write memory using a specific tool with a specific schema, and to cite memory using a specific tag with a specific format. Codex's model writes a structured raw memory artifact via Phase 1 extraction with a strict JSON schema: The Phase 2 consolidation prompt is 841 lines. . Schema validation rejects malformed output at parse time. The model citations are wrapped in blocks. The harness has a parser at that increments in the SQLite state DB whenever a citation arrives. This is the model's memory ritual. Strip the citation tag and the harness loses its decay signal. Claude Code's model writes memory using the standard and tools, into one file per memory under . There is no separate memory tool. The model picks one of four types ( , , , ) by file name prefix. The body uses a convention for behavioral rules. The harness wraps every body read in a block with the dynamic age in days and a verification reminder. The model was trained to read memory through that wrapper, weight it accordingly, and skip stale claims. Copilot CLI's model invokes as a dedicated tool. The body of the memory goes to a remote backend. Cross session memory was added in v0.0.412 as experimental. The retrieval surface is a server side query, not a local grep. The model expects the backend to be there. When the backend is unavailable (v1.0.23 fix), the agent used to hang on the first turn. That is a load bearing dependency. Now mix and match. Run a Codex trained model on Claude Code's harness. The model will look for a memory write tool, find , and write a file — but it will write a file in Codex's structured format, with headers and annotations, into a directory that Claude Code does not auto load on the next session. The harness does not know to inject the index. The next session does not see the memory. And critically, the model will emit blocks that Claude Code never parses. Memory effectively does not exist on the next turn. Run a Claude trained model on Codex's harness. The model will not emit citation tags. Codex's decay signal stops incrementing. Memories that were used silently rank below memories that were not used, because the harness sees zero citations. Within a few weeks, the wrong memories are getting evicted. Run either on Copilot CLI's harness with the remote backend. The model's local file instincts do not transfer. The tool is the only path, the schema is different, and the cross session retrieval is keyword search against a server, not the always loaded index plus on demand body read pattern that the model was trained on. The first turns will look fine because the model has memory shaped instincts. The retention will be different. The memory layer is the densest collision surface for model harness fit. Tools, schemas, citation tags, decay signals, retrieval rituals — all of these are coupled, all of these were learned together during post training, and none of them transfer cleanly when you swap one side. The tag is a microcosm of the larger problem. Codex's model emits a small XML block at the end of an assistant message whenever it pulled in memory: The harness has a parser that strips the block before showing the assistant message to the user, and uses the parsed to bump and columns in . The parser is at . The SQL is in migration : This is the model's contract with the harness. Cite what you used. The harness will reward what you cited by keeping it alive. The Phase 2 consolidator ranks memories by and decays anything with no citations and no fresh after 30 days. Claude Code's model has no equivalent citation tag. The harness does not need one because memory is read via the standard tool, and the agent's verification grep is what doubles as the "I used this" signal. The reminder text in front of every body read explicitly tells the model: "Records can become stale over time. Verify before recommending." There is no decay loop because the harness assumes the user will prune or the verification will fail in place. Copilot CLI's model talks to a remote memory backend. The store, retrieve, and rank logic is server side. The model does not need a citation tag because the backend tracks reads on its own. Now look at what happens in a cross harness run. A six character XML tag becomes the difference between a memory system that improves with use and one that degrades silently. This is what I mean by "the wire format is part of the model." The citation tag is not a feature on a roadmap. It is a habit the model picked up during post training, and that habit only pays off inside the harness that taught it. The Copilot CLI SDK exposes its system prompt as a structured object with ten section IDs. Hosts can override each section, replace it, or take full control. From the open source TypeScript at : This is not just a documentation surface. It is the public contract of the model's training distribution . Each section has a specific role, and the model was trained to read each section as a particular kind of instruction. The section is harder than . The section is consulted when the model is mid tool call. The section is what the model reads right before emitting a turn. Codex has its own equivalent, less explicit. The developer prompt is assembled in this order: Memory comes after policy and identity, before behavioral overrides. The model was trained to read this exact order. Claude Code's static prefix: A different shape, a different ordering, and a different set of precedence claims about what the model should treat as binding. The Claude trained model knows that instructions "OVERRIDE any default behavior and you MUST follow them exactly as written." That phrase lives inside the harness rather than inside the model itself, but the model has been trained to recognize the heading and treat its contents as binding. A model trained against this prefix will hunt for and react accordingly, while a model trained against a different prefix simply will not see the heading the same way and will give it the weight of any other piece of context. This is the same lesson as the citation tag, scaled up. The system prompt is not generic. It is a structured artifact with section conventions that the model was taught to read in a specific way. Swap harnesses and you keep the model's reading habits but lose the structure they apply to. GitHub Copilot CLI is the most interesting harness in the comparison because it explicitly tries to route across model families. Sonnet is the default. The picker exposes Sonnet, Opus, Haiku, and the GPT 5.x family. v1.0.32 added an mode that selects per session. How does Copilot CLI handle the model harness fit problem? Looking at the changelog, the strategy has three legs. The tool is included only when the active model is from the Codex family . v0.0.366: "Codex specific patch toolchain." The harness knows which models were trained on and only exposes it to those models. Anthropic models get the and shape they were trained on. This is not a translation layer. It is a per model tool surface. The router does not pretend and are the same operation. It serves the right tool to the right model. v1.0.13: "Tool search for Claude models." The implication: Claude trained models expect a deferred tool loading pattern via . The harness only exposes the discovery loop to those models. OpenAI trained models do not get the same loop. They get the full tool list up front because that is what they were trained on. v1.0.18: "New Critic agent automatically reviews plans and complex implementations using a complementary model to catch errors early (available in experimental mode for Claude models)." The Critic is a different model than the main agent. Plans get reviewed by the complementary model. This is multi model orchestration baked into the harness, and the routing is explicit. This is what a real router looks like. Not "translate everything to a common dialect," but "serve the right dialect to each model." It is more code, more state, more telemetry. It is also the only way to get top performance from each model. The cost of this approach is honesty. The harness has to admit that "Claude on Copilot CLI" and "GPT on Copilot CLI" are different products. The user picks one or the other and gets different behavior. There is no neutral common denominator. This is the right honest answer to model harness fit, and Copilot CLI is the only harness in the open or semi open set that actually ships it. The strategic logic is worth naming clearly. Multi model is the crucial bet for any serious agent platform in 2026 , and at GitHub and Microsoft we made that bet deliberately and early. Most customers are running multi model workflows whether their vendor admits it or not, and the only way to give every model its best performance is to build the per model routing surface inside the harness itself. We committed to that answer up front, which is what positions Copilot CLI to keep pace with whatever the labs ship next without having to redo its core architecture each time the leaderboard reshuffles. The matched pair is the unit of analysis, but the matched harness across many models is the unit of platform, and that is the level we are operating at. The single sharpest concrete demonstration of model harness fit comes from what happens when a user switches models mid conversation. Cursor's research team describes this carefully in their April 30 post, and the failure surface is worth walking through because every assumption that breaks here is an assumption a single model harness pair quietly relies on. Three things break at the moment of a model switch. First, the conversation history itself is now out of distribution. The previous model produced tool calls in its native vocabulary: blocks, tags, six or eight verb subagent dispatches. The new model was trained against a different vocabulary and now has to reason about a transcript full of tool calls it would not have emitted. Cursor handles this by injecting a custom instruction explicitly telling the model "you are taking over mid chat from another model" plus steering it away from the prior model's tools. That mitigates but does not eliminate the cost. The model is still reading a transcript that does not match its instincts. Second, the prompt cache breaks. Caches are provider and model specific, which means a switch is a guaranteed cache miss. For a long session, this turns the first turn after the switch into a full price re entry of every byte of system prompt and conversation history. Cursor's mitigation is to summarize the conversation at switch time, which yields a shorter clean transcript that costs less to re cache, at the price of losing details that the summary did not preserve. Third, the tools themselves change shape. The new model's harness loads its native tool set. If the user was deep into a subagent dispatch flow with one set of verbs, the next turn presents a different set. The model has to figure out whether the prior tools are still valid (they are not) and which of its own tools maps to the user's apparent intent. Cursor's recommendation, after building the mitigations, is honest: "we generally recommend staying with one model for the duration of a conversation, unless you have a reason to switch." The cleanest workaround they describe is to spawn a subagent with a different model rather than switch the main conversation. A subagent starts with a fresh context window, no transcript bias, no cache to break, and the new model's native tool surface from the first turn. Each of these failure modes maps directly back to the thesis. The transcript, the cache prefix, and the tool surface are all parts of the wire format the model was trained against. Change the model and you change the contract on all three sides at once. A model switch is not a model swap. It is a harness swap, a tool swap, and a cache invalidation, all at once. The model harness fit framing is no longer a subterranean observation. Two of the labs publishing the most interesting agent work in 2026 say it openly, and the AI infrastructure community has converged on a clean one line definition. Cursor's Stefan Heule and Jediah Katz describe their harness work as "obsessively stacking small optimizations" specifically because a step change is rare and the gains compound only inside a matched pair. Their team builds in custom prompting per provider and per model version, citing OpenAI's literal precision versus Claude's tolerance for imprecise instructions as concrete differentiators that flow back into prompt design. They report driving unexpected tool call errors down by an order of magnitude in one focused sprint. Tool call reliability is not a model property. It is a harness property, and one that compounds every turn the agent stays alive. Anthropic's Prithvi Rajasekaran ran a related experiment in his March 24 post on long running application development. The architecture: a planner, a generator, and an evaluator agent, modeled on Generative Adversarial Networks. The evaluator uses Playwright MCP to actually click through the running application as a user would, then grades against a rubric. Out of the box, Rajasekaran reports, "Claude is a poor QA agent" — it identifies legitimate issues and then talks itself into approving the work anyway. Tuning the evaluator prompt over multiple rounds is what turns it into a reliable judge. The harness creates the judgment surface; the model alone does not. The deeper lesson from Rajasekaran's work is about how harnesses should evolve as models improve. He built one harness against Claude Sonnet 4.5, which exhibited "context anxiety" strongly enough that compaction alone was not sufficient. The harness needed full context resets between sessions, with structured handoff artifacts to carry state across the boundary. When Opus 4.6 shipped, that behavior was largely gone. Rajasekaran dropped the entire context reset machinery and ran one continuous session for over two hours. Every component in a harness encodes an assumption about what the model cannot do on its own. Those assumptions go stale. The matched pair is not static. It moves as the model matures, and the harness has to retire scaffolding that is no longer load bearing. LangChain's Vivek Trivedy has the cleanest framing I have seen: "Agent = Model + Harness. If you're not the model, you're the harness." The harness in this view is every piece of code, configuration, and execution logic that is not the weights themselves. System prompts, tool descriptions, bundled infrastructure, orchestration logic, hooks, middleware. Working backwards from the desired agent behavior, every harness primitive earns its place by patching a specific model gap. Filesystems for durable state, bash for arbitrary action, sandboxes for safe execution, memory for continual learning, planning and self verification for long horizons. Each primitive started life as a workaround for a specific deficiency the model had at training time. Some of those primitives will get absorbed back into the model over time. Others will compound. Trivedy also names the mechanism that makes model harness fit so durable: a co-evolution feedback loop. "Useful primitives are discovered, added to the harness, and then used when training the next generation of models. As this cycle repeats, models become more capable within the harness they were trained in." This is the pipeline that hardens the matched pair over generations. A new harness primitive ships in week one. By month three, it shows up in millions of agent traces. By month six, those traces are training data for the next model. By month twelve, the next model has the primitive baked into its instincts and the harness can lean on it. The loop is what makes "swap to a foreign harness" not just clumsy but compounding clumsy. The model's habits got shaped by the previous generation of its own harness, which itself was shaped by the generation before. Move sideways and you skip every cycle of that compounding. Trivedy is honest about the cost of this loop, and I want to flag the counter argument cleanly. Quoting him: "A truly intelligent model should have little trouble switching between patch methods, but training with a harness in the loop creates this overfitting." If the model's tool format preference is overfit to its training harness, you could argue that the right long term move is to train against a more diverse set of harnesses so the model generalizes. That argument has merit. The labs that ship one model and one harness as a pair are buying near term performance at the cost of the model's portability. Whether that trade is the right one depends on whether portability is something the customer values, and right now the customer mostly values the leaderboard. Three independent posts published within weeks of each other, all converging on a single thesis: the model is only half of the system, the harness is the other half, the matched pair is the proper unit of analysis, and the vendors that ship the matched pair as a single product are the ones currently sitting at the top of the leaderboards. The harness side of the contract has converged on a markdown file per concern, and the file names are now load bearing across the ecosystem. A model trained on one harness recognizes the file names and knows which one carries which kind of authority. The key observation: the file names are now part of the wire format. A model that has been trained to look for a block under a heading will hunt for that exact heading on a turn. A model trained against will look for and miss . A model trained against will load personality from and ignore the same content if you put it in . This is why the AGENTS.md feature request against Anthropic's repo matters. It is not a docs migration. It is a request for the model's training distribution to expand its file recognition vocabulary. Until Anthropic post trains Claude to read , that file is invisible to Claude Code even if it sits next to in the repo. The SOUL.md ecosystem is a stress test of this thesis. SOUL.md is not yet recognized by any major harness's default loader. So the SOUL.md repo's installation instructions are revealing: copy your directory into the project, then add a few lines to pointing the model at it. That is a manual bridge from a non-recognized convention to a recognized one. The SOUL.md authors understand that the bytes do not work unless the model knows where to look, and "where to look" is a habit fixed in post training. The same routing problem shows up in the open. GitHub Copilot CLI v1.0.4 added: "Read .claude/settings.json and .claude/settings.local.json as additional repo config sources." v1.0.36 walked some of it back: "Custom agents, skills, and commands from ~/.claude/ are no longer loaded by the Copilot CLI." That is a router that tried to be permissive about file names, then narrowed when the user surface got confusing. The lesson sits underneath the changelog: even the harness that runs Claude models cannot treat files as authoritative without negotiating with the user about which conventions count. Pick the convention. Ship the post training to match. Or ship a router that explicitly maps each file to the model that recognizes it. The middle path of "be permissive and load anything that looks plausible" loses every time. After months of running these three harnesses side by side, reading the open source code, and tracking the Terminal-Bench leaderboard: The harness is no longer a wrapper around the model. The harness is part of the model's effective parameters. The post training process embeds the harness's tool surface, schema shapes, memory rituals, citation contracts, and system prompt structure into the model's instinct set. You can take the weights to a different harness, but you cannot take the instincts. The instincts only fire when the harness presents the world the way the post training presented it. This has three consequences worth naming. For agent platform builders: pick a harness, pick a model, ship them as a pair. Do not pretend the model is portable. Do not pretend the harness is neutral. The frontier labs are publishing model harness pairs whether they say so or not, and the per pair performance is the only number that matters. Copilot CLI's "different tools for different models" approach is the honest version of this. The dishonest versions ship a common denominator and underperform on every model they serve. For model labs: the harness is product strategy, not infrastructure. The harness is where the lab's post training investment compounds. Anthropic's injection model, the typed memory taxonomy, the verification on every body read, are not infrastructure choices. They are the surface the model was sculpted against, and they are the moat that makes the model less interchangeable than it would otherwise be. Same for Codex's two phase memory pipeline, the citation tag, the strict JSON schema. Same for Copilot CLI's ten section system prompt skeleton. The harness is where the model becomes irreplaceable. For users: the cost of switching is higher than it looks, and lower than vendors would like you to think. Higher because the model and the harness fused over months of training and you cannot pull them apart cleanly. Lower because the simple stack underneath is shared, and the conventions on top are documentable. A honest port — replicate the tool surface, replicate the citation contract, replicate the system prompt structure, replicate the memory ritual — would close most of the gap. It just costs as much as the original post training did to set up. The matched pair is not static. It shifts as the model matures. This is the most useful nuance from Rajasekaran's Anthropic post. A harness component that was load bearing for Sonnet 4.5 (context resets, sprint decomposition, aggressive compaction) became dead weight on Opus 4.6 because the model started doing that work natively. The right harness for a model in March is not the right harness for that model's successor in October. The discipline is to read the traces, identify which components are still earning their place, and retire the ones that are now patches over solved problems. Cursor's blog says the same thing in different words: "Every component in a harness encodes an assumption about what the model cannot do on its own, and those assumptions go stale." So back to the question I started with. Why does the same prompt produce visibly different output across three harnesses running the same model? Because the model running on three harnesses is effectively three different models, even though the weights on disk are byte for byte identical. The instincts that fire at runtime are not stored only in the weights, they are conditioned by the harness the weights were trained against, and the instincts turn out to be most of what shows up in the assistant's output on any given turn. The interesting design move now is not a better model. It is not a better harness either. It is the matched pair, designed end to end, where the post training and the runtime reinforce each other turn after turn until the model becomes legibly better at the things this specific harness rewards. You can see the major builders converging on this idea from three different starting points. Anthropic shipped Claude Code as the canonical Claude harness, with the post training and the runtime co-designed as a single product. OpenAI shipped Codex CLI as the canonical Codex harness, with the same vertical integration on the OpenAI side of the house. At GitHub and Microsoft we shipped Copilot CLI with explicit per model routing because multi model is crucial: customers run every frontier model they can get their hands on, and our job is to make each one perform at its best inside a harness designed to serve all of them well. The result is the most pragmatically honest harness in the open or semi open set today, and the one positioned to compound across model generations rather than locking to any single lab. Three different theories of what to do about model harness fit, all three coherent, and all three paying a real engineering price for the choice they made. The frontier work in 2026 is not about new model architectures. It is about new harness primitives. Ralph Loops, where a hook intercepts the model's exit attempt and reinjects the original prompt in a clean context window, forcing the agent to keep grinding against the goal. Just-in-time harness assembly, where the tool surface and the system prompt get composed per task instead of pre-configured per session. Self-tracing agents that read their own logs to find harness-level failure modes and patch them without human intervention. Each one of these is a primitive that some model will eventually be post trained against, and that pairing will show up at the top of the next leaderboard. The Terminal-Bench leaderboard tells you who is paying the price right. Look at it again in six months. The Evidence: Terminal-Bench 2.0 : what the leaderboard actually shows about model harness pairs Three Harnesses, Three Bets : SQ/EQ vs typed conversation loop vs JSON RPC supervisor The Tool Surface : where post training is most visible Skills Carry Tool Specs : why "same SKILL.md format" does not mean "interchangeable" The Memory Layer : synchronous live writes vs deferred batch vs server side, and why the citation tag matters The Citation Discipline : how the model talks back to the harness The System Prompt Skeleton : ten section IDs is a contract The Routing Reality : what GitHub Copilot CLI is actually doing about all this Mid-Chat Model Switching : the cleanest concrete failure mode What the Labs Are Saying : Cursor, Anthropic, and LangChain all converging on the same framing The Identity File Convention : CLAUDE.md, AGENTS.md, SOUL.md, USER.md, and what each one is for What This Means : the model is no longer the moat alone, and the matched pair shifts as the model matures — Codex's custom diff format. Two flavors: a freeform Lark grammar at and a JSON variant. The model was trained to emit patches in this format. It is not interchangeable with Claude Code's (which takes / ). — the bash family. Plus and for long lived processes that the model can drive with stdin writes after the fact. — the plan/todo tool. A model not trained on this tool will use a different convention to track work. — model can request expanded permissions mid turn. Codex is the only harness with this exact verb. — multi agent orchestration with , , , , , , , . Eight verbs. The model knows all eight. , — tools that find other tools. Codex's answer to deferred tool loading. — , , . Tied to migration . , , — lower case names internally, surfaced to the model as CamelCase ( , , ). The model was trained on the CamelCase variant. requires , , optional . Not the same shape as Codex's . has the deepest sandbox surface: , , , , , , , . The model knows when to set and pair it with the tool. and — the lazy load primitives. — single tool for subagent dispatch. Takes , , optional , optional . The post training has the model emit short imperative descriptions for these. / — both permission. Toggles a worktree local override. / — wrap for subagent isolation. — streams stdout from a background process. Pairs with . The model knows this pattern; Codex does not have it. — the workflow scaffolding tool. The model writes triplets in a particular pattern. , (bundled ripgrep), , — file reading with explicit range params. — built in (v0.0.374). Rejects URLs. , , — three verb interactive shell control. — subagent dispatch with depth and concurrency limits. , — multi turn subagent control. A different shape from Codex's six verb agent surface. — interactive clarification. — persistent memory tied to a remote backend. Memory is not local files here. — included specifically when serving Codex models. A different patch toolchain than Codex's own. , , , , .

0 views
Takuya Matsuyama 1 months ago

10 years of indie dev: How I went global from Japan (talk w/ Hiroshi) - Part 1/2

I joined Hiroshi's podcast episode a few weeks ago. We shared our experience and knowledge on indie dev. I'd like to cross-post our talk in English here. I also tried to create an English dub using AI. The voice cloning quality is quite impressive, so I hope you enjoy it. 00:00 I joined Hiroshi's podcast 01:12 Intro & welcome Takuya from Inkdrop 02:31 Takuya's background: from Walknote to Inkdrop 06:07 Going indie: cautious vs. reckless paths 08:06 How indie dev became a freelance pipeline 10:36 Timeline to making a living from Inkdrop 13:28 Why target the English market from day one 15:30 Pre-AI struggles writing English copy 17:31 Thoughts on the AI vibe-coding era 18:37 Reviewing every line: AI usage philosophy 22:19 The Shinkansen analogy for AI 27:05 Why personal taste matters more than ever 28:19 Living better in the AI era (Ichiju-Issai) 30:18 Enjoying tech change like the seasons 32:11 Avoiding the herd & staying unique 36:27 Dealing with online critics 37:46 Wrap-up Hiroshi: Hi, hello, good evening. I'm Hiroshi Creation, an indie app developer. Today we have another special guest with us: Takuya from Inkdrop. Thanks for joining us. Takuya: Thanks for having me. I heard you quit your day job, Hiroshi, so I figured I had to come show some support. Hiroshi: Thank you. Our connection goes back a while, right? I think the first time was about 5 years ago when I wrote a guest post on your blog. Takuya: Way before that, actually — we'd been following each other and watching each other's work. Back then your app was called "Family TODO," and now it's "minto." You'd been building it for a long time, and when it hit 10,000 users, that's when I invited you to write something on the blog. That's how it started — me reaching out to you. Hiroshi: Ah, that's right. Most indie devs probably already know you, but for now, Takuya, could you give a quick self-introduction? Takuya: Sure. I'm Takuya. I make a Markdown-focused note-taking app called Inkdrop — I've been working on it for about 10 years now. Originally I joined Yahoo as a new grad, quit after a year and a half. While working there I was always doing indie dev on the side, and what I built then was a music app called Walknote, for iOS — well, iPhone OS at the time. It got picked up and went viral. On that momentum I quit my job, like "I'm gonna live off my dream apps," and just jumped without thinking. In the end, monetization for Walknote totally failed, but it gathered around 130,000 users. Hiroshi: Wow. Takuya: It made it into the top rankings, things were going well, but I hadn't thought through monetization at all, so it didn't pan out. I gave up on it, then made a bunch of other things that all failed, and finally I thought "I want a better note-taking app, let me just build one." That became Inkdrop. Hiroshi: Oh, I see. Takuya: Until then I'd had tons of failures — it's not like I built Inkdrop out of nowhere and it just worked. Hiroshi: Right, that's the thing. People only see the bright side — the apps that actually succeed — but you'd built quite a lot before that, huh? Takuya: In terms of indie dev, it goes back to high school, even middle school. I've been doing personal projects basically since I started programming, so my programming history equals my indie dev history. Hiroshi: Wow, you're a real veteran. In minto-years, how old is Inkdrop now? More than 10? Takuya: Exactly 10 — this year is the anniversary. Thanks to everyone, it's still going. Hiroshi: I'd love to talk with a veteran like you about all kinds of things today. minto, by the way — in about a month it'll hit year 7, the 7th anniversary. Seven years is actually a lot, when I think about it. Takuya: Yeah, 7 years is long. Hiroshi: But in my case I went independent really late — for about 6 and a half years I was doing it as a side gig while working a day job. So my pace has been like a turtle's, honestly. Takuya: Was that about balancing monetization and being able to cover living expenses? What was the deciding factor for taking the leap? Hiroshi: Well, some people borrow money from places like the JFC and just dive in — Takuya, you might've been like that at first too. But in my case I was very cautious, like crossing a stone bridge by tapping it first. I waited until revenue was solid before going independent. Because of that, it ended up taking 6 and a half years. Takuya: I think that's totally fine. In my case, I really wasn't thinking — pure youthful recklessness. After I quit, friends literally called me asking "Are you okay?" So I really wouldn't recommend it. But the reason I survived was that Walknote had become a track record, and that mattered a lot. Back then the iPhone market was super hot, Facebook was on fire — everyone wanted to be the next Mark Zuckerberg, that was the vibe. So in that environment, word got around that "this guy can build high-quality iPhone apps." Through friends introducing me, I got work from startups I knew, and even bigger companies started giving me design work. So indie dev itself led directly to my freelance work. Even though it wasn't direct monetization, indie dev as a means of making money came through huge for me. Hiroshi: Right, back then there weren't many people who could build iPhone apps either. You probably had people asking "How do you even do this?" — that kind of thing. Takuya: Yeah, exactly. If you build something, it leads to work, so you don't have to worry about starving. You don't have to obsess over making it work financially as just an app. Hiroshi: Yeah, exactly. It's not 0 or 1 — if your indie app doesn't sell, you can take freelance contracts, or honestly just go back to being employed and work as a salaried engineer. When you think about it that way, the risk isn't really that big. You've got the skills, that's enough. minto already serves as a business card too. Even in the worst case where I can't make a living off it, everyone already knows "this guy can build stuff like this," so work would just keep coming in. Takuya: Yeah, and beyond just the technical side — having actually shipped a personal app means your marketing instincts aren't off compared to the average person. Plus, both you and I do our own design. Being a frontend engineer with some design sense — you don't need to explain it in a job interview. Just say "I made this" and they get it instantly. I've never explained it. I've never said "I have X years of PHP, X years of JS." I just say "I can build this app," show them, and the work comes in pretty easily. They go "Oh really? Actually we've been thinking about something like this," and the conversation moves forward fast. Hiroshi: When was it — you wrote about it on the blog. Takuya: For the first year after release I couldn't make a living off it at all. Around year 2, I wrote a blog post like "I can now cover half my rent, my full rent." So by year 3 I think I was fully able to support myself. Hiroshi: From there it was just Inkdrop full-time? Takuya: Pretty much. I basically don't take contracts anymore. Just one time — there was this Austrian React Native developer friend of mine, Marc Rousavy, who runs an agency. He needed designer work and asked me. I told him "Sure, as long as I can use it as content for my videos," and he said OK. So I turned it into YouTube content while doing the work — that was the last contract I took. Hiroshi: Got it. So basically just Inkdrop now. Takuya: My stance is, if something really interesting comes along I'll do it without being too rigid. That one was great because it was my first time taking work from an Austrian company — first time doing overseas contract work — and the fact I could turn it into video content was interesting too, plus it paid. All three things lined up. Hiroshi: That reminds me — about your videos. Your YouTube channel has crossed 200K subscribers now, right? That's pretty incredible. Takuya: I have two channels. DevAsLife is the main one with 210K, and the other one is talk-focused — I started it for English speaking practice — that one's at about 20K now. Hiroshi: Whoa. So you have the Silver Play Button too? Takuya: Yeah, I have it. Lately I've been posting more on the second channel (craftzdog), the talk-focused one, than on DevAsLife. Hiroshi: I think that's because you're constantly publishing in English, and Inkdrop is also primarily in English. Rather than focusing on Japan, your whole activity has gone beyond Japan and into the global market. My recent theme has been about earning foreign currency from a cheap Japan. But you've been doing this from a really early stage — about 10 years ago. Why did you choose to market to overseas audiences? Takuya: Because there was no reason to limit it to Japan, from the start. It's a note-taking app — why would I only sell it in Japan? That was the first question. Through development I'd been contributing to open source a lot, so "overseas" was already very close to my daily life. Hiroshi: I see. Takuya: I was filing issues, sending PRs, doing all that in English daily. Overseas developers were already in my circle. So when I conceived Inkdrop, since it's for developers, my target was naturally the developers around me — which means English-speaking people. If those were my target, not doing it in English wasn't an option. Of course, it's clear that Japan's economy will shrink over the next 10 to 20 years — but rather than keeping that in mind, basically when thinking about what I want to sell, I think about what kind of thing I'd buy. And then I think about who I want to sell to. That's someone close to me — someone similar to me, someone I can easily understand. That's the English-speaking developers around me. Following that line of thought, building it in English just made sense. Hiroshi: Got it, so it was natural. Rather than deliberately deciding to push hard into the English market, it was part of the open source flow — going overseas was the natural path. Takuya: Yeah. Of course my English was terrible at the time, so writing one blog post would take 2 weeks. To make a website I'd visit all kinds of sites, copy-paste phrases, mash them together. Coming up with copy in English was insanely hard. I could only do literal translation, and translation tools didn't really give good results either. So I'd visit the homepages of all the apps I was using, pick out "this phrase works, this one was useful," collect them, and stitch them together. Nostalgic. Hiroshi: Wow, that's amazing. Whereas now, if you want to make an overseas site or sell a service abroad, you just translate with AI and you're done — that's how easy it feels, putting quality aside. Hiroshi: So my impression of you, Takuya — even before AI you were heavily into indie dev, and among indie devs you have really high technical skill. You've contributed to open source from way back, you can do Electron performance tuning and app optimization. So I'd love to ask how you're feeling about this AI vibe-coding era right now. Takuya: It's super fun. I've been writing about this in recent blog posts too — AI is so pervasive that avoiding it is impossible. Honestly, in terms of skill, the AI is already beyond a new grad. But in terms of how I use it — it's polarized maybe. There's the extreme type who just lets AI do everything and doesn't even read the code AI wrote, and on the other side the type who only uses AI as chat. I feel the graph kind of looks like that, and I'm sitting in the middle. The code AI writes, I basically read line by line, review it, and only commit once I'm satisfied — that's how I use it. Because there are paying users already on the product, I can't just drop in irresponsible code. To ship something I'm ultimately responsible for, I have to review every single line. So firing up 10 or 20 agents and producing tons of stuff at insane speed — that's not how I'm using it right now. I always look at things one at a time. Hiroshi: Yeah, I get it. I've been making something new lately, and since the new thing has no users, I can make breaking changes freely. But when I'm working on minto, there's already thousands, like 10,000 lines of code that humans wrote — that I wrote — and there are existing users. So I really can't cut corners line by line. But for the new project, I'm letting AI do about 90% of it. I do review it, but it's pretty much half-self-driving — like 70% autopilot. Takuya: That feeling of "I can't fully trust AI yet" when you've already written an existing service by hand and it has users — I totally get it. When you're starting something completely from scratch with AI involved, the cost of writing is basically zero, so you can make breaking changes without hesitation. Like if you're making a new web page, a landing page — at that moment I'd definitely use AI, but in the trial-and-error process I'd be tossing out the code I just wrote. Just spinning the PDCA loop at insane speed, getting closer to the shape little by little. This is similar to image generation. When you make a website with AI, you let it build the whole thing each time, then "I don't like this part, regenerate, regenerate" — it's really similar to image generation, that mental image. Hiroshi: Before, only one-man-band CEOs with crazy clients could work that way. But now individuals can do it — that's the change. Whether it's a good way of working, nobody really knows yet. Takuya: In the end, if you actually want to pay attention to the fine details, that approach has limits. I wrote about this on the blog before — there's an analogy that AI is like the Shinkansen. Basically, you can get from Osaka to Tokyo at incredible speed, but if you specifically want to go to Asakusa, or back to Hikarigaoka Park where you used to live, you have to switch to local trains, take a taxi, take a bus — these fine-grained mode switches become necessary. So if you want to go somewhere specific, do something specific, when filling in that level of detail, leaving it to AI has limits. When you ride from Osaka to Tokyo, the scenery flies by at incredible speed, so the entire process becomes invisible. I think that's why your head feels foggy when you use it. Hiroshi: Ah. Takuya: "Actually we've arrived in Tokyo, we're at Shinagawa." From there, when you start figuring out how to get further, suddenly the scenery becomes visible — "Oh, a new building has gone up here." You can notice changes like that. The way I'm using AI in Inkdrop right now is probably more like taking a bus, taking a taxi, or noticing which buildings have been rebuilt — that kind of usage where you can see the world. Hiroshi: So basically, first you ride the Shinkansen, fly around, try lots of places, going "ah, this design style doesn't fit," like German style, Austrian style, and so on. You travel around like that, and once something clicks, from there specifically — "this specific architectural style, this window feel, reproduce this" — you start giving instructions at that level of specificity. Takuya: Yeah, exactly. That foggy-head feeling, I really get it. You can't read it anymore, the speed is too fast for humans to keep up. You lose the will to read it, it's too fast. Hiroshi: So as instructions, like "change the shape of this clock here a little, change the color" — at that unit of instruction, the way you're probably doing it, Takuya — when you instruct AI like that, you can look at each line and say "this color is bad." But broadly, if you give vague instructions before you even have an image of the world you want to realize, it's really unbearable to watch. Takuya: Recently I tried this front-end set — for website design, someone analyzed landing pages of various famous companies and turned them into Markdown. It was called design.md — or maybe Awesome DESIGN.md, that's the one. I just sent the link, you can see it in the chat. I tried it once, testing different site styles one by one on my own website, but the quality wasn't good enough to use as-is, unfortunately. The frame structure — page structure, layout, color palette — that's the only level it replicates. It doesn't blend things nicely with my app's concept. I thought "yeah, this isn't quite it." Hiroshi: That sense of "something's off," your authorial voice, the worldview that's uniquely you, Takuya — it shows in Inkdrop, in your daily blog, in your illustrations. I feel like that kind of sensibility is really important in the AI era. Otherwise you can't make the call, because without a refined sensibility, you can't judge whether a design is good, or whether it's missing something. If you just hand yourself over to whatever AI outputs, you end up with similar designs and apps. So in the process of refining that sensibility, what have you been doing lately, outside of AI — outside of the computer? Anything you've been doing? Takuya: Just yesterday I posted a Vlog and a blog post on exactly this topic — discussing how to live better in the AI era. The inspiration came from Yoshiharu Doi's Ichiju-Issai — the one-soup-one-side concept. When you're constantly online, you get steeped in algorithms. Open Twitter, X, and you're flooded with drama and gossip, your attention gets pulled in. To use a food analogy, those things are additives — you can live without them. So you keep subtracting that kind of thing, leaving only what's necessary, and maintain the rhythm of your life. That's one. The second is treasuring organic connections and ideas. Random ones — like a barista at the cafe you frequent striking up a conversation, that kind of small warm moment — or chats with the moms you see every morning, or playing a bit with kids you often see at the park. That kind of connection that wasn't designed by anyone — treasuring that. As for ideas, instead of staring down at the screen in front of your computer, set it aside, go for a walk, go camping, drive somewhere — make time to step away — and then suddenly good ideas pop up. Hiroshi: Yeah, yeah. Takuya: And then — how do I put it — enjoy technological change like the changing seasons. Hiroshi: Ooh. Takuya: Every day there's some new AI thing, this AI, that-and-that-agent — keeping up with all of it is exhausting. Instead of living each day competing, racing against someone, you know — "spring is here, the cherry blossoms are beautiful," "apples are in season, let's eat some," "I love saury, looking forward to autumn" — that kind of thing. Not chasing, but appreciating what comes — the things each season brings — appreciating, enjoying, gratefully receiving — that attitude when engaging with technology. The tension drops, and you take in only what's needed, when needed. Take it in, internalize it, let it ferment — I think that's good for keeping your own pace. Not just chasing trends, but more naturally — when the chance to engage comes through those organic connections, then it's fine to take it in. If you live at that pace, I think unique ideas emerge on their own. Hiroshi: Yeah, exactly — it's like being in the herd, no uniqueness emerges, as long as you're chasing trends. You're just imitating what others are already doing, so there's no element for uniqueness to emerge. So you have to do something different from others — otherwise you won't think of new video formats, won't write articles with unique perspectives. Publishing in English about how to live in the AI era based on Ichiju-Issai — nobody else is doing that, so at minimum that's unique, I'd say. I read your blog yesterday, and I thought there's no way miso soup or anything could connect to AI, but it really did connect. And I learned for the first time that Doi-san's book was that deep. Takuya: Yeah, exactly — it's not just recipes or that kind of suggestion, it traces all the way back to deep Japanese roots. It's a really profound book. So miso isn't an additive — it's something originally aged or fermented. Things with that kind of depth, versus additives like trends or X posts — it's better to think about them separately. You don't have to completely eliminate them — no need to forcibly remove them — but having something inside you that you don't get tired of even eating every day, like miso or rice, that becomes your axis and stabilizes you. It could be playing guitar every day, or drumming — in my case, when I drum, somehow I can return to my original self. That kind of thing seems to have no connection to indie dev at all, but I think it's actually really important. Hiroshi: Hmm, yeah. Takuya: The tech world is pretty closed, right? There are a lot of similar-type people. When that happens I can never quite fit in, can't really get into that circle. Ever since school, I've had a personality where joining a fixed group makes me anxious for some reason. Same in tech circles, same in English-speaking circles — when I hang out with the same people too long, suddenly I come back to myself, "wait, is this okay for me?" — and I can never stay rooted. It's just my nature. Hiroshi: Right. Takuya: I think that's fine. The flip side is loneliness follows me forever, but not staying with the same homogeneous group — I think that's one of the elements that makes me unique. Hiroshi: That's so important. The previous guest, Ko-san, was saying the same thing — basically, doing the same thing as everyone else doesn't get you anywhere. Sometimes intentionally going to a different community — Ko-san was talking about that too. So interesting. Love that kind of person. Hiroshi: I saw your post recently — even overseas, when you post on DevAsLife or your sub-channel, you get comments like "Why aren't you using OpenCode?" Even though using Claude Code or any AI agent at all already puts you in the top 0.something percent. And within that, people compete and try to one-up each other over 0.000-something percent — flexing on each other. I found it interesting that this kind of thing happens overseas too. Takuya: Tons of it. The people who say that are mostly anonymous accounts, people without confidence in themselves. They use VSCode and want the validation of "VSCode is fine" — so they attack non-VSCode users to convince themselves. So you can ignore them all, it's fine. Just a bit annoying. Hiroshi: I see. I thought it's like a village — being stuck in a village forever, you can't let yourself get swayed by those words. Takuya: You don't need to deny others — to validate what you're doing, the question of "what to belong to" is beside the point. Those people should first build up their own self-affirmation. Hiroshi: Thank you. So let's pause here for now, and make the second half about the topics you, Takuya, want to talk about. Takuya: Sure, thank you. Let's wrap up here for now. Thanks so much. Hiroshi: See you. X: https://x.com/hirothings Podcast: https://open.spotify.com/show/19HqgO48GOmiFXUMp6YuWv Minto: https://mintotodo.app/

0 views
qouteall notes 1 months ago

Rust Async Traps

In Rust, if you call an async function, it returns a future. But the future is just data by default. If you don't await it or spawn a it, its async code won't run. The word "future" has very different meaning in Java. In Java, when obtaining a , the task should be already running. Async runtime schedules async tasks on threads. When an async task suspends, the thread can run other async tasks. But it requires the async task to cooperatively suspend ( ). An async task can keep running without for long time, and the async runtime cannot force-suspend it. Then a scheduler thread will be kept occupied. This is called blocking the scheduler thread . When a scheduler thread is blocked, it reduces overall concurrency and reduces overall performance. And it may cause deadlock. The normal sleep and normal locking will block thread using OS functionality. When a thread is blocked by OS, async runtime don't know about it. In Tokio, use for mutex and and sleep. They will coorporatively pause and avoid that issue. That issue is not limited to only locking and sleep. It also involves networking and all kinds of IOs. So Tokio provides its own set of IO functionalities, and you have to use them when using Tokio for max performance. Also, heavy computation work without point is also blocking. The async runtime cannot force-suspend the heavy computation if it doesn't cooperatively . Tokio also supports an "escape hatch". The task spawned by runs in another thread pool and won't block the normal scheduler thread. The code that does non-async blocking or heavy compute work should be ran in . How to deadlock Tokio application in Rust with just a single mutex Why do I get a deadlock when using Tokio with a std::sync::Mutex? In Rust, a future can be dropped. When it's dropped, its async code stops executing in an await point. This is called cancellation. It's a implicit exit mechanism. The control flow of it is not obvious in code. Note it cancels the future, not the IO. Cancelling a future just stops the async code from running (and drop related data). The already-done IO operations won't be cancelled. (The written files won't be magically rolled back. The sent packets won't be magically withdrawn.) Cancellation not the only implicit exit mechanism. Panic is another implicit exit mechanism. And in the languages that have exceptions (Java, JS, Python, etc.), exception is another implciit exit mechanism. However, exceptions and panics are often logged, but future cancel is often not logged . Although panic is implicit code control flow, it's often explicit in logs. It's easy to debug because it's visible in log. But a future cancel by default logs nothing. Debugging future cancel issue is much harder than debugging panics. The cancellation "catch": normally when the parent future cancels, the inner futures are also cancelled. It propagates from outside to inside. The can stop that propagation. Although is , dropping it won't cancel the spawned task. So if you want to avoid cancellation, wrap it in (and don't call ). In Golang, there is panic, but there is no implcit cancellation. All cancellation need to be explicit. (However managing context cancellation in Golang still has traps, just different to async Rust.) Two examples of cancellation issues: Alan tries to cache requests, which doesn't always happen , Barbara gets burned by select See also: Dealing with cancel safety in async Rust , Cancelling async Rust There is another kind of "cancel": doesn't drop the future but does not the future. This is also dangerous. Elaborated below. Tokio documentation about cancellation safety: 1 , 2 Note again that "cancel" just drops Rust future (and un-track it in async runtime). It doesn't cancel the IO operation. With epoll, the buffer can be directly put inside future, with no extra allocation. If the Rust future is dropped, it just don't do the IO after being notified. With io_uring, dropping the future doesn't cancel the kernel's IO process. So putting buffer into future in io_uring is not memory-safe on cancellation (kernel will write into freed memory). Two solutions: See also: Notes on io-uring As previously mentioned, dropping a future cancels it. There is another kind of "cancellation": just not the future, without dropping the future. It's also dangerous. It may cause deadlock or weird delaying. In you can pass ownership of a future, but you can also pass a future borrow. When a future borrow is passed, one dangerous case can happen. If the select goes into one branch, the future of other branches are dropeed. If you pass a future borrow to it, the borrow itself is dropped, but the borrowed future is not dropped. However, the borrowed future will not be polled again (you can explicit await it after the , but it doesn't before finishing). This creates a temporaily un- -ed future. This is dangerous when async lock is involved. After acquiring lock, the returned future holds lock. If the future holding lock is dropped, it released lock. But if the future holds lock but not dropped and not polled, it's likely to deadlock. This is the mechanism behind futurelock . When using buffered stream, some futures in buffer may be temporarily un- -ed. This can cause weird delaying or deadlock. https://tmandry.gitlab.io/blog/posts/for-await-buffered-streams/ https://without.boats/blog/poll-progress/ Rust currently have no in-place initialization. Heap-allocating one thing requires firstly creating it on stack then move it to heap. In release mode, it can be optimized to directly initializing on heap. But in debug mode it still involves creating on stack. Some futures may be very large. Creating a large future on stack can cause stack overflow. Sometimes it stack overflows in debug mode but not release mode, because in release mode it directly writes to heap. In Windows the default stack size is smaller so it's more likely to stackoverflow. There is currently some inefficiency in future size. See Async Future Memory Optimisation How to reduce future size: It will print All of them execute on main thread. There is no parallelism. The parallelism can be enabled by using . But without it has no parallelism by default. This is different in Golang. In Golang, goroutines are parallel. Async-sync-async sandwitch: Async function call sync function that blocks on another async function. Its async-to-sync calling blocks scheduler thread. It's very prone to deadlock. Tokio does multi-thread work-stealing scheduling. Its purpose is very similar to OS scheduling. And an async task's purpose is very similar to OS thread. The duality of the two: As long as the data is owned by a thread, it's data-race free. The correspondence: as long as the data is owned by an async task, it's data-race free. Tokio requires the future to be . This can create some troubles. It requires because Tokio does work stealing. An async task in one thread could be then scheduled to another async task. However if async task is analogous to thread, then if we ensure that the data is owned by async task, it can also achieve data-race free, even if the data is not . However Rust doesn't check "async task boundary". An async task can pass data out. Then the data is no longer owned by async task. There is no language mechanism that ensures that the data is tied within async task. So you still have to satisfy even for the data that's only used with one async task. The constraint can be avoided for thread-per-core async runtimes. Using multiple async runtimes together is possible but is hard and error-prone. And there are many async-runtime-specific types. So async runtime naturally has exclusion. That's why Tokio has monopoly. In Golang you can only use one official goroutine scheduler. In Rust, although Tokio has monopoly, you have choices of using other async runtimes. This trap is not Rust-specific. When using thread pool, it often has thread count limit, which limits concurrency. But in async, there is no concurrency limit by default. This is good for high-performance web server. But it has downsides: One solution is to add a semaphore to limit concurrency. Structural concurrency force all concurrent tasks to be scoped. Then the tasks form a tree-shaped structure. Structural concurrency can borrow data from parent. There is no need to make the future . There is no need to wrap things in . The tree shape is free of cycles, so awaiting on child tasks alone cannot deadlock (but it can deadlock if other kinds of waits are involved). But there are cases that structural concurrency cannot handld. One is background tasks. For example, a web server provides a Restful API that launches a background task. The background task keeps running after the request that launch task finishes. The bane of my existence: Supporting both async and sync code in Rust Why async Rust? Async Rust can be a pleasure to work with (without ) Making Async Rust Reliable - Tyler Mandry FuturesUnordered and the order of futures The "fully owned" here means not just ownership in Rust semantics. The has internal data structures. The "fully owned" applies to these internal data structures. One async task fully own the means the internal data structure (that contains reference count) is only accessible from one async task. ↩ . When one branch is selected, the futures of other branches are cancelled. . Explcitly cancel a task. . When timeout is reached but the future hasn't finished, it's cancelled. In epoll, the OS notifies app that an IO can be done, then the app does another system call to do IO. It involves context switching from kernel to app (receive notification), then to kernel (do the IO syscall) then to app (finishing IO). The app can choose to not do the IO after receiving notification. This works well with Rust future cancellation. In io_uring, the OS directly finish IO (write to buffer) then tell the app. It's just a context switch from kernel to app (it's faster than epoll's kernel-to-app-to-kernel-to-app). The IO is fully done by kernel. The app cannot choose to "receive notification but not do IO". When app receives notification, the IO has already been done. This doesn't work well with Rust async cancellation. Make the future non-cancellable. Rust doesn't yet have linear type (must-move type) so this cannot be guaranteed by language. Make the buffer heap-allocated. When future is dropped, the buffer can still exist, kernel can write to it without violating memory safety. Avoid creating an in-place buffer like . The buffer will directly be in the future. When calling another async function, firstly box that future then await on it. If not boxed, the sub-future will be directly put inside parent future. Making async code call sync code is easy, but has risk of blocking scheduler thread, as mentioned previously. Making sync code call async is not easy. It requires using async runtime's API. But it's less risky. For scraper, if concurrency is too high, it may use too much memory then OOM. If it sends too many concurrent requests to a remote server, it may trigger rate limit then most requests fail. The "fully owned" here means not just ownership in Rust semantics. The has internal data structures. The "fully owned" applies to these internal data structures. One async task fully own the means the internal data structure (that contains reference count) is only accessible from one async task. ↩

0 views
Susam Pal 1 months ago

Touch Typing Number Keys

I learnt touch typing about two decades ago when I was still at university. Although I took some typewriter lessons as a child, those lessons did not stick with me. It was at university, when I found a Java applet-based touch typing tutor on the web, that I really learnt to touch type. Since then, touch typing has been an important part of my computing life. I've sometimes read arguments on the web downplaying touch typing as a skill, with claims like 'typing isn't the bottleneck, thinking is'. While that may be true, I still consider touch typing a useful skill, since it makes writing documents, code and email feel much more fluid and pleasant. It's like playing a musical instrument with the correct technique, rather than simply getting by without it. One feels smooth and expressive and the other feels raw and laboured. Later in life, I also wrote a tool named QuickQWERTY so that I could share the joy of touch typing with my friends. The tool teaches typing only with the QWERTY layout. I wrote it at a time when I did not know much about the computing world, so I was not even aware that other keyboard layouts existed. As a result, only QWERTY is supported. The tool is free and open source, so motivated individuals can modify the lessons to support other keyboard layouts. Some people have indeed done so over the years. Several of my friends used this tool. I know at least a few who benefitted from it and shared similar sentiments about how touch typing made their computing experience smoother. Back in my university days, I had learnt a method in which the digits 1 and 2 are typed with the left little finger, 3 with the left ring finger and so on. In this approach, the digits 1 to 6 are typed with the left hand and 7 to 0 with the right. There is an alternative method in which only 1 is typed with the left little finger, 2 with the left ring finger and so on. In this approach, the digits 1 to 5 are typed with the left hand and 6 to 0 with the right. Both methods require typing 1 with the left little finger. I have often felt that this may not be the most efficient way to type 1 . The little finger is shorter than the others and reaching 1 often requires shifting the whole hand slightly diagonally upwards. I have therefore felt that using the left ring finger for 1 might be more comfortable. Last month, I trained myself to use the left ring finger to type both 1 and 2 . This goes against almost every typing guide out there, but I decided to forgo established practices and explore on my own to find what feels right. At first, I was sceptical about whether I would be able to learn this method, since it meant overcoming 20 years of muscle memory that I have relied on almost every day. However, developing the new muscle memory has been surprisingly easy. In fact, both the old and the new muscle memories now coexist and I can switch between them at will without much trouble. It is remarkable how the brain can store conflicting muscle memories so effortlessly. So far, I am finding this new way of typing 1 and 2 more comfortable than either of the two popular methods I described above. I will continue typing this way for the rest of this month and see how it feels. Read on website | #miscellaneous

0 views
ava's blog 1 months ago

rose ▪ bud ▪ thorn - april 2026

Reply via email Published 30 Apr, 2026 Went on vacation with friends :) Had a fun time in some museum exhibitions! I've been putting effort into getting ready for working from home a similar way to how I get ready when I go to the office, and it has helped me feeling more ready and energized for the day. I've noticed how good my face skin looks like after spending time in the sun (with SPF 50 applied!); it clears up any bumps and redness, and makes my skin look very eve and smooth, like filtered. So whenever it is sunny outside in the mornings now, I spend at least 15 minutes on the balcony, with my face in the sun, soaking that up. The new Hello Kitty Island Adventure DLC released, and I love it so much. I also bought an Usahana Build-A-Bear and a little Usahana for my bag! I love her, and she reminds me to be more colorful and whimsical. It made me happy to dress up more this month and be more silly and adventurous with my looks whenever I could. I cut myself some bangs and my wife helped trim my ends. I originally planned to leave it all untouched until October so it'll be 2 years of uninterrupted growth since I cut off all my hair into a buzzcut, but it was necessary and I wanted a change. It looks good! I now have a similar haircut to Kiki from Kiki's Delivery Service , and my wife has put on the same kind of bow on me for fun. I started going back to the gym after feeling better again. I made my balcony spring/summer ready: Cut the plants, cleaned debris, wiped the furniture and brought out the seat cushions. Also planted new flowers! Bought three pants and two blazers. Big win, because clothes shopping feels awful to me and I hate trying stuff on. One of them has a really cute button; my wife thinks it's a seashell, I think it's a ginkgo leaf without the split. You'll see it at the bottom of this post. I bought cute stickers and a notepad from dinchenix I really like. Received my Gold Member package from noyb :) Still working on cards to give out at events. I am working on a bigger blog project that will probably still take weeks, maybe months. Progress is hard and slow. I'm putting a lot of time and effort into this, lots of notes, drafts, sketches... I am starting training on the Leg Curl and the Leg Press machines on Friday. I didn't move forward in the interview process with the company I mentioned last month. Beginning of the month was hard emotionally/hormonally; I reached one month on dienogest and I felt too hot at night and felt very easily angered and wanted to cry a lot. It has gotten better in the meantime and now my mood is stable again. I found (but removed and treated) a source of mold in my home :/ I was storing the balcony seat cushions incorrectly. I was sadly harassed by an old guy while in the vacation home in Husum, who was staring into the windows and whistling at me out of nowhere. Made me feel unsafe in the home. I felt very lost at times. I wasn't sure whether I was strong enough, capable enough for the things I want to do. I fell pretty hard into a scarcity mindset, and got myself down with some comparisons, and feeling very fragile and hopeless with my illnesses. I also got disillusioned about blogging or having a blog in general; I needed to give myself the freedom of not thinking about it for a while. I no longer knew what I wanted it to be or why I wanted it to go on, and it all just felt like a fever dream I woke up from. Surely influenced by the above moods, but also, it's hard to change and grow with an existing thing sometimes. I suspect the ulcer at my ileocecal valve has come back. I went back on Prednisone for a short while for it. Currently back off of it and feeling okayish. I had some tough times and communication issues in friendships that were difficult for me to deal with (but are now resolved).

0 views
Neil Madden 1 months ago

Java sealed classes and exhaustive pattern matching

Java 17 introduced sealed classes , which allow you to explicitly list the allowed sub-types of an interface or base class. For example, here’s a toy example using a sealed interface and records (inner classes are implicitly added to the permitted sub-types if an explicit list is not given): If you are familiar with functional programming languages with algebraic datatypes, you can view this as similar to a datatype declaration in Haskell or ML: We can then use this in a simple Main class: OK, not so exciting. But one thing to note here is that we didn’t have to add a clause to the switch expression in our main method. This is because sealed classes (and enums) enable exhaustiveness checking : the compiler knows exactly what the possible cases are, and so can check if you have covered them all. If you have, then you don’t need a clause. If you forget one (and don’t have a default clause), then you get a compile-time error. This is great when you want to ensure that all uses of some type do cover all of the cases, but it does introduce a new type of breaking change: adding a new sub-type to a sealed class/interface may break consumers of that code. For example, adding a new case to our example will cause the main method to fail to compile due to the missing case. So if you export a sealed type in your API then adding a new subtype is a breaking change that would require a major version bump (if you’re following SemVer). Although Java will produce a compile-time error for a non-exhaustive switch when you compile the consumer (main in this case), it cannot do so if the consumer is not recompiled when the sealed type changes. For example, suppose that we extend our SealedType with another case: If we just recompiled SealedType.java and don’t recompile Main, then we end up with a runtime exception if we trigger the new case: Here we have the new MatchException being thrown. The Javadoc notes this potential issue with separate compilation, and also some corner-cases with nulls in patterns. So even if you were hoping that using sealed classes would statically ensure that you update all consumers when a new case is added, this is not the case unless you recompile everything. I think for me the conclusion is that sealed types are probably most useful within the implementation of a component, and are less useful when exposed in the public API that a component offers to other components (eg a library). For internal use, where you typically are going to recompile everything together, you get the nice properties of exhaustiveness checking and higher compile-time safety guarantees. But when used across module boundaries, you may just be introducing new ways to break code, often only detectable at runtime. (I discovered these subtleties when reviewing the preview support for PEM-encoded cryptographic objects , which makes exactly this mistake of baking a sealed interface into a public API and recommend clients to pattern match against that type. A predict a very high chance of breakage if they ever want to add a new case).

0 views
Manuel Moreale 1 months ago

Nicolas Solerieu

This week on the People and Blogs series we have an interview with Nicolas Solerieu, whose blog can be found at slrncl.com/blog . Tired of RSS? Read this in your browser or sign up for the newsletter . People and Blogs is supported by the "One a Month" club members. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. I’m dad, designer, cyclist, designer, texture guy – currently living in San Luis Obispo, CA. My oldest kid just learned to blow his nose. The other one is in his prime baby time. These days I day dream about bikepacking and permaculture . Born and raised in France, I landed in California in 2016. An odd mix of work ethic and ego led me to define myself through the stuff I make: all sorts of combinations of rectangles and text boxes, mostly for screens, solely because I got good enough to get paid for it. While I'm filled with gratitude for my career, I spend a humorously uncomfortable amount of time torn between ascetic ideals and pragmatism. While I’m not a technologist, I’m not a monk either. I’m way too fidgety. Time outdoors, family life, movement, and occasional meditation keep me sane. I adopted this domain name in 2016 as I didn't like having my real name spelled out in the URL, it felt weird. I bought my initial domain back in 2012: nicolas-soleri.eu , I thought it was clever. SLRNCL is a concatenation of my last name and my first name without the vowels. It's hard to remember, which is great since I'm not trying to play the SEO game. I truly started to put effort into writing in 2022. The birth of my first child probably had a lot to do with it – and getting off instagram. I couldn’t fathom the idea of being a dad with an instagram account. But I’d love for my kids to one days read the blog of their silly dad. Self-awareness and allergy to grandiosity creates a tension between craft, skepticism, and my embodied experience which I love to put into words. The blog-therapy is (still) working. It’s eating up most of my creative ego and filling my feed. Nowadays I use the default iOS notes app. I write whenever. I edit little. I used to have a notes.txt file on my desktop where I was putting down all interesting nuggets, like a wine cellar, hoping for them to mature. Instead, they mostly degenerated and created a bunch of anxiety from doing nothing of it. I breed an uncomfortably large amount of thoughts daily. Most of them are unexceptional. I cultivate poor writing hygiene because I do not want to truly get into writing. Yet, there seems to be something that keeps bringing me back to words. To tame my ego and avoid creating a generational supply of passable notes I use my blog as a graveyard. Typos are my own, I’m working on it. With AI it now feels like a mark of authenticity. Sometimes I ask my wife to proof-read, but that is rare because we end up arguing, worth it. Following the flow of life is what makes my creative juice flow. I often write on the toilet or in public parks while keeping an eye on my kids. I thrive in “white-space” time - time in between things. So I jot down notes when I’m out and about. I’m not a coffee shop person and I hate my home office . My website is home cooked. It runs mostly on PHP. I still have Jquery installed but I’m slowly removing all Javascript dependencies. I'm not a great dev and prefer to stay 5 years behind trends. My website is constrained by my skills. This has kept me grounded and covered most of my needs and ambitions. I don't recommend inspecting my code, it's really not great but decently light. Building stuff is a great way to keep myself grounded in the process. I use Inter as the only font because it's nice, plain, and open source. It will default to system font if Inter isn't available. Because I don't want to import anything custom or use CDN. I'm not better than Inter (and few out there are IMO). The site is hosted by OVH in France. I’m considering self-hosting since my house produces excess solar power. I’d use bearblog if I was not a pretentious web designer and had to start over. I recommended it to my wife , she likes it. The simplicity and authenticity of the project is lovely. That said I do not regret the torturous process of having redesigned my website tirelessly over the last decade. The process taught me a lot about myself. My domain name + hosting cost under 20 euros/year. I do not run ads or track anything - I don’t plan or change this ever. That means my website has had an incredible ROI considering the career opportunities it gave me. The many people who hired me all visited my website (and told me about it). I had some rewarding connections with internet strangers. My gratitude is larger than an html file can hold, and definitely magnitude greater than what it cost me to run my website. Money is important, and I’m a lucky bastard. I don’t have anything against people monetizing their thoughts - though I’m rarely compelled by a paywall. Digital patronage and crowdfunding seems highly relevant to get out of the social media hell realm of today. It has pitfalls, the main one being requiring mass adoption which seems highly delusional. But hope and compassion are contagious while big tech fights entropy. Social media always comes back in a different form, meanwhile, html is still there. It’s the cockroach business model. There are so many goodies out there, one link away. Sharing is fun, side projects too. In my case it took me a decade to get my head out of my own butt and realize the cost of my own ventures. I believe a lot of us are similar to me, moving through life and accumulating stuff. Cleaning up, giving up, and passing along are necessary processes. So as a closing thought I’d suggest to sit, close your eyes and think of all your stuff. If you’re comfortable with it, great. Otherwise, spring is coming. Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 138 interviews . People and Blogs is possible because kind people support it. These two crack me up and make me think: Keenan and Taylor town (already seen on P&B) Some wholesome Aussie stories: Beau Miles Maggie Appleton always gets me interested The only design blog Tobias Van Shneider I've ever read I'm a fan of Faircompanies stories and mission Not a blog but worth checking out James low audio archive

0 views
マリウス 1 months ago

KTT x 80Retros GAME 1989 Orange

I picked up the KTT x 80Retros GAME 1989 Orange switches a while ago at Funkeys , a physical brick-and-mortar mechanical keyboard store in Yongsan-gu, Seoul , and it’s my first linear switch. Given its surprisingly cheap price I really didn’t expect much from it to be honest. KTT is a name people normally associate with budget options, like Peaches , Sea Salts , and Strawberries . It’s the kind of switches that show up in beginner build guides and they are generally good stuff, but not really the kind of thing that made me stop and think about what I was typing on. However, the GAME 1989 Orange changed that perception for me, and it did it in a way I genuinely didn’t see coming. But before we get into the switch itself, we need to talk about the vibe , because the vibe is half the story here. 80Retros is a relatively young brand out of China that debuted on ZFrontier around December 2023 with an interest check for their GAME 1989 cherry-profile PBT keycap set inspired by the original Game Boy . They describe themselves as lovers of all things vintage and retro, and unlike a lot of brands that slap “retro” on things as a marketing afterthought, they actually seem to mean it. What’s remarkable is how fast they’ve moved since then. Within a few years, they went from a single keycap IC to pushing out nearly a dozen different switches across two separate manufacturers ( KTT and HMX ), along with matching keycap sets in multiple colorways. The G.O.A.T. of switch reviews himself, ThereminGoat , covered this in detail in his HMX Volume 0-T review , and the GAME timeline is pretty interesting: The original HMX -manufactured GAME 1989 switches came first, followed by what he calls the “Film Trio” (the KD200 , FJ400 , and GAME 1989 Classic ), all packaged in these absolutely gorgeous film canister-inspired containers that look like oversized Kodak rolls. The film canister thing started as a nod to the KD200 and FJ400 being camera-brand-inspired, but the community loved the packaging so much that 80Retros seemingly just kept using it for everything. Even for switches that have nothing to do with photography. The KTT -manufactured GAME 1989 Orange and Red are the newer entries in this expanding catalogue, released as part of an “Expanded Film Series” in early 2025 alongside a Silent White variant and an HMX XMAS switch. So we’re looking at a brand that is absolutely not slowing down. On paper, PC top and PA66 bottom is a pretty classic material combo. KTT has used variations of this pairing for years. What makes this switch interesting is the KT2 stem made out of their proprietary UPE blend. UPE ( ultra-high molecular weight polyethylene ) is a material that’s been showing up more and more in the switch world, but it’s one of those things where the specific manufacturer’s blend matters enormously. Keygeek ’s U4 , for example, sounds glassy and solid. KTT ’s KT2 is more dry, a bit foamy, and (this is the part I didn’t expect) it brings an audible character that I can only describe as “marble-y” . It’s not soft, but it’s not hard either. It sits in this interesting middle ground. At 4mm travel with a pole bottom-out the switch is technically a long-pole linear, but the full travel distance means it doesn’t feel like one in the snappy, sharp way that most long-poles do. The pole bottom-out is there, but it’s mellowed out by the travel length and the stem material. More on that later. Stock smoothness is good, and I mean genuinely good. Probably not HMX -tier buttery, and probably not the absolute smoothest thing I’ve tried in the recent years, but there’s a quality to the travel that feels deliberate and controlled. The factory lube is present but light. A thin coating on the bottom housing railings, some on the stem legs and leaf, and the springs seem lightly done too. There is a texture to the keystroke and some people might call it scratch, but I’m not sure that would be fair, though it’s not entirely wrong either. UPE blends can be unpredictable when paired with other housing materials. Sometimes you get something silky, sometimes you get audible friction. The KT2 blend with this PC/PA66 housing produces a slight tactile grain in the travel that I genuinely enjoy. It’s subtle enough that you won’t notice it during normal typing speed, but if you slow-press a single key at ear level, it’s there. Spring-wise, 40g actuation bottoming out at around 50g is on the lighter side, especially for me and my usual Frankenswitches . I wouldn’t call it featherweight, but if you tend to bottom out hard, you’ll definitely hit the end of the stroke with minimal effort. The springs are clean, without noticeable ping in my set. The factory lube on the springs seems to do its job. One thing to note is that there’s reportedly about a 3g variance between individual switches. I couldn’t verify that precisely, but I did notice the occasional key that felt marginally different. Not a dealbreaker for me, but if you’re the kind of person who weighs every spring in a batch, keep it in mind. As for wobble, it is present. There’s some slight vertical (north-south) wobble and maybe a touch of east-west if you go looking for it. This seems to be a known trade-off with KTT ’s newer molds. Their older switches like the Hyacinths seemingly had incredibly tight tolerances, but those molds are from a different era. KTT has been retooling to accommodate new materials like their KT2 and KT3 blends, and the fit isn’t quite as snug as the old stuff. As for films, they probably do help to tighten up the housings and I’ve read that filming the switches apparently also compresses the sound profile slightly. Personally, the wobble doesn’t bother me too much. The sound profile is where the GAME 1989 Orange gets genuinely interesting, because the sound profile is busy , and I mean that in a good way. The bottom-out is lower-pitched than you’d typically expect from a PC -topped switch. The PA66 bottom housing and the KT2 stem material seemingly pull the tone down into a territory that’s thocky without being mushy. There’s a definite pop to the keystroke, and the bottom-out has weight to it. The top-out (the return stroke) is a touch brighter, creating this slight tonal contrast between the downstroke and upstroke that gives the switch a lot of auditory dimension. There’s a lot happening acoustically at any given keystroke and none of it sounds muddied or confused. The “marble-y” quality I mentioned earlier really comes through in the sound. It’s not a wet, lubed sound, but a relatively dry and more textured one, with a character that feels… natural, in lack of better words. The slight scratch in the travel actually adds to the sound profile rather than detracting from it. The initial contact, the pole hitting bottom, the spring compression, the return remains distinct of each other and layered. Volume-wise, it’s moderate. Definitely not silent, but also not exactly loud. Slightly quieter than your average long-pole, which makes sense given the full 4mm travel and the way the KT2 material absorbs some of the impact energy. I haven’t yet tested it on any of my aluminium builds , but at least on the few keyboards Funkeys had these switches on, as well as on my Kunai , I find that the sound profile works beautifully. Having that said, these switches are definitely less ideal for quiet/public environments, like open space offices and cafes. The switches come factory lubed and they work just fine stock. I’d personally resist the urge to lube them further unless you specifically want to kill the audible scratch, which I think is part of the charm. If you do lube, know that you’re trading character for smoothness, and these are already reasonably smooth to begin with. They accept films, and filming them does seem to tighten the sound slightly with less resonance in the housing, a more compressed signature. Depending on your build and plate material, that might be exactly what you want or exactly what you don’t. Try a few with and without before committing. As for the packaging, if you buy the 35-switch sets, they come in those aforementioned film canister containers. It’s genuinely lovely and a nice touch that makes the whole experience feel considered. Not something I’d pay extra for, but it’s a detail that matters for the overall product identity. One thing to note is that the canisters open very easily. I wouldn’t walk around holding them upside down unless I’d want to play find 35 switches hidden underneath the furniture . The KTT x 80Retros GAME 1989 Orange surprised me. It’s a switch that trades the ultra-polished, frictionless perfection for something with a dry, textured, slightly scratchy keystroke that somehow comes together into a sound profile that’s warm, full, and more complex than it has any right to be at this price point. It’s not perfect. The wobble is there, and the housing tolerances aren’t as tight as the best in the business. It doesn’t feel like every other linear on the market, at least not like the ones I had the chance to try over the past years. It has character, which, in a hobby that’s increasingly crowded with technically excellent but personality-free switches, has its charm. If you want the smoothest linear available, look elsewhere. If you want something that sounds interesting, feels engaging, and comes wrapped an homage to a long gone era give the 1989 Orange a shot. I’m genuinely glad I did. Disclaimer: I’m not a switch scientist. I don’t own a force curve rig, I can’t tell you the exact durometer of the KT2 blend, and my ears are probably not calibrated to the standards of someone like ThereminGoat . This review is based on my personal experience typing on these switches across a few different boards and ultimately actively using them on my primary keyboard . Your mileage may vary based on your plate material, case, keycaps, and other factors. Take everything here as one person’s experience and use it as a starting point for your own.

0 views