Posts in Data-analysis (20 found)
DYNOMIGHT 5 days ago

Is “colorectal cancer” rising in “young people”?

(Yes, but.) Over the past few years, I’ve seen many articles about mysterious rise in colorectal cancer (CRC) in young people. There are various stories for why this might be happening: General health. Maybe modern people are unhealthy (obesity, low physical activity, diabetes, poor sleep), leading to insulin resistance and chronic inflammation, meaning faster epithelial cell proliferation and a miscalibrated immune system that fails to stop early cancers? Ultra-processed food. Maybe people are eating more ultra-processed foods that contain additives (like emulsifiers) that degrade colon mucus, allowing bacteria to contact epithelial cells and drive inflammation? Or maybe ultra-processed food has low fiber and glycemic load, leading to insulin resistance and chronic inflammation, with the problems mentioned above? Bad meat. Maybe people are eating more red and/or processed meats, which expose the colon to nitrites and secondary bile acids, which inflame the epithelium and promote chronic inflammation? The microbiome. Maybe it’s the microbiome. For example, maybe people’s guts are getting colonized by strains of E. coli that produce genotoxic colibactin . Or maybe overuse of antibiotics in early life depletes protective bacteria in the gut, allowing harmful strains to expand, e.g. strains of B. fragilis that cause inflammation, or strains of F. nucleatum that can survive in the gut and drive tumor growth ? Environmental exposures. Maybe people are getting exposed to bad stuff in the environment (microplastics, forever chemicals , pesticides, endocrine disruptors, air pollution) that does bad stuff (damages gut barrier, screws up the microbiome, disrupts hormonal signaling)? Maternal health. Maybe poor maternal health (obesity, diabetes) exposes the fetus to elevated glucose / insulin / inflammation, and these in turn program the child for a lifetime of metabolic issues and inflammation? Whatever. Maybe alcohol / smoking / painkillers / calcium / vitamin D / inflammatory bowel disease / hereditary syndromes / screening bias? None of the experts seem to agree on which of these is the culprit, so I figured that I (person with blog) should help. If you poke at these stories, most of them are individually pretty weak. It can’t all be detection bias since CRC deaths are also going up in younger people. And several proposed causes (air pollution, tobacco) have actually fallen in rich countries. Other explanation, like E. coli producing colibactin, seem biologically real, but there’s no evidence that they’re increasing over time. Still other suggested causes (microplastics, forever chemicals) are mostly mechanistic speculation at this point. Obesity, inactivity, and chronic inflammation also all seem biologically real, and they are likely increasing, but why should they specifically cause colorectal cancer in young people ? A plausible answer to that last question is that they aren’t. They’re doing it, but not specifically . This will sound pedantic, but bear with me: If you say that CRC is increasing in younger people, what exactly does that mean? After all, the set of people who qualify as young changes over time. (Ever notice that you keep getting older?) Siegel et al. (2026) plot how often CRC was found in different age groups in 1995 and in 2022. They also provide this plot of how common different types of CRC are in different age groups. At a glance, this doesn’t look so bad. If you’re young, you might think, “OK, my current risk is higher than previous generations faced at the same age, but I can look forward to decreasing rates when I’m old.” You could easily think this is good news: While there’s a relative increase when you’re young, it’s tiny compared to the absolute decrease while you’re old. Unfortunately that’s the wrong way to think about it. Downham et al. (2026) plot CRC rates in different age groups across the Anglosphere over time. Everyone I’ve shown this plot to has said it’s confusing, so let me explain: The different lines track age-bands as people born in different years move in and out of those bands. For example, in the US plot in the bottom right, the “20-25” line starts with the left-most dot showing the CRC rate for people born between 1965 and 1970 when they were 20 to 24 years old (around 1990). The next dot shows the rate for people born between 1970 and 1975 when they were 20 to 24 years old (around 1995), and so on. That figure is weird, because the lines connect different groups of people. I wanted a plot where there are lines for different birth cohorts as they age. For unknown reasons, no one seems to make such plots, and the data isn’t trivial to access. So I used a plot digitizer to click on every damned point that US figure above and then replotted it: Now the individual lines show specific groups of people tracked through time. For example, the “1932.5” line shows CRC rates for people born between 1930 and 1935, when those people were at different ages. If you look closely, you’ll notice that these rates are higher those for people born between 1940 and 1945 for all ages (where we have data). That was the pattern for a long time: Between 1920 and 1950, later generations enjoyed lower CRC rates across all phases of their lives. But between 1950 and 1960, that pattern reversed and since then later generations have had higher CRC rates at all ages . We don’t know for sure what will happen in the future. But I think it’s likely this trend will continue. Yes, if you are currently young, you face higher CRC risk than previous generations did when they were young. That’s the bad news. The other bad news is that when you are old, you may also face higher CRC risk than previous generations did when they were old. The other other bad news is that CRC isn’t the only type of cancer that’s rising in later generations. Sung et al. (2019) give this plot: These are again the confusing graphs where individual lines show age bands as different people move in and out of them. But you get the point: Lots of cancers are going up in younger people later generations, including uterine, gallbladder, kidney, liver, pancreas, and thyroid. (Their additional material contains plots for 18 other cancers.) Note that these plots have a logarithmic y-axis, meaning the changes are larger than they might appear. Moving up a quarter of the way between two vertical ticks corresponds to an increase of a factor of ≈ 1.78. If lots of cancers are becoming more common in later generations, then why is everyone talking about CRC? I think that’s because CRC in unique in that it is: For example, thyroid cancer diagnoses have skyrocketed in recent decades. But that’s partly because of more detection, and thyroid cancer is highly treatable, without clear benefits from early detection. Pancreatic cancer also seems to be increasing, but we don’t have good ways to screen for it and even if we did, we don’t have good ways to treat it. CRC is really unique in that you can save lives by telling people, “Hey! CRC is going up! You should get screened!” If you’re interested in public health, that’s the most important thing. But if you’re interested in unraveling the mystery of CRC going up, it’s important to note that CRC isn’t really unique at all. Colorectal cancer is going up in young people. Various kinds of cancer are going up in later generations. (Definitely at younger ages, possibly at all ages.) This blog endorses colorectal cancer screening . We don’t yet know if colonoscopies are better than other methods of screening (sigmoidoscopy, stool tests), but we do know that screening is better than not screening. When caught early, CRC is highly treatable, often with only surgery (no chemotherapy or radiation) and a return to normal activities within a couple weeks. increasing in later generations treatable if caught early detectable via screening

0 views

90 % of the t distribution

William Sealy Gosset was great. He improved beer at Guinness by using the statistics that existed at the time. Not happy with that, he invented new statistics to brew even better beer. The things he invented are used all over the place now, but Guinness wanted to keep him a secret weapon, so they made him publish his results under the fake name Student . One thing Gosset realised is that it is wrong to compute 90 % confidence intervals for the mean by taking the standard deviation of the sample, and assume a normal distribution , like-a-so: \[\hat{\mu} \pm 1.645 \hat{\sigma}\] (Continue reading the full article on the web.)

0 views
Lalit Maganti 1 weeks ago

Which country voted the best at Eurovision?

Eurovision was on yesterday. I’ve never been interested much in the musical side but the weird political dynamics of Eurovision voting have always fascinated me; I tune in each year just for them and somewhat snarky commentary of Graham Norton, the UK commentator. As I was watching the jury votes come in, a question popped into my head: Which country has voted the best in Eurovision? That is, which country was best at picking the eventual top 10 and in the right order? Strangely enough, while there’s plenty of work on voting blocs and bilateral biases at Eurovision, most of it asks who votes for whom ; I wanted to ask who votes accurately . I couldn’t find anyone asking the question that way, so I decided to do some data analysis myself. To begin to answer this question, I first needed to formalize what “best” even means. That is, some mathematical notion of “good” and “bad”.

0 views

Regatta Starting Stations – Chi-squared Continued

In the Henley Royal Regatta two teams at a time propel their boats up a river and compete to be first to go a distance. Teams get assigned to their starting stations – Berkshire or Buckinghamshire – at random. From there, it is a straight shot up the river, with the lane from each starting station being seemingly identical. I didn’t know any of this, but a reader reached out some time ago because they had noticed something odd about this, and they wanted to borrow me as a sounding board. Here’s the odd thing: the team that starts from the Berkshire station has won 53.5 % of the 7555 races in the historic data this reader looked at. This is highly unexpected. If teams are assigned at random, and the starting stations are practically equal, then the starting station of the winning team should be a coin flip. If we flip 7555 coins, we would never have as many as 53.5 % come up heads. (Continue reading the full article on the web.)

0 views
Armin Ronacher 3 weeks ago

Content for Content’s Sake

Language is constantly evolving, particularly in some communities. Not everybody is ready for it at all times. I, for instance, cannot stand that my community is now constantly “cooking” or “cooked”, that people in it are “locked in” or “cracked.” I don’t like it, because the use of the words primarily signals membership of a group rather than one’s individuality. But some of the changes to that language might now be coming from … machines? Or maybe not. I don’t know. I, like many others, noticed that some words keep showing up more than before, and the obvious assumption is that LLMs are at fault. What I did was take 90 days’ worth of my local coding sessions and look for medium-frequency words where their use is inflated compared to what wordfreq would assume their frequency should be. Then I looked for the more common of these words and did a Google Trends search (filtered to the US). Note that some words like “capability” are more likely going to show up in coding sessions just because of the nature of the problem, so the actual increase is much more pronounced than you would expect. You can click through it; this is what the change over time looks like. Note that these are all words from agent output in my coding sessions that are inflated compared to historical norms: The interactive word trend chart requires JavaScript. Something is going on for sure. Google Trends, in theory, reflects words that people search for. In theory, maybe agents are doing some of the Googling, but it might just be humans Googling for stuff that is LLM-generated; I don’t know. This data set might be a complete fabrication, but for all the words I checked and selected, I also saw an increase on Google Trends. So how did I select the words to check in the first place? First, I looked for the highest-frequency words. They were, as you would expect, things like “add”, “commit”, “patch”, etc. Then I had an LLM generate a word list of words that it thought were engineering-related, and I excluded them entirely from the list. Then I also removed the most common words to begin with. In the end, I ended up with the list above, plus some other ones that are internal project names. For instance, habitat and absurd , as well as some other internal code names, were heavily over-represented, and I had to remove those. As you can see, not entirely scientific. But of the resulting list of words with a high divergence compared to wordfreq, they all also showed spikes on Google Trends. There might also be explanations other than LLM generation for what is going on, but I at least found it interesting that my coding session spikes also show up as spikes on Google Trends. The choice of words is one thing; the way in which LLMs form sentences is another. It’s not hard to spot LLM-generated text, but I’m increasingly worried that I’m starting to write like an LLM because I just read so much more LLM text. The first time I became aware of this was that I used the word “substrate” in a talk I gave earlier this year. I am not sure where I picked it up, but I really liked it for what I wanted to express and I did not want to use the word “foundation”. Since then, however, I am reading this word everywhere. This, in itself, might be a case of the Baader–Meinhof phenomenon , but you can also see from the selection above that my coding agent loves substrate more than it should, and that Google Trends shows an increase. We have all been exposed to LLM-generated text now, but I feel like this is getting worse recently. A lot of the tweet replies I get and some of the Hacker News comments I see read like they are LLM-generated, and that includes people I know are real humans. It’s really messing with my brain because, on the one hand, I really want to tell people off for talking and writing like LLMs; on the other hand, maybe we all are increasingly actually writing and speaking like LLMs? I was listening to a talk recording recently (which I intentionally will not link) where the speaker used the same sentence structure that is over-represented in LLM-generated text. Yes, the speaker might have used an LLM to help him generate the talk, but at the same time, the talk sounded natural. So either it was super well-rehearsed, or it was natural. At least on Twitter, LinkedIn, and elsewhere, there is a huge desire among people to write content and be read. Shutting up is no longer an option and, as a result, people try to get reach and build their profile by engaging with anything that is popular or trending. In the same way that everybody has gazillions of Open Source projects all of a sudden, everybody has takes on everything. My inbox is a disaster of companies sending me AI-generated nonsense and I now routinely see AI-generated blog posts (or at least ones that look like they are AI-generated) being discussed in earnest on Hacker News and elsewhere. Genuine human discourse had already been an issue because of social media algorithms before, but now it has become incredibly toxic. As more and more people discover that they can use LLMs to optimize their following, they are entering an arms race with the algorithms and real genuine human signal is losing out quickly. There are entire companies now that just exist to automate sending LLM-generated shit and people evidently pay money for it. If we take into account the idea that the highest-quality content should win out, then the speed element would not matter. If a human-generated comment comes in 15 minutes after a clanker-generated one, but outperforms it by being better, then this whole LLM nonsense would show up less. But I think that LLM-generated noise actually performs really well. We see this plenty with Open Source now. Someone builds an interesting project, puts it on GitHub and within hours, there are “remixes” and “reimplementations” of that codebase. Not only that, many of those forks come with sloppy marketing websites, paid-for domains, and a whole story on socials about why this is the path to take. I have complained before that Open Source is quickly deteriorating because people now see the opportunity to build products on top of useful Open Source projects, but the underlying mechanics are the same as why we see so much LLM slop. Someone has a formed opinion (hopefully) at lunch, and then has a clanker-made post 3 minutes later. It just does not take that much time to build it. For the tweets, I think it’s worse because I suspect that some people have scripts running to mostly automate the engagement. And surely, we should hate all of this. These low-effort posts, tweets, and Open Source projects should not make it anywhere. But they do! Whatever they play into, whether in the algorithms or with human engagement, they are not punished enough for how little effort goes into them. That increases in speed and ease of access can turn into problems is a long-understood issue. ID cards are a very unpopular thing in the UK because the British are suspicious of misuse of a central database after what happened in Nazi Germany. Likewise the US has the Firearm Owners Protection Act from 1986, which also bans the US from creating a central database of gun owners. The gun-tracing methodologies that result from not having such a database look like something out of a Wes Anderson movie . We have known for a long time that certain things should not be easy, because of the misuse that happens. We know it in engineering; we know it when it comes to governmental overreach. Now we are probably going to learn the same lesson in many more situations because LLMs make almost anything that involves human text much easier. This is hitting existing text-based systems quickly. Take, for instance, the EU complaints system, which is now buckling under the pressure of AI . Or take any AI-adjacent project’s issue tracker. Pi is routinely getting AI-generated issue requests, sometimes even without the knowledge of the author . I know that’s a lot of complaining for “I am getting too many emails, shitty Twitter mentions, and GitHub issues.” I really think, though, that now that we know that it’s happening, we have to change how we interact with people who are increasingly automating themselves. Not only do they produce a lot of shitty slop that we all have to sit through; they are also influencing the world in much more insidious ways, in that they are influencing our interactions with each other. The moment I start distrusting people I otherwise trust, because they have started picking up LLM phrasing, it erodes trust all over society. You also can’t completely ban people for bad behavior, because some of this increasingly happens accidentally. You sending Polsia spam to me? You’re dead to me. You sending me an AI-generated issue request and following up with an apology five minutes later? Well, I guess mistakes happen. Yet, in many ways, what is going on and will continue to go on is unsettling. I recently talked with my friend Ben who said he forced someone to call him to continue a conversation because he was no longer convinced he was talking to a human. Not all of us have been exposed to the extreme cases of this yet, but I had a handful of interactions in which I questioned reality due to the behavior of the person on the other side. I struggle with this, and I consider myself to be pretty open to new technologies and AI in particular. But how will my children react to stuff like this? My mother? I have strong doubts that technology is going to solve this for us. The reason I don’t think technology is going to solve this for us is that while it can hide some spam and label some generated text, it won’t fix us humans. What is being damaged here are social interactions across the board: the assumption that when someone writes to you, there is a person on the other side who has put some care into the interaction. I would rather have someone ghost me or reject me than send me back some AI-generated slop. Change has to start with awareness and an unfortunate developmend is that LLMs don’t just influence the text we rea and influence the text we write, even when we don’t use htem. Given the resulting ambiguity, we need to become more aware of how easily we can turn into energy vampires when we use agents to back us up in interactions with others. Consider that every time someone reads text coming from you, they will have to increasingly have to make a judgement call if it was you, or an LLM or you and an LLM that produced the interaction. Transparency in either direction, when there is ambiguity, can help great lengths. When someone sends us undeclared slop, we need to change how we engage with them. If we care about them, we should tell them. If we don’t care about them, we should not give them visibility and not engage. When it comes to creating platforms and interfaces where text can be submitted, we need to throw more wrenches in. The fact that it was cheap for you to produce does not make it cheap for someone else to receive, and we need to find more creative ways to increase the backpressure. GitHub or whatever wants to replace it, will have a lot to improve here and some of which might be going against it’s core KPIs. More engagement is increasingly the wrong thing to look at if you want a long term healthy platform. Whatever we can do to rate-limit social interactions is something we should try: more in-person meetings, more platforms where trust has to be earned, and maybe more acceptance that sometimes the right response is no response at all. And as for AI assistence on this blog, I have an AI transparency disclaimer for a while. In this particular blog post I used Pi as an agent to help me generate the dynamic visualization and I use the agent to write the code to analyze and scrape Google Trends.

0 views
iDiallo 1 months ago

Have You Seen the New Excel?

Stop coding. Stop hiring. Stop building. While the tech world obsesses over large language models and neural networks, I discovered the real disruptor that has been hiding in plain sight. Mine was originally installed on my desktop in 1992. And now, it's about to change everything in the world. We are talking about Microsoft Excel, of course. If you haven't looked at a spreadsheet lately, you are missing the most significant leap in enterprise capability since the invention of the corporation itself. We are entering an era of No-Code where the code was never needed in the first place. My own job as a software engineer is not safe, and I'm looking forward to the future. Developers from every walk of life are afraid, and for good reasons. You hear the complaints constantly: "How can I ensure the code works? I can't possibly review a PR with a thousand files. It's unmaintainable." This is a crisis of confidence in the software engineering sector. This specific anxiety has never existed in the Excel ecosystem. Code is called code for a reason, it is meant for the machine to read, not people. In Excel, we don't worry about "reviewing pull requests." We worry about results. The spreadsheet handles the logic and you handle the business outcome. It abstracts away the complexity so you don't have to pretend to understand it. And let's talk about the intimidation factor. Have you ever opened a modern codebase? It's a labyrinth of directories, dependencies, and config files. Where do you even start? It's paralyzing. How do you get started with Excel? You double-click an icon. It opens. It is a file. It is a grid. You type. It works. The barrier to entry is non-existent, yet the ceiling is infinite. If you are getting paid a high salary, and are watching how efficient excel is, you will be terrified. Companies are realizing they don't need distinct software solutions for distinct problems. They just need a grid. We are seeing enterprises replace entire departments with a single file. That is not an exaggeration. The HR department? Replaced by an org chart linked to a payroll calculator. The supply chain team? Replaced by a real-time inventory tracker. The marketing department? Replaced by a pie chart and a mailing list. Why pay for Salesforce? A well-formatted sheet with conditional formatting is a Customer Relationship Manager (CRM). Who even knows how to write SQL? SQL is legacy. A workbook with 1 million rows is a database. Jira is redundant when you have Gantt charts generated from cell dependencies. On top of it all, it has AI. It comes equipped with Microsoft Copilot for 365 apps, not to be confused with Windows Copilot, Microsoft Copilot, Copilot for Teams, Copilot+, Copilot Chat, or Copilot Web. This is the Copilot. It sits inside your grid, ready to extrapolate trends from column D and write your VLOOKUPs for you. While other AI startups are fighting for funding rounds, this integration is already live, embedded directly into the tool that runs the global economy. You aren't hearing much about Venture Capital funding or Series A rounds when it comes to Excel. Why? Because it is already profitable. It doesn't need a roadmap to profitability because it is the roadmap. While other platforms burn cash to acquire users, Excel is the default operating system of business. It requires no adoption curve. It requires no evangelists. It requires only that you open it and have a Microsoft 365 apps subscription. Total Vertical Integration Excel is versatile. It is a text editor; you can write your novel in cell B2. It is a design tool; pixel-perfect layouts can be achieved by merging cells and removing gridlines. It is an IDE; you can write and execute VBA code directly within the environment. It handles the visual and the logical simultaneously. You can present a quarterly report to the board while the underlying formulas are calculating the ROI of the lunch break. It creates a seamless workflow where the input and the output exist in the same plane. Privacy, Scalability, and The Cloud For the enterprise client, Excel offers the ultimate flexibility. Are you concerned about data sovereignty? Run your entire global operation locally on a ThinkPad from 2012. The file sits on your hard drive, unbreachable by the cloud. Do you need to scale? Push it to the cloud. Collaborate in real-time. Ten thousand employees can edit the same cell, creating a hive mind of productivity that traditional management structures cannot compete with. Oh, if you want to add support for crypto, just add a new worksheet. Batteries are included. The Future is a Cell The economy is shifting. We are moving away from specialized labor and toward generalized grid management. If your job involves inputting data, processing data, or presenting data, Excel has already automated you. It doesn't sleep, it doesn't ask for a raise, and it doesn't make calculation errors unless you tell it to. Best of all, it doesn't hallucinate. The grid is absolute, it is infinite and the grid is the future. Learn Excel now, or get left behind. That’s what AI Hype sounds like to my ears. Yes, it’s a great tool. But I don’t think we are all gonna die and lose our jobs. The same way we didn’t die and or lose our jobs to Excel. None of these things are jokes about Excel by the way, you can run entire companies from it. I'm tempted to just start hyping it everyday until everyone gets annoyed.

0 views
Michael Lynch 2 months ago

Refactoring English: Month 15

Hi, I’m Michael. I’m a software developer and founder of small, indie tech businesses. I’m currently working on a book called Refactoring English: Effective Writing for Software Developers . Every month, I publish a retrospective like this one to share how things are going with my book and my professional life overall. At the start of each month, I declare what I’d like to accomplish. Here’s how I did against those goals: Visits and orders are down, but mainly because January was such an outlier due to “The Most Popular Blogs of Hacker News in 2025.” I got another lucky bump from the HN moderators putting “My Eighth Year as a Bootstrapped Founder” on the front page. I mentioned in January that I added regional pricing for my book. I wasn’t tracking data carefully, but just based on order notifications, it seemed like most of my orders were coming from countries outside the US, so I took a closer look at the data. The first question was: is it really true that the majority of orders use regional pricing now? It’s true. The majority of Refactoring English customers are now outside of the US. The US accounts for only 28% of orders by volume and 40% by revenue. I was also surprised to see how many customers purchase from countries like India and Brazil, where English is not the primary language, so I checked English vs. non-English primary countries: Surprisingly, the majority of orders for Refactoring English come from countries where English is not the primary language, though English-speaking countries are a small majority revenue-wise. Next question: Do readers from certain countries purchase at a higher rate than others relative to total website visitors? Wow! One out of every six readers in Kazakhstan purchases the book! I need to start advertising in Kazakhstan. Okay, the extreme Kazakhstan result is based on a single customer, so that’s probably an outlier. And I bet my website analytics undercount visitors from Kazakhstan. What if I focus on the top countries based on website visitors? The US is my top country for website visitors, but a relatively low share (0.5%) purchase the book. Indian readers purchase at the highest rate, with 2.5% of website visitors purchasing the book. Canadian readers purchase the most by revenue, with every Canadian reader giving me about $0.47 in additional book sales. Clearly, I need to start pandering more to India and Canada in the book. I could change all the Docker examples to cricket examples and look for more opportunities to praise Shopify. After the US, most website visitors come from China (5.9% of total), but I’ve had zero sales in China. At first, I thought buying ebooks was not so popular in China, but I just checked what regional discount I was offering in China and was surprised to find it was zero. I wasn’t offering a regional discount in China at all. I made two mistakes in my price generation scripts that excluded a huge number of countries: The local currency thing is silly in retrospect because I can still offer a discount and just accept payment in USD. And I’m not sure how I ended up missing so many Stripe-supported countries. I even missed Kazakhstan, my new favorite country! I was only offering regional discounts in about 39 countries. After my fixes, the list grew to 156. And within 12 hours, I got a new order from Kazakhstan. With the majority of Refactoring English readers coming from countries where English is a second language, should I adjust the book to better serve non-native speakers? A few readers have asked about English tips for non-native speakers. I’d like to tackle the subject, but I have no experience writing as a non-native speaker. I want everything in the book to be techniques I personally use rather than things I’ve heard secondhand . My best idea is to find editing clients who are non-native speakers and look for patterns in their writing to include in the book. But right now, I’d like to get the v1 finished. The beauty of an ebook is that you can keep iterating on it and find ways to improve it even after official release. I’ve been using AI for software development for about a year and a half, but there have been two major inflection points: Since December, I’ve been spending more and more time doing AI-assisted coding. It’s become an ever-increasing part of my workday and non-work time. I used to have a bad habit of checking email and social media excessively. During the past month, I’ve repeatedly had the experience of noticing that it’s 4pm, but I haven’t checked email or social media. Except it’s because I’ve fallen into an AI vortex and forgot everything else. Every month, I think, “Is this a problem?” And in the past few weeks, I’ve had to face the fact that, yes, it’s a problem. I generally start each workday by writing a schedule on a little notepad on my desk. I break the day into 30-minute blocks and write down how I’ll spend that block. Historically, I stick to the schedule when I’m disciplined. When I have less will power, I let fun tasks exceed their budgets by a block or two. With AI-assisted coding, I was getting to the point where I’d make a schedule and then completely ignore it and play with AI all day. I wouldn’t say that I have an “addiction” to AI in the way people develop addictions to drugs or alcohol, but I am letting AI-assisted coding distract me from work that I recognize is more important, like finishing my book. There are a few factors that make AI especially compelling and easy for me to get sucked into: I feel like I can integrate any technology, write in any programming language, install any tool. There used to be an annoying level of friction in using any new software, but now I can mostly just hand it to AI and ask it to figure out how to install it or debug it, and it just works. In the 90s, Bill Gates published a book called Business @ the Speed of Thought . I’ve never read it, but I keep thinking back to that book title as I use AI. It’s not literally at the speed of thought, but it’s closer than anything I ever imagined. I can have an idea for a feature, give a brief explanation to an AI agent, and see the feature materialize in minutes. Even before AI, I’d often intend to spend an hour coding and instead spent three. But there were natural limits to how long I could code. A few hours of intense dev work fries my brain, and work becomes unpleasant, unproductive, or both. With AI, you can build for hours without doing any deep thought. And even when something does require thought, AI makes it easier than ever to take on tech debt. When I’m coding myself, I don’t want to do something the ugly way because then I’m the one who has to maintain that hack. But if I’m making AI do everything, I don’t feel the pain of hacky, ugly code. One of the things that makes gambling addictive is variable rewards . Our brains are more captivated by a system that gives you $10 at random intervals than one that delivers you money on a fixed, predictable schedule. Whether intentional or not, my experience with AI agents varies wildly. Sometimes, I point it at a 2,000 line log file and it diagnoses the issue before I’ve even asked a question. Other times, I give it a simple task, and it spends the next 20 minutes aimlessly roaming my codebase. Because I don’t know if the wait will be 5 seconds or 20 minutes, I sit there staring at the agent for a minute, then compulsively check it every few minutes, then start some other AI task while I’m waiting. And then I’m cycling between multiple agents and don’t even remember what they’re all doing. One of the most maddening experiences I have with AI is when I’ve set up the AI agent to complete a long task, and I come back hours later to find the AI paused its work a few minutes after I left and asked, “Okay, the next step is to try a full build, but that will take 30-60 minutes. Would you like me to continue?” Yes! That’s why I left the task to you! It’s hard to predict exactly what effect AI will have on the software industry, but I feel confident that it will completely upend the ecosystem. We’re in the early stages of a massive shake-up. Depending on how things turn out, there are paths forward for me as a software developer, but I also think there’s at least a 20% chance that we’re in the last year or two of “software developer” being a job that requires any special knowledge or skill. It could be like what happened to elevator operators . Right now, there are a few factors that make AI-assisted development especially attractive for developers in my position: The current situation with AI can’t last. The AI bubble could burst, and I’ll have to start paying the non-subsidized, metered rate. Or AI will continue to improve to the point where I have no advantage over junior engineers or even people with no software experience. I’ve found a few techniques for getting my AI usage back to a manageable place: It turns out that most of Refactoring English ’s readers come from outside the US. I’m using AI-assisted coding too much. Result : Published “Why Improve Your Writing?” and “Improve Your Grammar Incrementally” Result : Scheduled a discussion about design reviews I only included countries where Stripe supports the local currency. Even with this filter, I accidentally omitted a lot of countries where Stripe supports the local currency. In February 2025, I started using an integrated AI agent in my code editor In December 2025, I started running AI agents with full permissions (within isolated environments) AI is helpful for junior engineers, but senior engineers are the ones who can use it best There are multiple AI companies competing heavily on price and using VC money to subsidize costs. I use flat-rate plans, but I consume the equivalent of about $4k/month in API costs, and even those rates are probably VC-subsidized. Don’t start the day with an AI project If I start with AI and then work on my book, then I’m switching from an exciting, easy task to a hard, unsexy task. If I instead start the day with an hour of writing , I’ve done my hard task for the day and don’t have to move uphill. This is challenging because I often set up long AI tasks overnight, and I’m always curious in the morning to see how they turned out. Reduce parallel AI-driven projects. Parallel work feels appealing because I can cycle between agents. In practice, I find it sucks me in too much because there’s a spinning plates mentality of some agent always needing attention. Published two new book chapters Published “Eversource EV Rebate Program Exposed Massachusetts Customer Data” and complained to the MA Department of Public Utilities Don’t start the day with an AI coding project. It’s too distracting and too hard to switch to something harder but more important. Finish Refactoring English It won’t be fully polished and edited, but I want to complete all the chapters.

0 views

Human Intuition, AI Formalization: A Real Analysis Case Study

Disclaimer - I wrote the core ideas; Claude helped flesh out and polish the article. See appendix for more on this. This is a follow up to my previous post on leaning on Claude for Lean. I’ve now worked up to chapter 8.3 in Tao’s companion. The speed is great and Claude’s capabilities continue to impress (autoformalization is possible, but not my goal). I haven’t been stuck on anything so far. I’ve also upstreamed many typos to the companion repo .

0 views
Manuel Moreale 3 months ago

Step aside, phone: week 2

Halfway through this enjoyable life experiment, and overall, I’m very pleased with the results. As I mentioned last week, I was expecting week two usage to be a bit higher compared to week one, where I went full phone-rejection mode, but I’m still pleased with how low my usage was, even though it felt like I was using the phone a lot. No huge spikes this week, didn’t need to use Google Maps a lot, so the time distribution is a lot more even, as you can see. The first three days of the week were pretty similar to the previous week. I moved my chats back on the phone, and that’s most of the time spent on screen since “social” is just the combination of Telegram, WhatsApp, and iMessage. Usage went up a bit in the second part of the week, but I consider that a “healthy” use of the phone. On Thursday, I spent 20 or so minutes setting up an app, one that I’d categorise as a life utility app, like banking or insurance apps. They do have a site, but you’re required to use the phone anyway to take pictures and other crap, so it was faster to do it on the phone. Then on Saturday, I had to use Maps as well as AllTrails to find a place out in the wild. I was trying to find a bunker that’s hidden somewhere in a forest not too far from where I live (this is a story for another time), and that’s why screen time was a bit higher than normal on that particular day. Overall, I’m very happy with how the week went. A thing I’m particularly pleased with is the fact that I have yet to consume a single piece of media on my phone since we started this experiment. So far, I have only opened the browser a couple of times, and it was always to look up something very specific, and never to mindlessly scroll through news, videos or anything like that. My content consumption on the phone is down to essentially zero. One fun side effect of this experiment is how infrequently I now charge my phone. I took this screenshot this morning before plugging it in, and apparently, the last time it was fully charged was Wednesday afternoon. I’m now charging it once every 3 or 4 days, which is pretty neat. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Ankur Sethi 3 months ago

I used a local LLM to analyze my journal entries

In 2025, I wrote 162 journal entries totaling 193,761 words. In December, as the year came to a close and I found myself in a reflective mood, I wondered if I could use an LLM to comb through these entries and extract useful insights. I’d had good luck extracting structured data from web pages using Claude, so I knew this was a task LLMs were good at. But there was a problem: I write about sensitive topics in my journal entries, and I don’t want to share them with the big LLM providers. Most of them have at least a thirty-day data retention policy, even if you call their models using their APIs, and that makes me uncomfortable. Worse, all of them have safety and abuse detection systems that get triggered if you talk about certain mental health issues. This can lead to account bans or human review of your conversations. I didn’t want my account to get banned, and the very idea of a stranger across the world reading my journal mortifies me. So I decided to use a local LLM running on my MacBook for this experiment. Writing the code was surprisingly easy. It took me a few evenings of work—and a lot of yelling at Claude Code—to build a pipeline of Python scripts that would extract structured JSON from my journal entries. I then turned that data into boring-but-serviceable visualizations. This was a fun side-project, but the data I extracted didn’t quite lead me to any new insights. That’s why I consider this a failed experiment. The output of my pipeline only confirmed what I already knew about my year. Besides, I didn’t have the hardware to run the larger models, so some of the more interesting analyses I wanted to run were plagued with hallucinations. Despite how it turned out, I’m writing about this experiment because I want to try it again in December 2026. I’m hoping I won’t repeat my mistakes again. Selfishly, I’m also hoping that somebody who knows how to use LLMs for data extraction tasks will find this article and suggest improvements to my workflow. I’ve pushed my data extraction and visualization scripts to GitHub. It’s mostly LLM-generated slop, but it works. The most interesting and useful parts are probably the prompts . Now let’s look at some graphs. I ran 12 different analyses on my journal, but I’m only including the output from 6 of them here. Most of the others produced nonsensical results or were difficult to visualize. For privacy, I’m not using any real names in these graphs. Here’s how I divided time between my hobbies through the year: Here are my most mentioned hobbies: This one is media I engaged with. There isn’t a lot of data for this one: How many mental health issues I complained about each day across the year: How many physical health issues I complained about each day across the year: The big events of 2025: The communities I spent most of my time with: Top mentioned people throughout the year: I ran all these analyses on my MacBook Pro with an M4 Pro and 48GB RAM. This hardware can just barely manage to run some of the more useful open-weights models, as long as I don’t run anything else. For running the models, I used Apple’s package . Picking a model took me longer than putting together the data extraction scripts. People on /r/LocalLlama had a lot of strong opinions, but there was no clear “best” model when I ran this experiment. I just had to try out a bunch of them and evaluate their outputs myself. If I had more time and faster hardware, I might have looked into building a small-scale LLM eval for this task. But for this scenario, I picked a few popular models, ran them on a subset of my journal entries, and picked one based on vibes. This project finally gave me an excuse to learn all the technical terms around LLMs. What’s quantization ? What does the number of parameters do? What does it mean when a model has , , , or in its name? What is a reasoning model ? What’s MoE ? What are active parameters? This was fun, even if my knowledge will be obsolete in six months. In the beginning, I ran all my scripts with Qwen 2.5 Instruct 32b at 8-bit quantization as the model. This fit in my RAM with just enough room left over for a browser, text editor, and terminal. But Qwen 2.5 didn’t produce the best output and hallucinated quite a bit, so I ran my final analyses using Llama-3.3 70B Instruct at 3bit quantization. This could just about fit in my RAM if I quit every other app and increased the amount of GPU RAM a process was allowed to use . While quickly iterating on my Python code, I used a tiny model: Qwen 3 4b Instruct quantized to 4bits. A major reason this experiment didn’t yield useful insights was that I didn’t know what questions to ask the LLM. I couldn’t do a qualitative analysis of my writing—the kind of analysis a therapist might be able to do—because I’m not a trained psychologist. Even if I could figure out the right prompts, I wouldn’t want to do this kind of work with an LLM. The potential for harm is too great, and the cost of mistakes is too high. With a few exceptions, I limited myself to extracting quantitative data only. From each journal entry, I extracted the following information: None of the models was as accurate as I had hoped at extracting this data. In many cases, I noticed hallucinations and examples from my system prompt leaking into the output, which I had to clean up afterwards. Qwen 2.5 was particularly susceptible to this. Some of the analyses (e.g. list of new people I met) produced nonsensical results, but that wasn’t really the fault of the models. They were all operating on a single journal entry at a time, so they had no sense of the larger context of my life. I couldn’t run all my journal entries through the LLM at once. I didn’t have that kind of RAM and the models didn’t have that kind of context window. I had to run the analysis one journal entry at a time. Even then, my computer choked on some of the larger entries, and I had to write my scripts in a way that I could run partial analyses or continue failed analyses. Trying to extract all the information listed above in one pass produced low-quality output. I had to split my analysis into multiple prompts and run them one at a time. Surprisingly, none of the models I tried had an issue with the instruction . Even the really tiny models had no problems following the instruction. Some of them occasionally threw in a Markdown fenced code block, but it was easy enough to strip using a regex. My prompts were divided into two parts: The task-specific prompts included detailed instructions and examples that made the structure of the JSON output clear. Every model followed the JSON schema mentioned in the prompt, and I rarely ever ran into JSON parsing issues. But the one issue I never managed to fix was the examples from the prompts leaking into the extracted output. Every model insisted that I had “dinner with Sarah” several times last year, even though I don’t know anybody by that name. This name came from an example that formed part of one of my prompts. I just had to make sure the examples I used stood out—e.g., using names of people I didn’t know at all or movies I hadn’t watched—so I could filter them out using plain old Python code afterwards. Here’s what my prompt looked like: To this prompt, I appended task-specific prompts. Here’s the prompt for extracting health issues mentioned in an entry: You can find all the prompts in the GitHub repository . The collected output from all the entries looked something like this: Since my model could only look at one journal entry at a time, it would sometimes refer to the same health issue, gratitude item, location, or travel destination using different synonyms. For example, “exhaustion” and “fatigue” should refer to the same health issue, but they would appear in the output as two different issues. My first attempt at de-duplicating these synonyms was to keep a running tally of unique terms discovered during each analysis and append them to the end of the prompt for each subsequent entry. Something like this: But this quickly led to some really strange hallucinations. I still don’t understand why. This list of terms wasn’t even that long, maybe 15-20 unique terms for each analysis. My second attempt at solving this was a separate normalization pass for each analysis. After an analysis finished running, I extracted a unique list of terms from its output file and collected them into a prompt. Then asked the LLM to produce a mapping to de-duplicate the terms. This is what the prompt looked like: There were better ways to do this than using an LLM. But you know what happens when all you have is a hammer? Yep, exactly. The normalization step was inefficient, but it did its job. This was the last piece of the puzzle. With all the extraction scripts and their normalization passes working correctly, I left my MacBook running the pipeline of scripts all day. I’ve never seen an M-series MacBook get this hot. I was worried that I’d damage my hardware somehow, but it all worked out fine. There was nothing special about this step. I just decided on a list of visualizations for the data I’d extracted, then asked Claude to write some code to generate them for me. Tweak, rinse, repeat until done. I’m underwhelmed by the results of this experiment. I didn’t quite learn anything new or interesting from the output, at least nothing I didn’t already know. This was only partly because of LLM limitations. I believe I didn’t quite know what questions to ask in the first place. What was I hoping to discover? What kinds of patterns was I looking for? What was the goal of the experiment besides producing pretty graphs? I went into the project with a cool new piece of tech to try out, but skipped the important up-front human-powered thinking work required to extract good insights from data. I neglected to sit down and design a set of initial questions I wanted to answer and assumptions I wanted to test before writing the code. Just goes to show that no amount of generative AI magic will produce good results unless you can define what success looks like. Maybe this year I’ll learn more about data analysis and visualization and run this experiment again in December to see if I can go any further. I did learn one thing from all of this: if you have access to state-of-the-art language models and know the right set of questions to ask, you can process your unstructured data to find needles in some truly massive haystacks. This allows you analyze datasets that would take human reviewers months to comb through. A great example is how the NYT monitors hundreds of podcasts every day using LLMs. For now, I’m putting a pin in this experiment. Let’s try again in December. List of things I was grateful for, if any List of hobbies or side-projects mentioned List of locations mentioned List of media mentioned (including books, movies, games, or music) A boolean answer to whether it was a good or bad day for my mental health List of mental health issues mentioned, if any A boolean answer to whether it was a good or bad day for my physical health List of physical health issues mentioned, if any List of things I was proud of, if any List of social activities mentioned Travel destinations mentioned, if any List of friends, family members, or acquaintances mentioned List of new people I met that day, if any A “core” prompt that was common across analyses Task-specific prompts for each analysis

0 views
James Stanley 3 months ago

Evidence of absence

"Absence of evidence is not evidence of absence", they say. They're wrong. In this post we'll work through a scenario and show that absence of evidence is in fact evidence of absence. You can see Yudkowsky for more on this topic. You have a box with 100 bags in it. Each bag has 100 balls in it. 99 of the bags have 100 white balls, the final bag has 99 white balls and 1 black ball. (Assume all of the bags are indistinguishable from the outside, and all of the balls are the same size and weight etc., and the black ball is at a random position within its bag). What is the probability that a bag, selected uniformly at random, contains the black ball? You'll give your first probability estimate before the bag is opened, and then balls will be removed from the bag one by one and you can revise your estimate after seeing each ball. What is your strategy? Your lab assistant rummages around the box and selects a bag uniformly at random. You should agree that this bag contains the black ball with 1% probability, because you know that 1 in 100 bags contain the black ball. Your lab assistant takes out the first 99 balls. You observe that they're all white. This observation is merely an absence of evidence of the black ball, and, so they say, it is therefore not evidence of absence of the black ball, and your probability estimate should be unchanged, at a 1% chance that the bag contains the black ball. But while the 99 bags that only contain white balls would provide this observation every time, the 1 bag that contains the black ball would provide this observation only 1 time out of 100. So now you ought to agree that the probability that the bag contains the black ball is now only about 0.01%, not 1%, and we see that the ongoing "absence of evidence" of the black ball was in fact evidence of its absence after all. Your lab assistant pulls out the final ball. It's white .

0 views
Gabriel Weinberg 3 months ago

Simulating likely 2026 World Cup matchups (for all matches)

I’ve been using Cursor for coding for some time, but I finally gave Claude Code a try for this short side project: simulating the 2026 World Cup bracket to predict likely matchups for all matches, which is useful when considering which matches to potentially go to. Methodology: Start with the official World Cup tournament schedule (including yet-to-be played playoff matches) Blend Elo rankings with FIFA rankings (50/50) Use the Elo formulas to probabilistically predict winners (assuming no draws, even in group stage) Run one million individual simulations of the full tournament (it reaches diminishing returns around 50K, but hey, why not!) Run again with a home field advantage boost (+180 Elo) for the U.S., Canada, and Mexico based on prior World Cup outcomes Count up who participated in each match Some interesting findings (at least to me as a U.S. fan) are below, followed by a rundown for every match (in reverse order). Big Disclaimer 1 : The above is of course a gross simplification of the actual tournament. For example, it doesn’t take into account team matchup histories, game models, etc. etc. I do think, however, it is useful enough for the designed purpose of generally predicting likely match participants. Big Disclaimer 2 : I did a lot of output validation so I think the results are largely accurate (to the extent they can be given Big Disclaimer 1). However, I didn’t write or review every line of code, so it is likely there are still some bugs in there. If you think you see anything that seems off, let me know and I’ll try to track it down (and update anything if necessary). Aside on Claude code: Like many others, I found this process both productive and frustrating. It was definitely faster than I could have done it alone, but Claude kept forgetting basic context, and was way overconfident in the accuracy of the results. That is, many rounds of validation at every stage of output was absolutely necessary despite Claude saying things were good. I couldn’t trust its word at all. HA+ = with home field advantage (anytime this comes into play there is a + next to the team name) HA- = without home field advantage Here’s a visualization of the above made by a reader (thanks!) Start with the official World Cup tournament schedule (including yet-to-be played playoff matches) Blend Elo rankings with FIFA rankings (50/50) Use the Elo formulas to probabilistically predict winners (assuming no draws, even in group stage) Run one million individual simulations of the full tournament (it reaches diminishing returns around 50K, but hey, why not!) Run again with a home field advantage boost (+180 Elo) for the U.S., Canada, and Mexico based on prior World Cup outcomes Count up who participated in each match

0 views
Sean Goedecke 4 months ago

How does AI impact skill formation?

Two days ago, the Anthropic Fellows program released a paper called How AI Impacts Skill Formation . Like other papers on AI before it, this one is being treated as proof that AI makes you slower and dumber. Does it prove that? The structure of the paper is sort of similar to the 2025 MIT study Your Brain on ChatGPT . They got a group of people to perform a cognitive task that required learning a new skill: in this case, the Python Trio library. Half of those people were required to use AI and half were forbidden from using it. The researchers then quizzed those people to see how much information they retained about Trio. The banner result was that AI users did not complete the task faster, but performed much worse on the quiz . If you were so inclined, you could naturally conclude that any perceived AI speedup is illusory, and the people who are using AI tooling are cooking their brains. But I don’t think that conclusion is reasonable. To see why, let’s look at Figure 13 from the paper: The researchers noticed half of the AI-using cohort spent most of their time literally retyping the AI-generated code into their solution, instead of copy-pasting or “manual coding”: writing their code from scratch with light AI guidance. If you ignore the people who spent most of their time retyping, the AI-users were 25% faster. I confess that this kind of baffles me. What kind of person manually retypes AI-generated code? Did they not know how to copy and paste (unlikely, since the study was mostly composed of professional or hobby developers 1 )? It certainly didn’t help them on the quiz score. The retypers got the same (low) scores as the pure copy-pasters. In any case, if you know how to copy-paste or use an AI agent, I wouldn’t use this paper as evidence that AI will not be able to speed you up. Even if AI use offers a 25% speedup, is that worth sacrificing the opportunity to learn new skills? What about the quiz scores? Well, first we should note that the AI users who used the AI for general questions but wrote all their own code did fine on the quiz . If you look at Figure 13 above, you can see that those AI users averaged maybe a point lower on the quiz - not bad, for people working 25% faster. So at least some kinds of AI use seem fine. But of course much current AI use is not like this: if you’re using Claude Code or Copilot agent mode, you’re getting the AI to do the code writing for you. Are you losing key skills by doing that? Well yes, of course you are. If you complete a task in ten minutes by throwing it at a LLM, you will learn much less about the codebase than if you’d spent an hour doing it by hand. I think it’s pretty silly to deny this: it’s intuitively right, and anybody who has used AI agents extensively at work can attest to it from their own experience. Still, I have two points to make about this. First, software engineers are not paid to learn about the codebase . We are paid to deliver business value (typically by delivering working code). If AI can speed that up dramatically, avoiding it makes you worse at your job, even if you’re learning more efficiently. That’s a bit unfortunate for us - it was very nice when we could get much better at the job simply by doing it more - but that doesn’t make it false. Other professions have been dealing with this forever. Doctors are expected to spend a lot of time in classes and professional development courses, learning how to do their job in other ways than just doing it. It may be that future software engineers will need to spend 20% of their time manually studying their codebases: not just in the course of doing some task (which could be far more quickly done by AI agents) but just to stay up-to-date enough that their skills don’t atrophy. The other point I wanted to make is that even if your learning rate is slower, moving faster means you may learn more overall . Suppose using AI meant that you learned only 75% as much as non-AI programmers from any given task. Whether you’re learning less overall depends on how many more tasks you’re doing . If you’re working faster, the loss of learning efficiency may be balanced out by volume. I don’t know if this is true. I suspect there really is no substitute for painstakingly working through a codebase by hand. But the engineer who is shipping 2x as many changes is probably also learning things that the slower, manual engineer does not know. At minimum, they’ll be acquiring a greater breadth of knowledge of different subsystems, even if their depth suffers. Anyway, the point is simply that a lower learning rate does not by itself prove that less learning is happening overall. Finally, I will reluctantly point out that the model used for this task was GPT-4o (see section 4.1). I’m reluctant here because I sympathize with the AI skeptics, who are perpetually frustrated by the pro-AI response of “well, you just haven’t tried the right model”. In a world where new AI models are released every month or two, demanding that people always study the best model makes it functionally impossible to study AI use at all. Still, I’m just kind of confused about why GPT-4o was chosen. This study was funded by Anthropic, who have much better models. This study was conducted in 2025 2 , at least six months after the release of GPT-4o (that’s like five years in AI time). I can’t help but wonder if the AI-users cohort would have run into fewer problems with a more powerful model. I don’t have any real problem with this paper. They set out to study how different patterns of AI use affect learning, and their main conclusion - that pure “just give the problem to the model” AI use means you learn a lot less - seems correct to me. I don’t like their conclusion that AI use doesn’t speed you up, since it relies on the fact that 50% of their participants spent their time literally retyping AI code . I wish they’d been more explicit in the introduction that this was the case, but I don’t really blame them for the result - I’m more inclined to blame the study participants themselves, who should have known better. Overall, I don’t think this paper provides much new ammunition to the AI skeptic. Like I said above, it doesn’t support the point that AI speedup is a mirage. And the point it does support (that AI use means you learn less) is obvious. Nobody seriously believes that typing “build me a todo app” into Claude Code means you’ll learn as much as if you built it by hand. That said, I’d like to see more investigation into long-term patterns of AI use in tech companies. Is the slower learning rate per-task balanced out by the higher rate of task completion? Can it be replaced by carving out explicit time to study the codebase? It’s probably too early to answer these questions - strong coding agents have only been around for a handful of months - but the answers may determine what it’s like to be a software engineer for the next decade. See Figure 17. I suppose the study doesn’t say that explicitly, but the Anthropic Fellows program was only launched in December 2024, and the paper was published in January 2026. See Figure 17. ↩ I suppose the study doesn’t say that explicitly, but the Anthropic Fellows program was only launched in December 2024, and the paper was published in January 2026. ↩

0 views
Manuel Moreale 4 months ago

How You Read My Content

A week ago, after chatting with Kev about his own findings , I created a similar survey (which is still open if you want to answer it) to collect a second set of data because why the heck not. Kev’s data showed that 84.5% of responses picked RSS, Fediverse was second at 7.6%, direct visits to the site were third at 5.4%, and email was last at 2.4%. My survey has a slightly different set of options and allows for multiple choices—which is why the % don’t add up to 100—but the results are very similar: This is the bulk of the data, but then there’s a bunch of custom, random answers, some of which were very entertaining to read: So the takeaway is: people still love and use RSS. Which makes sense, RSS is fucking awesome, and more people should use it. Since we’re talking data, I’m gonna share some more information about the numbers I have available, related to this blog and how people follow it. I don’t have analytics, and these numbers are very rough, so my advice is not to give them too much weight. 31 people in the survey said they read content in their inbox, but there are currently 103 people who are subscribed to my blog-to-inbox automated newsletter. RSS is a black box for the most part, and finding out how many people are subscribed to a feed is basically impossible. That said, some services do expose the number of people who are subscribed, and so there are ways to get at least an estimate of how big that number is. I just grabbed the latest log from my server, cleaned the data as best as I could in order to eliminate duplicates and also entries that feel like duplicates, for example: In this case, it’s obvious that those two are the same service, and at some point, one more person has signed up for the RSS. But how about these: All those IDs are different, but what should I do here? Do I keep them all? Who knows. Anyway, after cleaning up everything, keeping only requests for the main RSS feed, I’m left with 1975 subscribers, whatever that means. Are these actual people? Who knows. Running the exact same log file (it’s the NGINX access log from Jan 10th to Jan 13th at ~10AM) through Goaccess, with all the RSS entries removed, tells me the server received ~50k requests from ~8000 unique IPs. 33% of those hits are from tools whose UA is marked as “Unknown” by Goaccess. Same story when it comes to reported OS: 35% is marked as “Unknown”. Another 15% on both of those tables is “Crawlers”, which to me suggests that at least half of the traffic hitting the website directly is bots. In conclusion, is it still worth serving content via RSS? Yes. Is the web overrun by bots? Also yes. Is somebody watching me type these words? Maybe. If you have a site and are going to run a similar experiment, let me know about it, and I’ll be happy to link it here. Also, if you want some more data from my logs, let me know. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs 80.1% reads the content inside their RSS apps 23.8% uses RSS to get notified, but then read in the browser 10.7% visits the site directly 4.9% reads in their inbox. 1 person said they follow on Mastodon, and I am not on Mastodon, so 🤷‍♂️ 1 person left a very useful message in German, a language I don’t speak, which was quite amusing 1 person lives in my house and looks over my shoulder when I write A couple of people mentioned that they read on RSS but check the site every now and again because they like the website

0 views

Bayes theorem and how we talk about medical tests

We want medical tests to give us a yes or no answer: you have the disease, you're cured. We treat them this way, often. My labs came back saying I'm healthy. I have immunity. I'm sick. Absolutely concrete results. The reality is more complicated, and tests do not give you a yes or no. They give you a likelihood. And most of the time, what the results mean for me , the test taker, is not immediately obvious or intuitive. They can mean something quite the opposite of what they seem. I ran into this recently on a page about celiac disease. The Celiac Disease Foundation has a page about testing for celiac disease . On this page, they give a lot of useful information about what different tests are available, and they point to some other good resources as well. In the section about one of the tests, it says (emphasis original): The tTG-IgA test will be positive in about 93% of patients with celiac disease who are on a gluten-containing diet. This refers to the test's sensitivity , which measures how correctly it identifies those with the disease. The same test will come back negative in about 96% of healthy people without celiac disease. This is the test's specificity . This is great information, and it tells you what you need to start figuring out what your chance of celiac disease is. The next paragraph says this, however: There is also a slight risk of a false positive test result, especially for people with associated autoimmune disorders like type 1 diabetes, autoimmune liver disease, Hashimoto's thyroiditis, psoriatic or rheumatoid arthritis, and heart failure, who do not have celiac disease. And this is where things are a little misleading. It says that there is a "slight risk" of a false positive test result. What do you think of as a slight risk? For me, it's maybe somewhere around 5%, maybe 10%. The truth is, the risk of a false positive is much higher (under many circumstances). When I take a test, I want to know a couple of things. If I get a positive test result, how likely is it that I have the disease? If I get a negative test result, how likely is it that I do not have the disease? The rates of positive and negative results listed above, the sensitivity and specificity, do not tell us these directly. However, they let us to calculate this with a little more information. Bayes' theorem says that . You can read as "the probability of A conditioned on B", or the chance that A happens if we know that B happens. What this formula lets us do is figure out one conditional probability we don't yet know in terms of other ones that we do know. In our case, we would say that is having celiac disease, and is getting a positive test result. This leaves as the chance that if you get a positive test result, that you do have celiac disease, which is exactly what we want to know. To compute this, we need a few more pieces of information. We already know that is 0.93, as we were told this above. And we can find prety easily. Let's say is 0.01, since about 1 in 100 people in the US have celiac disease. Estimates vary from 1 in 200 to 1 in 50, but this will do fine. That leaves us with . We have to compute it from both possibilities. If someone who has celiac disease takes the test, they have a 93% chance of it coming back positive, but they're only 1% of the population. On the other hand, someone without celiac disease has a 4% chance of it coming back positive (96% of the time it gives a true negative), and they're 99% of the population. We use these together to find that . Now we plug it all in! . Neat, 19%! So this says that, if you get a positive test result, you have a 19% chance of having celiac disease? Yes, exactly! It's less than 1 in 5! So if you get a positive test result, you have an 80% chance of it being a false positive. This is quite a bit higher than the aforementioned "slight risk." In fact, it means that the test doesn't so much diagnose you with celiac disease as say "huh, something's going on here" and strongly suggest further testing. Now let's look at the test the other way around, too. How likely is it you don't have celiac disease if you get a negative test result? Here we'd say that A is "we don't have it" and B is "we have a negative test". Doing some other calculations, pulled out of the oven fully baked in cooking show style, we can see that . So if you get a negative test result, you have a 99.9% chance of not having the disease. This can effectively rule it out! But... We know that 7% of people who take this test and do have celiac disease will get a negative result. How does this makes sense? The truth is, things are a little bit deeper. People don't actually present with exactly a 1% chance of having celiac's disease. That would be true if you plucked a random person from the population and subjected them to a blood test. But it's not true if you go to your doctor with GI symptoms which are consistent with celiac disease! If you're being tested for celiac disease, you probably are symptomatic. So that prior probability, ? It's better as something else, but how we set it is a good question. Let's say you present with symptoms highly consistent with celiac disease, and that this gives you a 10% chance of having celiac disease and a 90% chance of it being something else, given these symptoms . This changes the probability a lot. If you get a positive test in this case, then . So now a positive test is a 72% chance of having celiac disease, instead of just 20%. And a negative test here gives you a 10% chance of a false negative, better than the 0.1% chance before. The real question is how we go from symptoms to that prior probability accurately. I spent a lot of 2024 being poked with needles and tested for various diseases while we tried to figure out what was wrong with me. Ultimately it was Lyme disease, and the diagnosis took a while because of a false negative. That false negative happened because the test was calibrated for broad population sampling, not for testing individuals presenting with symptoms already. The whole story is a lot longer, and it's for another post. But maybe, just maybe, it would've been a shorter story if we'd learned reason about probabilities and medical tests better. Things are not intuitive, but Bayes is your friend, and Bayes' theorem can show us the information we really need to know. Or, we can keep going with things how they are. I mean, I did enjoy getting to know Barbara, my phlebotomist, from all my appointments. , the probability in general that the person taking the test has celiac disease. This is also called the prior probability , as it's what we would say the probability is if we did not know anything from this computation and test. , the probability that for any given test taken, it comes back positive. , the probability that if one has celiac disease, the test will come back positive.

0 views