Posts in Data-analysis (20 found)
Manuel Moreale 1 weeks ago

Step aside, phone: week 2

Halfway through this enjoyable life experiment, and overall, I’m very pleased with the results. As I mentioned last week, I was expecting week two usage to be a bit higher compared to week one, where I went full phone-rejection mode, but I’m still pleased with how low my usage was, even though it felt like I was using the phone a lot. No huge spikes this week, didn’t need to use Google Maps a lot, so the time distribution is a lot more even, as you can see. The first three days of the week were pretty similar to the previous week. I moved my chats back on the phone, and that’s most of the time spent on screen since “social” is just the combination of Telegram, WhatsApp, and iMessage. Usage went up a bit in the second part of the week, but I consider that a “healthy” use of the phone. On Thursday, I spent 20 or so minutes setting up an app, one that I’d categorise as a life utility app, like banking or insurance apps. They do have a site, but you’re required to use the phone anyway to take pictures and other crap, so it was faster to do it on the phone. Then on Saturday, I had to use Maps as well as AllTrails to find a place out in the wild. I was trying to find a bunker that’s hidden somewhere in a forest not too far from where I live (this is a story for another time), and that’s why screen time was a bit higher than normal on that particular day. Overall, I’m very happy with how the week went. A thing I’m particularly pleased with is the fact that I have yet to consume a single piece of media on my phone since we started this experiment. So far, I have only opened the browser a couple of times, and it was always to look up something very specific, and never to mindlessly scroll through news, videos or anything like that. My content consumption on the phone is down to essentially zero. One fun side effect of this experiment is how infrequently I now charge my phone. I took this screenshot this morning before plugging it in, and apparently, the last time it was fully charged was Wednesday afternoon. I’m now charging it once every 3 or 4 days, which is pretty neat. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Ankur Sethi 2 weeks ago

I used a local LLM to analyze my journal entries

In 2025, I wrote 162 journal entries totaling 193,761 words. In December, as the year came to a close and I found myself in a reflective mood, I wondered if I could use an LLM to comb through these entries and extract useful insights. I’d had good luck extracting structured data from web pages using Claude, so I knew this was a task LLMs were good at. But there was a problem: I write about sensitive topics in my journal entries, and I don’t want to share them with the big LLM providers. Most of them have at least a thirty-day data retention policy, even if you call their models using their APIs, and that makes me uncomfortable. Worse, all of them have safety and abuse detection systems that get triggered if you talk about certain mental health issues. This can lead to account bans or human review of your conversations. I didn’t want my account to get banned, and the very idea of a stranger across the world reading my journal mortifies me. So I decided to use a local LLM running on my MacBook for this experiment. Writing the code was surprisingly easy. It took me a few evenings of work—and a lot of yelling at Claude Code—to build a pipeline of Python scripts that would extract structured JSON from my journal entries. I then turned that data into boring-but-serviceable visualizations. This was a fun side-project, but the data I extracted didn’t quite lead me to any new insights. That’s why I consider this a failed experiment. The output of my pipeline only confirmed what I already knew about my year. Besides, I didn’t have the hardware to run the larger models, so some of the more interesting analyses I wanted to run were plagued with hallucinations. Despite how it turned out, I’m writing about this experiment because I want to try it again in December 2026. I’m hoping I won’t repeat my mistakes again. Selfishly, I’m also hoping that somebody who knows how to use LLMs for data extraction tasks will find this article and suggest improvements to my workflow. I’ve pushed my data extraction and visualization scripts to GitHub. It’s mostly LLM-generated slop, but it works. The most interesting and useful parts are probably the prompts . Now let’s look at some graphs. I ran 12 different analyses on my journal, but I’m only including the output from 6 of them here. Most of the others produced nonsensical results or were difficult to visualize. For privacy, I’m not using any real names in these graphs. Here’s how I divided time between my hobbies through the year: Here are my most mentioned hobbies: This one is media I engaged with. There isn’t a lot of data for this one: How many mental health issues I complained about each day across the year: How many physical health issues I complained about each day across the year: The big events of 2025: The communities I spent most of my time with: Top mentioned people throughout the year: I ran all these analyses on my MacBook Pro with an M4 Pro and 48GB RAM. This hardware can just barely manage to run some of the more useful open-weights models, as long as I don’t run anything else. For running the models, I used Apple’s package . Picking a model took me longer than putting together the data extraction scripts. People on /r/LocalLlama had a lot of strong opinions, but there was no clear “best” model when I ran this experiment. I just had to try out a bunch of them and evaluate their outputs myself. If I had more time and faster hardware, I might have looked into building a small-scale LLM eval for this task. But for this scenario, I picked a few popular models, ran them on a subset of my journal entries, and picked one based on vibes. This project finally gave me an excuse to learn all the technical terms around LLMs. What’s quantization ? What does the number of parameters do? What does it mean when a model has , , , or in its name? What is a reasoning model ? What’s MoE ? What are active parameters? This was fun, even if my knowledge will be obsolete in six months. In the beginning, I ran all my scripts with Qwen 2.5 Instruct 32b at 8-bit quantization as the model. This fit in my RAM with just enough room left over for a browser, text editor, and terminal. But Qwen 2.5 didn’t produce the best output and hallucinated quite a bit, so I ran my final analyses using Llama-3.3 70B Instruct at 3bit quantization. This could just about fit in my RAM if I quit every other app and increased the amount of GPU RAM a process was allowed to use . While quickly iterating on my Python code, I used a tiny model: Qwen 3 4b Instruct quantized to 4bits. A major reason this experiment didn’t yield useful insights was that I didn’t know what questions to ask the LLM. I couldn’t do a qualitative analysis of my writing—the kind of analysis a therapist might be able to do—because I’m not a trained psychologist. Even if I could figure out the right prompts, I wouldn’t want to do this kind of work with an LLM. The potential for harm is too great, and the cost of mistakes is too high. With a few exceptions, I limited myself to extracting quantitative data only. From each journal entry, I extracted the following information: None of the models was as accurate as I had hoped at extracting this data. In many cases, I noticed hallucinations and examples from my system prompt leaking into the output, which I had to clean up afterwards. Qwen 2.5 was particularly susceptible to this. Some of the analyses (e.g. list of new people I met) produced nonsensical results, but that wasn’t really the fault of the models. They were all operating on a single journal entry at a time, so they had no sense of the larger context of my life. I couldn’t run all my journal entries through the LLM at once. I didn’t have that kind of RAM and the models didn’t have that kind of context window. I had to run the analysis one journal entry at a time. Even then, my computer choked on some of the larger entries, and I had to write my scripts in a way that I could run partial analyses or continue failed analyses. Trying to extract all the information listed above in one pass produced low-quality output. I had to split my analysis into multiple prompts and run them one at a time. Surprisingly, none of the models I tried had an issue with the instruction . Even the really tiny models had no problems following the instruction. Some of them occasionally threw in a Markdown fenced code block, but it was easy enough to strip using a regex. My prompts were divided into two parts: The task-specific prompts included detailed instructions and examples that made the structure of the JSON output clear. Every model followed the JSON schema mentioned in the prompt, and I rarely ever ran into JSON parsing issues. But the one issue I never managed to fix was the examples from the prompts leaking into the extracted output. Every model insisted that I had “dinner with Sarah” several times last year, even though I don’t know anybody by that name. This name came from an example that formed part of one of my prompts. I just had to make sure the examples I used stood out—e.g., using names of people I didn’t know at all or movies I hadn’t watched—so I could filter them out using plain old Python code afterwards. Here’s what my prompt looked like: To this prompt, I appended task-specific prompts. Here’s the prompt for extracting health issues mentioned in an entry: You can find all the prompts in the GitHub repository . The collected output from all the entries looked something like this: Since my model could only look at one journal entry at a time, it would sometimes refer to the same health issue, gratitude item, location, or travel destination using different synonyms. For example, “exhaustion” and “fatigue” should refer to the same health issue, but they would appear in the output as two different issues. My first attempt at de-duplicating these synonyms was to keep a running tally of unique terms discovered during each analysis and append them to the end of the prompt for each subsequent entry. Something like this: But this quickly led to some really strange hallucinations. I still don’t understand why. This list of terms wasn’t even that long, maybe 15-20 unique terms for each analysis. My second attempt at solving this was a separate normalization pass for each analysis. After an analysis finished running, I extracted a unique list of terms from its output file and collected them into a prompt. Then asked the LLM to produce a mapping to de-duplicate the terms. This is what the prompt looked like: There were better ways to do this than using an LLM. But you know what happens when all you have is a hammer? Yep, exactly. The normalization step was inefficient, but it did its job. This was the last piece of the puzzle. With all the extraction scripts and their normalization passes working correctly, I left my MacBook running the pipeline of scripts all day. I’ve never seen an M-series MacBook get this hot. I was worried that I’d damage my hardware somehow, but it all worked out fine. There was nothing special about this step. I just decided on a list of visualizations for the data I’d extracted, then asked Claude to write some code to generate them for me. Tweak, rinse, repeat until done. I’m underwhelmed by the results of this experiment. I didn’t quite learn anything new or interesting from the output, at least nothing I didn’t already know. This was only partly because of LLM limitations. I believe I didn’t quite know what questions to ask in the first place. What was I hoping to discover? What kinds of patterns was I looking for? What was the goal of the experiment besides producing pretty graphs? I went into the project with a cool new piece of tech to try out, but skipped the important up-front human-powered thinking work required to extract good insights from data. I neglected to sit down and design a set of initial questions I wanted to answer and assumptions I wanted to test before writing the code. Just goes to show that no amount of generative AI magic will produce good results unless you can define what success looks like. Maybe this year I’ll learn more about data analysis and visualization and run this experiment again in December to see if I can go any further. I did learn one thing from all of this: if you have access to state-of-the-art language models and know the right set of questions to ask, you can process your unstructured data to find needles in some truly massive haystacks. This allows you analyze datasets that would take human reviewers months to comb through. A great example is how the NYT monitors hundreds of podcasts every day using LLMs. For now, I’m putting a pin in this experiment. Let’s try again in December. List of things I was grateful for, if any List of hobbies or side-projects mentioned List of locations mentioned List of media mentioned (including books, movies, games, or music) A boolean answer to whether it was a good or bad day for my mental health List of mental health issues mentioned, if any A boolean answer to whether it was a good or bad day for my physical health List of physical health issues mentioned, if any List of things I was proud of, if any List of social activities mentioned Travel destinations mentioned, if any List of friends, family members, or acquaintances mentioned List of new people I met that day, if any A “core” prompt that was common across analyses Task-specific prompts for each analysis

0 views
James Stanley 2 weeks ago

Evidence of absence

"Absence of evidence is not evidence of absence", they say. They're wrong. In this post we'll work through a scenario and show that absence of evidence is in fact evidence of absence. You can see Yudkowsky for more on this topic. You have a box with 100 bags in it. Each bag has 100 balls in it. 99 of the bags have 100 white balls, the final bag has 99 white balls and 1 black ball. (Assume all of the bags are indistinguishable from the outside, and all of the balls are the same size and weight etc., and the black ball is at a random position within its bag). What is the probability that a bag, selected uniformly at random, contains the black ball? You'll give your first probability estimate before the bag is opened, and then balls will be removed from the bag one by one and you can revise your estimate after seeing each ball. What is your strategy? Your lab assistant rummages around the box and selects a bag uniformly at random. You should agree that this bag contains the black ball with 1% probability, because you know that 1 in 100 bags contain the black ball. Your lab assistant takes out the first 99 balls. You observe that they're all white. This observation is merely an absence of evidence of the black ball, and, so they say, it is therefore not evidence of absence of the black ball, and your probability estimate should be unchanged, at a 1% chance that the bag contains the black ball. But while the 99 bags that only contain white balls would provide this observation every time, the 1 bag that contains the black ball would provide this observation only 1 time out of 100. So now you ought to agree that the probability that the bag contains the black ball is now only about 0.01%, not 1%, and we see that the ongoing "absence of evidence" of the black ball was in fact evidence of its absence after all. Your lab assistant pulls out the final ball. It's white .

0 views
Gabriel Weinberg 4 weeks ago

Simulating likely 2026 World Cup matchups (for all matches)

I’ve been using Cursor for coding for some time, but I finally gave Claude Code a try for this short side project: simulating the 2026 World Cup bracket to predict likely matchups for all matches, which is useful when considering which matches to potentially go to. Methodology: Start with the official World Cup tournament schedule (including yet-to-be played playoff matches) Blend Elo rankings with FIFA rankings (50/50) Use the Elo formulas to probabilistically predict winners (assuming no draws, even in group stage) Run one million individual simulations of the full tournament (it reaches diminishing returns around 50K, but hey, why not!) Run again with a home field advantage boost (+180 Elo) for the U.S., Canada, and Mexico based on prior World Cup outcomes Count up who participated in each match Some interesting findings (at least to me as a U.S. fan) are below, followed by a rundown for every match (in reverse order). Big Disclaimer 1 : The above is of course a gross simplification of the actual tournament. For example, it doesn’t take into account team matchup histories, game models, etc. etc. I do think, however, it is useful enough for the designed purpose of generally predicting likely match participants. Big Disclaimer 2 : I did a lot of output validation so I think the results are largely accurate (to the extent they can be given Big Disclaimer 1). However, I didn’t write or review every line of code, so it is likely there are still some bugs in there. If you think you see anything that seems off, let me know and I’ll try to track it down (and update anything if necessary). Aside on Claude code: Like many others, I found this process both productive and frustrating. It was definitely faster than I could have done it alone, but Claude kept forgetting basic context, and was way overconfident in the accuracy of the results. That is, many rounds of validation at every stage of output was absolutely necessary despite Claude saying things were good. I couldn’t trust its word at all. HA+ = with home field advantage (anytime this comes into play there is a + next to the team name) HA- = without home field advantage Here’s a visualization of the above made by a reader (thanks!) Start with the official World Cup tournament schedule (including yet-to-be played playoff matches) Blend Elo rankings with FIFA rankings (50/50) Use the Elo formulas to probabilistically predict winners (assuming no draws, even in group stage) Run one million individual simulations of the full tournament (it reaches diminishing returns around 50K, but hey, why not!) Run again with a home field advantage boost (+180 Elo) for the U.S., Canada, and Mexico based on prior World Cup outcomes Count up who participated in each match

0 views
Sean Goedecke 1 months ago

How does AI impact skill formation?

Two days ago, the Anthropic Fellows program released a paper called How AI Impacts Skill Formation . Like other papers on AI before it, this one is being treated as proof that AI makes you slower and dumber. Does it prove that? The structure of the paper is sort of similar to the 2025 MIT study Your Brain on ChatGPT . They got a group of people to perform a cognitive task that required learning a new skill: in this case, the Python Trio library. Half of those people were required to use AI and half were forbidden from using it. The researchers then quizzed those people to see how much information they retained about Trio. The banner result was that AI users did not complete the task faster, but performed much worse on the quiz . If you were so inclined, you could naturally conclude that any perceived AI speedup is illusory, and the people who are using AI tooling are cooking their brains. But I don’t think that conclusion is reasonable. To see why, let’s look at Figure 13 from the paper: The researchers noticed half of the AI-using cohort spent most of their time literally retyping the AI-generated code into their solution, instead of copy-pasting or “manual coding”: writing their code from scratch with light AI guidance. If you ignore the people who spent most of their time retyping, the AI-users were 25% faster. I confess that this kind of baffles me. What kind of person manually retypes AI-generated code? Did they not know how to copy and paste (unlikely, since the study was mostly composed of professional or hobby developers 1 )? It certainly didn’t help them on the quiz score. The retypers got the same (low) scores as the pure copy-pasters. In any case, if you know how to copy-paste or use an AI agent, I wouldn’t use this paper as evidence that AI will not be able to speed you up. Even if AI use offers a 25% speedup, is that worth sacrificing the opportunity to learn new skills? What about the quiz scores? Well, first we should note that the AI users who used the AI for general questions but wrote all their own code did fine on the quiz . If you look at Figure 13 above, you can see that those AI users averaged maybe a point lower on the quiz - not bad, for people working 25% faster. So at least some kinds of AI use seem fine. But of course much current AI use is not like this: if you’re using Claude Code or Copilot agent mode, you’re getting the AI to do the code writing for you. Are you losing key skills by doing that? Well yes, of course you are. If you complete a task in ten minutes by throwing it at a LLM, you will learn much less about the codebase than if you’d spent an hour doing it by hand. I think it’s pretty silly to deny this: it’s intuitively right, and anybody who has used AI agents extensively at work can attest to it from their own experience. Still, I have two points to make about this. First, software engineers are not paid to learn about the codebase . We are paid to deliver business value (typically by delivering working code). If AI can speed that up dramatically, avoiding it makes you worse at your job, even if you’re learning more efficiently. That’s a bit unfortunate for us - it was very nice when we could get much better at the job simply by doing it more - but that doesn’t make it false. Other professions have been dealing with this forever. Doctors are expected to spend a lot of time in classes and professional development courses, learning how to do their job in other ways than just doing it. It may be that future software engineers will need to spend 20% of their time manually studying their codebases: not just in the course of doing some task (which could be far more quickly done by AI agents) but just to stay up-to-date enough that their skills don’t atrophy. The other point I wanted to make is that even if your learning rate is slower, moving faster means you may learn more overall . Suppose using AI meant that you learned only 75% as much as non-AI programmers from any given task. Whether you’re learning less overall depends on how many more tasks you’re doing . If you’re working faster, the loss of learning efficiency may be balanced out by volume. I don’t know if this is true. I suspect there really is no substitute for painstakingly working through a codebase by hand. But the engineer who is shipping 2x as many changes is probably also learning things that the slower, manual engineer does not know. At minimum, they’ll be acquiring a greater breadth of knowledge of different subsystems, even if their depth suffers. Anyway, the point is simply that a lower learning rate does not by itself prove that less learning is happening overall. Finally, I will reluctantly point out that the model used for this task was GPT-4o (see section 4.1). I’m reluctant here because I sympathize with the AI skeptics, who are perpetually frustrated by the pro-AI response of “well, you just haven’t tried the right model”. In a world where new AI models are released every month or two, demanding that people always study the best model makes it functionally impossible to study AI use at all. Still, I’m just kind of confused about why GPT-4o was chosen. This study was funded by Anthropic, who have much better models. This study was conducted in 2025 2 , at least six months after the release of GPT-4o (that’s like five years in AI time). I can’t help but wonder if the AI-users cohort would have run into fewer problems with a more powerful model. I don’t have any real problem with this paper. They set out to study how different patterns of AI use affect learning, and their main conclusion - that pure “just give the problem to the model” AI use means you learn a lot less - seems correct to me. I don’t like their conclusion that AI use doesn’t speed you up, since it relies on the fact that 50% of their participants spent their time literally retyping AI code . I wish they’d been more explicit in the introduction that this was the case, but I don’t really blame them for the result - I’m more inclined to blame the study participants themselves, who should have known better. Overall, I don’t think this paper provides much new ammunition to the AI skeptic. Like I said above, it doesn’t support the point that AI speedup is a mirage. And the point it does support (that AI use means you learn less) is obvious. Nobody seriously believes that typing “build me a todo app” into Claude Code means you’ll learn as much as if you built it by hand. That said, I’d like to see more investigation into long-term patterns of AI use in tech companies. Is the slower learning rate per-task balanced out by the higher rate of task completion? Can it be replaced by carving out explicit time to study the codebase? It’s probably too early to answer these questions - strong coding agents have only been around for a handful of months - but the answers may determine what it’s like to be a software engineer for the next decade. See Figure 17. I suppose the study doesn’t say that explicitly, but the Anthropic Fellows program was only launched in December 2024, and the paper was published in January 2026. See Figure 17. ↩ I suppose the study doesn’t say that explicitly, but the Anthropic Fellows program was only launched in December 2024, and the paper was published in January 2026. ↩

0 views
Manuel Moreale 1 months ago

How You Read My Content

A week ago, after chatting with Kev about his own findings , I created a similar survey (which is still open if you want to answer it) to collect a second set of data because why the heck not. Kev’s data showed that 84.5% of responses picked RSS, Fediverse was second at 7.6%, direct visits to the site were third at 5.4%, and email was last at 2.4%. My survey has a slightly different set of options and allows for multiple choices—which is why the % don’t add up to 100—but the results are very similar: This is the bulk of the data, but then there’s a bunch of custom, random answers, some of which were very entertaining to read: So the takeaway is: people still love and use RSS. Which makes sense, RSS is fucking awesome, and more people should use it. Since we’re talking data, I’m gonna share some more information about the numbers I have available, related to this blog and how people follow it. I don’t have analytics, and these numbers are very rough, so my advice is not to give them too much weight. 31 people in the survey said they read content in their inbox, but there are currently 103 people who are subscribed to my blog-to-inbox automated newsletter. RSS is a black box for the most part, and finding out how many people are subscribed to a feed is basically impossible. That said, some services do expose the number of people who are subscribed, and so there are ways to get at least an estimate of how big that number is. I just grabbed the latest log from my server, cleaned the data as best as I could in order to eliminate duplicates and also entries that feel like duplicates, for example: In this case, it’s obvious that those two are the same service, and at some point, one more person has signed up for the RSS. But how about these: All those IDs are different, but what should I do here? Do I keep them all? Who knows. Anyway, after cleaning up everything, keeping only requests for the main RSS feed, I’m left with 1975 subscribers, whatever that means. Are these actual people? Who knows. Running the exact same log file (it’s the NGINX access log from Jan 10th to Jan 13th at ~10AM) through Goaccess, with all the RSS entries removed, tells me the server received ~50k requests from ~8000 unique IPs. 33% of those hits are from tools whose UA is marked as “Unknown” by Goaccess. Same story when it comes to reported OS: 35% is marked as “Unknown”. Another 15% on both of those tables is “Crawlers”, which to me suggests that at least half of the traffic hitting the website directly is bots. In conclusion, is it still worth serving content via RSS? Yes. Is the web overrun by bots? Also yes. Is somebody watching me type these words? Maybe. If you have a site and are going to run a similar experiment, let me know about it, and I’ll be happy to link it here. Also, if you want some more data from my logs, let me know. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs 80.1% reads the content inside their RSS apps 23.8% uses RSS to get notified, but then read in the browser 10.7% visits the site directly 4.9% reads in their inbox. 1 person said they follow on Mastodon, and I am not on Mastodon, so 🤷‍♂️ 1 person left a very useful message in German, a language I don’t speak, which was quite amusing 1 person lives in my house and looks over my shoulder when I write A couple of people mentioned that they read on RSS but check the site every now and again because they like the website

0 views

Bayes theorem and how we talk about medical tests

We want medical tests to give us a yes or no answer: you have the disease, you're cured. We treat them this way, often. My labs came back saying I'm healthy. I have immunity. I'm sick. Absolutely concrete results. The reality is more complicated, and tests do not give you a yes or no. They give you a likelihood. And most of the time, what the results mean for me , the test taker, is not immediately obvious or intuitive. They can mean something quite the opposite of what they seem. I ran into this recently on a page about celiac disease. The Celiac Disease Foundation has a page about testing for celiac disease . On this page, they give a lot of useful information about what different tests are available, and they point to some other good resources as well. In the section about one of the tests, it says (emphasis original): The tTG-IgA test will be positive in about 93% of patients with celiac disease who are on a gluten-containing diet. This refers to the test's sensitivity , which measures how correctly it identifies those with the disease. The same test will come back negative in about 96% of healthy people without celiac disease. This is the test's specificity . This is great information, and it tells you what you need to start figuring out what your chance of celiac disease is. The next paragraph says this, however: There is also a slight risk of a false positive test result, especially for people with associated autoimmune disorders like type 1 diabetes, autoimmune liver disease, Hashimoto's thyroiditis, psoriatic or rheumatoid arthritis, and heart failure, who do not have celiac disease. And this is where things are a little misleading. It says that there is a "slight risk" of a false positive test result. What do you think of as a slight risk? For me, it's maybe somewhere around 5%, maybe 10%. The truth is, the risk of a false positive is much higher (under many circumstances). When I take a test, I want to know a couple of things. If I get a positive test result, how likely is it that I have the disease? If I get a negative test result, how likely is it that I do not have the disease? The rates of positive and negative results listed above, the sensitivity and specificity, do not tell us these directly. However, they let us to calculate this with a little more information. Bayes' theorem says that . You can read as "the probability of A conditioned on B", or the chance that A happens if we know that B happens. What this formula lets us do is figure out one conditional probability we don't yet know in terms of other ones that we do know. In our case, we would say that is having celiac disease, and is getting a positive test result. This leaves as the chance that if you get a positive test result, that you do have celiac disease, which is exactly what we want to know. To compute this, we need a few more pieces of information. We already know that is 0.93, as we were told this above. And we can find prety easily. Let's say is 0.01, since about 1 in 100 people in the US have celiac disease. Estimates vary from 1 in 200 to 1 in 50, but this will do fine. That leaves us with . We have to compute it from both possibilities. If someone who has celiac disease takes the test, they have a 93% chance of it coming back positive, but they're only 1% of the population. On the other hand, someone without celiac disease has a 4% chance of it coming back positive (96% of the time it gives a true negative), and they're 99% of the population. We use these together to find that . Now we plug it all in! . Neat, 19%! So this says that, if you get a positive test result, you have a 19% chance of having celiac disease? Yes, exactly! It's less than 1 in 5! So if you get a positive test result, you have an 80% chance of it being a false positive. This is quite a bit higher than the aforementioned "slight risk." In fact, it means that the test doesn't so much diagnose you with celiac disease as say "huh, something's going on here" and strongly suggest further testing. Now let's look at the test the other way around, too. How likely is it you don't have celiac disease if you get a negative test result? Here we'd say that A is "we don't have it" and B is "we have a negative test". Doing some other calculations, pulled out of the oven fully baked in cooking show style, we can see that . So if you get a negative test result, you have a 99.9% chance of not having the disease. This can effectively rule it out! But... We know that 7% of people who take this test and do have celiac disease will get a negative result. How does this makes sense? The truth is, things are a little bit deeper. People don't actually present with exactly a 1% chance of having celiac's disease. That would be true if you plucked a random person from the population and subjected them to a blood test. But it's not true if you go to your doctor with GI symptoms which are consistent with celiac disease! If you're being tested for celiac disease, you probably are symptomatic. So that prior probability, ? It's better as something else, but how we set it is a good question. Let's say you present with symptoms highly consistent with celiac disease, and that this gives you a 10% chance of having celiac disease and a 90% chance of it being something else, given these symptoms . This changes the probability a lot. If you get a positive test in this case, then . So now a positive test is a 72% chance of having celiac disease, instead of just 20%. And a negative test here gives you a 10% chance of a false negative, better than the 0.1% chance before. The real question is how we go from symptoms to that prior probability accurately. I spent a lot of 2024 being poked with needles and tested for various diseases while we tried to figure out what was wrong with me. Ultimately it was Lyme disease, and the diagnosis took a while because of a false negative. That false negative happened because the test was calibrated for broad population sampling, not for testing individuals presenting with symptoms already. The whole story is a lot longer, and it's for another post. But maybe, just maybe, it would've been a shorter story if we'd learned reason about probabilities and medical tests better. Things are not intuitive, but Bayes is your friend, and Bayes' theorem can show us the information we really need to know. Or, we can keep going with things how they are. I mean, I did enjoy getting to know Barbara, my phlebotomist, from all my appointments. , the probability in general that the person taking the test has celiac disease. This is also called the prior probability , as it's what we would say the probability is if we did not know anything from this computation and test. , the probability that for any given test taken, it comes back positive. , the probability that if one has celiac disease, the test will come back positive.

0 views
Kev Quirk 1 months ago

How You Read My Content (The Answers)

Two days ago I published a simple survey asking how you read the content I put out on this site. Here's the results of that survey. Originally I was going to leave the survey running for at least a week, but after less than 48 hours, I received an email from Zoho telling me I’d hit the monthly response limit of 500 responses. If I wanted more responses, I’d have to pay. Nah. 500 responses is enough to give me a good indication on how people consume my content, so I was good with that. Also, 500 responses in less than 48 hours is bloody brilliant. Assuming only a small proportion of readers actually responded (as that’s usually the case with these things) that means there’s a healthy number of you reading my waffle, so thank you! The survey simply asked “how do you read the content I put out on this site?” and there were a handful of options for responses: If someone selected the last option, a text field would appear asking for more info. There were a few people who used this option, but all were covered by the other options. People just wanted to add some nuance, or leave a nice message. ❤ So I updated all the something else responses to be one of the other 4 options, and here’s the results: A highly accurate pie chart Well, quite a lot, actually. It tells me that there’s loads of you fine people reading the content on this site, which is very heart-warming. It also tells me that RSS is by far the main way people consume my content. Which is also fantastic, as I think RSS is very important and should always be a first class citizen when it comes to delivering content to people. I was surprised at how small the number was for Mastodon, too. I have a fair number of followers over there (around 13,000 according to Fosstodon) so I was expecting that number to be a bigger slice of the pie. Clearly people follow me there more for the hot takes than my waffle. 🙃 This was a fun little experiment, even if it did end more quickly than I would have liked. Thanks to all ~500 of you who responded, really appreciate it. See, you don’t need analytics to get an idea of who’s reading your stuff and how. Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ You can reply to this post by email , or leave a comment . Mastodon / Fediverse Occasionally visit the site Something else

0 views
Karan Sharma 2 months ago

Logchef v1.0: The Journey to a Real Log Viewer

About eight months ago I wrote about Logchef – a log viewer I’d been building to scratch my own itch with log exploration at work. Back then it was basically a nicer way to query ClickHouse without writing raw SQL every time. Today I’m shipping v1.0, and it’s evolved into something I didn’t quite expect. Let me walk through the major features that made it to 1.0 and some of the engineering decisions behind them. In that first post, I mentioned alerting as a “roadmap” item. It always felt like the obvious next step – you find a pattern in your logs, you want to know when it happens again. But building it took longer than expected. My first attempt was a “rooms” system – a home-grown notification router with its own email, Slack, and webhook channels. I got it working, then stared at the code for notification deduplication, grouping, silencing, and escalation. All problems that Alertmanager has already solved and battle-tested in production for years. So I ripped out rooms and integrated Alertmanager instead. Now Logchef just fires alerts to Alertmanager, and you get all the routing logic – Slack, PagerDuty, email, webhooks, silencing, grouping, inhibition – without me reinventing it poorly. The workflow is simple: write a LogchefQL or SQL query, set a threshold (e.g., “fire if count > 100”), pick a frequency, configure severity and labels. Logchef runs your query on schedule, evaluates the threshold, and if it triggers, fires an alert. Alert history is stored with execution logs so you can debug why something fired (or didn’t). The query language I wrote about originally was pretty basic – just filters that compiled to SQL on the frontend. Over the months it grew into something more capable, but more importantly, I rewrote the entire parser in Go and moved it to the backend. This also opens the door for a CLI tool later – same parser, same query language, different interface. Here’s what LogchefQL looks like now: The pipe operator ( ) selects specific columns instead of : Dot notation handles nested JSON fields. If your logs have a Map column with nested data: For keys that contain dots (common in OTEL-style logs), use quoted field syntax: The original frontend parser was TypeScript. It worked, but had problems: Inconsistency : The frontend generated SQL, but the backend had no idea what that SQL meant. Validation happened in two places. Type-awareness : ClickHouse has , , , and various string types. The frontend didn’t know the schema, so it couldn’t generate optimal SQL for each column type. For a column, you want or access. For , you want . For regular , it’s a simple comparison. Debugging hell : When a query failed, was it the parser? The SQL generator? ClickHouse syntax? Everything happened client-side, invisible to server logs. The new architecture is cleaner: The backend exposes three endpoints: (returns the SQL for “View as SQL”), (real-time validation with debouncing), and (parse, validate, execute, return results). Moving parsing to the backend also made the field sidebar implementation cleaner – the same schema-aware code that generates WHERE clauses can filter field values based on your current query. If you’ve used Kibana, you know the interaction: click a field, see its top values, click a value to add it as a filter. It’s the fastest way to explore logs when you don’t know exactly what you’re looking for. Building this for ClickHouse required solving a few problems: You can’t just run on a table with billions of rows. String fields like would take forever and return millions of values. The solution is a hybrid loading strategy based on column types: Each field loads in parallel (max 4 concurrent) with a 15-second timeout. One slow or failed field doesn’t block others – you get a retry button for that specific field. The sidebar respects your current query. If you’ve filtered to , the field values update to show only values from error logs. This happens through the backend – the field values endpoint accepts the current LogchefQL query and applies it as a WHERE clause filter. Same parser, same SQL generator, consistent results. Hit Esc and it cancels the query in ClickHouse. Without this, pressing “Cancel” would just hide the spinner – the query kept running on the server, burning resources. The implementation uses ClickHouse’s query ID feature: When you hit Esc, the frontend calls a cancellation endpoint that runs: The original query returns an error, the UI clears, ClickHouse frees resources. Simple, but requires plumbing the query ID through every execution path. “Write a query that finds slowest endpoints by p99” actually works. The AI generates LogchefQL or SQL based on natural language and your table schema. Under the hood it uses go-openai , so any OpenAI-compatible endpoint works – OpenAI, Ollama, vLLM, whatever you prefer. The system prompt includes your table schema so the model knows what fields exist. There’s also an MCP server that exposes Logchef to AI assistants like Claude Desktop, Cursor, or any MCP-compatible client. Instead of context-switching between your AI chat and the log viewer, you can ask directly: The MCP server handles discovery (teams, sources, schemas), querying (full ClickHouse SQL), analysis (histograms, saved queries), and even admin operations. It’s a separate binary that runs alongside Logchef – configure it once, and your AI assistant can query your logs through natural conversation. Not everyone wants a table. The compact view is a terminal-style display that shows logs as formatted text with syntax highlighting. Denser and faster to scan for certain debugging workflows. Use in your query, and an input field appears automatically. Great for saved queries that teams want to reuse with different parameters. This was a community contribution from @songxuanqing . The implementation detects patterns in the query text and renders input fields dynamically. Logchef supports multi-tenancy with role-based access. Teams can have multiple data sources, and users can be members of multiple teams with different roles: This integrates with OIDC for SSO, so you can use your existing identity provider. Configure stuff without touching config files. The admin settings panel lets you change AI configuration, Alertmanager connection, authentication settings, and query timeouts. This was a migration from config files to database-backed settings. On first boot, Logchef seeds the database from . After that, the UI takes over and changes are stored in SQLite. Backward compatible – existing config files still work, the UI just overrides them at runtime. No more SSH-ing into production to bump a timeout. A endpoint exposes query execution times, error rates, active queries, and other operational data. There’s a pre-built Grafana dashboard for monitoring Logchef itself. Some things didn’t make the cut: Calling something “1.0” is weird. There’s no clear line where software becomes “ready.” But I’ve been using Logchef daily at work for months now, and it’s at the point where I trust it. The rough edges are mostly smoothed out. The architecture feels right. Building tools you use yourself is different. You’re the first to hit the rough edges, so you fix them. Slower than building for imaginary users, but the result is something you actually want to use. Thanks again to Kailash for the early direction (schema-agnostic was his idea), and to everyone at Zerodha who’s been using this and giving feedback. Thanks to @songxuanqing for query variables and other contributors for docs and bug fixes. Demo | Docs | GitHub | v1.0.0 Release Inconsistency : The frontend generated SQL, but the backend had no idea what that SQL meant. Validation happened in two places. Type-awareness : ClickHouse has , , , and various string types. The frontend didn’t know the schema, so it couldn’t generate optimal SQL for each column type. For a column, you want or access. For , you want . For regular , it’s a simple comparison. Debugging hell : When a query failed, was it the parser? The SQL generator? ClickHouse syntax? Everything happened client-side, invisible to server logs. LowCardinality and Enum fields : Auto-load values when the sidebar opens. These are designed for fields with limited distinct values. String fields : Require an explicit click. A badge shows the count is unknown until you ask. Complex types (Map, Array, Tuple, JSON) : Excluded. You can’t have meaningful “distinct values” for a JSON blob. “What log sources do I have access to?” “Find all 500 errors in the last hour from the web service” “Show me a histogram of log volume over the past day” “What are the most common error messages in the database logs?” Admin : Full access, can manage team members and sources Editor : Can create/edit saved queries and collections Viewer : Read-only access to query and explore logs Live tail : Streaming logs in real-time. Still on the roadmap. Dashboarding : Multiple visualizations on one page. Logchef is query-focused; for dashboards, you probably want Grafana with ClickHouse as a datasource.

0 views
A Room of My Own 2 months ago

I Journaled My TV and Movie Watching for a Year

At the beginning of this year, I started tracking how much TV and how many movies I actually watch. Not because I wanted to optimise it, cut it down, or feel bad about it - I mean, I watch what I watch. I always have. It’s often my outlet, my decompression time, and we’re also a family that watches a show together with dinner in the evening (even though I’ve spent years trying to make “sit down at the table” our family thing). I try not to track everything (I can be/have been/am a compulsive tracker of many things). But a few things feel worth paying attention to. I already journal in Day One and keep a reading journal there for books ( in addition to Goodreads) , a habit I picked up after reading this blog post by Robert Breen . Related:  Keep a Book Journal with Day One and Apple Shortcuts 10 Reasons to Use Goodreads Tracking film and television felt like a natural extension of that practice, just another way of noticing how I spend my time. And somehow, I managed to stick with it for a full year. Whenever I watched something, I logged it. For TV shows, I noted the season, number of episodes, and average episode length. For movies, I recorded the basics. At the end of the year, I dropped everything into ChatGPT to get averages and totals. The result came to about sixteen days. At first, that number felt confronting. Sixteen full days of a year spent watching tv. But here is the actual excerpt from that exercise. Average runtime: 1.8 hours ≈ 430–450 episodes total Average episode length (weighted): ~42 minutes ≈ 305–315 hours That’s roughly: 15–16 full days 7–7.5 hours per week About 1 hour per day, averaged across the year And a kind ChatGPT comment I didn’t ask for: This isn’t actually excessive — especially considering how many long-form, narrative-heavy shows you watched (The Expanse, Parenthood, Silo). That kind of viewing is closer to reading novels than mindless scrolling. It’s also very seasonal: big immersion months, then quieter gaps. Not constant, not compulsive — more intentional than it might look on paper. ONE hour a day! That’s way below average. I don’t spend much/any time on social media. I don’t scroll endlessly or fall into algorithmic rabbit holes (I am so so mindful about that). I use Reddit occasionally when I’m researching something specific, but otherwise I’m careful about where my attention goes. Most of what I watched was long-form, narrative content: films, series, documentaries; chosen more or less deliberately, not consumed by default. That distinction matters. Tracking didn’t make me watch less; it made me watch more consciously. My system isn’t particularly elegant. I don’t use templates or ratings. I usually note what I watched, who I watched it with, a few words about whether I liked it or not  and basic details pulled from Wikipedia: the year, cast, director. If something sparks my interest like an interview, a review, a long-form article, I add that too. After seeing Nuremberg at the cinema, for example, I saved a Smithsonian piece that added depth to the experience. Writing things down shifted the experience from mostly consumption to something closer to engagement. Instead of shows blurring together and disappearing, they became moments with shape and memory.  This type of journaling practice is a way of being present with my experiences rather than letting them slip by unnoticed. Everything lives in Day One, dated and accompanied by a film poster (it just looks nicer like that in Day One if I want to view it in “Media” mode). What surprised me most wasn’t the number of hours, but how reassuring the practice felt. In a digital world designed to pull our attention in every direction, simply knowing how you spend your time is grounding. Mindful consumption doesn’t require perfection or abstinence, just awareness. I’ll probably keep tracking in the coming year, maybe with a few tweaks, maybe without. In the end, this isn’t about watching less. It’s about watching well. Everything I watched in 2025 (minus whatever I watch in the next few days of 2025) Movies: Tara Road , Gladiator , Red Sparrow , Burlesque , The Whole Truth , Promising Young Woman , I, Robot TV & Series: La Palma (limited series) Movies: The Last Witch Hunter , The Day After Tomorrow , The Mountain Between Us , I Feel Pretty , The Man from Earth: Holocene , Kinda Pregnant , Bridget Jones: Mad About the Boy , The Endless , Supernova TV & Series: New Amsterdam (Season 5), The Resident (Season 6), Obsession (miniseries), Apple Cider Vinegar (miniseries), The Search for Instagram’s Worst Con Artist (docuseries), Missing You (miniseries) Movies & Documentaries: American Murder: Gabby Petito , Gifted Hands: The Ben Carson Story , Black Bag TV & Series: Fire Country (Season 1) Movies: Time Cut , The Life List , The Amateur , Lonely Planet TV & Series: Zero Day (miniseries), Adolescence (miniseries), Matlock (Season 1), Fire Country (Seasons 2 & 3), The Swarm , The Expanse (Seasons 1–6), The Big Door Prize (Season 2) Movies & Documentaries: Seen TV & Series: Silo (Seasons 1 & 2), Cobra Kai (Season 6), Disclaimer (limited series), Locke & Key (Seasons 1–3), The Witcher: Blood Origin (limited series), The Four Seasons (Season 1) Movies & Documentaries: Ocean with David Attenborough , Sweethearts , Juror #2 , A Perfect Murder , Trap TV & Series: Loot (Season 2), Running Point (Season 1), Bob Hearts Abishola Movies: Godrich , Garfield , St. Vincent , A Man Called Otto , Forgetting Sarah Marshall TV & Series: Sirens (limited series), No Good Deed (limited series), Too Much (Season 1), Untamed (Season 1), The Signal (limited series), The Diplomat (Season 2), Pulse (Season 1), Little Disasters (limited series) Movies: The Old Guard , The Twits , Dinner for Schmucks TV & Series: Ghosts (Seasons 1–4), Dark Winds (Seasons 1–2), Elsbeth (Season 2), Mayfair Witches (Seasons 1–2) Movies: The Woman in Cabin 10 , Good Boys , A Merry Little Christmas , The House of Dynamite TV & Series: The Diplomat (Season 3), Nobody Wants This (Season 2), Parenthood (Seasons 1–6), All Her Fault (miniseries) Movies: Nuremberg TV & Series: The Beast in Me (miniseries), Boots (Season 1) Average runtime: 1.8 hours ≈ 430–450 episodes total Average episode length (weighted): ~42 minutes ≈ 305–315 hours 15–16 full days 7–7.5 hours per week About 1 hour per day, averaged across the year

0 views
DYNOMIGHT 2 months ago

Good if make prior after data instead of before

They say you’re supposed to choose your prior in advance. That’s why it’s called a “prior”. First , you’re supposed to say say how plausible different things are, and then you update your beliefs based on what you see in the world. For example, currently you are—I assume—trying to decide if you should stop reading this post and do something else with your life. If you’ve read this blog before, then lurking somewhere in your mind is some prior for how often my posts are good. For the sake of argument, let’s say you think 25% of my posts are funny and insightful and 75% are boring and worthless. OK. But now here you are reading these words. If they seem bad/good, then that raises the odds that this particular post is worthless/non-worthless. For the sake of argument again, say you find these words mildly promising, meaning that a good post is 1.5× more likely than a worthless post to contain words with this level of quality. If you combine those two assumptions, that implies that the probability that this particular post is good is 33.3%. That’s true because the red rectangle below has half the area of the blue one, and thus the probability that this post is good should be half the probability that it’s bad (33.3% vs. 66.6%) It’s easiest to calculate the ratio of the odds that the post is good versus bad, namely It follows that and thus that Alternatively, if you insist on using Bayes’ equation: Theoretically, when you chose your prior that 25% of dynomight posts are good, that was supposed to reflect all the information you encountered in life before reading this post. Changing that number based on information contained in this post wouldn’t make any sense, because that information is supposed to be reflected in the second step when you choose your likelihood . Changing your prior based on this post would amount to “double-counting”. In theory, that’s right. It’s also right in practice for the above example, and for the similar cute little examples you find in textbooks. But for real problems, I’ve come to believe that refusing to change your prior after you see the data often leads to tragedy. The reason is that in real problems, things are rarely just “good” or “bad”, “true” or “false”. Instead, truth comes in an infinite number of varieties. And you often can’t predict which of these varieties matter until after you’ve seen the data. Let me show you what I mean. Say you’re wondering if there are aliens on Earth. As far as we know, there’s no reason aliens shouldn’t have emerged out of the random swirling of molecules on some other planet, developed a technological civilization, built spaceships, and shown up here. So it seems reasonable to choose a prior it’s equally plausible that there are aliens or that there are not, i.e. that Meanwhile, here on our actual world, we have lots of weird alien-esque evidence, like the Gimbal video , the Go Fast video , the FLIR1 video , the Wow! signal , government reports on unidentified aerial phenomena , and lots of pilots that report seeing “tic-tacs” fly around in physically impossible ways. Call all that stuff . If aliens weren’t here, then it seems hard to explain all that stuff. So it seems like should be some low number. On the other hand, if aliens were here, then why don’t we ever get a good image? Why are there endless confusing reports and rumors and grainy videos, but never a single clear close-up high-resolution video, and never any alien debris found by some random person on the ground? That also seems hard to explain if aliens were here. So I think should also be some low number. For the sake of simplicity, let’s call it a wash and assume that Since neither the prior nor the data see any difference between aliens and no-aliens, the posterior probability is See the problem? Observe that where the last line follows from the fact that and . Thus we have that We’re friends. We respect each other. So let’s not argue about if my starting assumptions are good. They’re my assumptions. I like them. And yet the final conclusion seems insane to me. What went wrong? Assuming I didn’t screw up the math (I didn’t), the obvious explanation is that I’m experiencing cognitive dissonance as a result of a poor decision on my part to adopt a set of mutually contradictory beliefs. Say you claim that Alice is taller than Bob and Bob is taller than Carlos, but you deny that Alice is taller than Carlos. If so, that would mean that you’re confused, not that you’ve discovered some interesting paradox. Perhaps if I believe that and that , then I must accept that . Maybe rejecting that conclusion just means I have some personal issues I need to work on. I deny that explanation. I deny it! Or, at least, I deny that’s it’s most helpful way to think about this situation. To see why, let’s build a second model. Here’s a trivial observation that turns out to be important: “There are aliens” isn’t a single thing. There could be furry aliens, slimy aliens, aliens that like synthwave music, etc. When I stated my prior, I could have given different probabilities to each of those cases. But if I had, it wouldn’t have changed anything, because there’s no reason to think that furry vs. slimy aliens would have any difference in their eagerness to travel to ape-planets and fly around in physically impossible tic-tacs. But suppose I had divided up the state of the world into these four possibilities: If I had broken things down that way, I might have chosen this prior: Now, let’s think about the empirical evidence again. It’s incompatible with , since if there were no aliens, then normal people wouldn’t hallucinate flying tic-tacs. The evidence is also incompatible with since is those kinds of aliens were around they would make their existence obvious. However, the evidence fits pretty well with and also with . So, a reasonable model would be If we combine those assumptions, now we only get a 10% posterior probability of aliens. Now the results seem non-insane. To see why, first note that since both and have near-zero probability of producing the observed data. where the second equality follows from the fact that the data is assumed to be equally likely under and It follows that I hope you are now confused. If not, let me lay out what’s strange: The priors for the two above models both say that there’s a 50% chance of aliens. The first prior wasn’t wrong , it was just less detailed than the second one. That’s weird, because the second prior seemed to lead to completely different predictions. If a prior is non-wrong and the math is non-wrong, shouldn’t your answers be non-wrong? What the hell? The simple explanation is that I’ve been lying to you a little bit. Take any situation where you’re trying to determine the truth of anything. Then there’s some space of things that could be true . In some cases, this space is finite. If you’ve got a single tritium atom and you wait a year, either the atom decays or it doesn’t. But in most cases, there’s a large or infinite space of possibilities. Instead of you just being “sick” or “not sick”, you could be “high temperature but in good spirits” or “seems fine except won’t stop eating onions”. (Usually the space of things that could be true isn’t easy to map to a small 1-D interval. I’m drawing like that for the sake of visualization, but really you should think of it as some high-dimensional space, or even an infinite dimensional space.) In the case of aliens, the space of things that could be true might include, “There are lots of slimy aliens and a small number of furry aliens and the slimy aliens are really shy and the furry aliens are afraid of squirrels.” So, in principle , what you should do is divide up the space of things that might be true into tons of extremely detailed things and give a probability to each. Often, the space of things that could be true is infinite. So theoretically, if you really want to do things by the book, what you should really do is specify how plausible each of those (infinite) possibilities is. After you’ve done that, you can look at the data. For each thing that could be true, you need to think about the probability of the data. Since there’s an infinite number of things that could be true, that’s an infinite number of probabilities you need to specify. You could picture it as some curve like this: (That’s a generic curve, not one for aliens.) To me, this is the most underrated problem with applying Bayesian reasoning to complex real-world situations: In practice, there are an infinite number of things that can be true. It’s a lot of work to specify prior probabilities for an infinite number of things. And it’s also a lot of work to specify the likelihood of your data given an infinite number of things. So what do we do in practice? We simplify, usually by limiting creating grouping the space of things that could be true into some small number of discrete categories. For the above curve, you might break things down into these four equally-plausible possibilities. Then you might estimate these data probabilities for each of those possibilities. Then you could put those together to get this posterior: That’s not bad. But it is just an approximation . Your “real” posterior probabilities correspond to these areas: That approximation was pretty good. But the reason it was good is that we started out with a good discretization of the space of things that might be true: One where the likelihood of the data didn’t vary too much for the different possibilities inside of , , , and . Imagine the likelihood of the data—if you were able to think about all the infinite possibilities one by one—looked like this: This is dangerous. The problem is that you can’t actually think about all those infinite possibilities. When you think about four four discrete possibilities, you might estimate some likelihood that looks like this: If you did that, that would lead to you underestimating the probability of , , and , and overestimating the probability of . This is where my first model of aliens went wrong. My prior was not wrong. (Not to me.) The mistake was in assigning the same value to and . Sure, I think the probability of all our alien-esque data is equally likely given aliens and given no-aliens. But that’s only true for certain kinds of aliens, and certain kinds of no-aliens. And my prior for those kinds of aliens is much lower than for those kinds of non-aliens. Technically, the fix to the first model is simple: Make lower. But the reason it’s lower is that I have additional prior information that I forgot to include in my original prior. If I just assert that is much lower than then the whole formal Bayesian thing isn’t actually doing very much—I might as well just state that I think is low. If I want to formally justify why should be lower, that requires a messy recursive procedure where I sort of add that missing prior information and then integrate it out when computing the data likelihood. Mathematically, But now I have to give a detailed prior anyway. So what was the point of starting with a simple one? I don’t think that technical fix is very good. While it’s technically correct (har-har) it’s very unintuitive. The better solution is what I did in the second model: To create a finer categorization of the space of things that might be true, such that the probability of the data is constant-ish for each term. The thing is: Such a categorization depends on the data. Without seeing the actual data in our world, I would never have predicted that we would have so many pilots that report seeing tic-tacs. So I would never have predicted that I should have categories that are based on how much people might hallucinate evidence or how much aliens like to mess with us. So the only practical way to get good results is to first look at the data to figure out what categories are important, and then to ask yourself how likely you would have said those categories were, if you hadn’t yet seen any of the evidence.

0 views
Raph Koster 3 months ago

Looking back at a pandemic simulator

It’s been six years now since the early days of the Covid pandemic. People who were paying super close attention started hearing rumors about something going on in China towards the end of 2019 — my earliest posts about it on Facebook were from November that year. Even at the time, people were utterly clueless about the mathematics of how a highly infectious virus spread. I remember spending hours writing posts on various different social media sites explaining that the Infection Fatality Rates and the R value were showing that we could be looking at millions dead. People didn’t tend to believe me: “SEVERAL MILLION DEAD! Okay, I’m done. No one is predicting that. But you made me laugh. Thanks.” You can do the math yourself. Use a low average death estimate of 0.4%. Assume 60% of the population catches it and then we reach herd immunity (which is generous): But that’s with low assumptions… It was like typing to a wall. In fact, it’s pretty likely that it still is, since these days, the discourse is all about how bad the economic and educational impact of lockdowns was — and not about the fact that if the world had acted in concert and forcefully, we could have had a much better outcome than we did. The health response was too soft , the lockdown too lenient, and as a result, we took all the hits. Of course, these days people also forget just how deadly it was and how many died, and so on. We now know that the overall IFR was probably higher than 0.4%, but very strongly tilted towards older people and those with comorbidities. We also now know that herd immunity was a pipe dream — instead we managed to get vaccines out in record time and the ordinary course of viral evolution ended up reducing the death rate until now we behave as if Covid is just a deadlier flu (it isn’t, that thinking ignores long-term impact of the disease). The upshot: my math was not that far off — the estimated toll in the US ended up being 1.2 to 1.4 million souls, and worldwide it’s estimated as between 15 and 28.5 million dead. Plenty of denial of this, these days, and plenty of folks blaming the vaccines for what are most likely issues caused by the disease in the first place. Anyway, in the midst of it all, tired of running math in my spreadsheets (yeah, I was tracking it all in spreadsheets, what can I say?), I started thinking about why only a few sorts of people were wrapping their heads around the implications. The thing they all had in common was that they lived with exponential curves. Epidemiologists, Wall Street quants, statisticians… and game designers. Could we get more people to feel the challenges in their bones? So… I posted this to Facebook on March 24th, 2020: Three weeks ago I was idly thinking of how someone ought to make a little game that shows how the coronavirus spreads, how testing changes things, and how social distancing works. The sheer number of people who don’t get it — numerate people, who ought to be able to do math — is kind of shocking. I couldn’t help worrying at it, and have just about a whole design in my head. But I have to admit, I kinda figured someone would have made it by now. But they haven’t. It’s not even a hard game to make. Little circles on a plain field. Each circle simply bounces around. They are generated each with an age, a statistically real chance of having a co-morbid condition (diabetes, hypertension, immunosuppressed, pulmonary issues…), and crucially, a name out of a baby book. They can be in one of these states: In addition, there’s a diagnosed flag. We render asymptomatic the same as healthy. We render each of the other states differently, depending on whether the diagnosed flag is set. They show as healthy until dead, if not diagnosed. If diagnosed, you can see what stage they are in (icon or color change). The circles move and bounce. If an asymptomatic one touches a healthy one, they have a statistically valid chance of infecting. Circles progress through these states using simple stats. We track current counts on all of these, and show a bar graph. Yes, that means players can see that people are getting sick, but don’t know where. The player has the following buttons. The game ticks through days at an accelerated pace. It runs for 18 months worth of days. At the end of it, you have a vaccine, and the epidemic is over. Then we tell you what percentage of your little world died. Maybe with a splash screen listing every name and age of everyone who died. And we show how much money you spent. Remember, you can go negative, and it’s OK. That’s it. Ideally, it runs in a webpage. Itch.io maybe. Or maybe I have a friend with unlimited web hosting. Luxury features would be a little ini file or options screen that lets you input real world data for your town or country: percent hypertensive, age demographics, that sort of thing. Or maybe you could crowdsource it, so it’s a pulldown… Each weekend I think about building this. So far, I haven’t, and instead I try to focus on family and mental health and work. But maybe someone else has the energy. I suspect it might persuade and save lives. Some things about this that I want to point out in hindsight. Per the American Heart Association, among adults age 20 and older in the United States, the following have high blood pressure: Per the American Diabetes Association, Per studies in JAMA, Next, realize that because the disease spreads mostly inside households (where proximity means one case tends to infect others), this means that protecting the above extremely large slices of the population means either isolating them away from their families, or isolating the entire family and other regular contacts. People tend to think the at-risk population is small. It’s not. The response, for Facebook, was pretty surprising. The post was re-shared a lot, and designers from across the industry jumped in with tweaks to the rules. Some folks re-posted it to large groups about public initiatives, etc. There was also, of course, plenty of skepticism that something like this would make any difference at all. The first to take up the challenge was John Albano, who had his game Covid Ops up and running on itch.io a mere six days later . You can still play it there! Stuck in the house and looking for things to do. Soooo, when a fellow game dev suggested a game idea and basic ruleset along with “I wish someone would make a game like this,” I took that as a challenge to try. Tonight (this morning?), the first release of COVID OPS has been published. John’s game was pretty faithful to the sketch. You can see the comorbidities over on the left, and the way the player has clicked on 72 year old Rowan — who probably isn’t going to make it. As he updated it, he added in more detailed comorbidity data, and (unfortunately, as it turns out) made it so that people were immune after recovering from infection. And of course, like the next one I’ll talk about, John made a point of including real world resource links so that people could take action. By April 6th, another team led by Khail Santia had participated in Jamdemic 2020 and developed the first version of In the Time of Pandemia. He wrote, The compound I stay at is about to be cordoned. We’ve been contact-traced by the police, swabbed by medical personnel covered in protective gear. One of our housemates works at a government hospital and tested positive for antibodies against SARS-CoV-2. The pandemic closes in from all sides. What can a game-maker do in a time like this? I’ve been asking myself this question since the beginning of community quarantine. I’m based in Cebu City, now the top hotspot for COVID-19 in the Philippines in terms of incidence proportion. This game would go on to be completed by a fuller team including a mathematical epidemiologist, and In the Time of Pandemia eventually ended up topping charts on Newgrounds when it launched there in July of 2020. This game went viral and got a ton of press across the Pacific Rim . The team worked closely with universities and doctors in the Philippines and validated all the numbers. They added local flavor to their levels representing cities and neighborhoods that their local players would know. Gregg Victor Gabison, dean of the University of San Jose-Recoletos College of Information, Computer & Communications Technology, whose students play-tested the game, said, “This is the kind of game that mindful individuals would want to check out. It has substance and a storyline that connects with reality, especially during this time of pandemic.” Not only does the game have to work on a technical basis, it has to communicate how real a crisis the pandemic is in a simple, digestible manner. Dr. Mariane Faye Acma, resident physician at Medidas Medical Clinic in Valencia, Bukidnon, was consulted to assess the game’s medical plausibility. She enumerated critical thinking, analysis, and multitasking as skills developed through this game. “You decide who are the high risks, who needs to be tested and isolated, where to focus, [and] how much funds to allocate….The game will make players realize how challenging the work of the health sector is in this crisis.” “Ultimately, the game’s purpose is to give players a visceral understanding of what it takes to flatten the curve,” Santia said. I think most people have no idea that any of this happened or that I was associated with it. I only posted the design sketch on Facebook; it got reshared across a few thousand people. It wasn’t on social media, I didn’t talk about it elsewhere, and for whatever reason, I didn’t blog about it. I have had both these games listed on my CV for a while. Oh, I didn’t do any of the heavy lifting… all credit goes to the developers for that. There’s no question that way more than 95% of the work comes after the high-level design spec. But both games do credit me, and I count them as games I worked on. A while back, someone on Reddit said it was pathetic that I listed these. I never quite know what to make of comments like that (troll much?!?). No offense, but I’m proud of what a little design sketch turned into, and proud of the work that these teams did, and proud that one of the games got written up in the press so much; ended up being used in college classrooms; was vetted and validated by multiple experts in the field; and made a difference however slight. Peak Covid was a horrendous time. Horrendous enough that we have kind of blocked it from our memories. But I lost friends and colleagues. I still remember. Back then I wrote, This is the largest event in your lifetime. It is our World War, our Great Depression. We need to rise the occasion, and think about how we change. There is no retreat to how it used to be. There is only through. A year later, the vaccine gave us that path through, and here we are now. But as I write this, we have the first human case of H5N5 bird flu; it was only a matter of time. Maybe these games helped a few people get through it all. They were played by tens of thousands, after all. Maybe they will help next time. I know that the fact that they were made helped me get through, that making them helped John get through, helped Khail get through — in his own words: In the end, the attempt to articulate a game-maker’s perspective on COVID-19 has enabled me to somehow transcend the chaos outside and the turmoil within. It’s become a welcome respite from isolation, a thread connecting me to a diversity of talents who’ve been truly generous with their expertise and encouragement. As incidences continue to rise here and in many parts of the world, our hope is that the game will be of some use in showing what it takes to flatten the curve and in advocating for communities most in need. So… at minimum, they made a real difference to at least three people. And that’s not a bad thing for a game to aspire to. 328 million people in the US. 60% of that is 196 million catch it. 0.4% of that is 780,000 dead. asymptomatic but contagious symptomatic 70% of asymptomatic cases turn symptomatic after 1d10+5 days. The others stay sick for the full 21 days. Percent chance of moving from symptomatic to severe is based on comorbid conditions, but the base chance is 1 in 5 after some amount of days. Percent chance of moving from severe to critical is 1 in 4, modified by age and comorbidities, if in hospital. Otherwise, it’s double. Percent chance of moving from critical to dead is something like 1 in 5, modified by age and comorbidities, if in hospital. Otherwise, it’s double. Symptomatic, severe, and critical circles that do not progress to dead move to ‘recovered’ after 21 days since reaching symptomatic. Severe and critical circles stop moving. Hover on a circle, and you see the circle’s name and age and any comorbidities (“Alison, 64, hypertension.”) Test . This lets them click on a circle. If the circle is asymptomatic or worse, it gets the diagnosed flag. But it costs you one test. Isolate . This lets them click on a circle, and freezes them in place. Some visual indicator shows they are isolated. Note that isolated cases still progress. Hospitalize . This moves the circle to hospital. Hospital only has so many beds. Clicking on a circle already in hospital drops the circle back out in the world. Circles in hospital have half the chance or progressing to the next stage. Buy test . You only have so many tests. You have to click this button to buy more. Buy bed . You only have this many beds. You have to click this button to buy more. Money goes up when circles move. But you are allowed to go negative for money . Lockdown. Lastly, there is a global button that when pressed, freezes 80% of all circles. But it gradually ticks down and circles individually start to move again, and the button must be pressed again from time to time. While lockdown is running, it costs money as well as not generating it. If pressed again, it lifts the lockdown and all circles can move again. At the time that I posted, I could tell that people were desperately unwilling to enter lockdown for any extended period of time; but “The Hammer and the Dance” strategy of pulsed lockdown periods was still very much in our future. I wanted a mechanic that showed population non-compliance. There was also quite a lot of obsessing over case counts at the time, and one of the things that I really wanted to get across was that our testing was so incredibly inadequate that we really had little idea of how many cases we were dealing with and therefore what the IFR (infection fatality rate) actually was. That’s why tests are limited in the design sketch. I was also trying to get across that money was not a problem in dealing with this. You could take the money value negative because governments can choose to do that. I often pointed out in those days that if the government chose, it could send a few thousand dollars to every household every few weeks for the duration of lockdown. It would likely have been less impact to the GDP and the debt than what we actually did. I wanted names. I wanted players to understand the human cost, not just the statistics. Today, I might even suggest that an LLM generate a little biography for every fatality. Another thing that was constantly missed was the impact of comorbidities. To this day, I hear people say “ah, it only affected the old and the ill, so why not have stayed open?” To which I would reply with: For non-Hispanic whites, 33.4 percent of men and 30.7 percent of women. For non-Hispanic Blacks, 42.6 percent of men and 47.0 percent of women. For Mexican Americans, 30.1 percent of men and 28.8 percent of women. 34.2 million Americans, or 10.5% of the population, have diabetes. Nearly 1.6 million Americans have type 1 diabetes, including about 187,000 children and adolescents 4.2% of of the population of the USA has been diagnosed as immunocompromised by their doctor

1 views
Gabriel Weinberg 3 months ago

China has a major working-age population advantage through at least 2075

In response to “ A U.S.-China tech tie is a big win for China because of its population advantage ,” I received feedback along the lines of shouldn’t we be looking at China’s working-age population and not their overall population? I was trying to keep it simple in that post, but yes, we should, and when we do, we find, unfortunately, that China’s population advantage still persists. Here’s the data: Source: Our World in Data According to Our World in Data, China’s working-age population is 983 million to the U.S.’s 223 million, or 4.4x. Source: Our World in Data Source: Our World in Data The projections put China’s 2050 working-age population at 745 million to the U.S.’s 232 million, or 3.2x. Source: Our World in Data Source: Our World in Data The projections put China’s 2075 working-age population at 468 million to the U.S.’s 235 million, or 2.0x. Noah Smith recently delved into this rather deeply in his post “ China’s demographics will be fine through mid-century ” noting: China’s economic might is not going to go “poof” and disappear from population aging; in fact, as I’ll explain, it probably won’t suffer significant problems from aging until the second half of this century. And even in the second half, you can’t count on their demographic decline then either, both because even by 2075 their working-age population is still projected to be double the U.S.’s under current conditions, but also because those conditions are unlikely to hold. As Noah also notes: Meanwhile, there’s an even greater danger that China’s leaders will panic over the country’s demographics and do something very rash…All in all, the narrative that demographics will tip the balance of economic and geopolitical power away from China in the next few decades seems overblown and unrealistic. Check out my earlier article for details, but here’s a summary. [A] U.S.-China tech tie is a big win for China because of its population advantage . China doesn’t need to surpass us technologically; it just needs to implement what already exists across its massive workforce. Matching us is enough for its economy to dwarf ours. If per person output were equal today, China’s economy would be over 4× America’s because China’s population is over 4× the U.S. That exact 4× outcome is unlikely given China’s declining population and the time it takes to diffuse technology, but 2 to 3× is not out of the question. China doesn’t even need to match our per-person output: their population will be over 3× ours for decades, so reaching ⅔ would still give them an economy twice our size since 3 × ⅔ = 2. …With an economy a multiple of the U.S., it’s much easier to outspend us on defense and R&D, since budgets are typically set as a share of GDP. …What if China then starts vastly outspending us on science and technology and becomes many years ahead of us in future critical technologies, such as artificial superintelligence, energy, quantum computing, humanoid robots, and space technology? That’s what the U.S. was to China just a few decades ago, and China runs five-year plans that prioritize science and technology. …Our current per person output advantage is not sustainable unless we regain technological dominance. …[W]e should materially increase effective research funding and focus on our own technology diffusion plans to upgrade our jobs and raise our living standards . Thanks for reading! Subscribe for free to receive new posts or get the audio version .

0 views
pabloecortez 3 months ago

You can read the web seasonally

What if you read things around the web the way you watch movies or listen to music? A couple of days ago I made a post on Mastodon introducing lettrss.com, a project I made that takes a book in the public domain and sends one chapter a day to your RSS reader. Xinit replied with a great point about RSS feed management: This is fascinating, but I know how it would go based on the thousands of unread RSS feeds I've had, and the thousands of unheard podcasts I subscribed to. I'd end up with an RSS of unread chapters, representing a whole book in short order. Regardless of my inability to deal, it remains a great idea, and I will absolutely recommend while hiding my shame of a non-zero inbox. When I first started using RSS, I thought I'd found this great tool for keeping tabs on news, current events, and stuff I should and do care about. After adding newspapers, blogs, magazines, publications, YouTube channels and release notes from software I use, I felt a false sense of accomplishment, like I'd finally been able to wrangle the craziness of the internet into a single app, like I had rebelled against the algorithm™️. But it didn't take long to accumulate hundreds of posts, most of which I had no true desire to read, and soon after I abandoned my RSS reader. I came back to check on it from time to time, but its dreadful little indicator of unread posts felt like a personal failure, so eventually I deleted it entirely. Will Hopkins wrote a great post on this exact feeling. I don't actually like to read later : I used Instapaper back in the day, quite heavily. I built up a massive backlog of items that I'd read occasionally on my OG iPod Touch. At some point, I fell off the wagon, and Instapaper fell by the wayside. [...] The same thing has happened with todo apps over the years, and feed readers. They become graveyards of good intentions and self-imposed obligations. Each item is a snapshot in time of my aspirations for myself, but they don't comport to the reality of who I am. I couldn't have said it better myself. This only happens with long-form writing, whenever I come across an essay or blog post that I know will either require my full attention or a bit more time than I'm willing to give it in the moment. I've never had that issue with music. Music is more discrete. It's got a timestamp. I listen to music through moods and seasons, so much so that I make a playlist for every month of the year like a musical scrapbook. What if we took this approach to RSS feeds? Here's what I replied to Xinit: This is something I find myself struggling with too. I think I'm okay knowing some RSS feeds are seasonal, same as music genres throughout the year. Some days I want rock, others I want jazz. Similarly with RSS feeds, I've become comfortable archiving and resurfacing feeds. For reference, I follow around 10 feeds at any given time, and the feeds I follow on my phone are different from the ones on my desktop. You shouldn't feel guilty about removing feeds from your RSS readers. It's not a personal failure, it's an allocation of resources like time and attention.

0 views
annie's blog 4 months ago

Duck duck duck dichotomy

Have you ever played Duck Duck Goose 1 and the person who’s it keeps walking and walking and walking and walking around and never picks the goose? It’s really boring. There are very few actual dichotomies. Most choices are not binary. Most choices are more like: “Here is an array of options you can  recognize (the subset of a potentially infinite array of options you can’t even see because you’re only able to recognize what’s familiar). Pick one!” No wonder making decisions is so exhausting. I can spend a lot of time musing over the array of options, but eventually I  narrow it down to one option and then it’s time to make the real choice which is  a dichotomy: Yes, do it, action, go, forward. Choosing an option and then saying No to the option I selected for myself  is wild! Why would I do that? Because choice is dangerous. Exerting the force of my will upon the world, or at least attempting to do so, is a risk. Risk of pain, risk of failure, risk of being wrong (whatever that means), risk of ending up in a worse situation, risk of being misunderstood, risky risky risky! Sometimes it feels safer to just hang out, not move, wait and see. It isn’t safer, usually, but it feels  safer. Passivity is a way to live but it’s not the way I like to live. I like to happen. I like to be the thing that’s happening in my own life. I like to be the main character in my own story. And  I only get to happen by choosing. nothing happens and/or things happen to me but I never happen. I make choices all day long but most of those are inconsequential, like: what time will I get up, what food will I eat, will I be impatient or kind with my child, will I be impatient or kind with myself, will I make that phone call, will I go to the gym, will I worry, will I be grateful, will I floss today, will I finish this blog post, will I actually put away the clean laundry? The answer to that last one is No. It’s going to sit in the basket for a few days. These choices all seem inconsequential but maybe they aren’t. Tiny choices become a trend, the trend creates a groove, the groove becomes a rut and I walk the rut because it’s easier to stick with what’s familiar than to enact change, so here I am: that’s my life. I can change it by making different tiny choices, one after another. It’s not about the right choice or wrong choice or the accurate choice or idiotic choice or worst choice or best choice. It’s about exerting your will. Choosing something. Selecting an option and then acting on it. Saying Yes. Duck duck duck duck duck goose. It’s about the goose. It doesn’t matter who the goose is. It matters that you pick a goose. Otherwise there’s no game, just a bunch of kids sitting in a circle being bored and sad. Everyone sits in a circle. One person walks around the circle, tapping others and saying duck  until choosing a goose . The chosen goose tries to tag them before they sit down in the goose’s spot. nothing happens and/or things happen to me but I never happen. Everyone sits in a circle. One person walks around the circle, tapping others and saying duck  until choosing a goose . The chosen goose tries to tag them before they sit down in the goose’s spot.

0 views
emiruz 4 months ago

Modelling beliefs about sets

Here is an interesting scheme I encountered in the wild, generalised and made abstract for you, my intrepid reader. Let \(X\) be a set of binary variables. We are given information about subsets of \(X\), where each update is a probability ranging over a concrete set, the state of which is described by an arbitrary quantified logic formula. For example, \[P\bigg\{A \subset X \mid \exists_{x_i, x_j \in A} \big(x_o \ne x_j))\bigg\} = p\] The above assigns a probability \(p\) to some concrete subset A, with the additional information that at least 1 pair of its members do not have the same value.

0 views