Posts in Statistics (20 found)
Manuel Moreale 2 weeks ago

Step aside, phone: week 1

First weekly recap for this fun life experiment. To remind you what this is all about : in order to help Kevin get back to a more sane use of his time in front of his phone, we decided to publicly share 4 weeks of screen time statistics from our phones and write roundups every Sunday. Yes, we’re essentially trying to shame ourselves into being more mindful about our phone usage. Let me tell you, it definitely works. Every time I do one of these experiments, I use the first week to prove to myself that this whole phone usage situation is mostly a matter of being mindful about it, and that if I decide that I don’t want to use the phone, well, I will not use it. And it’s not very hard. Monday to Wednesday, I basically almost never picked up my phone from my desk. It was fully charged on Sunday afternoon, and I didn’t plug it in again till Thursday. I did use it when I was outside for a couple of minor things, but as you can see from the image below, screen time is reporting 9 minutes of total usage for the first 3 days of the week. Thursday and Friday, I logged a bit more screen time (had to do a few things that required the use of apps), but also because I started listening to a few podcasts while I was driving. I said I started because one thing I did this week was delete any app that’s related to content consumption from the phone. I think my personal goal for this month-long experiment is going to be to get back to a use of my phone that’s utility-driven and not consumption-focused. The phone should be a tool to do things and not a passive consumption device. Friday usage spiked, and that’s because I was out on a date, so most of the time spent with the screen on was Google Maps being open while I was in the car. I still tried to be mindful of that, though. I drove about 5 hours back and forth, but I only used Google Maps for a bit more than 1 hour. I also used the browser for the first time this week to purchase a couple of tickets for a museum, and I took a few pictures. So this is how the first week went. Not included here is last Sunday—I told Kevin we were going to start this experiment on Monday—but I clocked 11 minutes on that day. Not bad. Now, one consideration about this first week: in order to push my phone usage this low, I had to move some of my normal phone usage over to my Mac, which is how I managed to basically never touch chat apps on my phone. I know this is pretty much cheating, but it was intentional and something I was planning to do only in this first week, and I will move that screen time back on my phone starting next week. The goal is to find the right balance after all, and I like the process of pushing it all the way down to the extreme and then bringing it back up to some more sane levels. If you have decided to take part in this experiment, email me a link to your post, and I’ll include it below. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs Read Kevin's week one recap Read Thomas' week one recap Read Steve's week one recap Read John's week one and two recaps

0 views
James Stanley 2 weeks ago

Evidence of absence

"Absence of evidence is not evidence of absence", they say. They're wrong. In this post we'll work through a scenario and show that absence of evidence is in fact evidence of absence. You can see Yudkowsky for more on this topic. You have a box with 100 bags in it. Each bag has 100 balls in it. 99 of the bags have 100 white balls, the final bag has 99 white balls and 1 black ball. (Assume all of the bags are indistinguishable from the outside, and all of the balls are the same size and weight etc., and the black ball is at a random position within its bag). What is the probability that a bag, selected uniformly at random, contains the black ball? You'll give your first probability estimate before the bag is opened, and then balls will be removed from the bag one by one and you can revise your estimate after seeing each ball. What is your strategy? Your lab assistant rummages around the box and selects a bag uniformly at random. You should agree that this bag contains the black ball with 1% probability, because you know that 1 in 100 bags contain the black ball. Your lab assistant takes out the first 99 balls. You observe that they're all white. This observation is merely an absence of evidence of the black ball, and, so they say, it is therefore not evidence of absence of the black ball, and your probability estimate should be unchanged, at a 1% chance that the bag contains the black ball. But while the 99 bags that only contain white balls would provide this observation every time, the 1 bag that contains the black ball would provide this observation only 1 time out of 100. So now you ought to agree that the probability that the bag contains the black ball is now only about 0.01%, not 1%, and we see that the ongoing "absence of evidence" of the black ball was in fact evidence of its absence after all. Your lab assistant pulls out the final ball. It's white .

0 views
DYNOMIGHT 3 weeks ago

Heritability of intrinsic human life span is about 50% when heritability is redefined to be something completely different

How heritable is hair color? Well, if you’re a redhead and you have an identical twin, they will definitely also be a redhead. But the age at which twins go gray seems to vary a bit based on lifestyle. And there’s some randomness in where melanocytes end up on your skull when you’re an embryo. And your twin might dye their hair! So the correct answer is, some large number, but less than 100%. OK, but check this out: Say I redefine “hair color” to mean “hair color except ignoring epigenetic and embryonic stuff and pretending that no one ever goes gray or dyes their hair et cetera”. Now, hair color is 100% heritable. Amazing, right? Or—how heritable is IQ? The wise man answers, “Some number between 0% or 100%, it’s not that important, please don’t yell at me.” But whatever the number is, it depends on society. In our branch of the multiverse, some kids get private tutors and organic food and $20,000 summer camps, while other kids get dysfunctional schools and lead paint and summers spent drinking Pepsi and staring at glowing rectangles. These things surely have at least some impact on IQ. But again, watch this: Say I redefine “IQ” to be “IQ in some hypothetical world where every kid got exactly the same school, nutrition, and parenting, so none of those non-genetic factors matter anymore.” Suddenly, the heritability of IQ is higher. Thrilling, right? So much science. If you want to redefine stuff like this… that’s not wrong . I mean, heritability is a pretty arbitrary concept to start with. So if you prefer to talk about heritability in some other world instead of our actual world, who am I to judge? Incidentally, here’s a recent paper : I stress that this is a perfectly OK paper. I’m picking on it mostly because it was published in Science, meaning—like all Science papers—it makes grand claims but is woefully vague about what those claims mean or what was actually done. Also, publishing in Science is morally wrong and/or makes me envious. So I thought I’d try to explain what’s happening. It’s actually pretty simple. At least, now that I’ve spent several hours reading the paper and its appendix over and over again, I’ve now convinced myself that it’s pretty simple. So, as a little pedagogical experiment, I’m going to try to explain the paper three times, with varying levels of detail. The normal way to estimate the heritability of lifespan is using twin data. Depending on what dataset you use, this will give 23-35%. This paper built a mathematical model that tries to simulate how long people would live in a hypothetical world in which no one dies from any non-aging related cause, meaning no car accidents, no drug overdoses, no suicides, no murders, and no (non-age-related) infectious disease. On that simulated data, for simulated people in a hypothetical world, heritability was 46-57%. Everyone seems to be interpreting this paper as follows: Aha! We thought the heritability of lifespan was 23-35%. But it turns out that it’s around 50%. Now we know! I understand this. Clearly, when the editors at Science chose the title for this paper, their goal was to lead you to that conclusion. But this is not what the paper says. What it says is this: We built a mathematical model of alternate universe in which nobody died from accidents, murder, drug overdoses, or infectious disease. In that model, heritability was about 50%. Let’s start over. Here’s figure 2 from the paper. Normally, heritability is estimated from twin studies. The idea is that identical twins share 100% of their DNA, while fraternal twins share only 50%. So if some trait is more correlated among identical twins than among fraternal twins, that suggests DNA influences that trait. There are statistics that formalize this intuition. Given a dataset that records how long various identical and fraternal twins lived, these produce a heritability number. Two such traditional estimates appear as black circles in the above figures. For the Danish twin cohort, lifespan is estimated to be 23% heritable. For the Swedish cohort, it’s 35%. This paper makes a “twin simulator”. Given historical data, they fit a mathematical model to simulate the lifespans of “new” twins. Then they compute heritability on this simulated data. Why calculate heritability on simulated data instead of real data? Well, their mathematical model contains an “extrinsic mortality” parameter, which is supposed to reflect the chance of death due to all non-aging-related factors like accidents, murder, or infectious disease. They assume that the chance someone dies from any of this stuff is constant over people, constant over time, and that it accounts for almost all deaths for people aged between 15 and 40. The point of building the simulator is that it’s possible to change extrinsic mortality. That’s what’s happening in the purple curves in the above figure. For a range of different extrinsic mortality parameters, they simulate datasets of twins. For each simulated dataset, they estimate heritability just like with a real dataset. Note that the purple curves above nearly hit the black circles. This means that if they run their simulator with extrinsic mortality set to match reality, they get heritability numbers that line up with what we get from real data. That suggests their mathematical model isn’t totally insane. If you decrease extrinsic mortality, then you decrease the non-genetic randomness in how long people live. So heritability goes up. Hence, the purple curves go up as you go to the left. My explanation of this paper relies on some amount of guesswork. For whatever reason, Science has decided that papers should contain almost no math, even when the paper in question is about math. So I’m mostly working from an English description. But even that description isn’t systematic. There’s no place in the paper where clearly lay out all the things they did, in order. Instead, you get little hints, sort of randomly distributed throughout the paper. There’s an appendix, which the paper confidently cites over and over. But if you actually read the appendix, it’s just more disconnected explanations of random things except now with equations set in glorious Microsoft Work format. Now, in most journals, authors write everything. But Science has professional editors. Given that every single statistics-focused paper in Science seems to be like this, we probably shouldn’t blame the authors of this one. (Other than for their decision to publish in Science in the first place.) I do wonder what those editors are doing, though. I mean, let me show you something. Here’s the first paragraph where they start to actually explain what they actually did, from the first page: See that h(t,θ) at the end? What the hell is that, you ask? That’s a good question, because it was never introduced before this and is never mentioned again. I guess it’s just supposed to be f(t,θ) , which is fine. (I yield to none in my production of typos.) But if paying journals ungodly amounts of money brought us to this, of what use are those journals? Probably most people don’t need this much detail and should skip this section. For everyone else, let’s start over one last time. The “normal” way to estimate heritability is by looking at correlations between different kinds of twins. Intuitively, if the lifespans of identical twins are more correlated than the lifespans of fraternal twins, that suggests lifespan is heritable. And it turns out that one estimator for heritability is “twice the difference between the correlation among identical twins and the correlation among fraternal twins, all raised together.” There are other similar estimators for other kinds of twins. These normally say lifespan is perhaps 20% and 35% heritable. This paper created an equation to model the probability a given person will die at a given age. The parameters of the equation vary from person to person, reflecting that some of us have DNA that predisposes us to live longer than others. But the idea is that the chances of dying are fairly constant between the ages of 15 and 40, after which they start increasing. This equation contains an “extrinsic mortality” parameter. This is meant to reflect the chance of death due to all non-aging related factors like accidents or murder, etc. They assume this is constant. (Constant with respect to people and constant over time.) Note that they don’t actually look at any data on causes of death. They just add a constant risk of death that’s shared by all people at all ages to the equation, and then they call this “extrinsic mortality”. Now remember, different people are supposed to have different parameters in their probability-of-death equations. To reflect this, they fit a Gaussian distribution (bell curve) to the parameters with the goal of making it fit with historical data. The idea is that if the distribution over parameters were too broad, you might get lots of people dying at 15 or living until 120, which would be wrong. If the distribution were too concentrated, then you might get everyone dying at 43, which would also be wrong. So they find a good distribution, one that makes the ages people die in simulation look like the ages people actually died in historical data. Right! So now they have: Before moving on, I remind you of two things: The event of a person dying at a given age is random. But the probability that this happens is assumed to be fixed and determined by genes and genes alone. Now they simulate different kinds of twins. To simulate identical twins, they just draw parameters from their parameter distribution, assign those parameters to two different people, and then let them randomly die according to their death equation. (Is this getting morbid?) To simulate fraternal twins, they do the same thing, except instead of giving the two twins identical parameters, they give them correlated parameters, to reflect that they share 50% of their DNA. How exactly do they create those correlated parameters? They don’t explain this in the paper, and they’re quite vague in the supplement. As far as I can tell they sample two sets of parameters from their parameter distribution such that the parameters are correlated at a level of 0.5. Now they have simulated twins. They can simulate them with different extrinsic mortality values. If they lower extrinsic mortality, heritability of lifespan goes up. If they lower it to zero, heritability goes up to around 50%. Almost all human traits are partly genetic and partly due to the environment and/or random. If you could change the world and reduce the amount of randomness, then of course heritability would go up. That’s true for life expectancy just life for anything else. So what’s the point of this paper? There is a point! Sure, obviously heritability would be higher in a world without accidents or murder. We don’t need a paper to know that. But how much higher? It’s impossible to say without modeling and simulating that other world. Our twin datasets are really old. It’s likely that non-aging-related deaths are lower now in the past, because we have better healthcare and so on. This means that the heritability of lifespan for people alive today may be larger than it was for the people in our twin datasets, some of whom were born in 1870. We won’t know for sure until we’re all dead, but this paper gives us a way to guess. Have I mentioned that heritability depends on society? And that heritability changes when society changes? And that heritability is just a ratio and you should stop trying to make it be a non-ratio because only-ratio things cannot be non-ratios? This is a nice reminder. Honestly, I think the model the paper built is quite clever. Nothing is perfect, but I think this is a pretty good run at the question of “how high would the heritability of lifespan be if extrinsic mortality were lower. I only have two objections. The first is to the Science writing style. This is a paper describing a statistical model. So shouldn’t there be somewhere in the paper where they explain exactly what they did, in order, from start to finish? Ostensibly, I think this is done in the left-hand column on the second page, just with little detail because Science is written for a general audience. But personally I think that description is the worst of all worlds. Instead of giving the high-level story in a coherent way, it throws random technical details at you without enough information to actually make sense of them. Couldn’t the full story with the full details at least be in the appendix? I feel like this wasted hours of my time, and that if someone wanted to reproduce this work, they would have almost no chance of doing so from the description given. How have we as a society decided that we should take our “best” papers and do this to them? But my main objection is this: At first, I thought this was absurd. The fact that people die in car accidents is not a “confounding factor”. And pretending that no one dies in a car accidents does not “address” some kind of bias. That’s just computing heritability in some other world. Remember, heritability is not some kind of Platonic form. It is an observational statistic . There is no such thing as “true” heritability, independent of the contingent facts of our world. But upon reflection, I think they’re trying to say something like this: Heritability of intrinsic human lifespan is about 50% when extrinsic mortality is adjusted to be closer to modern levels. The problem is: I think this is… not true? Here are the actual heritability estimates in the paper, varying by dataset (different plots) the cutoff year (colors) and extrinsic mortality (x-axis). When extrinsic mortality goes down, heritability goes up. So the obvious question is: What is extrinsic mortality in modern people? This is a tricky question, because “extrinsic mortality” isn’t some simple observational statistic. It is a parameter in their model. (Remember, they never looked at causes of death.) So it’s hard to say, but they seem to suggest that extrinsic mortality in modern people is 0.001 / year, or perhaps a bit less. The above figures have the base-10 logarithm of extrinsic mortality on the x-axis. And the base-10 logarithm of 0.001 is -3. But if you look at the curves when the x-axis is -3, the heritability estimates are not 50% . They’re more like 35-45%, depending on the particular model and age cutoff. So here’s my suggested title: Heritability of intrinsic human lifespan is about 40% when extrinsic mortality is adjusted to modern levels, according to a simulation we built. There might be a reason I don’t work at Science. An equation that’s supposed to reflect the probability a given person dies at a given age. A distribution over the parameters of that equation that’s supposed to produce population-wide death ages that look like those in real historical data. They assume their death equation entirely determines the probability someone will die in a given year. They assume that the shape of someone’s death equation is entirely determined by genetics. Sure, obviously heritability would be higher in a world without accidents or murder. We don’t need a paper to know that. But how much higher? It’s impossible to say without modeling and simulating that other world. Our twin datasets are really old. It’s likely that non-aging-related deaths are lower now in the past, because we have better healthcare and so on. This means that the heritability of lifespan for people alive today may be larger than it was for the people in our twin datasets, some of whom were born in 1870. We won’t know for sure until we’re all dead, but this paper gives us a way to guess. Have I mentioned that heritability depends on society? And that heritability changes when society changes? And that heritability is just a ratio and you should stop trying to make it be a non-ratio because only-ratio things cannot be non-ratios? This is a nice reminder.

0 views
Ruslan Osipov 1 months ago

The illusory truth effect

I’m a bit late with this, but here’s an interesting headline: “Liberal arts students have lower unemployment rates than computer science students according to the NY Fed”. It’s a headline I saw early last year, took a note to read further, and just rediscovered the headline when cleaning up my notes. Here’s an article from The College Fix from June 20, 2025: Computer engineering grads face double the unemployment rate of art history majors . In the article, the author claims: The stats show art history majors have a 3 percent unemployment rate while computer engineering grads have a 7.5 percent unemployment rate. Computer science grads are in a similar boat, with a 6.1 percent rate. Ok, let’s find if this lines up with what NY Fed says : Oh, what’s that number next to “unemployment”? Uh-oh. Underemployment accounts for people working in a job which does not require a bachelor degree. This means that a computer engineering graduate is working a tech job, while an art history major takes up work in a fast food restaurant. And all of a sudden, the picture shifts. 17% of computer engineering majors were underemployed, while a whopping 46.9% of art history graduates weren’t utilizing their degree. This article is one of many, which cherry-picked data from the NY Fed and made outrageous claims. Further, the data is from 2023, which the article above mentions near the end, in passing. That’s a pretty relevant bit, for an article written in 2025, isn’t it? For me this brought up a question of digital hygiene and how the headlines I see affect us. I have seen this headline many times throughout the year - I never read through content, but over time the headline stayed in my memory. The illusory truth effect is the cognitive bias where repeated exposure to a statement makes it seem more truthful, even if it’s known to be false. I really did believe that CS graduates had lower employment than art history majors. Don’t get me wrong, the job market for newgrads is oh-so-brutal, and the future prospects are murky. Which probably made it easier to believe such an outrageous claim. Yes, disproving the headline took all of 10 seconds, but how many headlines do you see a day? What other misinformation cements itself in your head? And ultimately, is it better to limit access to such information, or - however impractical - try to verify everything you see?

0 views

Bayes theorem and how we talk about medical tests

We want medical tests to give us a yes or no answer: you have the disease, you're cured. We treat them this way, often. My labs came back saying I'm healthy. I have immunity. I'm sick. Absolutely concrete results. The reality is more complicated, and tests do not give you a yes or no. They give you a likelihood. And most of the time, what the results mean for me , the test taker, is not immediately obvious or intuitive. They can mean something quite the opposite of what they seem. I ran into this recently on a page about celiac disease. The Celiac Disease Foundation has a page about testing for celiac disease . On this page, they give a lot of useful information about what different tests are available, and they point to some other good resources as well. In the section about one of the tests, it says (emphasis original): The tTG-IgA test will be positive in about 93% of patients with celiac disease who are on a gluten-containing diet. This refers to the test's sensitivity , which measures how correctly it identifies those with the disease. The same test will come back negative in about 96% of healthy people without celiac disease. This is the test's specificity . This is great information, and it tells you what you need to start figuring out what your chance of celiac disease is. The next paragraph says this, however: There is also a slight risk of a false positive test result, especially for people with associated autoimmune disorders like type 1 diabetes, autoimmune liver disease, Hashimoto's thyroiditis, psoriatic or rheumatoid arthritis, and heart failure, who do not have celiac disease. And this is where things are a little misleading. It says that there is a "slight risk" of a false positive test result. What do you think of as a slight risk? For me, it's maybe somewhere around 5%, maybe 10%. The truth is, the risk of a false positive is much higher (under many circumstances). When I take a test, I want to know a couple of things. If I get a positive test result, how likely is it that I have the disease? If I get a negative test result, how likely is it that I do not have the disease? The rates of positive and negative results listed above, the sensitivity and specificity, do not tell us these directly. However, they let us to calculate this with a little more information. Bayes' theorem says that . You can read as "the probability of A conditioned on B", or the chance that A happens if we know that B happens. What this formula lets us do is figure out one conditional probability we don't yet know in terms of other ones that we do know. In our case, we would say that is having celiac disease, and is getting a positive test result. This leaves as the chance that if you get a positive test result, that you do have celiac disease, which is exactly what we want to know. To compute this, we need a few more pieces of information. We already know that is 0.93, as we were told this above. And we can find prety easily. Let's say is 0.01, since about 1 in 100 people in the US have celiac disease. Estimates vary from 1 in 200 to 1 in 50, but this will do fine. That leaves us with . We have to compute it from both possibilities. If someone who has celiac disease takes the test, they have a 93% chance of it coming back positive, but they're only 1% of the population. On the other hand, someone without celiac disease has a 4% chance of it coming back positive (96% of the time it gives a true negative), and they're 99% of the population. We use these together to find that . Now we plug it all in! . Neat, 19%! So this says that, if you get a positive test result, you have a 19% chance of having celiac disease? Yes, exactly! It's less than 1 in 5! So if you get a positive test result, you have an 80% chance of it being a false positive. This is quite a bit higher than the aforementioned "slight risk." In fact, it means that the test doesn't so much diagnose you with celiac disease as say "huh, something's going on here" and strongly suggest further testing. Now let's look at the test the other way around, too. How likely is it you don't have celiac disease if you get a negative test result? Here we'd say that A is "we don't have it" and B is "we have a negative test". Doing some other calculations, pulled out of the oven fully baked in cooking show style, we can see that . So if you get a negative test result, you have a 99.9% chance of not having the disease. This can effectively rule it out! But... We know that 7% of people who take this test and do have celiac disease will get a negative result. How does this makes sense? The truth is, things are a little bit deeper. People don't actually present with exactly a 1% chance of having celiac's disease. That would be true if you plucked a random person from the population and subjected them to a blood test. But it's not true if you go to your doctor with GI symptoms which are consistent with celiac disease! If you're being tested for celiac disease, you probably are symptomatic. So that prior probability, ? It's better as something else, but how we set it is a good question. Let's say you present with symptoms highly consistent with celiac disease, and that this gives you a 10% chance of having celiac disease and a 90% chance of it being something else, given these symptoms . This changes the probability a lot. If you get a positive test in this case, then . So now a positive test is a 72% chance of having celiac disease, instead of just 20%. And a negative test here gives you a 10% chance of a false negative, better than the 0.1% chance before. The real question is how we go from symptoms to that prior probability accurately. I spent a lot of 2024 being poked with needles and tested for various diseases while we tried to figure out what was wrong with me. Ultimately it was Lyme disease, and the diagnosis took a while because of a false negative. That false negative happened because the test was calibrated for broad population sampling, not for testing individuals presenting with symptoms already. The whole story is a lot longer, and it's for another post. But maybe, just maybe, it would've been a shorter story if we'd learned reason about probabilities and medical tests better. Things are not intuitive, but Bayes is your friend, and Bayes' theorem can show us the information we really need to know. Or, we can keep going with things how they are. I mean, I did enjoy getting to know Barbara, my phlebotomist, from all my appointments. , the probability in general that the person taking the test has celiac disease. This is also called the prior probability , as it's what we would say the probability is if we did not know anything from this computation and test. , the probability that for any given test taken, it comes back positive. , the probability that if one has celiac disease, the test will come back positive.

0 views
fLaMEd fury 2 months ago

Drinking Less, Enjoying More: Beer in 2025

What’s going on, Internet? This was a quieter year for beer. I checked in less, tried fewer new things, and enjoyed what I drank more. I leaned hard into IPAs, especially West Coast IPAs, spent more time with Bright IPAs, and reaffirmed my appreciation for the humble APA. 2025 wasn’t about chasing novelty. Beer isn’t getting cheaper, and it made more sense to stick with what I know I enjoy. Stats shown here are based on Untappd check-ins up to 23 December 2025 . There’s still a week to go between Christmas and New Year, which is usually when I try a bunch of new beers, so treat this as a snapshot rather than a final tally. I might update in the new year I still use Untapped to keep track of my beers. Here are my 2025 stats: Total Check-ins ↓ 37 from 2024 Unique Beers ↓ 34 from 2024 ↓ 4 from 2024 Average Rating ↑ 0.11 from 2024 The continued drop in volume alongside a higher average rating summed up the year well. Fewer beers, better choices, and more consistent enjoyment. Friday remained my most popular day for check-ins, with beers logged from four countries across the year, led overwhelmingly by New Zealand with only a small handful from overseas. American IPA was my most checked-in style, with Garage Project as my top brewery. Garage Project remained the most explored brewery in 2025. I still picked up a run of their seasonal releases early in the year, but stepping away from the Fresh Hop Subscription allowed other breweries to feature more prominently. I wouldn’t be surprised to see a different brewery take the top spot in 2026 as I spend more time exploring Auckland breweries. One thing these stats don’t fully capture is the beers I reached for most often. The Friday after-work beers and the ones that lived in my golf bag rarely made it into Untappd, but they were easily my favourites across the year. Parrotdog’s Raptor APA, Thunderbird Bright APA, Urbanaut’s Detroit Bright IPA, and Liberty’s Yakima Monster APA were constants. They aren’t seasonal or rare, but they are delicious, reliable core-range beers I kept coming back to. Untappd reflects moments. These beers reflect habits. Cheers to drinking less and enjoying more! 🍺 Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website.

0 views
DYNOMIGHT 2 months ago

Good if make prior after data instead of before

They say you’re supposed to choose your prior in advance. That’s why it’s called a “prior”. First , you’re supposed to say say how plausible different things are, and then you update your beliefs based on what you see in the world. For example, currently you are—I assume—trying to decide if you should stop reading this post and do something else with your life. If you’ve read this blog before, then lurking somewhere in your mind is some prior for how often my posts are good. For the sake of argument, let’s say you think 25% of my posts are funny and insightful and 75% are boring and worthless. OK. But now here you are reading these words. If they seem bad/good, then that raises the odds that this particular post is worthless/non-worthless. For the sake of argument again, say you find these words mildly promising, meaning that a good post is 1.5× more likely than a worthless post to contain words with this level of quality. If you combine those two assumptions, that implies that the probability that this particular post is good is 33.3%. That’s true because the red rectangle below has half the area of the blue one, and thus the probability that this post is good should be half the probability that it’s bad (33.3% vs. 66.6%) It’s easiest to calculate the ratio of the odds that the post is good versus bad, namely It follows that and thus that Alternatively, if you insist on using Bayes’ equation: Theoretically, when you chose your prior that 25% of dynomight posts are good, that was supposed to reflect all the information you encountered in life before reading this post. Changing that number based on information contained in this post wouldn’t make any sense, because that information is supposed to be reflected in the second step when you choose your likelihood . Changing your prior based on this post would amount to “double-counting”. In theory, that’s right. It’s also right in practice for the above example, and for the similar cute little examples you find in textbooks. But for real problems, I’ve come to believe that refusing to change your prior after you see the data often leads to tragedy. The reason is that in real problems, things are rarely just “good” or “bad”, “true” or “false”. Instead, truth comes in an infinite number of varieties. And you often can’t predict which of these varieties matter until after you’ve seen the data. Let me show you what I mean. Say you’re wondering if there are aliens on Earth. As far as we know, there’s no reason aliens shouldn’t have emerged out of the random swirling of molecules on some other planet, developed a technological civilization, built spaceships, and shown up here. So it seems reasonable to choose a prior it’s equally plausible that there are aliens or that there are not, i.e. that Meanwhile, here on our actual world, we have lots of weird alien-esque evidence, like the Gimbal video , the Go Fast video , the FLIR1 video , the Wow! signal , government reports on unidentified aerial phenomena , and lots of pilots that report seeing “tic-tacs” fly around in physically impossible ways. Call all that stuff . If aliens weren’t here, then it seems hard to explain all that stuff. So it seems like should be some low number. On the other hand, if aliens were here, then why don’t we ever get a good image? Why are there endless confusing reports and rumors and grainy videos, but never a single clear close-up high-resolution video, and never any alien debris found by some random person on the ground? That also seems hard to explain if aliens were here. So I think should also be some low number. For the sake of simplicity, let’s call it a wash and assume that Since neither the prior nor the data see any difference between aliens and no-aliens, the posterior probability is See the problem? Observe that where the last line follows from the fact that and . Thus we have that We’re friends. We respect each other. So let’s not argue about if my starting assumptions are good. They’re my assumptions. I like them. And yet the final conclusion seems insane to me. What went wrong? Assuming I didn’t screw up the math (I didn’t), the obvious explanation is that I’m experiencing cognitive dissonance as a result of a poor decision on my part to adopt a set of mutually contradictory beliefs. Say you claim that Alice is taller than Bob and Bob is taller than Carlos, but you deny that Alice is taller than Carlos. If so, that would mean that you’re confused, not that you’ve discovered some interesting paradox. Perhaps if I believe that and that , then I must accept that . Maybe rejecting that conclusion just means I have some personal issues I need to work on. I deny that explanation. I deny it! Or, at least, I deny that’s it’s most helpful way to think about this situation. To see why, let’s build a second model. Here’s a trivial observation that turns out to be important: “There are aliens” isn’t a single thing. There could be furry aliens, slimy aliens, aliens that like synthwave music, etc. When I stated my prior, I could have given different probabilities to each of those cases. But if I had, it wouldn’t have changed anything, because there’s no reason to think that furry vs. slimy aliens would have any difference in their eagerness to travel to ape-planets and fly around in physically impossible tic-tacs. But suppose I had divided up the state of the world into these four possibilities: If I had broken things down that way, I might have chosen this prior: Now, let’s think about the empirical evidence again. It’s incompatible with , since if there were no aliens, then normal people wouldn’t hallucinate flying tic-tacs. The evidence is also incompatible with since is those kinds of aliens were around they would make their existence obvious. However, the evidence fits pretty well with and also with . So, a reasonable model would be If we combine those assumptions, now we only get a 10% posterior probability of aliens. Now the results seem non-insane. To see why, first note that since both and have near-zero probability of producing the observed data. where the second equality follows from the fact that the data is assumed to be equally likely under and It follows that I hope you are now confused. If not, let me lay out what’s strange: The priors for the two above models both say that there’s a 50% chance of aliens. The first prior wasn’t wrong , it was just less detailed than the second one. That’s weird, because the second prior seemed to lead to completely different predictions. If a prior is non-wrong and the math is non-wrong, shouldn’t your answers be non-wrong? What the hell? The simple explanation is that I’ve been lying to you a little bit. Take any situation where you’re trying to determine the truth of anything. Then there’s some space of things that could be true . In some cases, this space is finite. If you’ve got a single tritium atom and you wait a year, either the atom decays or it doesn’t. But in most cases, there’s a large or infinite space of possibilities. Instead of you just being “sick” or “not sick”, you could be “high temperature but in good spirits” or “seems fine except won’t stop eating onions”. (Usually the space of things that could be true isn’t easy to map to a small 1-D interval. I’m drawing like that for the sake of visualization, but really you should think of it as some high-dimensional space, or even an infinite dimensional space.) In the case of aliens, the space of things that could be true might include, “There are lots of slimy aliens and a small number of furry aliens and the slimy aliens are really shy and the furry aliens are afraid of squirrels.” So, in principle , what you should do is divide up the space of things that might be true into tons of extremely detailed things and give a probability to each. Often, the space of things that could be true is infinite. So theoretically, if you really want to do things by the book, what you should really do is specify how plausible each of those (infinite) possibilities is. After you’ve done that, you can look at the data. For each thing that could be true, you need to think about the probability of the data. Since there’s an infinite number of things that could be true, that’s an infinite number of probabilities you need to specify. You could picture it as some curve like this: (That’s a generic curve, not one for aliens.) To me, this is the most underrated problem with applying Bayesian reasoning to complex real-world situations: In practice, there are an infinite number of things that can be true. It’s a lot of work to specify prior probabilities for an infinite number of things. And it’s also a lot of work to specify the likelihood of your data given an infinite number of things. So what do we do in practice? We simplify, usually by limiting creating grouping the space of things that could be true into some small number of discrete categories. For the above curve, you might break things down into these four equally-plausible possibilities. Then you might estimate these data probabilities for each of those possibilities. Then you could put those together to get this posterior: That’s not bad. But it is just an approximation . Your “real” posterior probabilities correspond to these areas: That approximation was pretty good. But the reason it was good is that we started out with a good discretization of the space of things that might be true: One where the likelihood of the data didn’t vary too much for the different possibilities inside of , , , and . Imagine the likelihood of the data—if you were able to think about all the infinite possibilities one by one—looked like this: This is dangerous. The problem is that you can’t actually think about all those infinite possibilities. When you think about four four discrete possibilities, you might estimate some likelihood that looks like this: If you did that, that would lead to you underestimating the probability of , , and , and overestimating the probability of . This is where my first model of aliens went wrong. My prior was not wrong. (Not to me.) The mistake was in assigning the same value to and . Sure, I think the probability of all our alien-esque data is equally likely given aliens and given no-aliens. But that’s only true for certain kinds of aliens, and certain kinds of no-aliens. And my prior for those kinds of aliens is much lower than for those kinds of non-aliens. Technically, the fix to the first model is simple: Make lower. But the reason it’s lower is that I have additional prior information that I forgot to include in my original prior. If I just assert that is much lower than then the whole formal Bayesian thing isn’t actually doing very much—I might as well just state that I think is low. If I want to formally justify why should be lower, that requires a messy recursive procedure where I sort of add that missing prior information and then integrate it out when computing the data likelihood. Mathematically, But now I have to give a detailed prior anyway. So what was the point of starting with a simple one? I don’t think that technical fix is very good. While it’s technically correct (har-har) it’s very unintuitive. The better solution is what I did in the second model: To create a finer categorization of the space of things that might be true, such that the probability of the data is constant-ish for each term. The thing is: Such a categorization depends on the data. Without seeing the actual data in our world, I would never have predicted that we would have so many pilots that report seeing tic-tacs. So I would never have predicted that I should have categories that are based on how much people might hallucinate evidence or how much aliens like to mess with us. So the only practical way to get good results is to first look at the data to figure out what categories are important, and then to ask yourself how likely you would have said those categories were, if you hadn’t yet seen any of the evidence.

0 views
Rik Huijzer 2 months ago

China's CO₂ Emissions Per Capita Has Already Surpassed the...

EU According to Our World in Data, China's CO₂ emissions per capita have already passed those of people in the European Union and the UK, and will surpass those of the US and Canada roughly around 2028: ![co-emissions-per-capita.svg](/files/b60da0f2ccbfb9fb)

0 views
Raph Koster 3 months ago

Looking back at a pandemic simulator

It’s been six years now since the early days of the Covid pandemic. People who were paying super close attention started hearing rumors about something going on in China towards the end of 2019 — my earliest posts about it on Facebook were from November that year. Even at the time, people were utterly clueless about the mathematics of how a highly infectious virus spread. I remember spending hours writing posts on various different social media sites explaining that the Infection Fatality Rates and the R value were showing that we could be looking at millions dead. People didn’t tend to believe me: “SEVERAL MILLION DEAD! Okay, I’m done. No one is predicting that. But you made me laugh. Thanks.” You can do the math yourself. Use a low average death estimate of 0.4%. Assume 60% of the population catches it and then we reach herd immunity (which is generous): But that’s with low assumptions… It was like typing to a wall. In fact, it’s pretty likely that it still is, since these days, the discourse is all about how bad the economic and educational impact of lockdowns was — and not about the fact that if the world had acted in concert and forcefully, we could have had a much better outcome than we did. The health response was too soft , the lockdown too lenient, and as a result, we took all the hits. Of course, these days people also forget just how deadly it was and how many died, and so on. We now know that the overall IFR was probably higher than 0.4%, but very strongly tilted towards older people and those with comorbidities. We also now know that herd immunity was a pipe dream — instead we managed to get vaccines out in record time and the ordinary course of viral evolution ended up reducing the death rate until now we behave as if Covid is just a deadlier flu (it isn’t, that thinking ignores long-term impact of the disease). The upshot: my math was not that far off — the estimated toll in the US ended up being 1.2 to 1.4 million souls, and worldwide it’s estimated as between 15 and 28.5 million dead. Plenty of denial of this, these days, and plenty of folks blaming the vaccines for what are most likely issues caused by the disease in the first place. Anyway, in the midst of it all, tired of running math in my spreadsheets (yeah, I was tracking it all in spreadsheets, what can I say?), I started thinking about why only a few sorts of people were wrapping their heads around the implications. The thing they all had in common was that they lived with exponential curves. Epidemiologists, Wall Street quants, statisticians… and game designers. Could we get more people to feel the challenges in their bones? So… I posted this to Facebook on March 24th, 2020: Three weeks ago I was idly thinking of how someone ought to make a little game that shows how the coronavirus spreads, how testing changes things, and how social distancing works. The sheer number of people who don’t get it — numerate people, who ought to be able to do math — is kind of shocking. I couldn’t help worrying at it, and have just about a whole design in my head. But I have to admit, I kinda figured someone would have made it by now. But they haven’t. It’s not even a hard game to make. Little circles on a plain field. Each circle simply bounces around. They are generated each with an age, a statistically real chance of having a co-morbid condition (diabetes, hypertension, immunosuppressed, pulmonary issues…), and crucially, a name out of a baby book. They can be in one of these states: In addition, there’s a diagnosed flag. We render asymptomatic the same as healthy. We render each of the other states differently, depending on whether the diagnosed flag is set. They show as healthy until dead, if not diagnosed. If diagnosed, you can see what stage they are in (icon or color change). The circles move and bounce. If an asymptomatic one touches a healthy one, they have a statistically valid chance of infecting. Circles progress through these states using simple stats. We track current counts on all of these, and show a bar graph. Yes, that means players can see that people are getting sick, but don’t know where. The player has the following buttons. The game ticks through days at an accelerated pace. It runs for 18 months worth of days. At the end of it, you have a vaccine, and the epidemic is over. Then we tell you what percentage of your little world died. Maybe with a splash screen listing every name and age of everyone who died. And we show how much money you spent. Remember, you can go negative, and it’s OK. That’s it. Ideally, it runs in a webpage. Itch.io maybe. Or maybe I have a friend with unlimited web hosting. Luxury features would be a little ini file or options screen that lets you input real world data for your town or country: percent hypertensive, age demographics, that sort of thing. Or maybe you could crowdsource it, so it’s a pulldown… Each weekend I think about building this. So far, I haven’t, and instead I try to focus on family and mental health and work. But maybe someone else has the energy. I suspect it might persuade and save lives. Some things about this that I want to point out in hindsight. Per the American Heart Association, among adults age 20 and older in the United States, the following have high blood pressure: Per the American Diabetes Association, Per studies in JAMA, Next, realize that because the disease spreads mostly inside households (where proximity means one case tends to infect others), this means that protecting the above extremely large slices of the population means either isolating them away from their families, or isolating the entire family and other regular contacts. People tend to think the at-risk population is small. It’s not. The response, for Facebook, was pretty surprising. The post was re-shared a lot, and designers from across the industry jumped in with tweaks to the rules. Some folks re-posted it to large groups about public initiatives, etc. There was also, of course, plenty of skepticism that something like this would make any difference at all. The first to take up the challenge was John Albano, who had his game Covid Ops up and running on itch.io a mere six days later . You can still play it there! Stuck in the house and looking for things to do. Soooo, when a fellow game dev suggested a game idea and basic ruleset along with “I wish someone would make a game like this,” I took that as a challenge to try. Tonight (this morning?), the first release of COVID OPS has been published. John’s game was pretty faithful to the sketch. You can see the comorbidities over on the left, and the way the player has clicked on 72 year old Rowan — who probably isn’t going to make it. As he updated it, he added in more detailed comorbidity data, and (unfortunately, as it turns out) made it so that people were immune after recovering from infection. And of course, like the next one I’ll talk about, John made a point of including real world resource links so that people could take action. By April 6th, another team led by Khail Santia had participated in Jamdemic 2020 and developed the first version of In the Time of Pandemia. He wrote, The compound I stay at is about to be cordoned. We’ve been contact-traced by the police, swabbed by medical personnel covered in protective gear. One of our housemates works at a government hospital and tested positive for antibodies against SARS-CoV-2. The pandemic closes in from all sides. What can a game-maker do in a time like this? I’ve been asking myself this question since the beginning of community quarantine. I’m based in Cebu City, now the top hotspot for COVID-19 in the Philippines in terms of incidence proportion. This game would go on to be completed by a fuller team including a mathematical epidemiologist, and In the Time of Pandemia eventually ended up topping charts on Newgrounds when it launched there in July of 2020. This game went viral and got a ton of press across the Pacific Rim . The team worked closely with universities and doctors in the Philippines and validated all the numbers. They added local flavor to their levels representing cities and neighborhoods that their local players would know. Gregg Victor Gabison, dean of the University of San Jose-Recoletos College of Information, Computer & Communications Technology, whose students play-tested the game, said, “This is the kind of game that mindful individuals would want to check out. It has substance and a storyline that connects with reality, especially during this time of pandemic.” Not only does the game have to work on a technical basis, it has to communicate how real a crisis the pandemic is in a simple, digestible manner. Dr. Mariane Faye Acma, resident physician at Medidas Medical Clinic in Valencia, Bukidnon, was consulted to assess the game’s medical plausibility. She enumerated critical thinking, analysis, and multitasking as skills developed through this game. “You decide who are the high risks, who needs to be tested and isolated, where to focus, [and] how much funds to allocate….The game will make players realize how challenging the work of the health sector is in this crisis.” “Ultimately, the game’s purpose is to give players a visceral understanding of what it takes to flatten the curve,” Santia said. I think most people have no idea that any of this happened or that I was associated with it. I only posted the design sketch on Facebook; it got reshared across a few thousand people. It wasn’t on social media, I didn’t talk about it elsewhere, and for whatever reason, I didn’t blog about it. I have had both these games listed on my CV for a while. Oh, I didn’t do any of the heavy lifting… all credit goes to the developers for that. There’s no question that way more than 95% of the work comes after the high-level design spec. But both games do credit me, and I count them as games I worked on. A while back, someone on Reddit said it was pathetic that I listed these. I never quite know what to make of comments like that (troll much?!?). No offense, but I’m proud of what a little design sketch turned into, and proud of the work that these teams did, and proud that one of the games got written up in the press so much; ended up being used in college classrooms; was vetted and validated by multiple experts in the field; and made a difference however slight. Peak Covid was a horrendous time. Horrendous enough that we have kind of blocked it from our memories. But I lost friends and colleagues. I still remember. Back then I wrote, This is the largest event in your lifetime. It is our World War, our Great Depression. We need to rise the occasion, and think about how we change. There is no retreat to how it used to be. There is only through. A year later, the vaccine gave us that path through, and here we are now. But as I write this, we have the first human case of H5N5 bird flu; it was only a matter of time. Maybe these games helped a few people get through it all. They were played by tens of thousands, after all. Maybe they will help next time. I know that the fact that they were made helped me get through, that making them helped John get through, helped Khail get through — in his own words: In the end, the attempt to articulate a game-maker’s perspective on COVID-19 has enabled me to somehow transcend the chaos outside and the turmoil within. It’s become a welcome respite from isolation, a thread connecting me to a diversity of talents who’ve been truly generous with their expertise and encouragement. As incidences continue to rise here and in many parts of the world, our hope is that the game will be of some use in showing what it takes to flatten the curve and in advocating for communities most in need. So… at minimum, they made a real difference to at least three people. And that’s not a bad thing for a game to aspire to. 328 million people in the US. 60% of that is 196 million catch it. 0.4% of that is 780,000 dead. asymptomatic but contagious symptomatic 70% of asymptomatic cases turn symptomatic after 1d10+5 days. The others stay sick for the full 21 days. Percent chance of moving from symptomatic to severe is based on comorbid conditions, but the base chance is 1 in 5 after some amount of days. Percent chance of moving from severe to critical is 1 in 4, modified by age and comorbidities, if in hospital. Otherwise, it’s double. Percent chance of moving from critical to dead is something like 1 in 5, modified by age and comorbidities, if in hospital. Otherwise, it’s double. Symptomatic, severe, and critical circles that do not progress to dead move to ‘recovered’ after 21 days since reaching symptomatic. Severe and critical circles stop moving. Hover on a circle, and you see the circle’s name and age and any comorbidities (“Alison, 64, hypertension.”) Test . This lets them click on a circle. If the circle is asymptomatic or worse, it gets the diagnosed flag. But it costs you one test. Isolate . This lets them click on a circle, and freezes them in place. Some visual indicator shows they are isolated. Note that isolated cases still progress. Hospitalize . This moves the circle to hospital. Hospital only has so many beds. Clicking on a circle already in hospital drops the circle back out in the world. Circles in hospital have half the chance or progressing to the next stage. Buy test . You only have so many tests. You have to click this button to buy more. Buy bed . You only have this many beds. You have to click this button to buy more. Money goes up when circles move. But you are allowed to go negative for money . Lockdown. Lastly, there is a global button that when pressed, freezes 80% of all circles. But it gradually ticks down and circles individually start to move again, and the button must be pressed again from time to time. While lockdown is running, it costs money as well as not generating it. If pressed again, it lifts the lockdown and all circles can move again. At the time that I posted, I could tell that people were desperately unwilling to enter lockdown for any extended period of time; but “The Hammer and the Dance” strategy of pulsed lockdown periods was still very much in our future. I wanted a mechanic that showed population non-compliance. There was also quite a lot of obsessing over case counts at the time, and one of the things that I really wanted to get across was that our testing was so incredibly inadequate that we really had little idea of how many cases we were dealing with and therefore what the IFR (infection fatality rate) actually was. That’s why tests are limited in the design sketch. I was also trying to get across that money was not a problem in dealing with this. You could take the money value negative because governments can choose to do that. I often pointed out in those days that if the government chose, it could send a few thousand dollars to every household every few weeks for the duration of lockdown. It would likely have been less impact to the GDP and the debt than what we actually did. I wanted names. I wanted players to understand the human cost, not just the statistics. Today, I might even suggest that an LLM generate a little biography for every fatality. Another thing that was constantly missed was the impact of comorbidities. To this day, I hear people say “ah, it only affected the old and the ill, so why not have stayed open?” To which I would reply with: For non-Hispanic whites, 33.4 percent of men and 30.7 percent of women. For non-Hispanic Blacks, 42.6 percent of men and 47.0 percent of women. For Mexican Americans, 30.1 percent of men and 28.8 percent of women. 34.2 million Americans, or 10.5% of the population, have diabetes. Nearly 1.6 million Americans have type 1 diabetes, including about 187,000 children and adolescents 4.2% of of the population of the USA has been diagnosed as immunocompromised by their doctor

1 views
emiruz 4 months ago

Modelling beliefs about sets

Here is an interesting scheme I encountered in the wild, generalised and made abstract for you, my intrepid reader. Let \(X\) be a set of binary variables. We are given information about subsets of \(X\), where each update is a probability ranging over a concrete set, the state of which is described by an arbitrary quantified logic formula. For example, \[P\bigg\{A \subset X \mid \exists_{x_i, x_j \in A} \big(x_o \ne x_j))\bigg\} = p\] The above assigns a probability \(p\) to some concrete subset A, with the additional information that at least 1 pair of its members do not have the same value.

0 views
Alex Molas 4 months ago

Bayesian A/B testing is not immune to peeking

Introduction Over the last few months at RevenueCat I’ve been building a statistical framework to flag when an A/B test has reached statistical significance. I went through the usual literature, including Evan Miller’s posts. In his well known “How Not to Run an A/B Test” there’s a claim that with Bayesian experiment design you can stop at any time and still make valid inferences, and that you don’t need a fixed sample size to get a valid result. I’ve read this claim in other posts. The impression I got is that you can peek as often as you want, stop the moment the posterior clears a threshold (eg $P(A>B) > 0.95$), and you won’t inflate false positives. And this is not correct. If you’re an expert in Bayesian statistics this is probably obvious, but it wasn’t for me. So I decided to run some simulations to see what really happens, and I’m sharing the results here in case it can be useful for others.

0 views
DYNOMIGHT 5 months ago

Dear PendingKetchup

PendingKetchup comments on my recent post on what it means for something to be heritable : The article seems pretty good at math and thinking through unusual implications, but my armchair Substack eugenics alarm that I keep in the back of my brain is beeping. Saying that variance was “invented for the purpose of defining heritability” is technically correct, but that might not be the best kind of correct in this case, because it was invented by the founder of the University of Cambridge Eugenics Society who had decided, presumably to support that project, that he wanted to define something called “heritability”. His particular formula for heritability is presented in the article as if it has odd traits but is obviously basically a sound thing to want to calculate, despite the purpose it was designed for. The vigorous “educational attainment is 40% heritable, well OK maybe not but it’s a lot heritable, stop quibbling” hand waving sounds like a person who wants to show but can’t support a large figure. And that framing of education, as something “attained” by people, rather than something afforded to or invested in them, is almost completely backwards at least through college. The various examples about evil despots and unstoppable crabs highlight how heritability can look large or small independent of more straightforward biologically-mechanistic effects of DNA. But they still give the impression that those are the unusual or exceptional cases. In reality, there are in fact a lot of evil crabs, doing things like systematically carting away resources from Black children’s* schools, and then throwing them in jail. We should expect evil-crab-based explanations of differences between people to be the predominant ones. *Not to say that being Black “is genetic”. Things from accent to how you style your hair to how you dress to what country you happen to be standing in all contribute to racial judgements used for racism. But “heritability” may not be the right tool to disentangle those effects. Dear PendingKetchup, Thanks for complimenting my math (♡), for reading all the way to the evil crabs, and for not explicitly calling me a racist or eugenicist. I also appreciate that you chose sincerity over boring sarcasm and that you painted such a vibrant picture of what you were thinking while reading my post. I hope you won’t mind if I respond in the same spirit. To start, I’d like to admit something. When I wrote that post, I suspected some people might have reactions similar to yours. I don’t like that. I prefer positive feedback! But I’ve basically decided to just let reactions like yours happen, because I don’t know how to avoid them without compromising on other core goals. It sounds like my post gave you a weird feeling. Would it be fair to describe it as a feeling that I’m not being totally upfront about what I really think about race / history / intelligence / biological determinism / the ideal organization of society? Because if so, you’re right. It’s not supposed to be a secret, but it’s true. Why? Well, you may doubt this, but when I wrote that post, my goal was that people who read it would come away with a better understanding of the meaning of heritability and how weird it is. That’s it. Do I have some deeper and darker motivations? Probably. If I probe my subconscious, I find traces of various embarrassing things like “draw attention to myself” or “make people think I am smart” or “after I die, live forever in the world of ideas through my amazing invention of blue-eye-seeking / human-growth-hormone-injecting crabs.” What I don’t find are any goals related to eugenics, Ronald Fisher, the heritability of educational attainment, if “educational attainment” is good terminology, racism, oppression, schools, the justice system, or how society should be organized. These were all non-goals for basically two reasons: My views on those issues aren’t very interesting or notable. I didn’t think anyone would (or should) care about them. Surely, there is some place in the world for things that just try to explain what heritability really means? If that’s what’s promised, then it seems weird to drop in a surprise morality / politics lecture. At the same time, let me concede something else. The weird feeling you got as you read my post might be grounded in statistical truth. That is, it might be true that many people who blog about things like heritability have social views you wouldn’t like. And it might be true that some of them pretend at truth-seeking but are mostly just charlatans out to promote those unliked-by-you social views. You’re dead wrong to think that’s what I’m doing. All your theories of things I’m trying to suggest or imply are unequivocally false. But given the statistical realities, I guess I can’t blame you too much for having your suspicions. So you might ask—if my goal is just to explain heritability, why not make that explicit? Why not have a disclaimer that says, “OK I understand that heritability is fraught and blah blah blah, but I just want to focus on the technical meaning because…”? One reason is that I think that’s boring and condescending. I don’t think people need me to tell them that heritability is fraught. You clearly did not need me to tell you that. Also, I don’t think such disclaimers make you look neutral. Everyone knows that people with certain social views (likely similar to yours) are more likely to give such disclaimers. And they apply the same style of statistical reasoning you used to conclude I might be a eugenicist. I don’t want people who disagree with those social views to think they can’t trust me. Paradoxically, such disclaimers often seem to invite more objections from people who share the views they’re correlated with, too. Perhaps that’s because the more signals we get that someone is on “our” side, the more we tend to notice ideological violations. (I’d refer here to the narcissism of small differences , though I worry you may find that reference objectionable.) If you want to focus on the facts, the best strategy seems to be serene and spiky: to demonstrate by your actions that you are on no one’s side, that you don’t care about being on anyone’s side, and that your only loyalty is to readers who want to understand the facts and make up their own damned mind about everything else. I’m not offended by your comment. I do think it’s a little strange that you’d publicly suggest someone might be a eugenicist on the basis of such limited evidence. But no one is forcing me to write things and put them on the internet. The reason I’m writing to you is that you were polite and civil and seem well-intentioned. So I wanted you to know that your world model is inaccurate. You seem to think that because my post did not explicitly support your social views, it must have been written with the goal of undermining those views. And that is wrong. The truth is, I wrote that post without supporting your (or any) social views because I think mixing up facts and social views is bad. Partly, that’s just an aesthetic preference. But if I’m being fully upfront, I also think it’s bad in the consequentialist sense that it makes the world a worse place. Why do I think this? Well, recall that I pointed out that if there were crabs that injected blue-eyed babies with human growth hormone, that would increase the heritability of height. You suggest I had sinister motives for giving this example, as if I was trying to conceal the corollary that if the environment provided more resources to people with certain genes (e.g. skin color) that could increase the heritability of other things (e.g. educational attainment). Do you really think you’re the only reader to notice that corollary? The degree to which things are “heritable” depends on the nature of society. This is a fact. It’s a fact that many people are not aware of. It’s also a fact that—I guess—fits pretty well with your social views. I wanted people to understand that. Not out of loyalty to your social views, but because it is true. It seems that you’re annoyed that I didn’t phrase all my examples in terms of culture war. I could have done that. But I didn’t, because I think my examples are easier to understand, and because the degree to which changing society might change the heritability of some trait is a contentious empirical question. But OK. Imagine I had done that. And imagine all the examples were perfectly aligned with your social views. Do you think that would have made the post more or less effective in convincing people that the fact we’re talking about is true? I think the answer is: Far less effective. I’ll leave you with two questions: Question 1: Do you care about the facts? Do you believe the facts are on your side? Question 2: Did you really think I wrote that post with with the goal of promoting eugenics? If you really did think that, then great! I imagine you’ll be interested to learn that you were incorrect. But just as you had an alarm beeping in your head as you read my post, I had one beeping in my head as I read your comment. My alarm was that you were playing a bit of a game. It’s not that you really think I wanted to promote eugenics, but rather that you’re trying to enforce a norm that everyone must give constant screaming support to your social views and anyone who’s even slightly ambiguous should be ostracized. Of course, this might be a false alarm! But if that is what you’re doing, I have to tell you: I think that’s a dirty trick, and a perfect example of why mixing facts and social views is bad. You may disagree with all my motivations. That’s fine. ( I won’t assume that means you are a eugenicist.) All I ask is that you disapprove accurately. xox dynomight My views on those issues aren’t very interesting or notable. I didn’t think anyone would (or should) care about them. Surely, there is some place in the world for things that just try to explain what heritability really means? If that’s what’s promised, then it seems weird to drop in a surprise morality / politics lecture.

0 views
emiruz 6 months ago

A short statistical reasoning test

Here are a few practical questions of my own invention which are easy to comprehend but very difficult to solve without statistical reasoning competence. They are provided in order of difficulty. The answers are at the end. If you find errors or have elegant alternative solutions, please email me (address in bio)! QUESTIONS 1. Sorting fractions under uncertainty You are given the number of trials and successes for a set of items, and you are asked to sort them by the fraction #successes / #trials.

0 views
DYNOMIGHT 6 months ago

Futarchy’s fundamental flaw — the market — the blog post

Here’s our story so far: Markets are a good way to know what people really think. When India and Pakistan started firing missiles at each other on May 7, I was concerned, what with them both having nuclear weapons. But then I looked at world market prices: See how it crashes on May 7? Me neither. I found that reassuring. But we care about lots of stuff that isn’t always reflected in stock prices, e.g. the outcomes of elections or drug trials. So why not create markets for those, too? If you create contracts that pay out $1 only if some drug trial succeeds, then the prices will reflect what people “really” think. In fact, why don’t we use markets to make decisions? Say you’ve invented two new drugs, but only have enough money to run one trial. Why don’t you create markets for both drugs, then run the trial on the drug that gets a higher price? Contracts for the “winning” drug are resolved based on the trial, while contracts in the other market are cancelled so everyone gets their money back. That’s the idea of Futarchy , which Robin Hanson proposed in 2007. Why don’t we? Well, maybe it won’t work. In 2022, I wrote a post arguing that when you cancel one of the markets, you screw up the incentives for how people should bid, meaning prices won’t reflect the causal impact of different choices. I suggested prices reflect “correlation” rather than causation, for basically the same reason this happens with observational statistics. This post, it was magnificent. It didn’t convince anyone. Years went by. I spent a lot of time reading Bourdieu and worrying about why I buy certain kinds of beer. Gradually I discovered that essentially the same point about futarchy had been made earlier by, e.g., Anders_H in 2015, abramdemski in 2017, and Luzka in 2021. In early 2025, I went to a conference and got into a bunch of (friendly) debates about this. I was astonished to find that verbally repeating the arguments from my post did not convince anyone. I even immodestly asked one person to read my post on the spot. (Bloggers: Do not do that.) That sort of worked. So, I decided to try again. I wrote another post called ” Futarky’s Futarchy’s fundamental flaw” . It made the same argument with more aggression, with clearer examples, and with a new impossibility theorem that showed there doesn’t even exist any alternate payout function that would incentivize people to bid according to their causal beliefs. That post… also didn’t convince anyone. In the discussion on LessWrong , many of my comments are upvoted for quality but downvoted for accuracy, which I think means, “nice try champ; have a head pat; nah.” Robin Hanson wrote a response , albeit without outward evidence of reading beyond the first paragraph. Even the people who agreed with me often seemed to interpret me as arguing that futarchy satisfies evidential decision theory rather than causal decision theory . Which was weird, given that I never mentioned either of those, don’t accept the premise the futarchy satisfies either of them, and don’t find the distinction helpful in this context. In my darkest moments, I started to wonder if I might fail to achieve worldwide consensus that futarchy doesn’t estimate causal effects. I figured I’d wait a few years and then launch another salvo. But then, legendary human Bolton Bailey decided to stop theorizing and take one of my thought experiments and turn it into an actual experiment. Thus, Futarchy’s fundamental flaw — the market was born. (You are now reading a blog post about that market.) I gave a thought experiment where there are two coins and the market is trying to pick the one that’s more likely to land heads. For one coin, the bias is known, while for the other coin there’s uncertainty. I claimed futarchy would select the worse / wrong coin, due to this extra uncertainty. Bolton formalized this as follows: There are two markets, one for coin A and one for coin B. Coin A is a normal coin that lands heads 60% of the time. Coin B is a trick coin that either always lands heads or always lands tails, we just don’t know which. There’s a 59% it’s an always-heads coin. Twenty-four hours before markets close, the true nature of coin B is revealed. After the markets closes, whichever coin has a higher price is flipped and contracts pay out $1 for heads and $0 for tails. The other market is cancelled so everyone gets their money back. Get that? Everyone knows that there’s a 60% chance coin A will land heads and a 59% chance coin B will land heads. But for coin A, that represents true “aleatoric” uncertainty, while for coin B that represents “epistemic” uncertainty due to a lack of knowledge. (See Bayes is not a phase for more on “aleatoric” vs. “epistemic” uncertainty.) Bolton created that market independently. At the time, we’d never communicated about this or anything else. To this day, I have no idea what he thinks about my argument or what he expected to happen. In the forum for the market, there was a lot of debate about “whalebait”. Here’s the concern: Say you’ve bought a lot of contracts for coin B, but it emerges that coin B is always-tails. If you have a lot of money, then you might go in at the last second and buy a ton of contracts on coin A to try to force the market price above coin B, so the coin B market is cancelled and you get your money back. The conversation seemed to converge towards the idea that this was whalebait. Though notice that if you’re buying contracts for coin A at any price above $0.60, you’re basically giving away free money. It could still work, but it’s dangerous and everyone else has an incentive to stop you. If I was betting in this market, I’d think that this was at least unlikely . Bolton posted about the market. When I first saw the rules, I thought it wasn’t a valid test of my theory and wasted a huge amount of Bolton’s time trying to propose other experiments that would “fix” it. Bolton was very patient, but I eventually realized that it was completely fine and there was nothing to fix. At the time, this is what the prices looked like: That is, at the time, both coins were priced at $0.60, which is not what I had predicted. Nevertheless, I publicly agreed that this was a valid test of my claims. I think this is a great test and look forward to seeing the results. Let me reiterate why I thought the markets were wrong and coin B deserved a higher price. There’s a 59% chance coin B would turns out to be all-heads. If that happened, then (absent whales being baited) I thought the coin B market would activate, so contracts are worth $1. So thats 59% × $1 = $0.59 of value. But if coin B turns out to be all-tails, I thought there is a good chance prices for coin B would drop below coin A, so the market is cancelled and you get your money back. So I thought a contract had to be worth more than $0.59. If you buy a contract for coin B for $0.70, then I think that’s worth Surely isn’t that low. So surely this is worth more than $0.59. More generally, say you buy a YES contract for coin B for $M. Then that contract would be worth It’s not hard to show that the breakeven price is Even if you thought was only 50%, then the breakeven price would still be $0.7421. Within a few hours, a few people bought contracts on coin B, driving up the price. Then, Quroe proposed creating derivative markets. In theory, if there was a market asking if coin A was going to resolve YES, NO, or N/A, supposedly people could arbitrage their bets accordingly and make this market calibrated. Same for a similar market on coin B. Thus, Futarchy’s Fundamental Fix - Coin A and Futarchy’s Fundamental Fix - Coin B came to be. These were markets in which people could bid on the probability that each coin would resolve YES, meaning the coin was flipped and landed heads, NO, meaning the coin was flipped and landed tails, or N/A, meaning the market was cancelled. Honestly, I didn’t understand this. I saw no reason that these derivative markets would make people bid their true beliefs. If they did, then my whole theory that markets reflect correlation rather than causation would be invalidated. Prices for coin B went up and down, but mostly up. Eventually, a few people created large limit orders, which caused things to stabilize. Here was the derivative market for coin A. And here it was market for coin B. During this period, not a whole hell of a lot happened. This brings us up to the moment of truth, when the true nature of coin B was to be revealed. At this point, coin B was at $0.90, even though everyone knows it only has a 59% chance of being heads. The nature of the coin was revealed. To show this was fair, Bolton did this by asking a bot to publicly generate a random number. Thus, coin B was determined to be always-heads. There were still 24 hours left to bid. At this point, a contract for coin B was guaranteed to pay out $1. The market quickly jumped to $1. I was right. Everyone knew coin A had a higher chance of being heads than coin B, but everyone bid the price of coin B way above coin A anyway. In the previous math box, we saw that the breakeven price should satisfy If you invert this and plug in M=$0.90, then you get I’ll now open the floor for questions. Isn’t this market unrealistic? Yes, but that’s kind of the point. I created the thought experiment because I wanted to make the problem maximally obvious, because it’s subtle and everyone is determined to deny that it exists. Isn’t this just a weird probability thing? Why does this show futarchy is flawed? The fact that this is possible is concerning. If this can happen, then futarchy does not work in general . If you want to claim that futarchy works, then you need to spell out exactly what extra assumptions you’re adding to guarantee that this kind of thing won’t happen. But prices did reflect causality when the market closed! Doesn’t that mean this isn’t a valid test? No. That’s just a quirk of the implementation. You can easily create situations that would have the same issue all the way through market close. Here’s one way you could do that: On average, this market will run for 30 days. (The length follows a geometric distribution ). Half the time, the market will close without the nature of coin B being revealed. Even when that happens, I claim the price for coin B will still be above coin A. If futarchy is flawed, shouldn’t you be able to show that without this weird step of “revealing” coin B? Yes. You should be able to do that, and I think you can. Here’s one way: First, have users generate public keys by running this command: Second, they should post the contents of the when asking for their bit. For example: Third, whoever is running the market should save that key as , pick a pit, and encrypt it like this: Users can then decrypt like this: Or you could use email… I think this market captures a dynamic that’s present in basically any use of futarchy: You have some information, but you know other information is out there. I claim that this market—will be weird. Say it just opened. If you didn’t get a bit, then as far as you know, the bias for coin B could be anywhere between 49% and 69%, with a mean of 59%. If you did get a bit, then it turns out that the posterior mean is 58.5% if you got a and 59.5% if you got a . So either way, your best guess is very close to 59%. However, the information for the true bias of coin B is out there! Surely coin B is more likely to end up with a higher price in situations where there are lots of bits. This means you should bid at least a little higher than your true belief, for the same reason as the main experiment—the market activating is correlated with the true bias of coin B. Of course, after the markets open, people will see each other’s bids and… something will happen. Initially, I think prices will be strongly biased for the above reasons. But as you get closer to market close, there’s less time for information to spread. If you are the last person to trade, and you know you’re the last person to trade, then you should do so based on your true beliefs. Except, everyone knows that there’s less time for information to spread. So while you are waiting till the last minute to reveal your true beliefs, everyone else will do the same thing. So maybe people sort of rush in at the last second? (It would be easier to think about this if implemented with batched auctions rather than a real-time market.) Anyway, while the game theory is vexing, I think there’s a mix of (1) people bidding higher than their true beliefs due to correlations between the final price and the true bias of coin B and (2) people “racing” to make the final bid before the markets close. Both of these seem in conflict with the idea of prediction markets making people share information and measuring collective beliefs. Why do you hate futarchy? I like futarchy. I think society doesn’t make decisions very well, and I think we should give much more attention to new ideas like futarchy that might help us do better. I just think we should be aware of its imperfections and consider variants (e.g. commiting to randomization ) that would resolve them. If I claim futarchy does reflect causal effects, and I reject this experiment as invalid, should I specify what restrictions I want to place on “valid” experiments (and thus make explicit the assumptions under which I claim futarchy works) since otherwise my claims are unfalsifiable? Markets are a good way to know what people really think. When India and Pakistan started firing missiles at each other on May 7, I was concerned, what with them both having nuclear weapons. But then I looked at world market prices: See how it crashes on May 7? Me neither. I found that reassuring. But we care about lots of stuff that isn’t always reflected in stock prices, e.g. the outcomes of elections or drug trials. So why not create markets for those, too? If you create contracts that pay out $1 only if some drug trial succeeds, then the prices will reflect what people “really” think. In fact, why don’t we use markets to make decisions? Say you’ve invented two new drugs, but only have enough money to run one trial. Why don’t you create markets for both drugs, then run the trial on the drug that gets a higher price? Contracts for the “winning” drug are resolved based on the trial, while contracts in the other market are cancelled so everyone gets their money back. That’s the idea of Futarchy , which Robin Hanson proposed in 2007. Why don’t we? Well, maybe it won’t work. In 2022, I wrote a post arguing that when you cancel one of the markets, you screw up the incentives for how people should bid, meaning prices won’t reflect the causal impact of different choices. I suggested prices reflect “correlation” rather than causation, for basically the same reason this happens with observational statistics. This post, it was magnificent. It didn’t convince anyone. Years went by. I spent a lot of time reading Bourdieu and worrying about why I buy certain kinds of beer. Gradually I discovered that essentially the same point about futarchy had been made earlier by, e.g., Anders_H in 2015, abramdemski in 2017, and Luzka in 2021. In early 2025, I went to a conference and got into a bunch of (friendly) debates about this. I was astonished to find that verbally repeating the arguments from my post did not convince anyone. I even immodestly asked one person to read my post on the spot. (Bloggers: Do not do that.) That sort of worked. So, I decided to try again. I wrote another post called ” Futarky’s Futarchy’s fundamental flaw” . It made the same argument with more aggression, with clearer examples, and with a new impossibility theorem that showed there doesn’t even exist any alternate payout function that would incentivize people to bid according to their causal beliefs. There are two markets, one for coin A and one for coin B. Coin A is a normal coin that lands heads 60% of the time. Coin B is a trick coin that either always lands heads or always lands tails, we just don’t know which. There’s a 59% it’s an always-heads coin. Twenty-four hours before markets close, the true nature of coin B is revealed. After the markets closes, whichever coin has a higher price is flipped and contracts pay out $1 for heads and $0 for tails. The other market is cancelled so everyone gets their money back. Let coin A be heads with probability 60%. This is public information. Let coin B be an ALWAYS HEADS coin with probability 59% and ALWAYS TAILS coin with probability 41%. This is a secret. Every day, generate a random integer between 1 and 30. If it’s 1, immediately resolve the markets. It it’s 2, reveal the nature of coin B. If it’s between 3 and 30, do nothing. Let coin A be heads with probability 60%. This is public information. Sample 20 random bits, e.g. . Let coin B be heads with probability (49+N)% where N is the number of bits. do not reveal these bits publicly. Secretly send these bits to the first 20 people who ask.

0 views
DYNOMIGHT 6 months ago

Heritability puzzlers

The heritability wars have been a-raging. Watching these, I couldn’t help but notice that there’s near-universal confusion about what “heritable” means. Partly, that’s because it’s a subtle concept. But it also seems relevant that almost all explanations of heritability are very, very confusing. For example, here’s Wikipedia’s definition : Any particular phenotype can be modeled as the sum of genetic and environmental effects: Phenotype ( P ) = Genotype ( G ) + Environment ( E ). Likewise the phenotypic variance in the trait – Var ( P ) – is the sum of effects as follows: Var( P ) = Var( G ) + Var( E ) + 2 Cov( G , E ). In a planned experiment Cov( G , E ) can be controlled and held at 0. In this case, heritability, H ², is defined as H ² = Var( G ) / Var( P ) H ² is the broad-sense heritability. Do you find that helpful? I hope not, because it’s a mishmash of undefined terminology, unnecessary equations, and borderline-false statements. If you’re in the mood for a mini-polemic: Reading this almost does more harm than good. While the final definition is correct, it never even attempts to explain what G and P are, it gives an incorrect condition for when the definition applies, and instead mostly devotes itself to an unnecessary digression about environmental effects. The rest of the page doesn’t get much better. Despite being 6700 words long, I think it would be impossible to understand heritability simply by reading it. Meanwhile, some people argue that heritability is meaningless for human traits like intelligence or income or personality. They claim that those traits are the product of complex interactions between genes and the environment and it’s impossible to disentangle the two. These arguments have always struck me as “suspiciously convenient”. I figured that the people making them couldn’t cope with the hard reality that genes are very important and have an enormous influence on what we are. But I increasingly feel that the skeptics have a point. While I think it’s a fact that most human traits are substantially heritable, it’s also true the technical definition of heritability is really weird, and simply does not mean what most people think it means. In this post, I will explain exactly what heritability is, while assuming no background. I will skip everything that can be skipped but—unlike most explanations—I will not skip things that can’t be skipped. Then I’ll go through a series of puzzles demonstrating just how strange heritability is. How tall you are depends on your genes, but also on what you eat, what diseases you got as a child, and how much gravity there is on your home planet. And all those things interact. How do you take all that complexity and reduce it to a single number, like “80% heritable”? The short answer is: Statistical brute force. The long answer is: Read the rest of this post. It turns out that the hard part of heritability isn’t heritability. Lurking in the background is a slippery concept known as a genotypic value . Discussions of heritability often skim past these. Quite possibly, just looking at the words “genotypic value”, you are thinking about skimming ahead right now. Resist that urge! Genotypic values are the core concept, and without them you cannot possibly understand heritability. For any trait, your genotypic value is the “typical” outcome if someone with your DNA were raised in many different random environments. In principle, if you wanted to know your genotypic height, you’d need to do this: Since you can’t / shouldn’t do that, you’ll never know your genotypic height. But that’s how it’s defined in principle—the average height someone with your DNA would grow to in a random environment. If you got lots of food and medical care as a child, your actual height is probably above your genotypic height. If you suffered from rickets, your actual height is probably lower than your genotypic height. Comfortable with genotypic values? OK. Then (broad-sense) heritability is easy. It’s the ratio Here, is the variance , basically just how much things vary in the population. Among all adults worldwide, is around 50 cm². (Incidentally, did you know that variance was invented for the purpose of defining heritability?) Meanwhile, is how much genotypic height varies in the population. That might seem hopeless to estimate, given that we don’t know anyone’s genotypic height. But it turns out that we can still estimate the variance using, e.g., pairs of adopted twins, and it’s thought to be around 40 cm². If we use those numbers, the heritability of height would be People often convert this to a percentage and say “height is 80% heritable”. I’m not sure I like that, since it masks heritability’s true nature as a ratio. But everyone does it, so I’ll do it too. People who really want to be intimidating might also say, “genes explain 80% of the variance in height”. Of course, basically the same definition works for any trait, like weight or income or fondness for pseudonymous existential angst science blogs. But instead of replacing “height” with “trait”, biologists have invented the ultra-fancy word “phenotype” and write The word “phenotype” suggests some magical concept that would take years of study to understand. But don’t be intimidated. It just means the actual observed value of some trait(s). You can measure your phenotypic height with a tape measure. Let me make two points before moving on. First, this definition of heritability assumes nothing. We are not assuming that genes are independent of the environment or that “genotypic effects” combine linearly with “environmental effects”. We are not assuming that genes are in Hardy-Weinberg equilibrium , whatever that is. No. I didn’t talk about that stuff because I don’t need to. There are no hidden assumptions. The above definition always works. Second, many normal English words have parallel technical meanings, such as “field” , “insulator” , “phase” , “measure” , “tree” , or “stack” . Those are all nice, because they’re evocative and it’s almost always clear from context which meaning is intended. But sometimes, scientists redefine existing words to mean something technical that overlaps but also contradicts the normal meaning, as in “salt” , “glass” , “normal” , “berry” , or “nut” . These all cause confusion, but “heritability” must be the most egregious case in all of science. Before you ever heard the technical definition of heritability, you surely had some fuzzy concept in your mind. Personally, I thought of heritability as meaning how many “points” you get from genes versus the environment. If charisma was 60% heritable, I pictured each person has having 10 total “charisma points”, 6 of which come from genes, and 4 from the environment: If you take nothing else from this post, please remember that the technical definition of heritability does not work like that . You might hope that if we add some plausible assumptions, the above ratio-based definition would simplify into something nice and natural, that aligns with what “heritability” means in normal English. But that does not happen. If that’s confusing, well, it’s not my fault. Not sure what’s happening here, but it seems relevant. So “heritability” is just the ratio of genotypic and phenotypic variance. Is that so bad? I think… maybe? How heritable is eye color? Close to 100%. This seems obvious, but let’s justify it using our definition that . Well, people have the same eye color, no matter what environment they are raised in. That means that genotypic eye color and phenotypic eye color are the same thing. So they have the same variance, and the ratio is 1. Nothing tricky here. How heritable is speaking Turkish? Close to 0%. Your native language is determined by your environment. If you grow up in a family that speaks Turkish, you speak Turkish. Genes don’t matter. Of course, there are lots of genes that are correlated with speaking Turkish, since Turks are not, genetically speaking, a random sample of the global population. But that doesn’t matter, because if you put Turkish babies in Korean households, they speak Korean. Genotypic values are defined by what happens in a random environment, which breaks the correlation between speaking Turkish and having Turkish genes. Since 1.1% of humans speak Turkish, the genotypic value for speaking Turkish is around 0.011 for everyone, no matter their DNA. Since that’s basically constant, the genotypic variance is near zero, and heritability is near zero. How heritable is speaking English? Perhaps 30%. Probably somewhere between 10% and 50%. Definitely more than zero. That’s right. Turkish isn’t heritable but English is. Yes it is . If you ask an LLM, it will tell you that the heritability of English is zero. But the LLM is wrong and I am right. Why? Let me first acknowledge that Turkish is a little bit heritable. For one thing, some people have genes that make them non-verbal. And there’s surely some genetic basis for being a crazy polyglot that learns many languages for fun. But speaking Turkish as a second language is quite rare , meaning that the genotypic value of speaking Turkish is close to 0.011 for almost everyone. English is different. While only 1 in 20 people in the world speak English as a first language, 1 in 7 learn it as a second language. And who does that? Educated people. Some argue the heritability of educational attainment is much lower. I’d like to avoid debating the exact numbers, but note that these lower numbers are usually estimates of “narrow-sense” heritability rather than “broad-sense” heritability as we’re talking about. So they should be lower. (I’ll explain the difference later.) It’s entirely possible that broad-sense heritability is lower than 40%, but everyone agrees it’s much larger than zero. So the heritability of English is surely much larger than zero, too. Say there’s an island where genes have no impact on height. How heritable is height among people on this island? There’s nothing tricky here. Say there’s an island where genes entirely determine height. How heritable is height? Again, nothing tricky. Say there’s an island where neither genes nor the environment influence height and everyone is exactly 165 cm tall. How heritable is height? It’s undefined. In this case, everyone has exactly the same phenotypic and genotypic height, namely 165 cm. Since those are both constant, their variance is zero and heritability is zero divided by zero. That’s meaningless. Say there’s an island where some people have genes that predispose them to be taller than others. But the island is ruled by a cruel despot who denies food to children with taller genes, so that on average, everyone is 165 ± 5 cm tall. How heritable is height? On this island, everyone has a genotypic height of 165 cm. So genotypic variance is zero, but phenotypic variance is positive, due to the ± 5 cm random variation. So heritability is zero divided by some positive number. Say there’s an island where some people have genes that predispose them to be tall and some have genes that predispose them to be short. But, the same genes that make you tall also make you semi-starve your children, so in practice everyone is exactly 165 cm tall. How heritable is height? ∞%. Not 100%, mind you, infinitely heritable. To see why, note that if babies with short/tall genes are adopted by parents with short/tall genes, there are four possible cases. If a baby with short genes is adopted into random families, they will be shorter on average than if a baby with tall genes. So genotypic height varies. However, in reality, everyone is the same height, so phenotypic height is constant. So genotypic variance is positive while phenotypic variance is zero. Thus, heritability is some positive number divided by zero, i.e. infinity. (Are you worried that humans are “diploid”, with two genes (alleles) at each locus, one from each biological parent? Or that when there are multiple parents, they all tend to have thoughts on the merits of semi-starvation? If so, please pretend people on this island reproduce asexually. Or, if you like, pretend that there’s strong assortative mating so that everyone either has all-short or all-tall genes and only breeds with similar people. Also, don’t fight the hypothetical.) Say there are two islands. They all live the same way and have the same gene pool, except people on island A have some gene that makes them grow to be 150 ± 5 cm tall, while on island B they have a gene that makes them grow to be 160 ± 5 cm tall. How heritable is height? It’s 0% for island A and 0% for island B, and 50% for the two islands together. Why? Well on island A, everyone has the same genotypic height, namely 150 cm. Since that’s constant, genotypic variance is zero. Meanwhile, phenotypic height varies a bit, so phenotypic variance is positive. Thus, heritability is zero. For similar reasons, heritability is zero on island B. But if you put the two islands together, half of people have a genotypic height of 150 cm and half have a genotypic height of 160 cm, so suddenly (via math) genotypic variance is 25 cm². There’s some extra random variation so (via more math) phenotypic variance turns out to be 50 cm². So heritability is 25 / 50 = 50%. If you combine the populations, then genotypic variance is Meanwhile phenotypic variance is Say there’s an island where neither genes nor the environment influence height. Except, some people have a gene that makes them inject their babies with human growth hormone, which makes them 5 cm taller. How heritable is height? True, people with that gene will tend be taller. And the gene is causing them to be taller. But if babies are adopted into random families, it’s the genes of the parents that determine if they get injected or not. So everyone has the same genotypic height, genotypic variance is zero, and heritability is zero. Suppose there’s an island where neither genes nor the environment influence height. Except, some people have a gene that makes them, as babies, talk their parents into injecting them with human growth hormone. The babies are very persuasive. How heritable is height? We’re back to 100%. The difference with the previous scenario is that now babies with that gene get injected with human growth hormone no matter who their parents are. Since nothing else influences height, genotype and phenotype are the same, have the same variance, and heritability is 100%. Suppose there’s an island where neither genes nor the environment influence height. Except, there are crabs that seek out blue-eyed babies and inject them with human growth hormone. The crabs, they are unstoppable. How heritable is height? Again, 100%. Babies with DNA for blue eyes get injected. Babies without DNA for blue eyes don’t. Since nothing else influences height, genotype and phenotype are the same and heritability is 100%. Note that if the crabs were seeking out parents with blue eyes and then injecting their babies, then height would be 0% heritable. It doesn’t matter that human growth hormone is weird thing that’s coming from outside the baby. It doesn’t matter if we think crabs should be semantically classified as part of “the environment”. It doesn’t matter that heritability would drop to zero if you killed all the crabs, or that the direct causal effect of the relevant genes has nothing to do with height. Heritability is a ratio and doesn’t care. So heritability can be high even when genes have no direct causal effect on the trait in question. It can be low even when there is a strong direct effect. It changes when the environment changes. It even changes based on how you group people together. It can be larger than 100% or even undefined. Even so, I’m worried people might interpret this post as a long way of saying heritability is dumb and bad, trolololol . So I thought I’d mention that this is not my view. Say a bunch of companies create different LLMs and train them on different datasets. Some of the resulting LLMs are better at writing fiction than others. Now I ask you, “What percentage of the difference in fiction writing performance is due to the base model code, rather than the datasets or the GPUs or the learning rate schedules?” That’s a natural question. But if you put it to an AI expert, I bet you’ll get a funny look. You need code and data and GPUs to make an LLM. None of those things can write fiction by themselves. Experts would prefer to think about one change at a time: Given this model, changing the dataset in this way changes fiction writing performance this much. Similarly, for humans, I think what we really care about is interventions. If we changed this gene, could we eliminate a disease? If we educate children differently, can we make them healthier and happier? No single number can possibly contain all that information. But heritability is something . I think of it as saying how much hope we have to find an intervention by looking at changes in current genes or current environments. If heritability is high, then given current typical genes , you can’t influence the trait much through current typical environmental changes . If you only knew that eye color was 100% heritable, that means you won’t change your kid’s eye color by reading to them, or putting them on a vegetarian diet, or moving to higher altitude. But it’s conceivable you could do it by putting electromagnets under their bed or forcing them to communicate in interpretive dance. If heritability is high, that also means that given current typical environments you can influence the trait through current typical genes . If the world was ruled by an evil despot who forced red-haired people to take pancreatic cancer pills, then pancreatic cancer would be highly heritable. And you could change the odds someone gets pancreatic cancer by swapping in existing genes for black hair. If heritability is low, that means that given current typical environments , you can’t cause much difference through current typical genetic changes . If we only knew that speaking Turkish was ~0% heritable, that means that doing embryo selection won’t much change the odds that your kid speaks Turkish. If heritability is low, that also means that given current typical genes , you might be able change the trait through current typical environmental changes . If we only know that speaking Turkish was 0% heritable, then that means there might be something you could do to change the odds your kid speaks Turkish, e.g. moving to Turkey. Or, it’s conceivable that it’s just random and moving to Turkey wouldn’t do anything. But be careful. Just because heritability is high doesn’t mean that changing genes is easy. And just because heritability is low doesn’t mean that changing the environment is easy. And heritability doesn’t say anything about non-typical environments or non-typical genes. If an evil despot is giving all the red-haired people cancer pills, perhaps we could solve that by intervening on the despot. And if you want your kid to speak Turkish, it’s possible that there’s some crazy genetic modifications that would turn them into unstoppable Turkish learning machine. Heritability has no idea about any of that, because it’s just an observational statistic based on the world as it exists today. Heritability: Five Battles by Steven Byrnes. Covers similar issues in way that’s more connected to the world and less shy about making empirical claims. A molecular genetics perspective on the heritability of human behavior and group differences by Alexander Gusev. I find the quantitative genetics literature to be incredibly sloppy about notation and definitions and math. (Is this why LLMs are so bad at it?) This is the only source I’ve found that didn’t drive me completely insane. This post focused on “broad-sense” heritability. But there a second heritability out there, called “narrow-sense”. Like broad-sense heritability, we can define the narrow-sense heritability of height as a ratio: The difference is that rather than having height in the numerator, we now have “additive height”. To define that, imagine doing the following for each of your genes, one at a time: For example, say overall average human height is 150 cm, but when you insert gene #4023 from yourself into random embryos, their average height is 149.8 cm. Then the additive effect of your gene #4023 is -0.2 cm. Your “additive height” is average human height plus the sum of additive effects for each of your genes. If the average human height is 150 cm, you have one gene with a -0.2 cm additive effect, another gene with a +0.3 cm additive effect and the rest of your genes have no additive effect, then your “additive height” is 150 cm - 0.2 cm + 0.3 cm = 150.1 cm. Note: This terminology of “additive height” is non-standard. People usually define narrow-sense heritability using “additive effects ”, which are the same thing but without including the mean. This doesn’t change anything since adding a constant doesn’t change the variance. But it’s easier to say “your additive height is 150.1 cm” rather than “the additive effect of your genes on height is +0.1 cm” so I’ll do that. Honestly, I don’t think the distinction between “broad-sense” and “narrow-sense” heritability is that important. We’ve already seen that broad-sense heritability is weird, and narrow-sense heritability is similar but different. So it won’t surprise you to learn that narrow-sense heritability is differently -weird. But if you really want to understand the difference, I can offer you some more puzzles. Say there’s an island where people have two genes, each of which is equally likely to be A or B. People are 100 cm tall if they have an AA genotype, 150 cm tall if they have an AB or BA genotype, and 200 cm tall if they have a BB genotype. How heritable is height? Both broad and narrow-sense heritability are 100%. The explanation for broad-sense heritability is like many we’ve seen already. Genes entirely determine someone’s height, and so genotypic and phenotypic height are the same. For narrow-sense heritability, we need to calculate some additive heights. The overall mean is 150 cm, each A gene has an additive effect of -25 cm, and each B gene has an additive effect of +25 cm. But wait! Let’s work out the additive height for all four cases: Since additive height is also the same as phenotypic height, narrow-sense heritability is also 100%. In this case, the two heritabilities were the same. At a high level, that’s because the genes act independently. When there are “gene-gene” interactions, you tend to get different numbers. Say there’s an island where people have two genes, each of which is equally likely to be A or B. People with AA or BB genomes are 100 cm, while people with AB or BA genomes are 200 cm. How heritable is height? Broad-sense heritability is 100%, while narrow-sense heritability is 0%. You know the story for broad-sense heritability by now. For narrow-sense heritability, we need to do a little math. So everyone has an additive height of 150 cm, no matter their genes. That’s constant, so narrow-sense heritability is zero. I think basically for two reasons: First, for some types of data (twin studies) it’s much easier to estimate broad-sense heritability. For other types of data (GWAS) it’s much easier to estimate narrow-sense heritability. So we take what we can get. Second, they’re useful for different things. Broad-sense heritability is defined by looking at what all your genes do together. That’s nice, since you are the product of all your genes working together. But combinations of genes are not well-preserved by reproduction. If you have a kid, then they breed with someone, their kids breed with other people, and so on. Generations later, any special combination of genes you might have is gone. So if you’re interested in the long-term impact of you having another kid, narrow-sense heritability might be the way to go. (Sexual reproduction doesn’t really allow for preserving the genetics that make you uniquely “you”. Remember, almost all your genes are shared by lots of other people. If you have any unique genes, that’s almost certainly because they have deleterious de-novo mutations. From the perspective of evolution, your life just amounts to a tiny increase or decrease in the per-locus population frequencies of your individual genes. The participants in the game of evolution are genes. Living creatures like you are part of the playing field. Food for thought.) Phenotype ( P ) is never defined. This is a minor issue, since it just means “trait”. Genotype ( G ) is never defined. This is a huge issue, since it’s very tricky and heritability makes no sense without it. Environment ( E ) is never defined. This is worse than it seems, since in heritability, different people use “environment” and E to refer to different things. When we write P = G + E , are we assuming some kind of linear interaction? The text implies not, but why? What does this equation mean? If this equation is always true, then why do people often add other stuff like G × E on the right? The text states that if you do a planned experiment (how?) and make Cov( G , E ) = 0, then heritability is Var( G ) / Var( P ). But in fact, heritability is always defined that way. You don’t need a planned experiment and it’s fine if Cov( G , E ) ≠ 0. And—wait a second—that definition doesn’t refer to environmental effects at all. So what was the point of introducing them? What was the point of writing P = G + E ? What are we doing? Create a million embryonic clones of yourself. Implant them in the wombs of randomly chosen women around the world who were about to get pregnant on their own. Convince them to raise those babies exactly like a baby of their own. Wait 25 years, find all your clones and take their average height. If heritability is high, then given current typical genes , you can’t influence the trait much through current typical environmental changes . If you only knew that eye color was 100% heritable, that means you won’t change your kid’s eye color by reading to them, or putting them on a vegetarian diet, or moving to higher altitude. But it’s conceivable you could do it by putting electromagnets under their bed or forcing them to communicate in interpretive dance. If heritability is high, that also means that given current typical environments you can influence the trait through current typical genes . If the world was ruled by an evil despot who forced red-haired people to take pancreatic cancer pills, then pancreatic cancer would be highly heritable. And you could change the odds someone gets pancreatic cancer by swapping in existing genes for black hair. If heritability is low, that means that given current typical environments , you can’t cause much difference through current typical genetic changes . If we only knew that speaking Turkish was ~0% heritable, that means that doing embryo selection won’t much change the odds that your kid speaks Turkish. If heritability is low, that also means that given current typical genes , you might be able change the trait through current typical environmental changes . If we only know that speaking Turkish was 0% heritable, then that means there might be something you could do to change the odds your kid speaks Turkish, e.g. moving to Turkey. Or, it’s conceivable that it’s just random and moving to Turkey wouldn’t do anything. Heritability: Five Battles by Steven Byrnes. Covers similar issues in way that’s more connected to the world and less shy about making empirical claims. A molecular genetics perspective on the heritability of human behavior and group differences by Alexander Gusev. I find the quantitative genetics literature to be incredibly sloppy about notation and definitions and math. (Is this why LLMs are so bad at it?) This is the only source I’ve found that didn’t drive me completely insane. Find a million random women in the world who just became pregnant. For each of them, take your gene and insert it into the embryo, replacing whatever was already at that gene’s locus. Convince everyone to raise those babies exactly like a baby of their own. Wait 25 years, find all the resulting people, and take the difference of their average height from overall average height. The overall mean height is 150 cm. If you take a random embryo and replace one gene with A, then the there’s a 50% chance the other gene is A, so they’re 100 cm, and there’s a 50% chance the other gene is B, so they’re 200 cm, for an average of 150 cm. Since that’s the same as the overall mean, the additive effect of an A gene is +0 cm. By similar logic, the additive effect of a B gene is also +0 cm.

0 views
DHH 7 months ago

Linux crosses magic market share threshold in US

According to Statcounter, Linux has claimed 5% market share of desktop computing in the US. That's double of where it was just three years ago. Really impressive. Windows is still dominant at 63%, and Apple sit at 26%. But for the latter, it's quite a drop from their peak of 33% in June 2023

0 views