Posts in Science (20 found)

Notes on Fourier series

The trigonometric Fourier series is a beautiful mathematical theory that shows how to decompose a periodic function into an infinite sum of sinusoids. These are my notes on the subject, with some examples and the connection to linear algebra in Hilbert space. Let’s assume that is a well-behaved 2L -periodic [1] function and that we can find coefficients a_n and b_n such that: Then we say that the Fourier series on the right-hand side converges to . We’ll talk more about the assumptions mentioned above and convergence in the next section. Note that when n=0 , the sum becomes just ; therefore it’s customary to write the series starting with n=1 , with a separate constant component (which is the function's average over one period). To make computations nicer, this constant is typically called a_0 / 2 , so: Our goal is to find the coefficients a_n and b_n that satisfy this equation. We’ll do this in three steps. Step 1: Integrate both sides of the equation between -L and L [2] . Per Appendix A, all integrals within the sum are zero, so we’re left with: And thus we find : Step 2: Multiply both sides by cos\frac{m\pi x}{L} ( m is a positive integer constant) and integrate between -L and L . Looking at the right-hand side, the first integral is zero per Appendix A, and the last integral is zero per Appendix B. We’re left with: Per Appendix B, the integral on the right is zero for all n\neq m , and L for n=m . Therefore, we can write: Recall that m is an arbitrary integer, just like ; for consistency, we’ll replace m by and isolate a_n : Step 3: Hopefully it’s clear where this is going now; multiply both sides by sin\frac{m\pi x}{L} and integrate between -L and L . Using a very similar reasoning to step 2, we’ll end up with: We’ve just found a way to calculate all the coefficients of our Fourier series for : The previous section discusses Fourier series for a function that is well-behaved - but what does that mean? The full answer would lead us deep into analysis, which I’d like to avoid here. So I’ll keep it brief. We typically assume that is square integrable , which is denoted as L^2 . Moreover, we assume that the function is piecewise smooth : each segment of the function has continuous derivatives. A very simple example of a piecewise smooth function is f(x)=|x| . Another is the triangular wave function used in the example below. These conditions hold for pretty much any reasonable function we want to approximate using Fourier series, so they aren’t a serious burden. For a function that satisfies these conditions, it’s guaranteed to have a Fourier series that pointwise converges to it. This means that at every continuous point of , the Fourier series converges to it exactly; at every jump point, the Fourier series converges to the mid-point of the jump. Sometimes, additional properties of the function can help us simplify the Fourier series for it. If f_e(x) is an even function , then we know that: Because the function inside the integral is odd, and integrating an odd function over a symmetric interval results in 0. Therefore, the Fourier series for such f_e(x) is a cosine series : With coefficients and a_n given as before. Similarly if f_o(x) is an odd function, then its and a_n are 0, and its Fourier series is a sine series : So far we’ve been talking about 2L -periodic functions that can be faithfully represented by Fourier series. But what if we have a non-periodic function defined on a finite interval? E.g. suppose we have f(x)=x on the interval [0,L] . Can we approximate it with a Fourier series? Yes! First, we have to make a choice of how to extend the function to the negative interval [-L,0] . Then, we simply repeat the function every 2L - this is called a periodic extension . Note that the Fourier series calculation only cares about the range [-L,L] . The resulting series will approximate the generated periodic function in its entirety, and in particular will also converge to it in the [0,L] interval (except maybe the endpoints, depending on the mode of extension). There are several natural ways to extend a function defined on [0,L] into the interval [-L,0] [3] : Here’s an example of extending our sample function f(x)=x onto the full interval [-L,L] and then repeating it periodically every 2L : Note that the Fourier series for these extended functions will be different. However, they will all converge to in the interval [0,L] . Typically, even and odd extensions have the benefit of producing either cosine or sine series, correspondingly (as discussed in the previous section). We’ve seen that Fourier series work well for periodic functions and also non-periodic functions defined on a finite domain (because we can extend these periodically). But what about aperiodic functions defined on the entire real line? This is where we’ll have to leave Fourier series behind and move on to their generalization - the Fourier transform ; this will be a topic for a separate post. Let’s take the following triangular function t(x) [4] : t(x) is periodic with period 4. We can define it by starting with a formula on the interval [0,2] : Then making an odd extension into [-2,0] and repeating it periodically. Now we can go ahead to calculate its Fourier coefficients. Since this function is odd, we know that we’ll get a sine series , as a_n are going to be 0 for all . Let’s calculate b_n ; in our case L=2 (half the period). Since t(x) is odd and so is the sine, we’re integrating an even function over a symmetric interval. Therefore, we only have to integrate on the positive half of the range and multiply the result by two: Let’s set k=\frac{n\pi}{2} : And split up the integral for the different segments of t(x) : The first integral, by the method described in Appendix C: The second integral can also be split into two: The first of these is trivial to calculate; the second can once again use Appendix C. After some tedious but straightforward calculations [5] we’ll get: Adding I_1+I_2 , we get: Now let’s substitute k=\frac{n\pi}{2} back. This makes sin(2k) zero because the sine of an integer multiple of \pi is always zero: We have b_n , so the Fourier series for our t(x) is: Note that for even values of , sin \frac{n\pi}{2} is zero, so only the odd terms remain: Here’s an interactive chart showing how the series t(x) converges to our triangular function. You can set the number of terms in the Fourier series and see the effect (red line). Note that all even coefficients are zero so it will look the same for as for n-1 when is odd. We’ve written the Fourier series for as follows so far: We can rewrite this in a somewhat more compact form, using a single sinusoid with a configurable phase at each : Based on Appendix D, q_n and \theta_n can be computed as follows: When Fourier series are used in the context of signal processing, this formulation is easier to reason about because it represents the magnitude and phase shift of each harmonic of in the frequency domain [6] It should not come as a surprise that the Fourier series, being a combination of trigonometric functions, can also be represented with complex exponential functions. Specifically, we’ll show that our can be approximated as follows: Let’s calculate C_n . We proceed in a manner similar to before, by multiplying both sides of the equation by e^{-im\pi x/L} and taking an integral in the range [-L,L] : By Appendix A, the sum elements are all zero when n\neq m . When n=m , we get: Therefore, renaming m to (since it’s just an arbitrary integer constant): We’ve found an alternative formulation to Fourier series, using complex exponentials instead of trigonometric functions. While this was a direct derivation, another way to achieve the same result is to use the Euler Formula to derive: And substitute these into the original Fourier series formula. I’ll leave this as an exercise for the diligent reader; eventually, the result will be the same. Moreover, it’s possible to show a direct correspondence between a_n , b_n and C_n , for n>0 : Note that C_{-n}=C_n^* when both a_n and b_n are real (which is the case for a real-valued ). This helps explain why the complex formulation has negative frequencies in the sum; when the function is actually real, each negative frequency is paired up with a positive frequency and the result is real [7] : So, for a real function we only need to account for positive frequencies: We can take it further. C_n is a complex number, so let’s represent it in polar form as C_n=\frac{q_n}{2} e^{i\theta_n} (the factor of half will make sense soon). Then: And substituting back into the sum: This is precisely the compact formulation from the previous section! The most beautiful aspect of Fourier theory is that it doesn’t just happen to work by chance, and is deeply connected to linear algebra. Please read my post on Hilbert space before proceeding. The space of real-valued square integrable functions L^2 forms a Hilbert space, in which we can define the inner product (assuming real functions): We’ve demonstrated that the family of functions: Are all mutually orthogonal, because their pairwise inner products are zero! We’ve also shown that any function in L^2 can be represented as a weighted sum of these functions: So these functions form a basis for L^2 . When we think of these functions as vectors (in an infinite Hilbert space), much of what we did in this post starts feeling like "normal" linear algebra. For example, when we have a set of basis vectors and we want to know how to represent some vector in this basis, we usually find the coefficients by projecting it onto the basis. E.g. with a basis vector e_1 , the coefficient of : Similarly, when we calculate the coefficient b_n for some function , we project onto the basis vector sin\frac{n\pi x}{L} by calculating: From Appendix B, we know that the denominator is L , and we’ve just denoted: Which should look familiar! This is the core linear-algebra idea behind Fourier series: the functions 1 , cos\frac{n\pi x}{L} , and sin\frac{n\pi x}{L} play the role of orthogonal basis vectors, while the Fourier coefficients are coordinates of in this basis. The integral formulas for a_n and b_n are not mysterious tricks; they are projections, just like dot products with basis vectors in ordinary Euclidean space. Fourier series therefore let us decompose a function into independent orthogonal directions, much like decomposing a vector into its , , and z components. For any integer n\neq 0 and an arbitrary constant L, we have: Using these, we can calculate the integral of a complex exponential function for an integer n\neq 0 : We’ll start with the product of two sines, for any positive integers m and : Using the trigonometric identity for a product of sines, we can write: Now let’s focus on two different scenarios, m\neq n and m=n . If m\neq n , then each of the integrals constituting ss are 0 (see on Appendix A), so ss=0 . If m=n , then the second integral is still 0, but the first one isn’t: We can use exactly the same approach to show that: One more variant to cover: Since sine is an odd function and cosine is an even function, their product is an odd function. And the integral of an odd function over a symmetric interval is 0 (see this post for more details ). Let’s calculate the indefinite integral: For some constant k . We’ll use integration by parts: Here u=x , so du=dx . Also dv=sin(kx) , so v=-\frac{cos(kx)}{k} . Putting it together: Let’s take a general sinusoid with magnitude q , frequency and phase : We’re going to show that s(x) can be represented as a sum of a sine and a cosine with no phase. This is related to my earlier post on the sum of same-frequency sinusoids . Let’s start by expanding s(x) using a trigonometric identity: Now we’ll denote: a=q\cdot cos(\theta) and b=-q\cdot sin(\theta) , so: We have a and b in terms of q and , but what about the other way around? Let’s take the equations: Square both of them and add together: Now we’ll take the equations for b and a and divide one by the other: Where the atan2 function is careful to take into account the sign of both numerator and denominator. Also it’s worth mentioning that is determined up to additions of 2\pi . To conclude, for any q , and : With the aforementioned conversion formulas for a , b . The trigonometric Fourier series is a beautiful mathematical theory that shows how to decompose a periodic function into an infinite sum of sinusoids. These are my notes on the subject, with some examples and the connection to linear algebra in Hilbert space. Coefficients of Fourier series Let’s assume that is a well-behaved 2L -periodic [1] function and that we can find coefficients a_n and b_n such that: \[f(x)=\sum_{n=0}^{\infty}\left(a_n cos\frac{n\pi x}{L}+b_n sin\frac{n\pi x}{L}\right)\] Then we say that the Fourier series on the right-hand side converges to . We’ll talk more about the assumptions mentioned above and convergence in the next section. Note that when n=0 , the sum becomes just ; therefore it’s customary to write the series starting with n=1 , with a separate constant component (which is the function's average over one period). To make computations nicer, this constant is typically called a_0 / 2 , so: \[f(x)=\frac{a_0}{2}+\sum_{n=1}^{\infty}\left(a_n cos\frac{n\pi x}{L}+b_n sin\frac{n\pi x}{L}\right)\] Our goal is to find the coefficients a_n and b_n that satisfy this equation. We’ll do this in three steps. Step 1: Integrate both sides of the equation between -L and L [2] . \[\int_{-L}^{L}f(x)dx=\int_{-L}^{L}\frac{a_0}{2}dx+\sum_{n=1}^{\infty}\bigg (\int_{-L}^{L}a_n cos\frac{n\pi x}{L}dx+\int_{-L}^{L}b_n sin\frac{n\pi x}{L}dx\bigg )\] Per Appendix A, all integrals within the sum are zero, so we’re left with: \[\int_{-L}^{L}f(x)dx=\int_{-L}^{L}\frac{a_0}{2}dx=\bigg[\frac{x\cdot a_0}{2}\bigg]_{-L}^{L}=a_0\cdot L\] And thus we find : \[a_0=\frac{1}{L}\int_{-L}^{L}f(x)dx\] Step 2: Multiply both sides by cos\frac{m\pi x}{L} ( m is a positive integer constant) and integrate between -L and L . \[\begin{aligned} \int_{-L}^{L}f(x)cos\frac{m\pi x}{L}dx&=\int_{-L}^{L}\frac{a_0}{2}cos\frac{m\pi x}{L}dx\\ &+\sum_{n=1}^{\infty}\bigg (\int_{-L}^{L}a_n cos\frac{n\pi x}{L}cos\frac{m\pi x}{L}dx+\int_{-L}^{L}b_n sin\frac{n\pi x}{L}cos\frac{m\pi x}{L}dx\bigg ) \end{aligned}\] Looking at the right-hand side, the first integral is zero per Appendix A, and the last integral is zero per Appendix B. We’re left with: \[\int_{-L}^{L}f(x)cos\frac{m\pi x}{L}dx=\sum_{n=1}^{\infty}\int_{-L}^{L}a_n cos\frac{n\pi x}{L}cos\frac{m\pi x}{L}dx\] Per Appendix B, the integral on the right is zero for all n\neq m , and L for n=m . Therefore, we can write: \[\int_{-L}^{L}f(x)cos\frac{m\pi x}{L}dx=a_m\cdot L\] Recall that m is an arbitrary integer, just like ; for consistency, we’ll replace m by and isolate a_n : \[a_n=\frac{1}{L}\int_{-L}^{L}f(x)cos\frac{n\pi x}{L}dx\] Step 3: Hopefully it’s clear where this is going now; multiply both sides by sin\frac{m\pi x}{L} and integrate between -L and L . Using a very similar reasoning to step 2, we’ll end up with: \[b_n=\frac{1}{L}\int_{-L}^{L}f(x)sin\frac{n\pi x}{L}dx\] We’ve just found a way to calculate all the coefficients of our Fourier series for : \[f(x)=\frac{a_0}{2}+\sum_{n=1}^{\infty}\left(a_n cos\frac{n\pi x}{L}+b_n sin\frac{n\pi x}{L}\right)\] Where: \[\begin{aligned} a_0&=\frac{1}{L}\int_{-L}^{L}f(x)dx\\ a_n&=\frac{1}{L}\int_{-L}^{L}f(x)cos\frac{n\pi x}{L}dx\\ b_n&=\frac{1}{L}\int_{-L}^{L}f(x)sin\frac{n\pi x}{L}dx \end{aligned}\] Conditions on f and convergence of Fourier series The previous section discusses Fourier series for a function that is well-behaved - but what does that mean? The full answer would lead us deep into analysis, which I’d like to avoid here. So I’ll keep it brief. We typically assume that is square integrable , which is denoted as L^2 . Moreover, we assume that the function is piecewise smooth : each segment of the function has continuous derivatives. A very simple example of a piecewise smooth function is f(x)=|x| . Another is the triangular wave function used in the example below. These conditions hold for pretty much any reasonable function we want to approximate using Fourier series, so they aren’t a serious burden. For a function that satisfies these conditions, it’s guaranteed to have a Fourier series that pointwise converges to it. This means that at every continuous point of , the Fourier series converges to it exactly; at every jump point, the Fourier series converges to the mid-point of the jump. Cosine and Sine series Sometimes, additional properties of the function can help us simplify the Fourier series for it. If f_e(x) is an even function , then we know that: \[b_n=\frac{1}{L}\int_{-L}^{L}f(x)sin\frac{n\pi x}{L}dx=0\] Because the function inside the integral is odd, and integrating an odd function over a symmetric interval results in 0. Therefore, the Fourier series for such f_e(x) is a cosine series : \[f_e(x)=\frac{a_0}{2}+\sum_{n=1}^{\infty}a_n cos\frac{n\pi x}{L}\] With coefficients and a_n given as before. Similarly if f_o(x) is an odd function, then its and a_n are 0, and its Fourier series is a sine series : \[f_o(x)=\sum_{n=1}^{\infty}b_n sin\frac{n\pi x}{L}\] Fourier series for a non-periodic function defined on an interval So far we’ve been talking about 2L -periodic functions that can be faithfully represented by Fourier series. But what if we have a non-periodic function defined on a finite interval? E.g. suppose we have f(x)=x on the interval [0,L] . Can we approximate it with a Fourier series? Yes! First, we have to make a choice of how to extend the function to the negative interval [-L,0] . Then, we simply repeat the function every 2L - this is called a periodic extension . Note that the Fourier series calculation only cares about the range [-L,L] . The resulting series will approximate the generated periodic function in its entirety, and in particular will also converge to it in the [0,L] interval (except maybe the endpoints, depending on the mode of extension). There are several natural ways to extend a function defined on [0,L] into the interval [-L,0] [3] : Direct periodic repetition: we simply repeat every L : f(x+L)=f(x)\ \forall x . Even extension: f(|x|) Odd extension: when x\ge 0 and -f(-x) when x<0 .

0 views
Stratechery 4 days ago

The SpaceX IPO and Data Centers in Space

Listen to this post : It’s hardly the biggest problem in the world — or perhaps the height of privilege to consider it a problem at all — but one of the most annoying consumer experiences is booking an Uber Black and realizing you got assigned a Tesla Model Y (Uber finally stopped allowing new Model Y’s onto Black last year ). Buckle up for an uncomfortable back seat, basic plastic finishes, and, all-too-often, potential car sickness from a driver who hasn’t completely mastered the Tesla’s aggressive regenerative braking. Still, the fact that the Model Y ever made it to the Black level is a testament to the brand Elon Musk built. Back in 2016, when 300,000 people dropped $1,000 each in a matter of hours to reserve an as-yet-unreleased Model 3, I explained that the phenomenon was because It’s a Tesla : The real payoff of Musk’s “Master Plan” is the fact that Tesla means something: yes, it stands for sustainability and caring for the environment, but more important is that Tesla also means amazing performance and Silicon Valley cool. To be sure, Tesla’s focus on the high end has helped them move down the cost curve, but it was Musk’s insistence on making “An electric car without compromises” that ultimately led to 276,000 people reserving a Model 3, many without even seeing the car: after all, it’s a Tesla. This is the same brand halo that landed what is, if we’re honest, a pretty basic car on the Uber Black list. What actually makes these cars compelling is the extent to which they are computers on wheels: I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own. Tesla appears to be doubling down on this point of differentiation: the company stopped production of the Models S and X earlier this year, focusing production resources on the CyberCab and robots; if you want your car to drive itself, you’ll get the same model as everyone else. It reminds me of Andy Warhol’s famous quote : What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke, Liz Taylor drinks Coke, and just think, you can drink Coke, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it. That “tradition” is scale, and America is indeed better at it than any other country in the world; and, amongst Americans, no one pursues and seeks to leverage scale quite like Musk. From a press release from American Airlines: American Airlines today announced a sweeping modernization of its narrowbody inflight customer experience with the installation of Starlink, the fastest Wi-Fi in the sky, on more than 500 narrowbody aircraft beginning in Q1 2027. Starlink is widely regarded as the world’s most advanced satellite constellation using a low Earth orbit to deliver broadband Internet capable of supporting inflight streaming, online gaming, collaborative meeting tools and more. With thousands of satellites in low Earth orbit, Starlink can deliver multigigabit connectivity to aircraft using its Aero Terminal, which can support up to 1 Gbps per antenna. “As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden. “The addition of Starlink solidifies American as a leading airline in keeping passengers connected in flight.” As part of American’s commitment to an elevated onboard experience, Starlink will enable seamless streaming, browsing and real-time communication capabilities across American’s domestic and short-haul international routes. I linked to the press release just for the amusement of American Airlines, which has in recent years built its strategy around offering anything-but-premium on routes you need, billing their Starlink deal as a commitment to “an elevated onboard experience.” That may have been the argument for United’s Starlink deal when it was announced in 2024 , but by this point it’s tablestakes , which is surely exactly how Musk wants it. Starlink is the consumer-facing business of SpaceX, generating $8.7 billion in revenue last year and $4.4 billion in profit; while it’s not totally clear exactly how SpaceX accounts for launch costs, obviously Starlink benefits greatly from the fact that it has access to SpaceX’s launch capacity. That launch capacity has resulted in over ten thousand active satellites in low Earth orbit, delivering low latency high speed Internet anywhere in the world — including in the air. That’s the carrot for airlines; the stick is the prospect of everyone else having the same service, and customers making flight decisions based on the quality of Internet access available. There is a similarity to Tesla in this way. Musk companies at their best don’t win the game; they change the rules through scale, such that billionaires buy economy cars because they actually drive themselves (with supervision), and airlines transform the consumer experience on their own dime. Musk makes all-in bets — whether that be in terms of launch capacity or in autonomous driving — not by making rational short-term business decisions, but by starting with the desired end state and working backwards. Tech has a long history of silly charts — there is an entire category known as Bezos charts — and the SpaceX S-1 has one that made me laugh. It came in the discussion of SpaceX’s total addressable market: We believe we have identified the largest actionable total addressable market (“TAM”) in human history. We estimate that our quantifiable TAM is $28.5 trillion, consisting of $370 billion in Space from space-enabled solutions; $1.6 trillion in Connectivity across $870 billion in Starlink Broadband and $740 billion in Starlink Mobile as well as additional opportunities in enterprise and government; $26.5 trillion in AI across $2.4 trillion in AI infrastructure, $760 billion in consumer subscriptions, $600 billion in digital advertising, and $22.7 trillion in enterprise applications. For illustrative purposes of sizing our addressable market opportunity, we exclude China and Russia from our global estimates. This image is approximately to scale vertically, but certainly not horizontally: I could use the help in really wrapping my mind around the $26.5 trillion AI opportunity, given it’s more than 13 times the space and connectivity opportunity combined! In all seriousness, the numbers are obviously absurd, but then again, everything about this IPO is absurd. SpaceX is seeking a $2 trillion valuation on a mere $18.67 billion in revenue with $4.9 billion in losses last year, and growth actually slowed from 35% to 33%. That slowdown happened despite the addition of xAI (and thus also X), which tipped the company from a small profit to that massive loss, thanks to $5.1 billion in AI R&D expense. That R&D, keep in mind, went towards building a model that is in 5th place, and whose entire founding team recently left the company. But sure, $26.5 trillion AI opportunity! This is not to say that SpaceX won’t get its desired valuation. Tesla’s valuation never made any sense right up until the Models 3 and Y actually worked out, causing Tesla’s share price to soar (and even then it was hard to ever build a financial model that justified the new share price). Musk’s ability to make his own reality starts with investors; from 2021’s Mistakes and Memes and comparing Apple and Tesla: This comparison works as far as it goes, but it doesn’t tell the entire story: after all, Apple’s brand was derived from decades building products, which had made it the most profitable company in the world. Tesla, meanwhile, always seemed to be weeks from going bankrupt, at least until it issued ever more stock, strengthening the conviction of Tesla skeptics and shorts. That, though, was the crazy thing: you would think that issuing stock would lead to Tesla’s stock price slumping; after all, existing shares were being diluted. Time after time, though, Tesla announcements about stock issuances would lead to the stock going up. It didn’t make any sense, at least if you thought about the stock as representing a company. It turned out, though, that TSLA was itself a meme, one about a car company, but also sustainability, and most of all, about Elon Musk himself. Issuing more stock was not diluting existing shareholders; it was extending the opportunity to propagate the TSLA meme to that many more people, and while Musk’s haters multiplied, so did his fans. The Internet, after all, is about abundance, not scarcity. The end result is that instead of infrastructure leading to a movement, a movement, via the stock market, funded the building out of infrastructure. I explained in that Article why I generally did not cover Tesla’s financial results, and the reasoning extends to why I don’t expect to cover SpaceX’s: Musk is the master of memes, and is himself a meme. He offers a dream — Mars, fully autonomous vehicles, an addressable market of $28.5 trillion — and positions his companies and their stock as access to that dream, and through the alchemy of capital markets, transforms shared delusion into mass market reality. Musk’s track record matters in this regard. Building an electric car company was possible, as was full self-driving (supervised); at the same time there were ever increasing government mandates and programs around decreasing emissions that acted as the stick to Tesla’s carrot. Similarly, landing rockets was possible, and the new market creation downstream from correspondingly lower launch costs was comprehensible. That Musk succeeded in both instances gives him the benefit of the doubt. The question that matters, then, is not if the numbers make sense right now (they absolutely do not); what matters is if the dream is even possible, and if there are actual reasons to think it might happen. I think that data centers in space meet these conditions. The first question about data centers in space is if they are even possible, and I think the answer is clearly yes. The key thing to consider is that there is no requirement that these data centers look anything like data centers on earth. On earth we build massive buildings full of GPUs with massive infrastructure for cooling those GPUs and massive power plants (or a connection to a grid which connects to massive power plants) to power those GPUs. The idea of transporting these massive structures to space sounds implausible, and it is! However, there is no reason that space data centers would look like data centers on earth. What makes far more sense is to think about an individual satellite as something akin to a rack. Right now the largest Starlink satellite in orbit is the V2 Mini Direct-to-Cell, which measures 7.4 meters by 2.7 meters by 0.3 meters (estimated); an NVL72 rack from Nvidia, meanwhile, measures 2.2 meters by 1.1 meters by 0.6 meters, so we’re already in the right size range. The V2 Mini Direct-to-Cell consumes (and dissipates) up to an estimated 25kW of energy; the NVL72 up to 135kW, and it can fit a 1 trillion parameter model quantized to FP4. The big shortcoming for a rack-satellite is power and its dissipation, but going from 25kW to 135kW is certainly within the realm of possibility — and given that you don’t need much of the cooling and power distribution usage on earth, something closer to 100kW might deliver similar performance. There are other issues to address, including the problem of radiation screwing with calculations, reliability, etc., although those two concerns could be addressed in part by using larger chips (which are less efficient, but also use less power); these rack-satellites will also be disposable, like Starlink satellites, ameliorating reliability issues. The key factor, however, is that a fleet of racks, interconnected with lasers (as Starlink’s already are), each with their own solar panels and radiator arrays for cooling (deploying 200+ square meters of radiators per rack will be a huge challenge), is possible . The next question about data centers in space is if there is a use case for them — the carrot — and I already made the argument that there is in The Inference Shift . Specifically, there are three types of workloads developing around LLMs: training, answer inference, and agentic inference. From the section making the case for “agentic inference”: Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. If delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. It’s agentic inference that makes the most sense for racks in space, and conveniently enough, that is also the market that is likely to be the largest in the long run. The third question about data centers in space is if there is a stick. Specifically, while I think that racks-in-space are both a lot more viable than people think, and a lot more relevant to agentic inference than current modes of compute, it is at the end of the day cheaper and easier to build on earth, all things being equal. All things are not equal, however: right now we are at the very beginning of the AI buildout and already one of the biggest constraints is not just power (expected), but zoning (unexpected). I wrote in an Update last week : That leads to an interesting contrast to globalization: when companies were closing down American factories and laying off workers and moving operations to China, none of the affected towns or workers had a say. They just suddenly no longer had a job, and a huge number of cities across the Rust Belt no longer had a reason to exist. People simply had to move, or worse, retreat to things like alcohol or drugs. AI, however, is the opposite: building data centers requires permission, which is to say that people actually have a say. Again, I am not at all saying that these people are well informed about data centers, or about the economic impact on their communities, much less the economic impact of AI generally; what I am noting is that people who didn’t have a say in globalization are suddenly finding they do have a say about AI, and it’s not a surprise they are expressing their disapproval by blocking data centers. In that Update I made the case that data center builders — and by extension the companies that use them — should straight up pay people for permission to build data centers in their communities. At a minimum, however, that increases the costs of terrestrial data centers. What seems very plausible in the long run is that the demand for compute ends up being so large that there eventually is nowhere left to build, making the vast expanses of space not just an alternative but in fact the only choice. If all of this happens — and there are a lot of “if”s here! — then suddenly that $2 trillion valuation starts looking reasonable. SpaceX is already monetizing xAI’s first data center, Colossus 1, to the tune of $15 billion/year for 300MW of capacity; that’s 3,000 racks-in-space. Anthropic, meanwhile, will probably make 3x the revenue on that capacity; it remains to be seen if xAI can get back in the state-of-the-art game, but if so then the amount of revenue it can generate per rack-in-space will be commensurately higher. Even without xAI, however, SpaceX has the potential to be a monopoly provider of marginal compute capacity. There are, needless to say, a massive number of assumptions baked into this argument, including assuming a huge number of engineering challenges are solved, Starship actually works, SpaceX gets sufficient supply of the right kinds of chips, compute demand is massively larger, agentic inference unbundles current architectures, and data center opponents are successful. The risk attached to all of these assumptions should discount the valuation you put on this business, which is to say I still think this IPO is nuts. At the same time, I’m glad it exists, for multiple reasons. The first one is the most obvious one: Musk, for all of his faults, has already pushed humanity forward on multiple vectors, including electric cars, self-driving, reusable rockets, satellite Internet, etc., and I’m excited to see him try and do more. The second is that I am in fact concerned about our ability to muster enough compute to fully realize the gains from AI, and am very worried about a replay of nuclear power, where our failure to build denied us the opportunity to even imagine what could be invented in a world of unlimited energy; the fact Musk is proposing an alternative path to unlimited compute is a relief. The third is that I appreciate the extent to which this IPO is a return to what an IPO should be: the opportunity for people to contribute capital to actually build the business, and to benefit if it works out. As I noted, I can’t make a financial model that necessarily justifies this valuation, particularly based on current financials, but neither can a VC investing in the Series A of a company. SpaceX has already invented a lot, and its early investors are going to make a lot of money with this IPO; at the same time, there is still so much more to invent that there remains a lot of upside — and, to be very clear, a lot of risk. It’s a testament to SpaceX’s ambitions that retail investors get to play VC. And hey, you get Mars upside for free! Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Unsung 5 days ago

“But obviously, that’s just silly stuff.”

This 22-minute video by Karl Jobst describes a pretty wild discovery of a glitch called Crystal Storage Glitch, allowing to skip a certain level for much faster completion times in Mega Man X2: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/but-obviously-thats-just-silly-stuff/yt1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/but-obviously-thats-just-silly-stuff/yt1.1600w.avif" type="image/avif"> I won’t spoil the glitch because it’s a fascinating combination of a corner case, a race condition, and even a dose of dumb luck. Its finding unveils almost like a scientific discovery over many years – first a theoretical possibility, then a first sighting done in a modified emulator, then confirmation made by a machine via a tool-assisted speedrun, and eventually actual performance by someone by hand. And a lot of this achieved by relative newcomers to the community, too. There is certain poetry here in having to go slow to go fast – you’ll see what I mean. #bugs #games #speedrunning #youtube

0 views
DYNOMIGHT 5 days ago

Is “colorectal cancer” rising in “young people”?

(Yes, but.) Over the past few years, I’ve seen many articles about mysterious rise in colorectal cancer (CRC) in young people. There are various stories for why this might be happening: General health. Maybe modern people are unhealthy (obesity, low physical activity, diabetes, poor sleep), leading to insulin resistance and chronic inflammation, meaning faster epithelial cell proliferation and a miscalibrated immune system that fails to stop early cancers? Ultra-processed food. Maybe people are eating more ultra-processed foods that contain additives (like emulsifiers) that degrade colon mucus, allowing bacteria to contact epithelial cells and drive inflammation? Or maybe ultra-processed food has low fiber and glycemic load, leading to insulin resistance and chronic inflammation, with the problems mentioned above? Bad meat. Maybe people are eating more red and/or processed meats, which expose the colon to nitrites and secondary bile acids, which inflame the epithelium and promote chronic inflammation? The microbiome. Maybe it’s the microbiome. For example, maybe people’s guts are getting colonized by strains of E. coli that produce genotoxic colibactin . Or maybe overuse of antibiotics in early life depletes protective bacteria in the gut, allowing harmful strains to expand, e.g. strains of B. fragilis that cause inflammation, or strains of F. nucleatum that can survive in the gut and drive tumor growth ? Environmental exposures. Maybe people are getting exposed to bad stuff in the environment (microplastics, forever chemicals , pesticides, endocrine disruptors, air pollution) that does bad stuff (damages gut barrier, screws up the microbiome, disrupts hormonal signaling)? Maternal health. Maybe poor maternal health (obesity, diabetes) exposes the fetus to elevated glucose / insulin / inflammation, and these in turn program the child for a lifetime of metabolic issues and inflammation? Whatever. Maybe alcohol / smoking / painkillers / calcium / vitamin D / inflammatory bowel disease / hereditary syndromes / screening bias? None of the experts seem to agree on which of these is the culprit, so I figured that I (person with blog) should help. If you poke at these stories, most of them are individually pretty weak. It can’t all be detection bias since CRC deaths are also going up in younger people. And several proposed causes (air pollution, tobacco) have actually fallen in rich countries. Other explanation, like E. coli producing colibactin, seem biologically real, but there’s no evidence that they’re increasing over time. Still other suggested causes (microplastics, forever chemicals) are mostly mechanistic speculation at this point. Obesity, inactivity, and chronic inflammation also all seem biologically real, and they are likely increasing, but why should they specifically cause colorectal cancer in young people ? A plausible answer to that last question is that they aren’t. They’re doing it, but not specifically . This will sound pedantic, but bear with me: If you say that CRC is increasing in younger people, what exactly does that mean? After all, the set of people who qualify as young changes over time. (Ever notice that you keep getting older?) Siegel et al. (2026) plot how often CRC was found in different age groups in 1995 and in 2022. They also provide this plot of how common different types of CRC are in different age groups. At a glance, this doesn’t look so bad. If you’re young, you might think, “OK, my current risk is higher than previous generations faced at the same age, but I can look forward to decreasing rates when I’m old.” You could easily think this is good news: While there’s a relative increase when you’re young, it’s tiny compared to the absolute decrease while you’re old. Unfortunately that’s the wrong way to think about it. Downham et al. (2026) plot CRC rates in different age groups across the Anglosphere over time. Everyone I’ve shown this plot to has said it’s confusing, so let me explain: The different lines track age-bands as people born in different years move in and out of those bands. For example, in the US plot in the bottom right, the “20-25” line starts with the left-most dot showing the CRC rate for people born between 1965 and 1970 when they were 20 to 24 years old (around 1990). The next dot shows the rate for people born between 1970 and 1975 when they were 20 to 24 years old (around 1995), and so on. That figure is weird, because the lines connect different groups of people. I wanted a plot where there are lines for different birth cohorts as they age. For unknown reasons, no one seems to make such plots, and the data isn’t trivial to access. So I used a plot digitizer to click on every damned point that US figure above and then replotted it: Now the individual lines show specific groups of people tracked through time. For example, the “1932.5” line shows CRC rates for people born between 1930 and 1935, when those people were at different ages. If you look closely, you’ll notice that these rates are higher those for people born between 1940 and 1945 for all ages (where we have data). That was the pattern for a long time: Between 1920 and 1950, later generations enjoyed lower CRC rates across all phases of their lives. But between 1950 and 1960, that pattern reversed and since then later generations have had higher CRC rates at all ages . We don’t know for sure what will happen in the future. But I think it’s likely this trend will continue. Yes, if you are currently young, you face higher CRC risk than previous generations did when they were young. That’s the bad news. The other bad news is that when you are old, you may also face higher CRC risk than previous generations did when they were old. The other other bad news is that CRC isn’t the only type of cancer that’s rising in later generations. Sung et al. (2019) give this plot: These are again the confusing graphs where individual lines show age bands as different people move in and out of them. But you get the point: Lots of cancers are going up in younger people later generations, including uterine, gallbladder, kidney, liver, pancreas, and thyroid. (Their additional material contains plots for 18 other cancers.) Note that these plots have a logarithmic y-axis, meaning the changes are larger than they might appear. Moving up a quarter of the way between two vertical ticks corresponds to an increase of a factor of ≈ 1.78. If lots of cancers are becoming more common in later generations, then why is everyone talking about CRC? I think that’s because CRC in unique in that it is: For example, thyroid cancer diagnoses have skyrocketed in recent decades. But that’s partly because of more detection, and thyroid cancer is highly treatable, without clear benefits from early detection. Pancreatic cancer also seems to be increasing, but we don’t have good ways to screen for it and even if we did, we don’t have good ways to treat it. CRC is really unique in that you can save lives by telling people, “Hey! CRC is going up! You should get screened!” If you’re interested in public health, that’s the most important thing. But if you’re interested in unraveling the mystery of CRC going up, it’s important to note that CRC isn’t really unique at all. Colorectal cancer is going up in young people. Various kinds of cancer are going up in later generations. (Definitely at younger ages, possibly at all ages.) This blog endorses colorectal cancer screening . We don’t yet know if colonoscopies are better than other methods of screening (sigmoidoscopy, stool tests), but we do know that screening is better than not screening. When caught early, CRC is highly treatable, often with only surgery (no chemotherapy or radiation) and a return to normal activities within a couple weeks. increasing in later generations treatable if caught early detectable via screening

0 views
Simon Willison 5 days ago

Notes on Pope Leo XIV's encyclical on AI

Dropped this morning by the Vatican: Magnifica Humanitas of His Holiness Pope Leo XIV on Safeguarding the Human Person in the Time of Artificial Intelligence . This is a very interesting document. It's some of the clearest writing I've seen on the ethics of integrating AI into modern society. Pope Leo XIV chose the name Leo in honor of Pope Leo XIII, who is known for his 1891 Rerum novarum encyclical on "Rights and Duties of Capital and Labor". This story on Vatican News further clarifies the significance of that decision: Meeting with the College of Cardinals for their first formal encounter after his election, Pope Leo XIV explained part of the reason for the choice of his papal name. "There are different reasons for this," he said, before going on to explain that he chose the name Leo "mainly because Pope Leo XIII, in his historic encyclical Rerum novarum addressed the social question in the context of the first great industrial revolution." "In our own day," he continued, "the Church offers to everyone the treasury of her social teaching in response to another industrial revolution and to developments in the field of artificial intelligence that pose new challenges for the defence of human dignity, justice, and labour." And now we get Pope Leo XIV's own encyclical on the AI revolution. There's a lot in here, but the writing style is very approachable, including to non-Catholics. (I listened to most of the encyclical on a walk with our dog, my first time trying the ElevenReader iPhone app . It worked very well: I pasted in a URL to the document and it read it to me in a very high quality voice, highlighting each paragraph as it went.) Here are some of my highlights. In each case below emphasis is mine. Here's a useful description of the interpretability problem for LLMs in section 98: First, any statement regarding AI risks becoming quickly outdated, given the remarkable pace at which these systems are developing. Second, all of us, including those who design them, possess only a limited understanding of their actual functioning. Indeed, current AI systems are more “cultivated” than “built,” for developers do not directly design every detail, but instead create a framework within which the intelligence “grows.” As a result, fundamental scientific aspects — such as the internal representations and computational processes of these systems — remain, at present, unknown. I liked section 83's description of the relationship between development and dignity: For individuals as well as for nations, development is both a duty and a right. Minimum conditions are required for enabling every person and people to flourish in accord with their dignity, without being kept in a state of dependence or excluded from access to necessary goods. Development is truly human when it places people at the center instead of the accumulation of wealth, and when it concerns peoples as well as individuals. Justice demands the recognition of the rights of society and the rights of peoples, and includes a responsibility toward future generations. Development is not truly human if it increases consumption for some while shifting costs and burdens onto others, or relegates entire regions to subordinate roles, preventing them from realizing their full potential . Baked in cultural biases and sycophancy get a mention in section 100: In personal use, three aspects in particular deserve careful consideration: the ease with which results are obtained, the impression of objectivity and the simulation of human communication. The speed and simplicity with which information, complex analyses, media content and practical assistance can be accessed undoubtedly makes life easier. Yet they can also encourage excessive reliance and the search for ready-made answers, and weaken personal creativity and judgment. The apparent objectivity of the responses and suggestions these systems provide can lead us to overlook the fact that they reflect the cultural assumptions of those who designed and trained them, with all their strengths and limitations . The artificial imitation of positive human communication — words of advice, empathy, friendship and even love — can be engaging and at times genuinely helpful. However, for less discerning users, it can also be misleading, creating the illusion of a relationship with a real personal subject . When words are simulated, they do not build genuine relationships, but only their appearance. The artificial imitation of care or support can become particularly risky when it enters contexts where real relationships and emotional bonds are lacking. 101 touches on the environmental impact: Current AI systems require enormous amounts of energy and water, significantly influencing carbon dioxide emissions, and place heavy demands on natural resources. As their complexity increases, especially in the case of large language models, the need for computing power and storage capacity grows too, which requires an extensive network of machines, cables, data centers and energy-intensive infrastructure . For this reason, it is essential to develop more sustainable technological solutions that reduce environmental impact and help protect our common home. 102 covers the risks of algorithmic systems making decisions that impact people's lives without "compassion, mercy, forgiveness": The use of AI is never a purely technical matter: when it enters processes that affect people’s lives, it touches on rights, opportunities, status and freedom . Important and sensitive decisions — concerning employment, credit, access to public services or even a person’s reputation — risk being fully delegated to automated systems that do not know “compassion, mercy, forgiveness, and above all, the hope that people are able to change,” and can therefore give rise to new forms of exclusion. 105 emphasizes the need for human accountability in how these systems are applied: For AI to respect human dignity and truly serve the common good, responsibility must be clearly defined at every stage: from those who design and develop these systems to those who use them and rely on them for concrete decisions . In many cases, however, the internal processes leading to a result remain opaque, making it harder to assign responsibility and correct errors. This is where accountability becomes crucial: the possibility of identifying who must “account” for decisions, justify them, monitor them, and, when necessary, challenge them and remedy any harm caused . And 108 touches on the way AI amplifies the power of those with resources: In fact, as with every major technological shift, AI tends to amplify the power of those who already possess economic resources, expertise and access to data . In light of the common good and the universal destination of goods, this raises serious concerns, since small but highly influential groups can shape information and consumption patterns, influence democratic processes and steer economic dynamics to their own advantage, undermining social justice and solidarity among peoples. For this reason, it is essential that the use of AI, especially when it touches on public goods and fundamental rights, be guided by clear criteria and effective oversight, grounded in participation and subsidiarity. That same section explicitly calls out data as something that should be thought of more as a public good: [...] Moreover, ownership of data cannot be left solely in private hands but must be appropriately regulated. Data is the product of many contributors and should not be treated as something to be sold off or entrusted to a select few . It is necessary to think creatively in order to manage data as a common or shared good, in a spirit of participation, as Saint John Paul II already suggested regarding collective goods. Given that Palantir is named after a Lord of the Rings reference, I can't help but wonder if the J.R.R. Tolkien quote from The Return of the King (section 213) was the Pope throwing a little shade at Peter Thiel. The twentieth-century Catholic author J.R.R. Tolkien, in the words of a protagonist in one of his novels, described our responsibility in this way: “It is not our part to master all the tides of the world, but to do what is in us for the succour of those years wherein we are set, uprooting the evil in the fields that we know, so that those who live after may have clean earth to till.” The civilization of love will not arise from a single or spectacular gesture, but from the sum total of small and steadfast acts of fidelity that serve as a bulwark against dehumanization. For this reason, it is worthwhile pausing to reflect on some aspects of how we, each in our own way, can cooperate in building the civilization of love. On 6th January this year I joined the Oxide and Friends 2026 predictions podcast episode to talk about predictions for 2026, 2029 and 2032. I wrote mine up here , with hindsight they weren't nearly ambitious enough - it's already undeniable that LLMs write good code, we've made huge advances in sandboxing and New Zealand kākāpō have indeed had a truly excellent breeding season . There's one segment from the episode that I didn't bother to include in my write-up, but that I can't resist providing as a lightly-edited transcript here: Bryan Cantrill: 37:13 I think that AI has created some real public perception problems for itself. And I think that you are gonna have one of the frontier model companies, this year, have a white paper explaining how the proliferation of AI will mean prosperity for everybody. They will be trying to make some economic argument - because this is gonna be a 2026 election issue, how we think of these things and how they are regulated and it's a big mess. There's more heat than light in this debate. Simon Willison: 38:05 I'd like to tag something on to that one: I think that only works if they can sort of wash that through existing trusted experts. Sam Altman and Dario are constantly publishing essays about this stuff and nobody believes a word they say. Get Barack Obama's signature on one of these position papers and maybe you've got something people might start to trust a little bit. Adam Leventhal: 38:27 Otherwise, it's just like "leaded gas is good for you", says Exxon. Bryan Cantrill: 38:31 I mean, yeah. God. Obama... let's go with that, that's a great one because if it's like Bill Clinton everyone's gonna kind of roll their eyes, so it's gotta be someone who's got real credibility saying that this is gonna be broad-based... I'd say if they get that person to do it, it's gonna be revealed that that's also a bit crooked. Simon Willison: 38:57 How about the Pope? Bryan Cantrill: 39:01 The Pope is very into this stuff! That's a great prediction. We've hit pay dirt. The Pope weighing in on LLMs and their economic impact on the world. Simon, I'm giving you full credit if the Pope weighs in believing that this is gonna be economic devastation. My prediction here looks a whole lot less insightful given the Leo XIV/Leo XIII relationship, which I was unaware of when we recorded the episode! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views

90 % of the t distribution

William Sealy Gosset was great. He improved beer at Guinness by using the statistics that existed at the time. Not happy with that, he invented new statistics to brew even better beer. The things he invented are used all over the place now, but Guinness wanted to keep him a secret weapon, so they made him publish his results under the fake name Student . One thing Gosset realised is that it is wrong to compute 90 % confidence intervals for the mean by taking the standard deviation of the sample, and assume a normal distribution , like-a-so: \[\hat{\mu} \pm 1.645 \hat{\sigma}\] (Continue reading the full article on the web.)

0 views
Gabriel Weinberg 1 weeks ago

More data supports science funding literally pays for itself

Previously I put out a post explaining “ how science funding literally pays for itself ” that takes you through the math and some data that backs it up. Now two new data points further bolster this claim. First, the Congressional Budget Office (CBO), the nonpartisan federal agency that provides budget and economic information to Congress, published a report entitled “ Estimating the Economic Effects of Federal Investment in Research and Development . ” Usually the CBO only projects out 10 years per their mandate, but because the effects of science funding can take longer to fully manifest, they projected out 30 years. Thanks for reading Gabriel Weinberg! Subscribe for free to receive new posts and support my work. The relevant headline takeaway is highlighted below in their primary table (Table 1), showing that over this period the effects of a $30B increase in science funding for 10 years ($300B in total and about a 33% increase from today) would result in decreasing the overall deficit over 30 years (see green arrows). The decrease is about -2% on average if the “R&D funding increase [is] financed by reducing noninvestment spending” and about -1% on average if the “R&D funding increase [is] financed by borrowing.” This means that the increased science funding would grow the economy so much that the tax revenues received from this growth alone would outweigh the spending increase, leading to an overall decrease in the budget deficit. In other words, increasing science funding (at least by this amount) is a complete no-brainer, so let’s do it already! A few years ago the CBO did a similar report for infrastructure spending and compared the two in this report, finding the ROI effects of science funding to be about seven times greater than infrastructure spending. Again, so let’s do it already! The effect on the present value of GDP over the next 30 years (discounted using Treasury rates) that a dollar increase in deficit-financed R&D spending would have is about seven times larger than the effect that CBO, in its August 2021 report, estimated the same increase in infrastructure spending would have. Second, the Clark Center regularly polls a panel of economists , and recently they asked about this specific topic . The panel essentially universally agreed that historically U.S. science funding has paid for itself. In particular, 82% agreed “historical federal support for scientific research has paid for itself through a substantial positive effect on long-run U.S. productivity growth.” 0% disagreed, with the rest either not answering, or declaring either “no opinion” or “uncertain”. They also ask respondents about the confidence in their answer, and when weighted the results are even more striking with a whopping 97% in the agree category. Are you sold yet? Government science funding, the bulk of which goes to medical research, extends our lifespans and healthspans by inventing new medicines and other technologies that grow our economy so much it literally pays for itself. I get that this is not the most flashy policy area, but it is the most obviously good for our long-term future. Finally, and also new this year, the Pew Research Center put out a survey on Americans’ views of science and science funding , and among other things found broad bipartisan support for government science funding. 84% of U.S. adults say “government investments in scientific research aimed at advancing knowledge are usually worthwhile investments for society over time.” That breaks down by part as 76% of Republicans and 93% of Democrats (including independents who lean one way or the other). Thanks for reading! Subscribe for free to receive new posts or get the audio version .

0 views
Jeff Geerling 1 weeks ago

Wi-Wi Is Wireless Time Sync at 1 nanosecond

At NAB, I found a demo of Wi-Wi STAMP , a wireless time synchronization protocol that came out of Japan's NICT . Wi-Wi stands for Wireless 2Way interferometry, and it uses the 900 MHz band for picosecond-level time sync, and mm-level distance accuracy, in a tiny box, currently the size of a smartphone. The system is still in development, but existing prototypes have 20ps of phase synchronization jitter, and time synchronization down to 30ns. The next generation will have time down to 5ns in real-world use.

0 views

The Applicability of Spaced Repetition

Spaced repetition has a natural domain of applicability: information that is systematically organized as an unambiguous key-value mapping with short keys and values. The “Hello, world!” of flashcards is the NATO phonetic alphabet : A → alpha, B → bravo, etc. Similarly, the periodic table can be thought of as defining a collection of mappings: element name ↔ symbol, element name ↔ atomic number, etc. You can just drill these cards and memorize the facts without a prior step of understanding, or building a conceptual model. Applying spaced repetition is trivial for this kind of information. That’s why most people who use spaced repetition are either language learners or medical students. In biology the main intuition you need is for “3D shapes bumping around in Brownian motion”, which comes free with your human brain, and afterwards it’s mostly just a lot of facts you have to memorize. Analogously with language: you already have a language center , you just need to drill vocabulary and grammar. And the further you go from this domain, the harder it is to apply spaced repetition. Highly conceptual knowledge, like math, is hard to encode. You have to spend a lot of time just understanding the information, and building a conceptual model in your head, and then you start writing flashcards to solidify that model, like taking tomographic cuts of some complex object. And coming up with questions that make good flashcards (short, unambiguous, etc.) out of this highly abstract knowledge is very hard. Often you have some deceptively simple fact, a simple assertion, but there’s no good way to encode it as a flashcard, so you have to encode “around it” by asking questions that assume or require that knowledge (e.g. asking why X is true), and hoping that in drilling those, your brain will remember the actual target. In general, relational facts are easier to encode, since a binary predicate like $\text{Property}(\text{Object}, \text{Value})$ readily becomes a question. “Caffeine is metabolized by cytochrome 1A2 ”, in Prolog , is $\text{Metabolism}(\text{Caffeine}, \text{CYP1A2})$, and becomes “Q: What is the cytochrome that metabolizes caffeine? A: 1A2”. But how do you encode stand-alone assertions like “all unitary matrices are invertible ”? You could encode that as a yes-or-no question, but that’s useless, because rationally you can expect such questions to be biased towards yes. Both “what is a property of unitary matrices?” and “what kinds of matrices are invertible?” are useless because they have hundreds of possible equally-valid answers, so they’re ambiguous. You have to be creative and find all kinds of tricks and stratagems to encode around the knowledge. Tangentially: this, I think, is why using AI to write flashcards is often misguided. In highly systematized domains, you don’t need AI in the first place, because there’s nothing for the AI to do except import a CSV into Anki. In domains that are highly conceptual and abstract, you’re not memorizing a set of objectively-knowable facts, you’re trying to solidify a private, internal mental model that you build by reading and thinking and solving problems. You can give the AI all kinds of general rules on how to write good flashcards, but the AI can’t look into your mind and know which facts are salient for you , what you already know, which micro-volumes of knowledge can be encoded lightly with just a few flashcards, and which things need more shoring up and consequently more coverage. Can this situation be improved, or is this just an intrinsic limitation of spaced repetition? I don’t know. But it seems reasonable to think some limited gains are possible. I think not a lot of people are using spaced repetition on these more “conceptual” domains, and (by the rule that most people in a community are lurkers ) even fewer of those people are writing, in detail, to share their knowledge. Plenty of people have written about how to write good flashcards in general, what I want to read is closer to case studies where someone sits down with a text (or, even better, a textbook) and describes the process by which they turned that text into flashcards, like this from Michael Nielsen. From a corpus of similar case studies we might derive general rules for, not how to write effective flashcards, but how to encode complex, conceptual knowledge into question-answer form.

0 views
Gabriel Garrido 2 weeks ago

Technology with a Human Face

Revisiting Schumacher’s “Technology with a Human Face”, prescient and relevant since its writing 53 years ago. We may say, therefore, that modern technology has deprived man of the kind of work that the enjoys most, creative, useful work with hands and brains, and given him plenty of work of a fragmented kind, most of which he does not enjoy at all. It has multiplied the number of people who are exceedingly busy doing kinds of work which, if it is productive at all, is so only in an indirect or “roundabout” way, and much of which would not be necessary at all if technology were rather less modern. (…) All this confirms our suspicion that modern technology, the way it has developed, is developing, and promises further to develop, is showing an increasingly inhuman face, and that we might do well to take stock and reconsider our goals. The system of mass production, based on sophisticated, highly capital-intensive, high energy-input dependent, and human labour-saving technology, presupposes that you are already rich, for a great deal of capital investment is needed to establish one single workplace. The system of production by the masses mobilises the priceless resources which are possessed by all human beings, their clever brains and skillful hands, and supports them with first-class tools. The technology of mass production is inherently violent, ecologically damaging, self-defeating in terms of non-renewable resources, and stultifying for the human person. The technology of production by the masses, making use of the best of modern knowledge and experience, is conducive to decentralisation, compatible with the laws of ecology, gentle in its use of scarce resources, and designed to serve the human person instead of making him the servant of machines. Let us admit that the people of the forward stampede, like the devil, have all the best tunes or at least the most popular and familiar tunes. You cannot stand still, they say; standing still means going down; you must go forward; there is nothing wrong with modern technology except that it is as yet incomplete; let us complete it. (…) “More, further, quicker, richer,” he says, “are the watchwords of the present-day society”. And he thinks we must help people to adapt, “For there is no alternative.” This is the authentic voice of the forward stampede, which talks in much the same tone as Dostoyevsky’s Grand Inquisitor: “Why have you come to hinder us?” Schumacher wrote about high-technology machinations of his time, most of which persist today. Just as much can be said, however, about the high-technology of today. The tools rushing “inevitably” towards our grasp, “or else”. Computers, the internet, and most digital artifacts enmeshed in our daily life are far from being low-technology: they pose a vast resource footprint, are and can be produced only by a few, and are dependent on capital-intensive and highly centralizing infrastructure. What to make of this? Where’s the technology with a human face? This is a question that continues to inspire me despite the degenerate conditions which personal computing have provided a means to: mass surveillance, the pillaging of attention, and bondage to corporations. I have no doubt that it is possible to give a new direction to technological development, a direction that shall lead it back to the real needs of man, and that also means: to the actual size of man. Man is small, and, therefore, small is beautiful. To go for giantism is to go for self-destruction. And what is the cost of a reorientation? We might remind ourselves that to calculate the cost of survival is perverse. No doubt, a price has to be paid for anything worth while: to redirect technology so that it serves man instead of destroying him requires primarily an effort of the imagination and an abandonment of fear. It is in fact, an effort of the imagination. A few themes come to mind. First, exercise self-limitation. Not every problem is inherently solved by technology, especially if it impedes or hinders autonomy and the enrichment of interpersonal relationships. Furthermore, develop a keen awareness for what’s essential and what’s superfluous. If a problem can be solved by technology, consider its appropriateness in the context in which it is deployed, both in time and across time. Can a person or community maintain it? Can it be afforded? Can its fit be adapted as needed? Can it be wound down? What’s the simplest version of it that can satisfy these constraints? Can it leverage existing hardware and networks? Can it be done through technology steered by community and democratic structures? There’s a lot of work to be done here in the light of our current predicaments. If you’re disappointed by the direction of things, heed the call and do not budge: For it takes a good deal of courage to say “no” to the fashions and fascinations of the age and to question the presuppositions of a civilisation which appears destined to conquer the whole world; the requisite strength can be derived only from deep convictions. If it were derived from nothing more than fear of the future, it would likely disappear at the decisive moment. Read in full here: https://cooperative-individualism.org/schumacher-e-f_technology-with-a-human-face-1973.htm

0 views
Susam Pal 2 weeks ago

The Problem of Pedagogy in Advanced Mathematics

It is a commonly held opinion that educational institutions could do more to improve the pedagogy of mathematics. This is especially true in school, when students are first exposed to new subjects. Poor exposition can turn students away from mathematics for a lifetime. Only the highly motivated ones continue to engage with the subject. This is very unfortunate because mathematics is a beautiful subject and it is filled with wonder. It also teaches rigour in reasoning, clarity of thought and the discipline of constructing arguments from first principles to obtain intricate and often beautiful results. What is perhaps less known is that pedagogy is a problem even for graduate-level mathematics students and professional mathematicians. The proofs in many graduate-level mathematics textbooks are, in my humble opinion, not really proofs at all. They are closer to high-level outlines of proofs. The authors simply do not show their work. The student then has to put in an extraordinary amount of effort to understand and justify each line. Sometimes a 10-line argument in a textbook might expand into a 10-page proof if the student really wants to convince themselves that the argument works. I am not a mathematician, but out of personal interest, I have worked with professional mathematicians in the past to help refine notes that explain certain intermediate steps in textbooks (for example, Galois Theory by Stewart, in a specific case). I was surprised to find that it was not just me who found the intermediate steps of certain proofs obscure. Even professional mathematicians who had studied the subject for much of their lives found them obscure. It took us two days of working together to untangle a complicated argument and present it in a way that satisfied three properties: correctness, completeness and accessibility to a reasonably motivated student. I don't mean that the books merely omit basic results from elementary topics like group theory or field theory, which students typically learn in their undergraduate courses. Even if we take all the elementary results from undergraduate courses for granted, the proofs presented in graduate-level textbooks are often nowhere near a complete explanation of why the arguments work. They are high-level outlines at best. I find this hugely problematic, especially because students often have to learn a topic under difficult deadlines. If the exposition does not include sufficient detail, some students might never learn exactly why the proof works, because not everyone has the time to work out a 10-page proof for every 10 lines in the book. Many good universities provide accompanying notes that expand the difficult arguments by giving rigorous proofs and adding commentary to aid intuition. I think that is a great practice. I have studied several graduate-level textbooks in the last few years, and while these textbooks are a boon to the world because textbooks that expose the subject are better than no textbooks at all, I am also disappointed by how inaccessible such material often is. If I had unlimited time, I would write accompaniments to those textbooks that provide a detailed exposition of all the arguments. But of course, I don't have unlimited time. Even so, I am thinking of at least making a start by writing accompaniment notes for some topics whose exposition quality I feel strongly about, such as s-arc transitivity of graphs, field extensions and related topics. Read on website | #miscellaneous

0 views
ava's blog 3 weeks ago

#LiegendDemo - protesting for ME/CFS treatment & visibility

Today, I attended a protest for the visibility of ME/CFS sufferers . ME/CFS is short for Myalgic encephalomyelitis/chronic fatigue syndrome ; it is a chronic illness characterized by extreme fatigue that doesn't improve with rest, along with sleep issues, dizziness, muscle and joint pain, cognitive difficulties, extreme sensitivity to stimuli, and more. There is significant overlap with what is often referred to as " Long Covid " or " Post Covid ", leading to speculation that they're one and the same. It is estimated that 1.5 Million people are affected in Germany alone, with around 40 Million estimated worldwide. One day, it could be you. The exact cause is still being investigated, but it is most often associated with a viral infection (Covid, Epstein-Barr, etc.), and while symptoms can sometimes be managed, a full recovery is very rare. There is currently no known treatment or cure, and diagnostic criteria are still being developed after all this time (50 years since the WHO has acknowledged it!), which makes getting a diagnosis hard. There is stigma around the illness, with doctors dismissing symptoms entirely or blaming it on mental illness or laziness, inappropriately trying to force sufferers to overexert themselves, worsening their symptoms. This is aided by the fact that ME/CFS is often not taught in medical degrees. This group of patients is especially vulnerable, because advocating for themselves takes so much energy they don't have. Many of them can not even get out of bed or do any strenuous mental tasks, or they have to spend the little energy they have with the bare minimum to survive and then have none left for their free time. They are frequently very isolated and lacking the support they need. Any exertion can cause weeks of increased symptoms (post-exertional malaise). Years of their life are just gone, spent existing in bed in a dark room, unable to think clearly or to really move, having difficulty speaking, having difficulty processing and enduring sounds, touch, or light. The fatigue can become so bad that they are unable to even talk. Their education, finances and careers suffer, they can no longer take care of themselves and their families or pets, they struggle with doctor's appointments or the paperwork required to receive assistance, disability benefits, etc. and often start to have other chronic illnesses like fibromyalgia, irritable bowel syndrome, postural orthostatic tachycardia syndrome (POTS) and more. It can affect both children and adults. It's easy to forget they exist because they are not visible out in public and left behind in public discourse. To make them visible, people all across the country meet up to lie on the floor - this happened for the first time in 2023, and is still going strong in 2026. I don't have ME/CFS, but after a Covid infection, I struggled with orthostatic issues, post-viral tachycardia, and my chronic illnesses (Crohn's and Bechterew's disease) sometimes cause me intense fatigue as well; so I can relate a little to some parts of the illness, but I am lucky that my issues have treatments that helped (and some I could recover from). It was important to me to show up when they can't. 1. Dedicated funding for ME/CFS Research funds must be specifically allocated to ME/CFS with PEM, rather than absorbed into broader post-infectious research categories. Otherwise, the disease risks being underfunded while still being treated politically as adequately addressed. 2. Priority for drugs and effective treatments Prioritize the development of medications and clinically effective therapies, not only basic research or administrative structures. 3.Mandatory involvement of patient organizations ME/CFS patient representatives with PEM expertise should be directly involved in planning and implementation. Past programs included people unfamiliar with the disease, resulting in research that overlooked core symptoms and legitimized unsuitable therapies. 4. Immediate funding for biomedical research Concrete biomedical projects should receive funding without delay, not spending years building structures before supporting treatment-oriented research, despite already existing promising drug approaches. 5. Clear disease definitions, rigorous research standards and exclusion of unsuitable research approaches Studies should use strict diagnostic criteria and focus on PEM as the defining symptom. Many previous studies examined general fatigue rather than properly diagnosed ME/CFS, producing weak or misleading results. Research that ignores the biological, multisystem nature of ME/CFS should not be funded. 6. Legal and political safeguards The so-called research decade should be backed by binding commitments rather than remaining a non-binding political initiative. Otherwise, funding and programs could be reduced or canceled after political changes. 7. Healthcare access, diagnostics, social support, and patient care, sustainable research infrastructure Long-term structures such as specialized centers, biobanks, patient registries, and clinical trial networks should be established to support ongoing nationwide ME/CFS research. 8. Use existing research and strengthen international cooperation Future work should build on existing ME/CFS findings and coordinate internationally to avoid redundant studies and accelerate progress. The International ME/CFS Awareness Day is on May 12th. Donate to the ME/CFS Research Foundation Read what people affected by ME/CFS say 18 Minute Short Documentary on YouTube, English Subtitles 🇩🇪 Doku: ME/CFS: Keine Kraft mehr 🇩🇪 ME/CFS sufferer in Austria making use of assisted suicide program 🇩🇪 Liegenddemo Germany 🇩🇪 MECFS.de 🇩🇪 MECFS-Info.de 🇩🇪 ME-Hilfe.de 🇩🇪 Fatigatio.de 🇩🇪 Nicht Genesen Kids Reply via email Published 09 May, 2026

0 views

Notes on the Hantavirus Outbreak

Right now there’s a cruise ship parked outside Cabo Verde because of an outbreak of Andes virus . Yep, another cruise ship. I don’t get the appeal. It’s like a big open-air serial passage experiment: you get a bunch of old people with failing immune systems in close contact and race a pathogen through them. How much should I worry about this? Is this early January of 2020? I tried asking Claude but the biosecurity filter kept blocking my queries. The WHO says : Although uncommon, limited human‑to‑human transmission of HPS due to Andes virus has been reported in community settings involving close and prolonged contact. Secondary infections among healthcare workers have been previously documented in healthcare facilities, though remain rare. WHO currently assesses the risk to the global population from this event as low […] WHO advises against the application of any travel or trade restrictions based on the current information available on this event. So, hantavirus is the family. They are carried by rodents and spread by aerosols. In humans they can cause hantavirus pulmonary syndrome (HPS), which has a case fatality rate (CFR) of between 30 and 60%. Not great! Used to be these infections were mouse-to-human dead-ends. But Andes virus (ANDV), first identified in 1995, is known to spread from human to human. The last time there was an outbreak was 2018–2019 in Epuyén , Chubut , a town of 1,500 on the lee side of the Andes ( quite beautiful ). Described in this paper . 34 known infections and 11 deaths for a CFR of 32%. The $R_0$ was 2.12, reduced to 0.96 after control measures were implemented. Given the small number of cases, there should be some uncertainty about the $R_0$. But $R_0 > 1$ is the threshold for sustainable transmission. In this outbreak, the index case , while symptomatic, attended a birthday party with 100 other people, and infected five guests in 90 minutes, who went on to infect more people. The authors write: The super-spreading capability of the ANDV Epuyén/18−19 strain shows a facility ($R>2$) for sustaining continuous chains of transmission if no control measures are enforced. The appendix has some interesting stuff on how patients were infected at the birthday party. A further concern here is the incubation period: Wikipedia says the incubation period is between one and eight weeks . In the Chubut outbreak, the distribution was: Which is not good. I don’t have more data to draw a nice-looking CDF . Now this all sounds quite bad. Are there reasons to be optimistic? First, Argentina has had 710 cases of HPS in the period 1995–2008 ( Martínez 2010 ) and a further 533 cases in the period 2009–2017 ( Alonso 2019 ), and we are all still alive. In the latter period, most of these cases are from occupational/recreational exposure to rodent feces and only 1.8% of cases are from suspected human-to-human transmission. So, over 1,200 cases and every one of them fizzled out, but for one outbreak which was limited after successful contact tracing and quarantine. Second, the virus has left Argentina before: once to Switzerland in 2016, and once to the United States in 2018. In the second case the patient “while ill, [traveled] on two commercial domestic flights”. And neither export led to a general outbreak. Thirdly, in a small outbreak like the Chubut one, the $R_0$ can vary wildly from social factors unconnected to the virus, e.g. if the birthday party had not happened. You need a large $n$ to get the $R_0$ as a property of the virus itself. It’s possible the Chubut outbreak just had anomalously high transmission. What does this add up to? I don’t know. On the balance of evidence, I think this outbreak is more likely than not to fizzle out. In the interest of accountability, and putting my beliefs on record (which is the only objective way to judge the accuracy of your mental model) I’m gonna say: And yet. And yet it feels so much like early COVID, particularly with public health authorities making very complacent remarks that “it’s not that transmissible, contact tracing will work, quarantine will work”. Complacency at the start, and severity at the end, is exactly why COVID was such a fuckup. 70% probability the outbreak ends with fewer than 300 deaths. 90% probability the outbreak ends with fewer than 1,000 deaths.

0 views

Scaling, stretching and shifting sinusoids

This is a brief and simple [1] explanation of how to adjust the standard sinusoid sin(x) to change its amplitude, frequency and phase shift. More precisely, given the general function: We’ll see how adjusting the parameters , and affect the shape of s(x) . Each section below covers one of these aspects mathematically, and you can use the demo at the bottom to experiment with the topic visually. Scaling is conceptually the simplest change; we adjust to increase or decrease the amplitude (maximal height) of s(x) . Setting A=2 will make the value twice as large (in both the positive and negative direction) as the original function. Stretching changes the frequency of sin(x) , which is inverse proportional to its period. The baseline function sin(x) has a period of 2\pi , meaning it repeats every 2\pi . In other words, sin(x)=sin(x+2\pi) for any . If we set w=2 , we get sin(2x) . This function repeats itself twice as fast as sin(x) , because is multiplied by 2 before being fed into the sinusoid. If changes by \pi , the sinusoid’s input changes by 2\pi . Therefore, the period of sin(2x) is \pi , the period of sin(4x) is \frac{\pi}{2} and so on. [2] More generally, the period of sin(wx) is \frac{2\pi}{w} . Play with the demo below to see this in action, by changing and observing how the waveform changes. If we know the period p we want, we can easily calculate the that gives us this period: The final parameter we discuss is ; it’s called the phase of the sinusoid. In the baseline sin(x) , . The sinusoid is 0 at x=0 , achieves its positive peak at x=\frac{\pi}{2} , crosses 0 again at x=\pi , negative peak at x=\frac{3\pi}{2} and returns to its original position at x=2\pi where the repetition begins. By adding a non-zero , we don’t affect the sinusoid’s amplitude or frequency, but we do shift it right or left along the axis. For example, suppose we use the function sin(x+\theta) with \theta=\frac{\pi}{2} . Then when x=0 , we have sin(\frac{\pi}{2}) , so the sinusoid is already at its positive peak; at x=\frac{\pi}{2} , the sinusoid crosses 0 into the negatives, etc. Everything happens earlier (by exactly the value of \theta=\frac{\pi}{2} ) than in the baseline sinusoid. In other words, we’ve shifted the function left by \frac{\pi}{2} . Similarly, when is negative, everything happens later, and the function is shifted right . We’ve now gone over all the parameters for the function: Use the demo below to adjust these parameters and observe their effect on the sinusoid: controls the scaling factor (amplitude). is the frequency and controls the repetition period controls the phase - how much the sinusoid is shifted left or right

0 views
DYNOMIGHT 1 months ago

You’re probably taking the wrong painkiller

This is an essay that recently appeared in Asterisk . Consider the rest of the risk issue for all your risk needs. Lots of people die after overdosing on acetaminophen (paracetamol, Tylenol, Panadol). In the U.S., it’s estimated to cause 56,000 emergency department visits, 2,600 hospitalizations, and 500 deaths per year. Acetaminophen has a scarily narrow therapeutic window. The instructions on the package say it’s okay to take up to four grams per day. If you take eight grams, your liver could fail and you could die. Meanwhile, it seems to be really hard to kill yourself by overdosing on ibuprofen (Advil, Nurofen, Motrin, Brufen). In 2006, Wood et al. searched the medical literature and found 10 documented cases in history. Nine of those cases involved complicating factors, and in the 10th, a woman took the equivalent of more than 500 standard (200mg) pills. So, for many years, if I needed a painkiller, I’d try to take ibuprofen rather than acetaminophen. My logic was that if eight grams of acetaminophen could kill my liver, then one gram was probably still hard on it. I’m fond of my liver and didn’t want to cause it any unnecessary inconvenience. But guess what? My logic was wrong and what I was doing was stupid. I’m now convinced that for most people in most circumstances, acetaminophen is safer than ibuprofen, provided you use it as directed. I think most doctors agree with this. In fact, I think many doctors think it’s obvious. (Source: I asked some doctors; they said it was obvious.) Should this have been obvious to me ? I figured it out by obsessively researching how those drugs work and making up a story about metabolic pathways and blood flow, and amino acid reserves. It’s a good story, one that revealed that my logic stemmed from an egregious lack of respect for biology and that I’m a big dummy (always a favorite subject). But if the clearest road to some piece of knowledge runs through metabolic pathways, then I don’t think that knowledge counts as obvious. So how is a normal person meant to figure it out? Why doesn’t the fact that acetaminophen is typically safer than ibuprofen appear on drug labels or government websites or WebMD? Are normal people supposed to figure it out, or has society decided that this is the kind of thing best left illegible? Note: You should not switch medications based on the uninformed ramblings of non-trustworthy pseudonymous internet people. Ibuprofen inhibits the the Cyclooxygenase (COX) enzyme. This in turn inhibits the formation of messenger molecules involved in inflammation, which leads to less physical inflammation and thus less pain. The same story is true for almost all over-the-counter painkillers, which is why they’re almost all considered “non-steroidal anti-inflammatory drugs,” or NSAIDs. This includes ibuprofen, aspirin, naproxen (Aleve), and a long list of related drugs . But it does not include acetaminophen. Nobody knows! Like ibuprofen, acetaminophen inhibits some COX enzymes. But it does so in a weird way that barely affects inflammation or messenger molecules, so it’s unclear if this matters for pain reduction. In the brain,  acetaminophen is metabolized into a mysterious chemical called AM404 . This activates the cannabinoid receptors and increases endocannabinoid signaling , which seems to reduce the subjective experience of pain. AM404 also activates the capsaicin receptor , which is associated with burning sensations that you’d normally expect to increase pain, but maybe some desensitization thing happens downstream? And maybe acetaminophen also interacts with serotonin or nitric oxide or does other stuff? How this all comes together to reduce pain is still somewhat a scientific mystery. Aside : When trying to understand painkillers, it’s natural to focus on chemistry and molecular biology. But the unknown physical origins of consciousness are always nearby, looming ominously. In an ideal world, the only thing ibuprofen would do is reduce inflammation in the part of your body that hurts. But that is not our world. When ibuprofen inhibits the COX enzymes, it does so throughout the body. And mostly, that is bad. For one, ibuprofen reduces production of mucus in the stomach. That might sound okay or even good. But stomach mucus is important. You need it to shield the lining of your stomach from your extremely acidic gastric juice 1 . Having less mucus can lead to gastrointestinal problems or even ulcers. Ibuprofen also affects the heart. When ibuprofen inhibits the COX enzymes there, this in turn inhibits one chemical that prevents clotting and another that causes clotting. In balance, this seems to lead to more clotting, and an increased statistical risk of heart attacks 2 . If you’re healthy, the risk of a heart attack from an occasional low dose of ibuprofen is probably zero. But if you have heart issues and take medium to large doses regularly for as little as a few days, this might  be a serious concern. Ibuprofen also affects the kidneys. If you’re stressed, or cold, or dehydrated, or take stimulants, your body will constrict your blood vessels. That squeezes your kidneys’ intake tube, depriving them of blood. Your kidneys don’t like that, so they release signaling molecules to locally re-dilate the blood vessels. Trouble is, when ibuprofen inhibits COX enzymes in the kidneys, it inhibits those signaling molecules. If everything is normal, that’s okay, because the kidneys wouldn’t try to use those molecules anyway. But if your body has clamped down on the blood vessels, then the kidneys don’t have the tool they use to keep blood flowing, meaning they don’t get as much blood as they want. This is bad 3 . There are many other less common side effects, including allergies, respiratory reactions in asthmatics, induced meningitis , and suppressed ovulation. If you take a lot of ibuprofen, this could hurt your liver. But the major concerns seem to be the stomach, the heart, and the kidneys. Acetaminophen also inhibits some COX enzymes. But unlike ibuprofen, the effect is minimal outside the central nervous system. Thus, acetaminophen has little effect on stomach mucus, blood clots, or blood flow, and so presents almost none of the risks that ibuprofen does. Even so, if you take too much acetaminophen at once, you could easily die. How does this happen? Well, when acetaminophen is metabolized by the liver, it’s mostly broken down into harmless stuff. But a small fraction (5-15%) is broken down by the P450 system into an extremely toxic chemical called NAPQI . Ordinarily this is fine; your body creates and neutralizes toxic stuff all the time. For example, if you drank 20 grams of formaldehyde, you’d likely die. But did you know that your body itself makes and processes ~50 grams of formaldehyde every day? When liver cells sense NAPQI, they immediately release glutathione, which binds to NAPQI and renders it harmless. But there’s a problem. If you take too much acetaminophen at once, the pathways that break it down into harmless stuff get saturated, but the P450 system doesn’t get saturated. This means that not only is there more acetaminophen, but also that a much larger fraction of it is broken down into NAPQI. Soon your liver cells will run out of glutathione to neutralize it. Then, NAPQI will build up and bind to various proteins in the liver cells (especially in mitochondria) causing them to malfunction and/or commit suicide . This can cause total liver failure. So you should never take more than the recommended dose of  acetaminophen 4 . If you do take too much, you should go to a hospital immediately. They will give you NAC, which will replenish your glutathione and neutralize the NAPQI. Your prospects are good as long as you get to the hospital within a few hours 5 6 . Acetaminophen has lots of other possible side effects, like skin issues and blood disorders. But these all seem to be quite rare. The primary concern with acetaminophen  is liver damage. So if you have liver disease, then surely you’d want to avoid acetaminophen and take ibuprofen instead, right? Nope. It’s the opposite. Liver disease shifts the balance of risk in favor of acetaminophen. With liver disease, it’s hard for blood to flow into the liver, meaning that blood tends to pool in the abdomen. To counter this, blood vessels elsewhere in the body contract. This includes blood vessels around the kidneys. Remember the kidneys? Again, when blood vessels are constricted, the kidneys send out signaling molecules to locally re-dilate the blood vessels. But those signaling molecules are blocked by ibuprofen. So if you have liver disease, taking ibuprofen risks starving your kidneys of blood just like if you were dehydrated. Meanwhile, people with moderate liver disease are usually still able to process acetaminophen without issue, as long as it’s in smaller amounts. So doctors usually tell patients with liver disease to avoid ibuprofen and take  acetaminophen instead, just with a maximum of two grams per day instead of four. (Obviously, if you have liver disease, then you should talk to a doctor, I beg you, for the love of god.) The main takeaway from all this is that the risks of both drugs emerge from the madhouse of complexity that is your body. Surely there are some situations where acetaminophen is more dangerous than ibuprofen? I tried to capture the most common situations in this table: It’s actually fairly hard to find situations where ibuprofen is safer than acetaminophen. Possibly this is true if you’re hungover, but I would be very careful, because you tend to be dehydrated when hungover, raising the risk of kidney damage. (It’s probably optimal, from a health perspective, to avoid taking recreational drugs at doses that leave you physically ill the next day.) Aside from hangovers, the only situations I could find where ibuprofen might be safer than acetaminophen  are if you’re taking certain anti-seizure or tuberculosis drugs or maybe if you have a certain enzyme deficiency ( G6PDD ). What have we learned so far? The body is really complicated! The main risk of acetaminophen is liver damage by creating too much NAPQI. Taking too much at once can easily kill you. However, as long as you don’t take too much at once and your liver isn’t depleted, then your liver will maintain NAPQI levels at zero and it will be completely fine. And there are very few other risks. Meanwhile, ibuprofen poses a risk of gastrointestinal issues, heart attacks, or kidney damage. The risk varies based on lots of factors like whether you’ve eaten food, whether you’re dehydrated, your blood pressure, and your heart health 7 . Therefore, acetaminophen is probably safer, provided you never take too much 8 . I don’t want to be alarmist. If you’re healthy, the risk from taking an occasional dose of ibuprofen as directed is extremely low. Given that so many people find that ibuprofen is more effective for many kinds of pain, it’s totally reasonable to use it. I do so myself. Still, it seems to be the case that in the vast majority of situations, acetaminophen is saf_er_. Personally, if I have pain, I first take acetaminophen, and then add ibuprofen if necessary. I’m pretty sure many experts think this is somewhere between “sensible” and “obvious.” But if acetaminophen is safer, then why don’t official sources tell you that 9 ? I can get doctors to admit this off-the-record. I can find random comment threads with support from people who seem to know what they’re talking about. But why does this fact never appear on government websites or drug labels? In the U.S., the Food and Drug Administration (FDA) creates 10 a “drug facts” label for over-the-counter drugs. Here’s what that looks like for ibuprofen: And here’s what it looks like for acetaminophen (paracetamol): I feel dumb saying this, but when I saw those labels in the past, I thought of them as a bunch of random information thrown together for legal reasons. But after spending a lot of time trying to understand these drugs myself, I now realize that these labels are… really good? Imagine you work at the FDA and it’s your job to write a safety label. You need to synthesize a vast and murky scientific landscape. Your label will be read by people with minimal scientific background who are likely currently in pain, and who could die if they take the drug in the wrong situation. If I were in that situation, I’d think about all the different situations in which taking one of these drugs could literally kill someone, and then — after a quick panic attack — I’d write a label that screamed, HEY, IF YOU ARE IN ANY OF THESE SITUATIONS, TAKING THIS DRUG COULD LITERALLY KILL YOU. Then I’d think about all the other situations where taking the drug might be okay depending on a set of complex science stuff and tell people in those situations to PLEASE TALK TO A DOCTOR FOR THE LOVE OF GOD because I DON’T KNOW IF YOU’VE HEARD BUT SCIENCE IS COMPLICATED. Everything else would be a minor concern. From that perspective, these labels are a triumph. This isn’t random information — every word is a synthesis of a mountain of research, carefully optimized to save lives. How did those drug labels come to be? If you want a taste for the FDA’s process, I encourage you to skim the 2002 Federal Register document in which the FDA proposed to update ibuprofen’s safety label and to formally classify it as Generally Recognized as Safe . It’s more than 21,000 words long and — I think — astonishingly good. It not only summarizes the entire medical literature on ibuprofen, it summarizes it well. Here is one representative bit: Bradley et al. (Ref. 42) conducted a 4-week, double-blind, randomized trial in 184 subjects comparing the effectiveness and safety of the maximum approved OTC daily dose of 1,200 mg of ibuprofen (number of subjects (n) = 62) to that of a prescription dose of 2,400 mg/day (n = 61), and to 4,000 mg/day of acetaminophen (n = 59) for the treatment of osteoarthritis. While there were no significant differences in the number of side effects reported during this study, the study demonstrated a trend towards a dose dependent increase in minor GI adverse events (nausea and dyspepsia) associated with higher doses of ibuprofen (1,200 mg/day: 7/62 or 11.3 percent; versus 2,400 mg/day: 14/61 or 23 percent). In addition, two subjects treated with 2,400 mg/day of ibuprofen became positive for occult blood while participating in the study. I spend a lot of time complaining about bad statistical writing. A lot . Probably too much. But I’m here to tell you, that paragraph is gorgeous . The writing is clear and penetrating. It contains all the important details, but no other details. Compared to the abstract of the original paper , the above is shorter and easier to understand yet simultaneously more informative. Five stars. The rest of the document is equally good, with clear and sensible explanations for various recommendations. For example, they discuss a proposal from the National Kidney Foundation for additional warning about risks to kidneys, explain why they think that proposal has merit, and then recommend a shorter version, which appears on every package of ibuprofen sold today. As far as I can tell, this level of quality is typical. For example, the FDA’s 2019 proposed rule on sunscreens is similarly masterful. This leaves us with this constellation of facts: Acetaminophen is, in general, safer than ibuprofen. The FDA doesn’t tell you that. Neither do other respectable authorities. The FDA is highly competent. So what’s happening here? Have the experts conspired to keep this knowledge secret? I don’t think so. Mostly, I think this is down to two factors. First, the FDA doesn’t really have a mission of determining “in what circumstances is drug A safer than drug B?” Their goal is to take individual drugs and determine how people can use them safely. They seem to be quite good at this. Second, everyone is mortally afraid of giving “medical advice.” It varies by jurisdiction, but in general, giving “wellness advice” is OK, but if you give personalized advice, you risk going to prison. The more credible you are, the higher that risk is 11 . Stepping back, how should we think about this situation? The body is complicated. When experts give the public advice on drugs, they are trying to insulate us from that complexity. But there is no way to do that without making trade-offs. Society has implicitly chosen tradeoffs that mean certain “less important” facts are de-prioritized. It’s not obvious that this is the wrong choice. I feel foolish for not having more respect for the body’s complexity and for the difficulty of the task all the experts are trying to accomplish. This is not medical advice. For some reason, humans have gastric acid that is more acidic than most other animals, and is only matched by animals that specialize in eating carrion.  ↩ At least two NSAIDs ( rofecoxib and valdecoxib ) have been withdrawn from the market due to an increased risk of heart attacks. For the same reason, the US refuses to approve etoricoxib .  ↩ Nephrologists hate ibuprofen. (Source: nephrologists.) If it was up to them, maybe ibuprofen would come with a “HAVE YOU CONSIDERED TAKING ACETAMINOPHEN INSTEAD?” warning. It confuses me that the safety label for ibuprofen doesn’t warn you about the danger of taking it while dehydrated and quietly damaging your kidneys. My best guess is that this is because other doctors don’t hate ibuprofen as much as nephrologists.  ↩ Watch out for combination medicines (like cold or flu medicines or opiate painkillers) that include acetaminophen. Arguably, acetaminophen is a victim of its own success here. It’s included in these things because it is better tolerated than NSAIDs. But it’s easy to miss.  ↩ Oddly, NAC is considered a nutritional supplement, meaning basically anyone can buy it. But there’s also almost no regulation, so who knows if the thing you bought actually has NAC in it? Do not screw around trying to self-medicate an acetaminophen overdose. Go to a hospital.  ↩ At one point while researching all this I had what I thought was a good idea: Why not sell acetaminophen in pills bundled together with NAC? The NAC would replenish glutathione stores in the liver, seemingly reducing the risk of overdose. Later on, I developed more humility and felt very stupid for fantasizing that such an obvious idea could be novel or useful. I think that this is indeed a bad idea because NAC itself has side effects, though I can’t find much formal discussion. In fact, I found a 2010 editorial called “Why Not Formulate an Acetaminophen Tablet Containing N -Acetylcysteine to Prevent Poisoning?”   In another study, Nakhaee et al. (2021) actually tried giving NAC together with acetaminophen to rats and found that this seemed to make it better at reducing pain. So maybe this isn’t a completely stupid idea. That last paper also led me to discover that “rat hot plate test” is a standard phrase, and one that drives home what humanity’s dominion over nature means in practice.  ↩ Above, we mentioned that acetaminophen overdose is estimated to cause around 500 deaths per year in the U.S. It’s much harder to give direct numbers for how many people die from taking ibuprofen, because NSAIDs don’t really directly “kill” people, but rather increase the risk of dying in various ways. The best estimates seem to be that NSAIDs cause 5,000-16,500 deaths each year in the US via gastrointestinal complications, and something similar via heart attacks. These numbers are not a good way of quantifying the relative risk of drugs, because they represent different people taking different amounts for different reasons. But they do show that ibuprofen is not without risk.  ↩ There are probably some people who are too disordered to track much acetaminophen they’ve taken. For such people, ibuprofen might be the safer choice. Though I’m skeptical that many such people are found among the readers of Asterisk .  ↩ There are two cases where official sources are clear that acetaminophen is safer than ibuprofen: for use by pregnant women and small children. This doesn’t appear on the safety label, but if you’re pregnant and go to a doctor, they will probably tell you to take acetaminophen but not ibuprofen or other NSAIDs. And if you have a newborn baby, their doctor will probably tell you that you can give them acetaminophen but not ibuprofen or other NSAIDs.  ↩ Technically, for many drugs today, it is the drug manufacturer that “creates” the label, which is why they can be slightly different. However, the FDA strongly regulates what is on it, including most of the language and even details about the font and so on. The federal register contains a template the FDA published for ibuprofen which is almost identical to what appears on the side of drugs today   ↩ Unlike in most places, in the United Kingdom it seems to be perfectly legal for people to give each other medical advice, provided they don’t misrepresent themselves as licensed doctors. This is not legal advice.  ↩ The body is really complicated! The main risk of acetaminophen is liver damage by creating too much NAPQI. Taking too much at once can easily kill you. However, as long as you don’t take too much at once and your liver isn’t depleted, then your liver will maintain NAPQI levels at zero and it will be completely fine. And there are very few other risks. Meanwhile, ibuprofen poses a risk of gastrointestinal issues, heart attacks, or kidney damage. The risk varies based on lots of factors like whether you’ve eaten food, whether you’re dehydrated, your blood pressure, and your heart health 7 . Therefore, acetaminophen is probably safer, provided you never take too much 8 . Acetaminophen is, in general, safer than ibuprofen. The FDA doesn’t tell you that. Neither do other respectable authorities. The FDA is highly competent. For some reason, humans have gastric acid that is more acidic than most other animals, and is only matched by animals that specialize in eating carrion.  ↩ At least two NSAIDs ( rofecoxib and valdecoxib ) have been withdrawn from the market due to an increased risk of heart attacks. For the same reason, the US refuses to approve etoricoxib .  ↩ Nephrologists hate ibuprofen. (Source: nephrologists.) If it was up to them, maybe ibuprofen would come with a “HAVE YOU CONSIDERED TAKING ACETAMINOPHEN INSTEAD?” warning. It confuses me that the safety label for ibuprofen doesn’t warn you about the danger of taking it while dehydrated and quietly damaging your kidneys. My best guess is that this is because other doctors don’t hate ibuprofen as much as nephrologists.  ↩ Watch out for combination medicines (like cold or flu medicines or opiate painkillers) that include acetaminophen. Arguably, acetaminophen is a victim of its own success here. It’s included in these things because it is better tolerated than NSAIDs. But it’s easy to miss.  ↩ Oddly, NAC is considered a nutritional supplement, meaning basically anyone can buy it. But there’s also almost no regulation, so who knows if the thing you bought actually has NAC in it? Do not screw around trying to self-medicate an acetaminophen overdose. Go to a hospital.  ↩ At one point while researching all this I had what I thought was a good idea: Why not sell acetaminophen in pills bundled together with NAC? The NAC would replenish glutathione stores in the liver, seemingly reducing the risk of overdose. Later on, I developed more humility and felt very stupid for fantasizing that such an obvious idea could be novel or useful. I think that this is indeed a bad idea because NAC itself has side effects, though I can’t find much formal discussion. In fact, I found a 2010 editorial called “Why Not Formulate an Acetaminophen Tablet Containing N -Acetylcysteine to Prevent Poisoning?”   In another study, Nakhaee et al. (2021) actually tried giving NAC together with acetaminophen to rats and found that this seemed to make it better at reducing pain. So maybe this isn’t a completely stupid idea. That last paper also led me to discover that “rat hot plate test” is a standard phrase, and one that drives home what humanity’s dominion over nature means in practice.  ↩ Above, we mentioned that acetaminophen overdose is estimated to cause around 500 deaths per year in the U.S. It’s much harder to give direct numbers for how many people die from taking ibuprofen, because NSAIDs don’t really directly “kill” people, but rather increase the risk of dying in various ways. The best estimates seem to be that NSAIDs cause 5,000-16,500 deaths each year in the US via gastrointestinal complications, and something similar via heart attacks. These numbers are not a good way of quantifying the relative risk of drugs, because they represent different people taking different amounts for different reasons. But they do show that ibuprofen is not without risk.  ↩ There are probably some people who are too disordered to track much acetaminophen they’ve taken. For such people, ibuprofen might be the safer choice. Though I’m skeptical that many such people are found among the readers of Asterisk .  ↩ There are two cases where official sources are clear that acetaminophen is safer than ibuprofen: for use by pregnant women and small children. This doesn’t appear on the safety label, but if you’re pregnant and go to a doctor, they will probably tell you to take acetaminophen but not ibuprofen or other NSAIDs. And if you have a newborn baby, their doctor will probably tell you that you can give them acetaminophen but not ibuprofen or other NSAIDs.  ↩ Technically, for many drugs today, it is the drug manufacturer that “creates” the label, which is why they can be slightly different. However, the FDA strongly regulates what is on it, including most of the language and even details about the font and so on. The federal register contains a template the FDA published for ibuprofen which is almost identical to what appears on the side of drugs today   ↩ Unlike in most places, in the United Kingdom it seems to be perfectly legal for people to give each other medical advice, provided they don’t misrepresent themselves as licensed doctors. This is not legal advice.  ↩

0 views
David Bushell 1 months ago

RSS Club #007: Running

Today Sabastian Sawe ran an historic sub-two-hour marathon in a competitive race. A marathon is around 42 kilometres, aka 26 miles in freedom units (we use miles in the UK too but not for running distances). I feel the record is a little unfair on Kipchoge who achieved the milestone first under non-competitive conditions. Even more unfair on Kejelcha who finishing second today in 1:59:41. Two unbelievable athletes in the same day. If my maths is correct that’s not far off a 14 min 5km pace. That’s simply outrageous! My personal best for a 5km is 22 mins. With the caveat of questionable GPS cutting a park corner. My fastest half-marathon is 1:51:42. Basically half as slow as an elite marathon runner. Of course, they run half-marathons even faster. I do not believe my knees could withstand double that distance. Anyone who can drag their body 42km deserves applause. I doubt I could even sprint 100 metres as fast as these guys maintain a marathon pace. More napkin maths suggest that is 100 metres in around 16 seconds? Usain Bolt did it once in 9.58s. Maintaining a pace of 16s/100m for 42,000 metres in incredible. If the maths ain’t exciting you see this video of runners attempting to match Kipchoge’s pace . Elite sprinting in anaerobic . Long distance running is aerobic (aka “cardio”). I wont feign expertise on the exact science. All I know is the 200 metre sprint is notoriously difficult. It pushes the human body beyond what it can maintain for anaerobic sprinting. You gotta just start sucking in oxygen and try to ignore the fact that it feels like you’re dying. When it comes to superiority over other animals, top of the list is our brain and our dexterity. But more impressive I think is our endurance . Our ancestors started walking upright and evolved as persistence hunters . Prey cramps up and physically cannot move to save its life. Brutal way to go! Long-distance running is more about breathing, a steady pace, and good form to avoid injury. The perfect running shoe is less important than people want to think. A good fit matters. Pheidippides didn’t run the first marathon in Nikes (fashion sneakers fall apart instantly). He probably wore sandals or was barefoot. The most important attire is short shorts, underwear of synthetic material to keep your bits in place, and plenty of lube on the thighs. Never wear a cotton T-shirt unless you want bloody nipples. Chafing is like the boiling frog parable. You don’t realise until it’s too late and you’re walking like a cowboy for a week. Unless you’re running competitively, never compare yourself to others. It does not help you in the slightest. There is no “good” time to run any particular distance. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Gregory Gundersen 1 months ago

Brownian Motion

Imagine that a pollen particle is suspended in a glass of water. If we were to observe and record the vertical position of the particle over time, we would find that its movements were random. And if we were to plot this position, we’d get a jagged path through time (Figure 1 1 1 , left). This path would be just one of many possible paths, and if we were to repeat this observational experiment many times, we would not expect to see the same path again. Given this randomness, how can we reason about this phenomenon? Can we say anything interesting or useful about the particle? For most of human history, this was a seemingly impossible task. A key insight, a conceptual pillar in probability theory, is to separate what actually happened (Figure 1 1 1 , left) from other possible outcomes (Figure 1 1 1 , right). This approach allows us to reason about the world through counterfactuals: what are all the possible paths the pollen could have taken? How likely is each path? What can we say about the distribution of outcomes? This understanding of the pollen particle as a random process is a deep idea, and it took many decades and scientists to understand. The phenomenon was first observed in the 1830s by the Scottish botanist Robert Brown. Brown used a microscope to observe pollen particles suspended in water, and to his surprise, he saw the particles moving! At first, he thought this meant that the pollen particles were alive, but he tested and then rejected this hypothesis by observing the same effect with particles that he was convinced were inanimate, such as glass powder, minerals, and even pulverized fragments of the Egyptian Sphinx (Góra, 2006) ! For roughly half-a-century, the phenomenon remained a mystery, although it became known as Brownian motion . Then starting in 1905, Albert Einstein published a series of papers in which he hythesized that the pollen particles were moving because they were being bombarded by invisible molecules in the liquid (Einstein, 1905) . In the following year, the Polish physicist Marian Smoluchowski independently published essentially the same theory (Von Smoluchowski, 1906) . At the time, this theory was controversial, because the idea of molecules was not yet widely accepted. However, using statistical mechanics, Einstein and Smoluchowski were able to make testable predictions about the behavior of the particles, and another scientist, Jean Baptiste Perrin, verified the model a few years later (Perrin, 1909) . And since Einstein’s breakthrough work, Brownian motion has been widely studied and more deeply understood. In the mathematical community, Brownian motion was formalized by Norbert Wiener (Wiener, 1923) , and thus Brownian motion is often referred to as a Wiener process , particularly by mathematicians. The goal of this post is to better understand Brownian motion. Brownian motion is an important concept because it can be used to model many phenomenon, from particles suspended in liquids to the prices of stocks. Ultimately, we’ll reconstruct the marginal distribution of our pollen particle at any given point in time. As we will see this, this is the normal distribution. This deep connection means that we can make mathematically precise probabilistic statements about a completely random process. Let’s begin with a simplified model of our pollen particle in discrete time. This is a stochastic process called a random walk . In the next section, we’ll extend this to continuous time, which is Brownian motion. Imagine we can discretize time and then observe a single discrete “tick” on the clock. What happens to the pollen particle during this one tick? In our simple model of the world, we’re going to imagine that we flip a coin, not necessarily fair, and that the pollen particle moves up or down the same amount based on the outcome of that coin toss. The coin toss models the fact that the pollen particle is being randomly bombarded by water molecules and thus its position at the next time point is random. So the pollen particle cannot stay in place; after one tick of the clock, it moves up or down. Formally, let S 0 S_0 S 0 ​ be the particle’s initial position (non-random), and let S 1 S_1 S 1 ​ be a univariate random variable denoting the vertical position of the pollen particle after one tick. We assume that the initial position is zero ( S 0 = 0 S_0 = 0 S 0 ​ = 0 ), since this makes our calculations and notation easier and since it is simply a vertical shift in the final path. So we flip a coin with bias p p p , where p p p is the probability of heads ( H H H ) and q : = 1 − p q := 1-p q : = 1 − p is the probability of tails ( T T T ). If the coin is heads, then the pollen particle moves up u u u , and if the coin is tails, then the pollen particle moves down u u u (to − u -u − u ). Let’s denote the outcome of each coin flip as a random variable Z i Z_i Z i ​ , taking values in { − 1 , 1 } \{-1, 1\} { − 1 , 1 } . Then the position after a single coin flip is u Z 1 u Z_1 u Z 1 ​ (Figure 2 2 2 ). Now consider the position S n S_n S n ​ after n n n time steps. At each time point i i i , we flip a coin, which we assume is independent of all other coin tosses, to discover whether the pollen particle is displaced up or down from its current position. Then S n S_n S n ​ is simply the sum S n = u Z 1 + u Z 2 + ⋯ + u Z n . (1) S_n = u Z_1 + u Z_2 + \dots + u Z_n. \tag{1} S n ​ = u Z 1 ​ + u Z 2 ​ + ⋯ + u Z n ​ . ( 1 ) Since each Z i Z_i Z i ​ is random, S n S_n S n ​ is also random. Clearly, as we repeatedly flip our coin, the set of possible locations of the pollen particle expands linearly with n n n . We can visualize all these possible locations as a directed graph or tree, sometimes called a binomial tree (Figure 3 3 3 , left)—we’ll explain the name in a moment. The tree layers (vertical slices) are zero-indexed, and so the root node occurs at time n = 0 n=0 n = 0 . Each node is a possible location, and the n n n -th layer is all possible locations by time n n n . The directed edges (left to right) are valid moves of the pollen particle. A path in this binomial tree is a sequence of steps which starts at the tree’s root (left-most node) and continues right at each time step until it reaches a leaf node (right-most node). A valid path is one that always moves left-to-right, from root to leaf. A valid path cannot, for example, move straight down at the same time point or move backwards. To help us identify nodes, let’s introduce the counting number k k k , which indexes the leaf nodes, taking values in k ∈ { 0 , 1 , … , n } k \in \{0, 1, \dots, n\} k ∈ { 0 , 1 , … , n } . Like the time index n n n , the number k k k is a zero-based index. Let’s denote the bottom leaf node with k = 0 k=0 k = 0 and the top leaf node with k = n k=n k = n . To illustrate this, I’ve visualized the tree with the nodes labeled with tuples ( n , k ) (n, k) ( n , k ) (Figure 3 3 3 , right). Now that we understand this simple, discrete-time model for our pollen particle, let’s tackle our motivating question: which outcomes (leaf nodes) are most likely? Any given path is random, but can we say something about the distribution of outcomes? To start, let’s compute the probability of arriving at the highlighted leaf node in Figure 4 4 4 . This is really the probability of arriving at a given node ( n , k ) (n, k) ( n , k ) , which in turn is really the probability of flipping k k k heads in n n n coin tosses. Let’s use K n K_n K n ​ for this random variable. Arriving at this node requires that we flip two heads and one tails. The probability of this is P ( { two heads and one tails } ) = p 2 q . (2) \mathbb{P}\left(\{ \text{two heads and one tails} \}\right) = p^2 q. \tag{2} P ( { two heads and one tails } ) = p 2 q . ( 2 ) However, there are three ways flip two heads in three coin tosses, { H H T , H T H , T H H } , (3) \{ HHT, HTH, THH \}, \tag{3} { H H T , H T H , T H H } , ( 3 ) which is another way of saying that there are three paths to the highlighted node. Since each path is a mutually exclusive outcome, we compute our desired probability by summing the probability of all outcomes in Equation 2 2 2 by the number of paths: P ( { arriving at node  ( 3 , 2 ) } ) = P ( K 3 = 2 ) = 3 p 2 q . (4) \mathbb{P}\left(\{ \text{arriving at node $(3, 2)$} \}\right) = \mathbb{P}(K_3 = 2) = 3 p^2 q. \tag{4} P ( { arriving at node  ( 3 , 2 ) } ) = P ( K 3 ​ = 2 ) = 3 p 2 q . ( 4 ) For example, if p = 1 / 2 p=1/2 p = 1 / 2 , then this probability would be 3 / 8 3/8 3 / 8 . To compute this probability in general, we just need a way to compute the number of ways to get k k k successes or heads in n n n trials. Since order matters, the number of ways to pick k k k heads from n n n coin tosses is n ( n − 1 ) ( n − 2 ) … ( n − k + 1 ) = n ! ( n − k ) ! . (5) n (n-1) (n-2) \dots (n-k+1) = \frac{n!}{(n-k)!}. \tag{5} n ( n − 1 ) ( n − 2 ) … ( n − k + 1 ) = ( n − k ) ! n ! ​ . ( 5 ) First, we can choose any of n n n coin tosses to be a heads. Then we can pick any of n − 1 n-1 n − 1 coins tosses to be heads. And so on, until we have k − 1 k-1 k − 1 heads. (The last pick is completely constrained.) However, this overcounts the possible paths. For example, this does not distinguish between H 1 H 2 H_1 H_2 H 1 ​ H 2 ​ and H 2 H 1 H_2 H_1 H 2 ​ H 1 ​ , where the subscript i i i denotes the i i i -th coin toss. So we need to divide the permutation in Equation 5 5 5 by the number of ways we can order elements in a k k k -sized set. This is k k k factorial. Putting this together, we see that the number of ways to get to each node in the binomial tree is ordered ways to pick  k  heads from  n  tosses permutations of  k  heads       =       n ( n − 1 ) ( n − 2 ) … ( n − k + 1 ) k ( k − 1 ) ( k − 2 ) … 1 . (6) \frac{\text{ordered ways to pick $k$ heads from $n$ tosses}}{\text{permutations of $k$ heads}} \;\; = \;\; \frac{n(n-1)(n-2) \dots (n-k+1)}{k(k-1)(k-2) \dots 1}. \tag{6} permutations of  k  heads ordered ways to pick  k  heads from  n  tosses ​ = k ( k − 1 ) ( k − 2 ) … 1 n ( n − 1 ) ( n − 2 ) … ( n − k + 1 ) ​ . ( 6 ) This number in Equation 6 6 6 is often called the binomial coefficient , pronounced “ n n n choose k k k ”, and is denoted as ( n k ) ≜ n ! k ! ( n − k ) ! (7) {n \choose k} \triangleq \frac{n!}{k! (n-k)!} \tag{7} ( k n ​ ) ≜ k ! ( n − k ) ! n ! ​ ( 7 ) Putting it all together, the probability of arriving at the k k k -th node in the n n n -th layer of a binomial tree is P ( { K n = k } ) = ( n k ) p k q n − k . (8) \mathbb{P}(\{K_n = k\}) = {n \choose k} p^k q^{n-k}. \tag{8} P ( { K n ​ = k } ) = ( k n ​ ) p k q n − k . ( 8 ) The fact that these probabilities sum to one is just a trivial application of the binomial theorem . See A1 . Computing the mean and variance of K n K_n K n ​ is relatively straightforward. We can view K n K_n K n ​ as the sum of independent Bernoulli random variables, so K n = Z 1 + 1 2 + Z 2 + 1 2 + ⋯ + Z n + 1 2 . (9) K_n = \frac{Z_1 + 1}{2} + \frac{Z_2 + 1}{2} + \dots + \frac{Z_n + 1}{2}. \tag{9} K n ​ = 2 Z 1 ​ + 1 ​ + 2 Z 2 ​ + 1 ​ + ⋯ + 2 Z n ​ + 1 ​ . ( 9 ) The mean of each Z i Z_i Z i ​ is 2 p − 1 2p - 1 2 p − 1 , and the first two moments are easy to compute: E [ K n ] = ∑ i = 1 n E  ⁣ [ Z i + 1 2 ] = n p , V [ K n ] = ∑ i = 1 n V  ⁣ [ Z i + 1 2 ] = n p q . (10) \begin{aligned} \mathbb{E}[K_n] &= \sum_{i=1}^n \mathbb{E}\!\left[\frac{Z_i + 1}{2}\right] = np, \\ \mathbb{V}[K_n] &= \sum_{i=1}^{n} \mathbb{V}\!\left[\frac{Z_i + 1}{2}\right] = npq. \end{aligned} \tag{10} E [ K n ​ ] V [ K n ​ ] ​ = i = 1 ∑ n ​ E [ 2 Z i ​ + 1 ​ ] = n p , = i = 1 ∑ n ​ V [ 2 Z i ​ + 1 ​ ] = n p q . ​ ( 1 0 ) For the variance calculation, we use Bienaymé’s identity and the fact that ( Z i + 1 ) / 2 (Z_i+1)/2 ( Z i ​ + 1 ) / 2 are independent. The distribution of K n K_n K n ​ was first discovered by Jakob Bernoulli and is called the binomial distribution after the binomial terms implicit in n n n choose k k k —again, see A1 —and hence the “binomial tree.” Of course, K n K_n K n ​ is the distribution on the number of heads in n n n coin tosses, while we’re more interested in the location of the pollen particle. But there’s a simple relationship between the two. The location S n S_n S n ​ is simply the number of up moves ( K n K_n K n ​ ), minus the number of down moves ( n − K n n - K_n n − K n ​ ), scaled by the size of the move u u u . In other words, it is: S n = u ( Z 1 + Z 2 + ⋯ + Z n ) = u ( K n − ( n − K n ) ) = u ( 2 K n − n ) . (11) \begin{aligned} S_n &= u \left( Z_1 + Z_2 + \dots + Z_n \right) \\ &= u \left(K_n - (n - K_n)\right) \\ &= u \left(2 K_n - n\right). \end{aligned} \tag{11} S n ​ ​ = u ( Z 1 ​ + Z 2 ​ + ⋯ + Z n ​ ) = u ( K n ​ − ( n − K n ​ ) ) = u ( 2 K n ​ − n ) . ​ ( 1 1 ) Since n n n and u u u are non-random, the events { S n = u ( 2 k − n ) } \{S_n = u(2k-n)\} { S n ​ = u ( 2 k − n ) } and { K n = k } \{K_n=k\} { K n ​ = k } are identical. So we can say P ( { S n = u ( 2 k − n ) } ) = P ( { K n = k } ) = ( n k ) p k q n − k . (12) \mathbb{P}(\{S_n = u(2k-n)\}) = \mathbb{P}(\{K_n = k\}) = {n \choose k} p^k q^{n-k}. \tag{12} P ( { S n ​ = u ( 2 k − n ) } ) = P ( { K n ​ = k } ) = ( k n ​ ) p k q n − k . ( 1 2 ) So the location of our pollen particle by a given layer n n n is determined by the distribution of a random variable K n K_n K n ​ with the probability mass function (PMF) in Equation 8 8 8 . While K n K_n K n ​ and S n S_n S n ​ have the same probabilities, they clearly have different moments. The first two moments of S n S_n S n ​ are: E [ S n ] = E [ u ( 2 K n − n ) ] = 2 n u ( p − 1 / 2 ) , V [ S n ] = V [ u ( 2 K n − n ) ] = V [ 2 u K n ] = 4 u 2 n p q . (13) \begin{aligned} \mathbb{E}[S_n] &= \mathbb{E}[u(2K_n - n)] = 2nu(p-1/2), \\ \mathbb{V}[S_n] &= \mathbb{V}[u(2K_n - n)] = \mathbb{V}[2 u K_n] = 4 u^2 npq. \end{aligned} \tag{13} E [ S n ​ ] V [ S n ​ ] ​ = E [ u ( 2 K n ​ − n ) ] = 2 n u ( p − 1 / 2 ) , = V [ u ( 2 K n ​ − n ) ] = V [ 2 u K n ​ ] = 4 u 2 n p q . ​ ( 1 3 ) In the special case in which p = 1 / 2 p = 1/2 p = 1 / 2 , then E [ S n ] = 0 , V [ S n ] = n u 2 . (14) \begin{aligned} \mathbb{E}[S_n] &= 0, \\ \mathbb{V}[S_n] &= n u^2. \end{aligned} \tag{14} E [ S n ​ ] V [ S n ​ ] ​ = 0 , = n u 2 . ​ ( 1 4 ) We can explore this distribution by plotting the function for various parameterizations (Figure 5 5 5 ). Note that while K n K_n K n ​ is binomially distributed and while S n S_n S n ​ and K n K_n K n ​ have the same probability function, S n S_n S n ​ is not binomially distributed. That’s because the binomial distribution only has support over the non-negative integers. It’s a distribution over repeated coin flips. But S n S_n S n ​ has support over the negative numbers. I don’t think the distribution of S n S_n S n ​ has a name, but speaking loosely, it is essentially a binomial distribution mean-centered at zero. And another way to visualize this is to imagine larger and larger binomial trees (Figure 6 6 6 ). The distribution for the locations S n S_n S n ​ for n ∈ { 6 , 20 , 80 } n \in \{6, 20, 80\} n ∈ { 6 , 2 0 , 8 0 } are the distributions in Figure 5 5 5 . To summarize so far, we have done something remarkable. We have modeled the motion of a completely random particle, and yet we can say something concrete and precise about its distribution of locations over time. To do this, however, we had to assume that time was discrete. So the natural next question is: what’s the distribution of our process in the continuous-time limit? At this point in our story, it is not far-fetched to guess that it’s the normal distribution. De Moivre proved the De Moivre–Laplace theorem , the earliest version of a central limit theorem (CLT), in 1738, so roughly a hundred years before Robert Brown observed Brownian motion. So scientists and mathematicians already knew that a sum of independent and identically distributed random variables converge to a Gaussian. The key insight in the development of Brownian motion was to realize that the bombardment of a pollen particle could be modeled as such as a sum. So now let’s imagine what happens when the molecular bombardments on our pollen particle increase in number but decrease proportionally in impact. So we have more bombardments but they move the pollen particle less per bombardment. This rescaling is critical, or else the variance of our process would explode. Put in physical terms, if we increased the number of bombardments of our pollen particle but did not scale down the size of the move, the pollen particle’s moves would grow implausibly large. To formalize this, let’s first fix p = 1 / 2 p = 1/2 p = 1 / 2 so that E [ Z i ] = 0 \mathbb{E}[Z_i] = 0 E [ Z i ​ ] = 0 . We’ll handle the asymmetric case later. Now suppose that there are n n n bombardments per unit of physical time t t t , so for any fixed amount of physical time t > 0 t \gt 0 t > 0 , we can model the location of our pollen particle as B t ( n ) = u Z 1 + u Z 2 + ⋯ + u Z ⌊ t n ⌋ , u : = 1 n . (15) B_t^{(n)} = u Z_1 + u Z_2 + \dots + u Z_{\lfloor tn \rfloor}, \quad u := \frac{1}{\sqrt{n}}. \tag{15} B t ( n ) ​ = u Z 1 ​ + u Z 2 ​ + ⋯ + u Z ⌊ t n ⌋ ​ , u : = n ​ 1 ​ . ( 1 5 ) The notation ⌊ t n ⌋ {\lfloor tn \rfloor} ⌊ t n ⌋ just indicates flooring to an integer since t t t is a positive real number. And we need u u u to scale with n n n , and so we set u = 1 / n u = 1 / \sqrt{n} u = 1 / n ​ . If we take n → ∞ n \rightarrow \infty n → ∞ , then we get a continuous-time limit of a random walk: B t = lim ⁡ n → ∞ B t ( n ) . (16) B_t = \lim_{n \rightarrow \infty} B_t^{(n)}. \tag{16} B t ​ = n → ∞ lim ​ B t ( n ) ​ . ( 1 6 ) So again, we hold physical time t t t fixed, and we make our binomial tree finer and finer (larger n n n for fixed t t t ). If we remove the grid of the binomial tree which clutters the visualization, and just visualize paths for finer and finer n n n , we can create visualizations similar to Figure 6 6 6 but for much larger n n n (Figure 7 7 7 ). Now we can ask the same question we asked in the discrete-time case: after physical time t t t , what is the distribution of our pollen particle’s position? As we observed above, it must be a normal distribution! Here, the insight is not that the binomial distribution converges to the normal distribution—again, this was known a hundred years before Robert Brown’s observations. The insight is that by modeling the continuous-time limit of a random walk as in Equation 15 15 1 5 , this rescaled random walk B t ( n ) B_t^{(n)} B t ( n ) ​ converges to a normal distribution N ( 0 , t ) \mathcal{N}(0, t) N ( 0 , t ) . Let’s see this a bit more formally. The De Moivre–Laplace theorem states that a properly standardized binomial random variable converges to the normal distribution. In our notation, K n ∼ binom ( n , p ) K_n \sim \text{binom}(n, p) K n ​ ∼ binom ( n , p ) with p = 1 / 2 p=1/2 p = 1 / 2 , and the theorem states: K n − E [ K n ] V [ K n ] = K n − n p n p q = K n − n / 2 n / 4    → d    N ( 0 , 1 ) . (17) \frac{K_n - \mathbb{E}[K_n]}{\sqrt{\mathbb{V}[K_n]}} = \frac{K_n - np}{\sqrt{npq}} = \frac{K_n - n/2}{\sqrt{n/4}} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(0, 1). \tag{17} V [ K n ​ ] ​ K n ​ − E [ K n ​ ] ​ = n p q ​ K n ​ − n p ​ = n / 4 ​ K n ​ − n / 2 ​ → d N ( 0 , 1 ) . ( 1 7 ) Now observe that B t ( n ) B_t^{(n)} B t ( n ) ​ is essentially this standardized quantity up to rescaling: B t ( n ) = u Z 1 + u Z 2 + ⋯ + u Z ⌊ t n ⌋ = 1 n ( Z 1 + Z 2 + ⋯ + Z ⌊ t n ⌋ ) = 1 n ( 2 K ⌊ t n ⌋ − ⌊ t n ⌋ ) = 1 n ⌊ t n ⌋ ⌊ t n ⌋    2 ( K ⌊ t n ⌋ − ⌊ t n ⌋ / 2 ) = ⌊ t n ⌋ n K ⌊ t n ⌋ − ⌊ t n ⌋ / 2 ⌊ t n ⌋ / 4 . (18) \begin{aligned} B_t^{(n)} &= u Z_1 + u Z_2 + \dots + u Z_{\lfloor tn \rfloor} \\ &= \frac{1}{\sqrt{n}} \left( Z_1 + Z_2 + \dots + Z_{\lfloor tn \rfloor} \right) \\ &= \frac{1}{\sqrt{n}} \left(2 K_{\lfloor tn \rfloor} - {\lfloor tn \rfloor} \right) \\ &= \frac{1}{\sqrt{n}} \frac{\sqrt{\lfloor tn \rfloor}}{\sqrt{\lfloor tn \rfloor}} \; 2 \left(K_{\lfloor tn \rfloor} - {\lfloor tn \rfloor}/2 \right) \\ &= \sqrt{\frac{\lfloor tn \rfloor}{n}} \frac{K_{\lfloor tn \rfloor} - {\lfloor tn \rfloor}/2}{\sqrt{\lfloor tn \rfloor / 4}}. \end{aligned} \tag{18} B t ( n ) ​ ​ = u Z 1 ​ + u Z 2 ​ + ⋯ + u Z ⌊ t n ⌋ ​ = n ​ 1 ​ ( Z 1 ​ + Z 2 ​ + ⋯ + Z ⌊ t n ⌋ ​ ) = n ​ 1 ​ ( 2 K ⌊ t n ⌋ ​ − ⌊ t n ⌋ ) = n ​ 1 ​ ⌊ t n ⌋ ​ ⌊ t n ⌋ ​ ​ 2 ( K ⌊ t n ⌋ ​ − ⌊ t n ⌋ / 2 ) = n ⌊ t n ⌋ ​ ​ ⌊ t n ⌋ / 4 ​ K ⌊ t n ⌋ ​ − ⌊ t n ⌋ / 2 ​ . ​ ( 1 8 ) By De Moivre–Laplace, we can say: K ⌊ t n ⌋ − ⌊ t n ⌋ / 2 ⌊ t n ⌋ / 4    → d    N ( 0 , 1 ) . (19) \frac{K_{\lfloor tn \rfloor} - {\lfloor tn \rfloor}/2}{\sqrt{\lfloor tn \rfloor / 4}} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(0, 1). \tag{19} ⌊ t n ⌋ / 4 ​ K ⌊ t n ⌋ ​ − ⌊ t n ⌋ / 2 ​ → d N ( 0 , 1 ) . ( 1 9 ) And as n → ∞ n \rightarrow \infty n → ∞ , we can see that the prefactor converges to t \sqrt{t} t ​ : ⌊ t n ⌋ n    →    t . (20) \sqrt{\frac{\lfloor tn \rfloor}{n}} \;\rightarrow\; \sqrt{t}. \tag{20} n ⌊ t n ⌋ ​ ​ → t ​ . ( 2 0 ) Since the standardized binomial converges in distribution to N ( 0 , 1 ) \mathcal{N}(0, 1) N ( 0 , 1 ) and the prefactor converges to the constant t \sqrt{t} t ​ , we can see that B t ( n )    → d    t    N ( 0 , 1 ) = N ( 0 , t ) . (21) B_t^{(n)} \;\stackrel{d}{\rightarrow}\; \sqrt{t} \; \mathcal{N}(0, 1) = \mathcal{N}(0, t). \tag{21} B t ( n ) ​ → d t ​ N ( 0 , 1 ) = N ( 0 , t ) . ( 2 1 ) That’s it! As an aside, I think that in a modern treatment, we would invoke Slutsky’s theorem to arrive at Equation 21 21 2 1 . Slutsky’s theorem states that if a sequence of random variables converges in distribution and is multiplied by a sequence converging to a constant, then the product converges in distribution to the constant times the limit. The geometric interpretation of this is that the marginal distribution after time t t t is simply the normal distribution N ( 0 , t ) \mathcal{N}(0, t) N ( 0 , t ) (Figure 8 8 8 ). ​ ) are denoted with dashed lines. Now that we see the simplest version of the derivation in its entirety, we can make two important adjustments. First, notice that our bombardment factor u = 1 / n u = 1/\sqrt{n} u = 1 / n ​ has no physical meaning. The denominator just ensures convergence, and so this bombardment has unit scale. But we can introduce a parameter σ \sigma σ which captures the physical scale of the bombardment. Concretely, let u = σ n . (22) u = \frac{\sigma}{\sqrt{n}}. \tag{22} u = n ​ σ ​ . ( 2 2 ) It’s easy to see that this will flow through the derivation in Equation 18 18 1 8 and give us σ B t ( n )    → d    σ t    N ( 0 , 1 ) = N ( 0 , σ 2 t ) . (23) \sigma B_t^{(n)} \;\stackrel{d}{\rightarrow}\; \sigma \sqrt{t} \; \mathcal{N}(0, 1) = \mathcal{N}(0, \sigma^2 t). \tag{23} σ B t ( n ) ​ → d σ t ​ N ( 0 , 1 ) = N ( 0 , σ 2 t ) . ( 2 3 ) But I think the more interesting adjustment is adding a drift parameter μ \mu μ . Of course, we could just shift our Brownian motion directly: μ + σ B t ( n )    → d    N ( μ , σ 2 t ) . (24) \mu + \sigma B_t^{(n)} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(\mu, \sigma^2 t). \tag{24} μ + σ B t ( n ) ​ → d N ( μ , σ 2 t ) . ( 2 4 ) But this has no physical meaning for our process. It’s just an arbitrary shift, not a drift. A richer way to approach this is to encode it directly into the bias of our coin flip. Intuitively, if we flip a biased coin (so p ≠ 1 / 2 p \neq 1/2 p  ​ = 1 / 2 ), then the position our pollen particle will drift over time (Figure 9 9 9 ). However, there’s a problem with this approach: since p p p is constrained to [ 0 , 1 ] [0, 1] [ 0 , 1 ] , then E [ Z i ] \mathbb{E}[Z_i] E [ Z i ​ ] is constrained to [ − 1 , 1 ] [-1, 1] [ − 1 , 1 ] , and thus the mean of our Brownian motion is constrained to [ − t , t ] [-t, t] [ − t , t ] : E [ Z i ] = 2 p − 1 , E  ⁣ [ B t ( n ) ] = 1 n ⌊ t n ⌋ E [ Z 1 ] . (25) \begin{aligned} \mathbb{E}[Z_i] &= 2p - 1, \\ \mathbb{E}\!\left[B_t^{(n)}\right] &= \frac{1}{\sqrt{n}} \lfloor tn \rfloor \mathbb{E}[Z_1]. \end{aligned} \tag{25} E [ Z i ​ ] E [ B t ( n ) ​ ] ​ = 2 p − 1 , = n ​ 1 ​ ⌊ t n ⌋ E [ Z 1 ​ ] . ​ ( 2 5 ) A more elegant approach is to make p p p a function μ \mu μ . However, we cannot naively do this, since our drift could explode as n → ∞ n \rightarrow \infty n → ∞ . So we need to normalize μ \mu μ by n n n . Consider this definition for our bias parameter, now p n p_n p n ​ : p n = 1 2 + μ 2 σ n . (26) p_n = \frac{1}{2} + \frac{\mu}{2 \sigma \sqrt{n}}. \tag{26} p n ​ = 2 1 ​ + 2 σ n ​ μ ​ . ( 2 6 ) Intuitively, the factor μ / ( 2 σ n ) \mu /(2 \sigma \sqrt{n}) μ / ( 2 σ n ​ ) is the precise rate at which the bias of our coin has to vanish as we increase the number of bombardments n n n per unit of physical time t t t . So the mean of each Z i Z_i Z i ​ is E [ Z i ] = 2 p n − 1 = μ σ n , (27) \mathbb{E}[Z_i] = 2 p_n - 1 = \frac{\mu}{\sigma \sqrt{n}}, \tag{27} E [ Z i ​ ] = 2 p n ​ − 1 = σ n ​ μ ​ , ( 2 7 ) and so the mean of our process—let’s denote it as X n ( n ) X_n^{(n)} X n ( n ) ​ since it is no longer standardized—converges to μ t \mu t μ t as n → ∞ n \rightarrow \infty n → ∞ : E  ⁣ [ X t ( n ) ] = σ n ⌊ t n ⌋ E [ Z 1 ] = σ n ⌊ t n ⌋ μ σ n    →    μ t . (28) \mathbb{E}\!\left[X_t^{(n)}\right] = \frac{\sigma}{\sqrt{n}} \lfloor tn \rfloor \mathbb{E}[Z_1] = \frac{\sigma}{\sqrt{n}} \lfloor tn \rfloor \frac{\mu}{\sigma \sqrt{n}} \;\rightarrow\; \mu t. \tag{28} E [ X t ( n ) ​ ] = n ​ σ ​ ⌊ t n ⌋ E [ Z 1 ​ ] = n ​ σ ​ ⌊ t n ⌋ σ n ​ μ ​ → μ t . ( 2 8 ) Putting these two adjustments together—one for the drift and one for the size of the bombardment—we can see that the general result is non-standard Brownian motion: X t ( n )    → d    N ( μ t , σ 2 t ) . (29) X_t^{(n)} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(\mu t, \sigma^2 t). \tag{29} X t ( n ) ​ → d N ( μ t , σ 2 t ) . ( 2 9 ) Alternatively, we could simply rewrite the main derivation (Equation 18 18 1 8 ) using u u u and p n p_n p n ​ as defined in Equations 22 22 2 2 and 26 26 2 6 respectively. This is arguably the more elegant derivation, since we construct the marginal distribution from the ground up. See A2 for this derivation. Note that this isn’t a proof that the rescaled random walk converges to Brownian motion as a process . That requires more advanced mathematics such as Donsker’s theorem . Rather, it’s a claim about its marginal distribution at any fixed time t t t . But I think this provides amazing intuition for what Brownian motion really is without requiring much beyond elementary probability. I still remember sitting in class for a course on probability and random process and watching the professor churn through the algebra to produce the insight in Equation 19 19 1 9 . It felt surprising and then obvious. The normal distribution is everywhere precisely because it is the limiting distribution for sums of independent and identically distributed random variables. We can shift or scale our random walk. We can make it asymmetric. It doesn’t really matter. We’ll still converge to a normal. And in my mind, this derivation builds good intuition for other properties of Brownian motion. For example, we can say that Brownian motion is a martingale or that it has stationary Gausian increments. The mathematics needed to make these claims precise might require some work, but the basic intuition is encoded in the derivations and visualizations above. The binomial theorem is the following identity, which holds for any non-negative integer power n n n : ( x + y ) n = ∑ k = 0 n ( n k ) x k y n − k . (A1.1) (x + y)^n = \sum_{k=0}^n {n \choose k} x^k y^{n-k}. \tag{A1.1} ( x + y ) n = k = 0 ∑ n ​ ( k n ​ ) x k y n − k . ( A 1 . 1 ) This is easy to prove by induction. One can trivially check that the base case holds. And the inductive step is as follows: ( x + y ) n ( x + y ) = ∑ k = 0 n ( n k ) x k y n − k ( x + y ) = ∑ k = 0 n ( n k ) x k + 1 y n − k + ∑ k = 0 n ( n k ) x k y n − k + 1 : = A + B . (A1.2) \begin{aligned} (x + y)^n (x + y) &= \sum_{k=0}^n {n \choose k} x^k y^{n-k} (x + y) \\ &= \sum_{k=0}^n {n \choose k} x^{k+1} y^{n-k} + \sum_{k=0}^n {n \choose k} x^k y^{n-k+1} \\ &:= A + B. \end{aligned} \tag{A1.2} ( x + y ) n ( x + y ) ​ = k = 0 ∑ n ​ ( k n ​ ) x k y n − k ( x + y ) = k = 0 ∑ n ​ ( k n ​ ) x k + 1 y n − k + k = 0 ∑ n ​ ( k n ​ ) x k y n − k + 1 : = A + B . ​ ( A 1 . 2 ) If we write each sum A A A and B B B explicitly, it’s clear that we have n − 1 n-1 n − 1 overlapping terms: A = ( n 0 ) x 1 y n + ( n 1 ) x 2 y n − 1 + ⋯ + ( n n − 1 ) x n y 1 + ( n n ) x n + 1 y 0 , B = ( n 0 ) x 0 y n + 1 + ( n 1 ) x 1 y n + ⋯ + ( n n − 1 ) x n − 1 y 2 + ( n n ) x n y 1 . (A1.3) \begin{aligned} A &= {n \choose 0} x^1 y^n + {n \choose 1} x^2 y^{n-1} + \dots + {n \choose n-1} x^n y^1 + {n \choose n} x^{n+1} y^0, \\ \\ B &= {n \choose 0} x^0 y^{n+1} + {n \choose 1} x^1 y^n + \dots + {n \choose n-1} x^{n-1} y^2 + {n \choose n} x^n y^1. \end{aligned} \tag{A1.3} A B ​ = ( 0 n ​ ) x 1 y n + ( 1 n ​ ) x 2 y n − 1 + ⋯ + ( n − 1 n ​ ) x n y 1 + ( n n ​ ) x n + 1 y 0 , = ( 0 n ​ ) x 0 y n + 1 + ( 1 n ​ ) x 1 y n + ⋯ + ( n − 1 n ​ ) x n − 1 y 2 + ( n n ​ ) x n y 1 . ​ ( A 1 . 3 ) Collecting the n − 1 n-1 n − 1 like terms, we get: A + B = [ ( n 0 ) + ( n 1 ) ] x 1 y n + ⋯ + [ ( n n − 1 ) + ( n n ) ] x n y 1 + ( n 0 ) x 0 y n + 1 + ( n n ) x n + 1 y 0 . (A1.4) \begin{aligned} A+B &= \left[{n \choose 0} + {n \choose 1}\right] x^1 y^n + \dots + \left[{n \choose n-1} + {n \choose n}\right] x^n y^1 \\ &\quad + {n \choose 0} x^0 y^{n+1} + {n \choose n} x^{n+1} y^0. \end{aligned} \tag{A1.4} A + B ​ = [ ( 0 n ​ ) + ( 1 n ​ ) ] x 1 y n + ⋯ + [ ( n − 1 n ​ ) + ( n n ​ ) ] x n y 1 + ( 0 n ​ ) x 0 y n + 1 + ( n n ​ ) x n + 1 y 0 . ​ ( A 1 . 4 ) Finally, we can use the following identity to collapse bracketed binomial coefficients: ( n k ) = ( n − 1 k − 1 ) + ( n − 1 k ) . (A1.5) {n \choose k} = {n-1 \choose k-1} + {n-1 \choose k}. \tag{A1.5} ( k n ​ ) = ( k − 1 n − 1 ​ ) + ( k n − 1 ​ ) . ( A 1 . 5 ) And we can rewrite the non-overlapping terms in terms of n + 1 n+1 n + 1 since ( n 0 ) = ( n + 1 0 ) = ( n n ) = ( n + 1 n + 1 ) = 1. (A1.6) {n \choose 0} = {n+1 \choose 0} = {n \choose n} = {n+1 \choose n+1} = 1. \tag{A1.6} ( 0 n ​ ) = ( 0 n + 1 ​ ) = ( n n ​ ) = ( n + 1 n + 1 ​ ) = 1 . ( A 1 . 6 ) This completes the inductive step: ( x + y ) n + 1 = ( n + 1 1 ) x 1 y n + ⋯ + ( n + 1 n ) x n y 1 + ( n + 1 0 ) x 0 y n + 1 + ( n + 1 n + 1 ) x n + 1 y 0 = ∑ k = 1 n + 1 ( n + 1 k ) x k y n + 1 − k . (A1.7) \begin{aligned} &(x+y)^{n+1} \\ &= {n+1 \choose 1} x^1 y^n + \dots + {n+1 \choose n} x^n y^1 + {n+1 \choose 0} x^0 y^{n+1} + {n+1 \choose n+1} x^{n+1} y^0 \\ &= \sum_{k=1}^{n+1} {n+1 \choose k} x^k y^{n+1-k}. \end{aligned} \tag{A1.7} ​ ( x + y ) n + 1 = ( 1 n + 1 ​ ) x 1 y n + ⋯ + ( n n + 1 ​ ) x n y 1 + ( 0 n + 1 ​ ) x 0 y n + 1 + ( n + 1 n + 1 ​ ) x n + 1 y 0 = k = 1 ∑ n + 1 ​ ( k n + 1 ​ ) x k y n + 1 − k . ​ ( A 1 . 7 ) Finally, the fact that the binomial distribution normalizes—discussed around Equation 8 8 8 —is simply a direct application of the binomial theorem for x = p x = p x = p and y = 1 − p y=1-p y = 1 − p . Let p n p_n p n ​ be defined as p n = 1 2 + μ 2 σ n . (A3.1) p_n = \frac{1}{2} + \frac{\mu}{2 \sigma \sqrt{n}}. \tag{A3.1} p n ​ = 2 1 ​ + 2 σ n ​ μ ​ . ( A 3 . 1 ) Then clearly E [ Z i ] = 2 p n − 1 = μ σ n , μ K : = E [ K ⌊ t n ⌋ ] = ⌊ t n ⌋ p n = ⌊ t n ⌋ ( 1 2 + μ 2 σ n ) , σ K 2 : = V [ K ⌊ t n ⌋ ] = ⌊ t n ⌋ p n ( 1 − p n ) = ⌊ t n ⌋ ( 1 2 + μ 2 σ n ) ( 1 2 − μ 2 σ n ) = ⌊ t n ⌋ ( 1 4 − μ 2 4 σ 2 n ) . (A3.2) \begin{aligned} \mathbb{E}[Z_i] &= 2 p_n - 1 = \frac{\mu}{\sigma \sqrt{n}}, \\\\ \mu_{K} := \mathbb{E}[K_{\lfloor tn \rfloor}] &= {\lfloor tn \rfloor} p_n \\ &= {\lfloor tn \rfloor} \left( \frac{1}{2} + \frac{\mu}{2 \sigma \sqrt{n}}\right), \\\\ \sigma_{K}^2 := \mathbb{V}[K_{\lfloor tn \rfloor}] &= {\lfloor tn \rfloor} p_n (1 - p_n) \\ &= {\lfloor tn \rfloor} \left( \frac{1}{2} + \frac{\mu}{2 \sigma \sqrt{n}}\right) \left( \frac{1}{2} - \frac{\mu}{2 \sigma \sqrt{n}}\right) \\ &= {\lfloor tn \rfloor} \left( \frac{1}{4} - \frac{\mu^2}{4 \sigma^2 n}\right). \end{aligned} \tag{A3.2} E [ Z i ​ ] μ K ​ : = E [ K ⌊ t n ⌋ ​ ] σ K 2 ​ : = V [ K ⌊ t n ⌋ ​ ] ​ = 2 p n ​ − 1 = σ n ​ μ ​ , = ⌊ t n ⌋ p n ​ = ⌊ t n ⌋ ( 2 1 ​ + 2 σ n ​ μ ​ ) , = ⌊ t n ⌋ p n ​ ( 1 − p n ​ ) = ⌊ t n ⌋ ( 2 1 ​ + 2 σ n ​ μ ​ ) ( 2 1 ​ − 2 σ n ​ μ ​ ) = ⌊ t n ⌋ ( 4 1 ​ − 4 σ 2 n μ 2 ​ ) . ​ ( A 3 . 2 ) Let’s redefine X t ( n ) X_t^{(n)} X t ( n ) ​ as the following sequence: X t ( n ) = u Z 1 + u Z 2 + ⋯ + u Z ⌊ t n ⌋ , u : = σ n . (A3.3) X_t^{(n)} = u Z_1 + u Z_2 + \dots + u Z_{\lfloor tn \rfloor}, \quad u := \frac{\sigma}{\sqrt{n}}. \tag{A3.3} X t ( n ) ​ = u Z 1 ​ + u Z 2 ​ + ⋯ + u Z ⌊ t n ⌋ ​ , u : = n ​ σ ​ . ( A 3 . 3 ) We can write this as: X t ( n ) = σ n [ Z 1 + Z 2 + ⋯ + Z ⌊ t n ⌋ ] = σ n [ 2 K ⌊ t n ⌋ − ⌊ t n ⌋ ] = σ n    2 [ K ⌊ t n ⌋ − ⌊ t n ⌋ 1 2 ] = σ n    2 [ K ⌊ t n ⌋ − ⌊ t n ⌋ 1 2 − ⌊ t n ⌋ ( μ 2 σ n ) + ⌊ t n ⌋ ( μ 2 σ n ) ] = σ n    2 [ K ⌊ t n ⌋ − ⌊ t n ⌋ ( 1 2 − μ 2 σ n ) ] + ⌊ t n ⌋ μ n = σ n    2 [ K ⌊ t n ⌋ − μ K ] + ⌊ t n ⌋ μ n = σ n 4 ⌊ t n ⌋ ⌊ t n ⌋ 1 − μ σ 2 n 1 − μ σ 2 n [ K ⌊ t n ⌋ − μ K ] + ⌊ t n ⌋ μ n = σ n ⌊ t n ⌋ ( 1 − μ σ 2 n ) [ K ⌊ t n ⌋ − μ K ⌊ t n ⌋ ( 1 − μ σ 2 n ) 4 ] + ⌊ t n ⌋ μ n = σ n ⌊ t n ⌋ ( 1 − μ σ 2 n ) [ K ⌊ t n ⌋ − μ K σ K ] + ⌊ t n ⌋ μ n (A3.4) \begin{aligned} X_t^{(n)} &= \frac{\sigma}{\sqrt{n}} \left[ Z_1 + Z_2 + \dots + Z_{\lfloor tn \rfloor} \right] \\ &= \frac{\sigma}{\sqrt{n}} \left[ 2 K_{\lfloor tn \rfloor} - {\lfloor tn \rfloor} \right] \\ &= \frac{\sigma}{\sqrt{n}} \; 2 \left[ K_{\lfloor tn \rfloor} - \lfloor tn \rfloor \frac{1}{2} \right] \\ &= \frac{\sigma}{\sqrt{n}} \; 2 \left[ K_{\lfloor tn \rfloor} - \lfloor tn \rfloor \frac{1}{2} - {\lfloor tn \rfloor} \left( \frac{\mu}{2 \sigma \sqrt{n}}\right) + {\lfloor tn \rfloor} \left( \frac{\mu}{2 \sigma \sqrt{n}}\right) \right] \\ &= \frac{\sigma}{\sqrt{n}} \; 2 \left[ K_{\lfloor tn \rfloor} - \lfloor tn \rfloor \left( \frac{1}{2} - \frac{\mu}{2 \sigma \sqrt{n}} \right) \right] + {\lfloor tn \rfloor} \frac{\mu}{n} \\ &= \frac{\sigma}{\sqrt{n}} \; 2 \left[ K_{\lfloor tn \rfloor} - \mu_K \right] + {\lfloor tn \rfloor} \frac{\mu}{n} \\ &= \frac{\sigma}{\sqrt{n}} \sqrt{4 \frac{\lfloor tn \rfloor}{\lfloor tn \rfloor} \frac{1 - \frac{\mu}{\sigma^2 n}}{1 - \frac{\mu}{\sigma^2 n}}} \left[ K_{\lfloor tn \rfloor} - \mu_K \right] + {\lfloor tn \rfloor} \frac{\mu}{n} \\ &= \frac{\sigma}{\sqrt{n}} \sqrt{\lfloor tn \rfloor \left( 1 - \frac{\mu}{\sigma^2 n} \right)} \left[ \frac{K_{\lfloor tn \rfloor} - \mu_K}{\sqrt{\frac{\lfloor tn \rfloor \left(1 - \frac{\mu}{\sigma^2 n}\right)}{4}}} \right] + {\lfloor tn \rfloor} \frac{\mu}{n} \\ &= \frac{\sigma}{\sqrt{n}} \sqrt{\lfloor tn \rfloor \left( 1 - \frac{\mu}{\sigma^2 n} \right)} \left[ \frac{K_{\lfloor tn \rfloor} - \mu_K}{\sigma_K} \right] + {\lfloor tn \rfloor} \frac{\mu}{n} \end{aligned} \tag{A3.4} X t ( n ) ​ ​ = n ​ σ ​ [ Z 1 ​ + Z 2 ​ + ⋯ + Z ⌊ t n ⌋ ​ ] = n ​ σ ​ [ 2 K ⌊ t n ⌋ ​ − ⌊ t n ⌋ ] = n ​ σ ​ 2 [ K ⌊ t n ⌋ ​ − ⌊ t n ⌋ 2 1 ​ ] = n ​ σ ​ 2 [ K ⌊ t n ⌋ ​ − ⌊ t n ⌋ 2 1 ​ − ⌊ t n ⌋ ( 2 σ n ​ μ ​ ) + ⌊ t n ⌋ ( 2 σ n ​ μ ​ ) ] = n ​ σ ​ 2 [ K ⌊ t n ⌋ ​ − ⌊ t n ⌋ ( 2 1 ​ − 2 σ n ​ μ ​ ) ] + ⌊ t n ⌋ n μ ​ = n ​ σ ​ 2 [ K ⌊ t n ⌋ ​ − μ K ​ ] + ⌊ t n ⌋ n μ ​ = n ​ σ ​ 4 ⌊ t n ⌋ ⌊ t n ⌋ ​ 1 − σ 2 n μ ​ 1 − σ 2 n μ ​ ​ ​ [ K ⌊ t n ⌋ ​ − μ K ​ ] + ⌊ t n ⌋ n μ ​ = n ​ σ ​ ⌊ t n ⌋ ( 1 − σ 2 n μ ​ ) ​ ⎣ ⎢ ⎡ ​ 4 ⌊ t n ⌋ ( 1 − σ 2 n μ ​ ) ​ ​ K ⌊ t n ⌋ ​ − μ K ​ ​ ⎦ ⎥ ⎤ ​ + ⌊ t n ⌋ n μ ​ = n ​ σ ​ ⌊ t n ⌋ ( 1 − σ 2 n μ ​ ) ​ [ σ K ​ K ⌊ t n ⌋ ​ − μ K ​ ​ ] + ⌊ t n ⌋ n μ ​ ​ ( A 3 . 4 ) Finally, it’s clear that the prefactor converges to σ t \sigma \sqrt{t} σ t ​ as n → ∞ n \rightarrow \infty n → ∞ : σ n ⌊ t n ⌋ ( 1 − μ σ 2 n )    →    σ t (A3.5) \frac{\sigma}{\sqrt{n}} \sqrt{\lfloor tn \rfloor \left( 1 - \frac{\mu}{\sigma^2 n} \right)} \;\rightarrow\; \sigma \sqrt{t} \tag{A3.5} n ​ σ ​ ⌊ t n ⌋ ( 1 − σ 2 n μ ​ ) ​ → σ t ​ ( A 3 . 5 ) while the last term converges to μ t \mu t μ t as n → ∞ n \rightarrow \infty n → ∞ : ⌊ t n ⌋ μ n    →    μ t . (A3.6) {\lfloor tn \rfloor} \frac{\mu}{n} \;\rightarrow\; \mu t. \tag{A3.6} ⌊ t n ⌋ n μ ​ → μ t . ( A 3 . 6 ) K ⌊ t n ⌋ − μ K σ K    → d    N ( 0 , 1 ) , (A3.7) \frac{K_{\lfloor tn \rfloor} - \mu_K}{\sigma_K} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(0, 1), \tag{A3.7} σ K ​ K ⌊ t n ⌋ ​ − μ K ​ ​ → d N ( 0 , 1 ) , ( A 3 . 7 ) then again by Slutsky’s theorem, we know X t ( n )    → d    N ( μ t , σ 2 t ) . (A3.8) X_t^{(n)} \;\stackrel{d}{\rightarrow}\; \mathcal{N}(\mu t, \sigma^2 t). \tag{A3.8} X t ( n ) ​ → d N ( μ t , σ 2 t ) . ( A 3 . 8 ) So X t ( n ) X_t^{(n)} X t ( n ) ​ converges to a normal distribution with drift μ t \mu t μ t and volatility σ t \sigma \sqrt{t} σ t ​ .

0 views
A Working Library 1 months ago

Orbital

Six people—four astronauts and two cosmonauts—circle the Earth. They may be among the last to do so, as the space station they live in is due to be dismantled. While they circle and observe, watching sunrise after sunset, seeing typhoons and dust storms wash across the surface below, another crew of astronauts takes off for the moon, passing them by. But their gaze remains stubbornly down, not out; down into the water and land and lights, into their own memories and histories, the deaths and lives that keep them tethered as certainly as gravity prevents them from falling away. A moving love letter to our one and only planet. View this post on the web , subscribe to the newsletter , or reply via email .

0 views
Sean Goedecke 1 months ago

Many anti-AI arguments are conservative arguments

Most anti-AI rhetoric is left-wing coded. Popular criticisms of AI describe it as a tool of techno-fascism , or appeal to predominantly left-wing concerns like carbon emissions , democracy , or police brutality . Anti-AI sentiment is surprisingly bipartisan , but the big anti-AI institutions are labor unions and the progressive wing of the Democrats. This has always seemed weird to me, because the contents of most anti-AI arguments are actually right-wing coded. They’re not necessarily intrinsically right-wing, but they’re the kind of arguments that historically have been made by conservatives, not liberals or leftists. Here are some examples: On top of all that 2 , frontier AI models themselves are quite left-wing. Notwithstanding some real cases of data bias (most infamously Google’s image model miscategorizing dark-skinned humans as “gorillas”), the models reliably espouse left-wing positions . Even Elon Musk’s deliberate attempt to create a right-wing AI in Grok has had mixed success . In 2006, Stephen Colbert coined the phrase “reality has a left-wing bias”. If the left-wing were more sympathetic to AI, I think they would be using this as a pro-left argument 3 . So what happened? A year ago I wrote Is using AI wrong? A review of six popular anti-AI arguments . In that post I blame the hard right-wing turn many big tech CEOs made in 2024. That was around the same time that LLMs was emerging in the public consciousness with ChatGPT, so it made sense that AI got tagged as right-wing: after all, the billionaires on TV and Twitter talking about how AI were going to change the world were all the same people who’d just gone all-in on Donald Trump. I still think this is a pretty good explanation - just unfortunate timing - but there are definitely other factors at play. One obvious factor is the hangover from the pro-crypto mania of 2021 and 2022, where many of the same tech-obsessed folks also posted ugly art and talked about how their technology would change the world forever. Few of these predictions came true (though cryptocurrency has indeed changed the world forever), and it’s understandable that many people viewed AI as a natural continuation of this movement. On top of that, Donald Trump himself has come out strongly pro-AI, both in terms of policy and in terms of actually posting AI art himself. This naturally creates a backlash where anti-Trump people are primed to be even more anti-AI 4 . Here are some more reasons: Let me finally put my cards on the table. I would describe myself as on the left wing, and I’m broadly agnostic about the impact of AI. Like the boring fence-sitter I am, I think it will have a mix of positive and negative effects. In general, I’m unconvinced by the pro-copyright and human-soul-related anti-AI arguments, or by the idea that AI is inherently right-wing, but I’m troubled by the environmental impact and the impact on jobs (which in my view are more classically left-wing positions). Still, I’m curious what will happen when the left-wing flavor of anti-AI rhetoric disappears, which I think it will (as I said at the start, anti-AI sentiment is actually pretty bipartisan ). When people start making explicitly right-wing anti-AI arguments, will that cause the left-wing to move a little bit towards supporting AI? Or will right-wing institutions continue to explicitly support AI, allowing anti-AI sentiment to become a wedge issue that the left-wing can exploit to pry away voters? In any case, I don’t think the current state of affairs is particularly stable. In many ways, the dominant anti-AI arguments would fit better in a conservative worldview than in the worldview of their liberal proponents. I don’t think any did, which is probably for the best - they would have only had a couple of years to break into the industry before hiring collapsed in 2023. Another point that isn’t quite mainstream enough but that I still want to mention: AI critics often argue that cavalier deployment of AI means that people might take dangerous medical advice instead of simply trusting their doctor. But anyone who’s been close to a person with chronic illness knows that “just trust your doctor” is kind of right-wing-coded itself, and that the left-wing position is very sympathetic to patients who don’t or can’t. In a parallel universe, I can imagine the left-wing arguing that patients need AI to avoid the mistakes of their doctors, not the other way around. Is it a good argument? I don’t know, actually. The easy counter is that the LLMs are just mirroring the biases in their training data. But you could argue in response that superintelligence is also latent in the training data, and that hill-climbing towards superintelligence also picks up the associated political positions (which just so happen to be left-wing). I am no fan of Donald Trump, but it doesn’t follow that everything he supports is bad (e.g. the First Step Act ). Many AI critics complain that AI steals copyrighted content , but prior to 2023, leftists have been largely anti-intellectual-property on principle (either because they’re anti- property , or because they characterize copyright as benefiting huge media corporations and patent trolls). A popular anti-AI-art sentiment is that it’s corrosive to the human spirit to consume AI slop: in other words, art just inherently ought to be generated by humans, and using AI thus damages some part of our intangible human soul. Whether you like this argument or not, it’s structurally similar to a whole slate of classic arguments-from-intuition for conservative positions like anti-abortion or anti-homosexuality. Weird new technological art has traditionally been championed by the left-wing and dismissed by the right-wing (as inhuman , cheap , or degenerate ). But when it comes to AI art, it’s the left-wing making these arguments, and others (not necessarily right-wingers) arguing that AI art can also be a medium of human artistic expression. One main worry about AI is that it’s going to take over a lot of jobs. This is a compelling argument! But the left-wing has recently been famously unsympathetic to this same argument around fossil-fuel energy jobs like coal mining , to the point where Biden infamously advised a group of miners in New Hampshire to learn to code 1 . Halting technological progress to preserve jobs is quite literally a “conservative” position. AI has real environmental impact (though this is often wildly overstated, as I say here ), and the right-wing is politically committed to downplaying or denying anthropogenic environmental impacts in general. When times are tough, it’s easy to blame the hot new thing that everyone is talking about. Because the right-wing is currently ascendant in the US, left-wingers are more inclined to talk about how tough times are. The left-wing is over-represented in the kind of “computer jobs” that are under direct threat from AI. Being pro-Europe has always been left-wing coded, and Europe has been noticeably slower and more sceptical about AI than the USA. I don’t think any did, which is probably for the best - they would have only had a couple of years to break into the industry before hiring collapsed in 2023. ↩ Another point that isn’t quite mainstream enough but that I still want to mention: AI critics often argue that cavalier deployment of AI means that people might take dangerous medical advice instead of simply trusting their doctor. But anyone who’s been close to a person with chronic illness knows that “just trust your doctor” is kind of right-wing-coded itself, and that the left-wing position is very sympathetic to patients who don’t or can’t. In a parallel universe, I can imagine the left-wing arguing that patients need AI to avoid the mistakes of their doctors, not the other way around. ↩ Is it a good argument? I don’t know, actually. The easy counter is that the LLMs are just mirroring the biases in their training data. But you could argue in response that superintelligence is also latent in the training data, and that hill-climbing towards superintelligence also picks up the associated political positions (which just so happen to be left-wing). ↩ I am no fan of Donald Trump, but it doesn’t follow that everything he supports is bad (e.g. the First Step Act ). ↩

0 views
The Coder Cafe 1 months ago

How an SSD Works

☕ Welcome to The Coder Cafe! Today, we explore quantum physics. Not the abstract kind, but the kind that runs inside the device you are reading this on. Indeed, every time you save a file to an SSD, electrons exploit quantum physics to cross a physical barrier they classically have no business crossing. I’m not a physicist, but I’ve been in love with quantum physics for years, and over the last few months I've gone deep into these concepts. Get cozy, grab a coffee, and let’s begin! An Introduction to Matter To start, what is matter? Matter is made up of molecules, and molecules are assemblages of atoms , the building blocks of matter. For example, water is an H₂O molecule: 2 hydrogen atoms and 1 oxygen atom. An atom is itself composed of a nucleus and electrons , which carry a negative charge and orbit around it. The nucleus contains two types of particles: Protons , which carry a positive charge, naturally repel each other. And neutrons , which carry no electric charge and act as a kind of “glue,” helping to keep the nucleus stable. The attraction between electrons (−) and protons (+) keeps the whole thing in a stable state . On the other hand, too few or too many neutrons relative to the protons, and the nucleus becomes unstable. It will eventually decay by emitting energy. This is the principle of radioactivity. Carbon-14, for example, is slightly unstable. It decays slowly and predictably. This predictability allows it to be used as a clock to date ancient elements. One might think that when touching a solid object, like a table, for instance, what gives the table its solidity is that it is “filled” with matter, preventing our finger from passing through. Yet, if we took the nucleus and enlarged it into a marble and placed that marble on a football pitch, the electrons would be orbiting at the level of the stands. If the nucleus of an atom were the size of a marble placed at the center of a football pitch, the electrons would only be found orbiting at the level of the distant stands with almost nothing in between. An atom is therefore almost entirely empty . Solid matter is almost nothing, and what gives this impression of solidity are forces between atoms called electromagnetic forces . The universe is made up of 4 and only 4 fundamental forces: Gravity : Attracts everything with mass toward everything else with mass. The strong nuclear force : It glues protons and neutrons together inside the nucleus. The weak nuclear force : Responsible for certain radioactive decays. It is what allows a neutron to transform into a proton (or vice versa). And the electromagnetic force . If we focus on this last one, it is the one that: Attracts opposite charges And repels identical charges. Unlike the two nuclear forces, which only act inside the nucleus, the electromagnetic force has an infinite range. That is why it is the one that governs interactions between atoms at our scale. It is therefore the electromagnetic force that creates the illusion of solidity . When we touch a table, it is the electrons in our hand and those in the table that repel each other. We never truly touch anything. Let’s now talk about light. So, what is light ? It is an electromagnetic wave, a disturbance of the electric and magnetic fields that propagates through space. Light is a spectrum . Indeed, so-called visible light, the light our eyes can perceive, is only a tiny portion of what exists. The full spectrum is called the electromagnetic spectrum: Radio wave → Microwave → Infrared → Visible light → UV → X-rays → Gamma rays When a radio picks up radio waves, it is therefore picking up light, invisible due to its frequency. Indeed, what varies across the electromagnetic spectrum is the frequency of the wave, and therefore its energy. But light hides a surprise: it is also a particle. NOTE : A particle can be summarized as follows: an indivisible packet of energy. We know it is a particle thanks to Einstein in 1905 (for which he received his only Nobel Prize, not for relativity). When a light bulb emits light, it emits specific particles called photons. When we vary the intensity of that light bulb, one might assume it is the energy of the photon that varies, but that is not the case. The energy of each photon is fixed by its frequency. The higher the frequency of a photon, the more energetic each photon is. That is why, for example, UV rays burn the skin. What makes a light bulb emit more light is the increase in electric voltage, which therefore produces more photons. It is the quantity of photons that makes a light bulb shine more or less. In flight, the photon behaves like a wave : it propagates, it oscillates, and it can interfere with other photons. But when it comes into contact with matter, it behaves like a particle: it interacts in one single hit, in one single place. When a photon collides with matter, it can either be: Absorbed : The photon ceases to exist. Its energy is transferred to an atom, which moves to a higher energy level. This is what an eye does: it absorbs the photon and converts it into an electrical signal. Reflected : Technically, this is not a true reflection because it is not the same photon that leaves. The atom absorbs the photon and then re-emits a photon of the same energy in a different direction. NOTE : What determines whether a photon is absorbed or reflected depends on the energy levels of the electrons in the atoms of the surface. If the photon’s frequency matches an available energy level, the atom absorbs it. Otherwise, the photon is re-emitted. That is why glass is transparent, why the retina absorbs light, and why a mirror reflects almost everything. We have seen that light is a wave. But how do we know this? This is where Young’s double-slit experiment comes in, and it is this very experiment that will lay the foundations of quantum physics. Young’s experiment, carried out for the first time in 1801, is as follows: A laser projects photons (light) A wall with two small slits, A and B A screen behind to detect where the photons land If light were a “packet” of something, we would see the following result: If light behaved purely as a particle, firing it through two slits would simply produce two bright bands on the screen, one for each slit ( credits ). Yet, the result of Young’s double-slit experiment is as follows: Instead of two bands, light actually produces multiple alternating stripes on the screen, proof that it behaves as a wave, interfering with itself after passing through both slits simultaneously ( credits ). We obtain what is called an interference pattern . The wave passes through both slits simultaneously, splits into two, and these two waves meet on the other side. Where two light waves meet after passing through a double slit, they either reinforce or cancel each other out, creating an alternating pattern of bright and dark bands on the screen. When two waves meet, they add up or cancel out depending on their respective phase: Two crests meeting → they add up → bright zone A crest meeting a trough → they cancel out → dark zone The result is an alternating pattern of bright and dark bands on the screen: that is an interference pattern . In the 20th century, researchers then had an idea: apply Young’s experiment no longer by projecting photons (light) but electrons (matter). The experiment is therefore similar, but instead of a laser, an electron gun is used to then measure on the screen where the matter lands. Obviously, with this experiment, we are going to get two bands of matter, right? Well, still no! An interference pattern is observed as well . This result was not a complete surprise to everyone. In 1924, physicist Louis de Broglie had already theoretically proposed that matter, like light, could have a wave-like nature. But this time, it's not a wave-like light; it's a probability wave . This is one of the greatest discoveries in quantum physics: at the atomic level, the particle has no defined position . The position of a particle is determined by a function called the wave function , , which describes the probabilities of finding that particle at a given location in time. A smooth sinusoidal curve representing the wave function ψ(x), showing how the probability of finding a particle oscillates across different positions in space. A small clarification on this concept of undefined position to make sure the concept is clear, because this is the moment where our rational brain can start to “let go.” Let’s take a coin for a coin toss. We throw it in the air and hide the result. We are in a state of uncertainty, but this uncertainty is called epistemic . We do not know the result (heads or tails) because we have not looked yet, yet that result already exists. For a particle in the quantum world, the uncertainty is called ontological . It is not that we lack information about the position of the particle; it is that this position simply does not exist yet . This is what is called quantum superposition : an unmeasured particle exists in multiple states simultaneously. However, measurement changes everything. When we measure the position of a particle, we will find it in one of the possible positions described by the wave function. We then say that the wave function “collapses” because it restricts the possibilities into a single real state. Once a particle’s position is measured, its wave function collapses from a spread of possibilities into a single sharp spike, pinpointing the particle at one exact location. As an analogy, it is a bit like Minecraft. A default Minecraft map is 60 million x 60 million blocks. For the initial loading, the server does not generate the entire map. It only generates the world around the observer , i.e., the player. However, when the player moves, they force the server to generate the world's continuation. Where this analogy reaches its limits is that the generation of the Minecraft world, even if it is random, is still deterministic because each world has its own seed. The quantum world, on the other hand, appears to be purely random, meaning without hidden information. Let’s return to Young’s experiment. What would happen if, when a particle passes through a slit, we placed a detector there to observe which slit the particle goes through? We recall that a wave passes through both slits at once. When a detector is added to observe which slit the particle passes through, the outcome on the screen becomes uncertain because the act of measuring the particle’s position disrupts its wave-like behavior. This is the moment where the brain completely lets go: observing the particle changes the result of the experiment . Indeed, observing that particle “forces” it to have a defined position, and it then behaves like a classical marble. The result, therefore, gives us two bands of matter . To summarize what we have seen so far: an unobserved particle exists as a probability wave, in multiple positions simultaneously. As soon as we measure it, this wave collapses, and the particle ends up at a precise location. But then, why do we never see this in everyday life? The answer is decoherence . Quantum superposition is only possible as long as a particle remains isolated from its environment. As soon as it interacts with anything , another atom, a photon, an electric field, that interaction constitutes a measurement in the quantum sense. The wave function collapses, and the particle ends up in a precise state. An isolated electron in a vacuum can remain in superposition. But a macroscopic object like a table is made up of billions upon billions of atoms that permanently interact with the surrounding air, light photons, and electromagnetic fields. These interactions occur billions of times per second. The superposition collapses instantaneously before we can even observe it. That is why quantum physics is only observable at the atomic scale. And that is also why a single electron in a transistor behaves very differently from an object we can hold in our hand. OK, so the original Young’s experiment with light produces an interference pattern because light is a wave. The variation with electrons (or indeed subsequently other elements such as atoms) also produces an interference pattern, which proves that matter is a wave, but this time a probability wave. When we measure the result, we change the result of the experiment because we force the particle to “choose” its position. But incidentally, how does this measurement work in the experiment? It works thanks to photons . Indeed, when the electron passes through one of the slits, we project a photon which will interact with the electron and be re-emitted in a direction that allows us to deduce which slit the electron went through. Researchers wanted to know what would happen if they performed the exact same experiment, measuring which slit the particle went through, but this time, instead of reading the information encoded in the orientation of the photon, they destroyed that information . And here, another surprise: if we destroy the information, we return to an interference pattern . It was as if, since we were not using that information, there was nothing forcing the particle to choose which slit to go through, and so it could remain in the form of a probability wave. This new experiment, therefore, demonstrates something fundamental in quantum physics: technically, it is not the act of measuring that influences the experiment, but whether or not this information exists somewhere in the universe . If this information is destroyed, the interference pattern returns. The key is therefore information . NOTE : How does the destruction of this information work? One might think it would simply be a matter of having the photon absorbed by an absorbing surface before reading it, but this does not work, and we are left with bands. Indeed, by doing so, the information theoretically exists because the absorbing surface could have determined the position of the particle through the orientation of the photon. The destruction works with another incredible principle of quantum physics that I will not detail in this article: entanglement. The photon is sent onto a special crystal, which splits it into two twin photons quantumly linked. One of the twins is then destroyed, making the information unrecoverable because to read the information, one absolutely needs to read both twins. To simplify, the two twins are not copies; they form a single system whose properties are not individually defined. We are slowly getting closer to SSDs. But before that, there is one last quantum concept we need to talk about: the tunnel effect . We said that an unobserved electron does not exist like a marble at a precise location. It exists as a probability wave spread out in space. This wave function gives a probability of finding the electron at each point in space. Now let’s imagine a physical barrier . We send an electron toward this barrier. Classically, if the electron does not have enough energy to pass over it, it is blocked. Full stop. Yet quantum mechanically, the wave function of the electron does not stop abruptly at the barrier. Because it is a wave, it propagates and gradually decays through the barrier. It does not fall to zero. On the other side, there therefore remains a non-zero probability of finding the electron . This is the tunnel effect: a real chance for the electron to end up on the other side , without having had the classical energy needed to cross. This probability is not fixed. It depends directly on the thickness of the barrier: the thinner the barrier, the more the wave function survives on the other side, and the higher the tunneling probability. At our scale, the barriers are far too thick for this effect to be observable. But at the scale of a few nanometers, the probability exists. In an SSD, we want to store data. They work with bits, but it is precisely in the management of these bits that the principles of quantum physics come into play. In an SSD, each bit is encoded in cells called floating gates : small zones isolated on all sides by an insulating layer. This box can contain electrons or not: Box with electrons : Bit = 0 Box without electrons : Bit = 1 In an SSD, each bit is stored in a floating gate cell: a cell filled with electrons represents a 0, while an empty cell represents a 1. If we need: To write , we therefore need to make electrons enter this isolated box. What we do is apply an electric voltage that deforms the wave function of the electrons and increases their probability of ending up on the other side. The electrons, therefore, cross the barrier via the tunnel effect. To erase , we apply a reverse voltage, which also impacts the wave function, and the electrons cross in the other direction. To read , it is a classical, non-quantum measurement: we measure the electric current passing through the transistor. Electrons present: weak current: 0. No electrons: strong current: 1. We saw, however, that the wave function gives a probability , not a certainty. If we apply a voltage to write or erase, we therefore only have a probability that the electron will cross the barrier. How can an SSD be reliable then? An individual electron is unpredictable, but we never send just one electron. We send millions simultaneously. Statistically, enough of them cross the barrier to charge the floating gate reliably. And after each write, the controller immediately re-reads the cell to verify. If not enough electrons have crossed, it tries again. That is why SSDs embed error correction mechanisms, ECC (Error Correcting Code) , precisely because the process is probabilistic by nature. When a cell exceeds a certain error threshold over time, it is finally marked as defective and taken out of service. The data it held is moved to a healthy cell. That is why SSDs always have an over-provisioning capacity: a reserve of cells invisible to the user, planned from the manufacturing stage to replace defective cells over time. And that is also why an SSD does not fail all at once; it degrades progressively , cell by cell, until the reserve is exhausted. And this is where quantum physics imposes its limits. The more transistors shrink, the thinner the insulating barriers become, and the more the tunnel effect becomes uncontrollable, electrons escape spontaneously, errors increase, and cells age faster. Moore’s Law, which predicts a doubling of transistor density every two years, is today running up against these fundamental physical limits. This is not an engineering problem: it is quantum physics that sets the boundary . Matter is made up of atoms, themselves composed of a nucleus (protons and neutrons) and electrons. An atom is almost entirely empty: what we perceive as “solid” is an illusion created by the electromagnetic forces between atoms. Light is both an electromagnetic wave and a particle called a photon. In flight, it behaves like a wave, but it is emitted and absorbed like a particle, in one single hit, in one single place. Young’s double-slit experiment proves that light is a wave : it produces an interference pattern, impossible to obtain with classical particles. Matter behaves in the same way. But unlike light, its wave is not physical: it is a probability wave that describes the possible positions of a particle. This is quantum superposition: an unmeasured particle exists in multiple states simultaneously. It is not the act of measuring that collapses the superposition: it is the existence of the information somewhere in the universe. If the information is destroyed, the superposition is restored. Decoherence explains why we never see superposition at our scale: any macroscopic object permanently interacts with its environment, which instantaneously collapses its wave function. The tunnel effect is a direct consequence of the wave-like nature of particles: the wave function of an electron does not stop abruptly at a physical barrier. There exists a non-zero probability of finding it on the other side, without having had the classical energy to cross. SSDs exploit the tunnel effect to write and erase data: an electric voltage deforms the wave function of electrons and increases their probability of crossing the insulating barrier of a floating gate. Reliability rests on the large number of electrons sent and on ECC. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Instruction Pipelining Simultaneous Multithreading Linux Soft vs. Hard Lockup Something Deeply Hidden We Have No Idea Quantum Country The Double-Slit Experiment - Veritasium ❤️ If you enjoyed this post, please hit the like button. 💬 Did you know quantum physics was hiding in your laptop all along? I’d love to hear your reaction in the comments. Leave a comment An Introduction to Matter To start, what is matter? Matter is made up of molecules, and molecules are assemblages of atoms , the building blocks of matter. For example, water is an H₂O molecule: 2 hydrogen atoms and 1 oxygen atom. An atom is itself composed of a nucleus and electrons , which carry a negative charge and orbit around it. The nucleus contains two types of particles: Protons , which carry a positive charge, naturally repel each other. And neutrons , which carry no electric charge and act as a kind of “glue,” helping to keep the nucleus stable. If the nucleus of an atom were the size of a marble placed at the center of a football pitch, the electrons would only be found orbiting at the level of the distant stands with almost nothing in between. An atom is therefore almost entirely empty . Solid matter is almost nothing, and what gives this impression of solidity are forces between atoms called electromagnetic forces . The Fundamental Forces in the Universe The universe is made up of 4 and only 4 fundamental forces: Gravity : Attracts everything with mass toward everything else with mass. The strong nuclear force : It glues protons and neutrons together inside the nucleus. The weak nuclear force : Responsible for certain radioactive decays. It is what allows a neutron to transform into a proton (or vice versa). And the electromagnetic force . Attracts opposite charges And repels identical charges. Absorbed : The photon ceases to exist. Its energy is transferred to an atom, which moves to a higher energy level. This is what an eye does: it absorbs the photon and converts it into an electrical signal. Reflected : Technically, this is not a true reflection because it is not the same photon that leaves. The atom absorbs the photon and then re-emits a photon of the same energy in a different direction. A laser projects photons (light) A wall with two small slits, A and B A screen behind to detect where the photons land If light behaved purely as a particle, firing it through two slits would simply produce two bright bands on the screen, one for each slit ( credits ). Yet, the result of Young’s double-slit experiment is as follows: Instead of two bands, light actually produces multiple alternating stripes on the screen, proof that it behaves as a wave, interfering with itself after passing through both slits simultaneously ( credits ). We obtain what is called an interference pattern . The wave passes through both slits simultaneously, splits into two, and these two waves meet on the other side. Where two light waves meet after passing through a double slit, they either reinforce or cancel each other out, creating an alternating pattern of bright and dark bands on the screen. When two waves meet, they add up or cancel out depending on their respective phase: Two crests meeting → they add up → bright zone A crest meeting a trough → they cancel out → dark zone A smooth sinusoidal curve representing the wave function ψ(x), showing how the probability of finding a particle oscillates across different positions in space. A small clarification on this concept of undefined position to make sure the concept is clear, because this is the moment where our rational brain can start to “let go.” Let’s take a coin for a coin toss. We throw it in the air and hide the result. We are in a state of uncertainty, but this uncertainty is called epistemic . We do not know the result (heads or tails) because we have not looked yet, yet that result already exists. For a particle in the quantum world, the uncertainty is called ontological . It is not that we lack information about the position of the particle; it is that this position simply does not exist yet . This is what is called quantum superposition : an unmeasured particle exists in multiple states simultaneously. However, measurement changes everything. When we measure the position of a particle, we will find it in one of the possible positions described by the wave function. We then say that the wave function “collapses” because it restricts the possibilities into a single real state. Once a particle’s position is measured, its wave function collapses from a spread of possibilities into a single sharp spike, pinpointing the particle at one exact location. As an analogy, it is a bit like Minecraft. A default Minecraft map is 60 million x 60 million blocks. For the initial loading, the server does not generate the entire map. It only generates the world around the observer , i.e., the player. However, when the player moves, they force the server to generate the world's continuation. Where this analogy reaches its limits is that the generation of the Minecraft world, even if it is random, is still deterministic because each world has its own seed. The quantum world, on the other hand, appears to be purely random, meaning without hidden information. Let’s return to Young’s experiment. What would happen if, when a particle passes through a slit, we placed a detector there to observe which slit the particle goes through? We recall that a wave passes through both slits at once. When a detector is added to observe which slit the particle passes through, the outcome on the screen becomes uncertain because the act of measuring the particle’s position disrupts its wave-like behavior. This is the moment where the brain completely lets go: observing the particle changes the result of the experiment . Indeed, observing that particle “forces” it to have a defined position, and it then behaves like a classical marble. The result, therefore, gives us two bands of matter . To summarize what we have seen so far: an unobserved particle exists as a probability wave, in multiple positions simultaneously. As soon as we measure it, this wave collapses, and the particle ends up at a precise location. But then, why do we never see this in everyday life? The answer is decoherence . Decoherence Quantum superposition is only possible as long as a particle remains isolated from its environment. As soon as it interacts with anything , another atom, a photon, an electric field, that interaction constitutes a measurement in the quantum sense. The wave function collapses, and the particle ends up in a precise state. An isolated electron in a vacuum can remain in superposition. But a macroscopic object like a table is made up of billions upon billions of atoms that permanently interact with the surrounding air, light photons, and electromagnetic fields. These interactions occur billions of times per second. The superposition collapses instantaneously before we can even observe it. That is why quantum physics is only observable at the atomic scale. And that is also why a single electron in a transistor behaves very differently from an object we can hold in our hand. The Key is Information OK, so the original Young’s experiment with light produces an interference pattern because light is a wave. The variation with electrons (or indeed subsequently other elements such as atoms) also produces an interference pattern, which proves that matter is a wave, but this time a probability wave. When we measure the result, we change the result of the experiment because we force the particle to “choose” its position. But incidentally, how does this measurement work in the experiment? It works thanks to photons . Indeed, when the electron passes through one of the slits, we project a photon which will interact with the electron and be re-emitted in a direction that allows us to deduce which slit the electron went through. Researchers wanted to know what would happen if they performed the exact same experiment, measuring which slit the particle went through, but this time, instead of reading the information encoded in the orientation of the photon, they destroyed that information . And here, another surprise: if we destroy the information, we return to an interference pattern . It was as if, since we were not using that information, there was nothing forcing the particle to choose which slit to go through, and so it could remain in the form of a probability wave. This new experiment, therefore, demonstrates something fundamental in quantum physics: technically, it is not the act of measuring that influences the experiment, but whether or not this information exists somewhere in the universe . If this information is destroyed, the interference pattern returns. The key is therefore information . NOTE : How does the destruction of this information work? One might think it would simply be a matter of having the photon absorbed by an absorbing surface before reading it, but this does not work, and we are left with bands. Indeed, by doing so, the information theoretically exists because the absorbing surface could have determined the position of the particle through the orientation of the photon. The destruction works with another incredible principle of quantum physics that I will not detail in this article: entanglement. The photon is sent onto a special crystal, which splits it into two twin photons quantumly linked. One of the twins is then destroyed, making the information unrecoverable because to read the information, one absolutely needs to read both twins. To simplify, the two twins are not copies; they form a single system whose properties are not individually defined. The Tunnel Effect We are slowly getting closer to SSDs. But before that, there is one last quantum concept we need to talk about: the tunnel effect . We said that an unobserved electron does not exist like a marble at a precise location. It exists as a probability wave spread out in space. This wave function gives a probability of finding the electron at each point in space. Now let’s imagine a physical barrier . We send an electron toward this barrier. Classically, if the electron does not have enough energy to pass over it, it is blocked. Full stop. Yet quantum mechanically, the wave function of the electron does not stop abruptly at the barrier. Because it is a wave, it propagates and gradually decays through the barrier. It does not fall to zero. On the other side, there therefore remains a non-zero probability of finding the electron . This is the tunnel effect: a real chance for the electron to end up on the other side , without having had the classical energy needed to cross. This probability is not fixed. It depends directly on the thickness of the barrier: the thinner the barrier, the more the wave function survives on the other side, and the higher the tunneling probability. At our scale, the barriers are far too thick for this effect to be observable. But at the scale of a few nanometers, the probability exists. How SSDs Use Quantum Physics In an SSD, we want to store data. They work with bits, but it is precisely in the management of these bits that the principles of quantum physics come into play. In an SSD, each bit is encoded in cells called floating gates : small zones isolated on all sides by an insulating layer. This box can contain electrons or not: Box with electrons : Bit = 0 Box without electrons : Bit = 1 In an SSD, each bit is stored in a floating gate cell: a cell filled with electrons represents a 0, while an empty cell represents a 1. If we need: To write , we therefore need to make electrons enter this isolated box. What we do is apply an electric voltage that deforms the wave function of the electrons and increases their probability of ending up on the other side. The electrons, therefore, cross the barrier via the tunnel effect. To erase , we apply a reverse voltage, which also impacts the wave function, and the electrons cross in the other direction. To read , it is a classical, non-quantum measurement: we measure the electric current passing through the transistor. Electrons present: weak current: 0. No electrons: strong current: 1. Matter is made up of atoms, themselves composed of a nucleus (protons and neutrons) and electrons. An atom is almost entirely empty: what we perceive as “solid” is an illusion created by the electromagnetic forces between atoms. Light is both an electromagnetic wave and a particle called a photon. In flight, it behaves like a wave, but it is emitted and absorbed like a particle, in one single hit, in one single place. Young’s double-slit experiment proves that light is a wave : it produces an interference pattern, impossible to obtain with classical particles. Matter behaves in the same way. But unlike light, its wave is not physical: it is a probability wave that describes the possible positions of a particle. This is quantum superposition: an unmeasured particle exists in multiple states simultaneously. It is not the act of measuring that collapses the superposition: it is the existence of the information somewhere in the universe. If the information is destroyed, the superposition is restored. Decoherence explains why we never see superposition at our scale: any macroscopic object permanently interacts with its environment, which instantaneously collapses its wave function. The tunnel effect is a direct consequence of the wave-like nature of particles: the wave function of an electron does not stop abruptly at a physical barrier. There exists a non-zero probability of finding it on the other side, without having had the classical energy to cross. SSDs exploit the tunnel effect to write and erase data: an electric voltage deforms the wave function of electrons and increases their probability of crossing the insulating barrier of a floating gate. Reliability rests on the large number of electrons sent and on ECC. Instruction Pipelining Simultaneous Multithreading Linux Soft vs. Hard Lockup Something Deeply Hidden We Have No Idea Quantum Country The Double-Slit Experiment - Veritasium

0 views