Latest Posts (20 found)

Obfuscating My Contact Email

I stumbled across this great post by Spencer Mortensen yesterday, which tested different email obfuscation techniques against real spambots to see which ones actually work. It's a fascinating read, and I'd recommend checking it out if you're into that sort of thing. The short version is that spambots scrape your HTML looking for email addresses. If your address is sitting there in plain text, they'll hoover it up. But if you encode each character as a HTML entity , the browser still renders and uses it correctly, while most bots haven't got a clue what they're looking at. From Spencer's testing, this approach blocks around 95% of harvesters, which is good enough for me. On this site, my contact email shows up in two places: Both pull from the value in Pure Blog's config, so I only needed to make a couple of changes. The reply button lives in , which is obviously a PHP file. So the fix there was straightforward - I ditched the shortcode and used PHP directly to encode the address character by character into HTML entities: Each character becomes something like , which is gibberish to a bot, but perfectly readable to a human using a browser. The shortcode still gets replaced normally by Pure Blog after the PHP runs, so the subject line still works as expected. The contact page is a normal page in Pure Blog, so it's Markdown under the hood. This means I can't drop PHP into it. Instead, I used Pure Blog's hook , which runs after shortcodes have already been processed. By that point, has been replaced with the plain email address, so all I needed to do was swap it for the encoded version: This goes in , and now any page content that passes through Pure Blog's function will have the email automatically encoded. So if I decide to publish my elsewhere, it should automagically work. As well as the obfuscation, I also set up my email address as a proper alias rather than relying on a catch-all to segregate emails . That way, if spam does somehow get through, I can nuke the alias, create a new one, and update it in Pure Blog's settings page. Is this overkill? Probably. But it was a fun little rabbit hole, and now I can feel smug about it. 🙃 Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . The Reply by email button at the bottom of every post. My contact page .

0 views

Writing an LLM from scratch, part 32i -- Interventions: what is in the noise?

Towards the end of last year, I trained a 163M-parameter GPT-2-style model from scratch on my local RTX 3090 , using code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". The result was a pretty decent little model, but it wasn't as good as the original GPT-2-small, despite having more parameters (because it wasn't using weight-tying). Specifically: on a particular test set, my model gave a loss of 3.944 -- quite a lot more than the original GPT-2's 3.500 on the same dataset. I wanted to see whether I could train a model on my own hardware (or on something that didn't cost too much to rent in the cloud) that got closer to the original model's performance. So over the last few months, I've done a bunch of further training runs, each one testing a specific intervention -- a stand-alone change that I expected to change the loss, either for better or for worse. Specifically: At the end of all of that, I had this table showing the effect of each intervention in terms of loss on the test set. They're sorted from least-effective to most-effective, and you can see the baseline in there too: Winners and losers are reasonably clear: So, for an optimal train, we'd just use the effective interventions, right? Well, not quite. Full-fat float32 I decided wasn't worth the effort, as it meant that the train took more than twice as long, and (because it required a larger machine), cost more than three times as much. The others did look like solid changes, but there was one concern. The effect of each intervention is actually pretty small. For example, gradient clipping reduced the loss by 0.014, from 3.692 to 3.678. That's a 0.3% improvement. Even the best intervention, scheduling the learning rate, only improved things by 2%. Could it be that some or all of these improvements were not real, but just a result of the random nature of training deep neural networks? Could the differences just be in the noise? They seemed small enough for that to be possible. I've trained seven more models over the last few days to try to get a feel as to how big an effect noise has for this kind of training run. The results appear to show that variations in the initial weights matter quite a lot, but randomness in the training loop (given the same initial weights) actually has a fairly minimal impact. That surprised me a bit! Let's go through the details. When I did the original baseline training run -- creating the model that was the comparison point for all of the interventions -- I wanted to minimise the amount of random number-induced differences between the training runs in this interventions series. I did this by setting the random seed at the start -- specifically, I had this code: At the time I wrote it, this seemed pretty complete -- the seed is set on Python's own random number generator, on PyTorch's, and on the separate ones it uses for CUDA. However, in a separate project, where I was fine-tuning a Qwen model as a classifier, I'd found that this wasn't enough. In order to get full reproducibility, I'd had to lock things down a bit more, with this additional code: So: was my random number seed code enough for this case? Or would I get a different model if I ran the same code a second time? That was easy enough to do; I spun up a machine, and just ran the "baseline" train again. 3 hours 24 minutes later: Interestingly, that was exactly the same final train loss as the original baseline train. Here's the model . I ran my normal smoke test, asking it to complete "Every effort moves you" ...so that was OK -- the model was generating reasonably coherent text. Then I ran the eval to find its loss on the test set: Exactly the same as the original baseline! That was certainly promising. Now, the use of three decimal places for the output from the loss eval is just a formatting thing, so I bumped it up to 6 dps, and the new model got this: Running that against the original baseline model: Again, exactly the same. Finally, more out of idle interest than anything else, I decided to see if the models were at least different: That is, quite frankly, amazing to me. I was expecting pretty close results, but what we're seeing here is that two separate models, trained on the same data, but on different machines more than a month apart, have weights that are bit-wise identical. No random noise at all. That's actually really reassuring! It makes me much more comfortable that we're standing on a stable foundation here. Now it was time to see what effect changing that random seed would have. Let's think about what the random seed does. When we call , we're initialising Python's pseudo-random number generator so that it will start at a particular point -- after we've called it, it will generate the same sequence of "random" numbers each time it's asked for a new one. So the effect of this code: ...is to initialise three separate pseudo-random number generators to be in a known deterministic state, so they'll all generate the same sequence in every run. So, the first thing to do was to see what happened if we changed that number. I decided to do two training runs, each with exactly the same code as the baseline, but with different random seeds. Firstly, I changed it from 42 to 22 1 : That training run completed: Here's the model . Time for the evals; the smoke test: ...and the loss test: So, that's 3.673453 compared to 3.691526, an improvement of 0.018 over the run with a seed of 42. That's more than the 0.014 improvement we got from gradient clipping (and indeed, the 0.013 from full-fat float32 training), and quite close to the 0.023 improvement from adding attention weight bias. Time for another training run: Another 3h24m later: Here's the model . The smoke test: ...and the test set loss: A further improvement! That's 0.038 better than our original baseline, which beats adding on attention weight bias (though it's worse than the weight decay update). Now, three data points is rather a small number for any kind of statistical analysis, but just out of interest, let's do the basics. GeeksForGeeks has a good refresher here if you're a bit rusty. Firstly, our mean is ...and our variance 2 is: If we take the square root of that, we get the standard deviation (SD): So, if we assume a normal distribution, what would that say about our results? Here's the results table again. If we assume that the results are on a normal distribution: That seemed a bit saddening -- were all of the results apart from scheduling the learning rate within the noise? Well, so as I said, three data points is too small a number to take those results without a fistful of salt. I was thinking of perhaps trying another few random seeds to see what would happen, and perhaps to tighten those numbers up a bit, but then something occurred to me -- randomness was being used in two different ways in the training run, and perhaps we could separate them? Where do we use the random numbers? Well, immediately after we set the seeds, we create our uninitialised model for training: One of the random number generators -- Python's, PyTorch's, or one of the CUDA ones -- will be used to generate the initial weights that we're going to start training. That means that for the same model setup , we'll always start with exactly the same weights. But if the model settings change such that we initialise different things in a different order, then we'll have different weights. After we've done that, we go into the training loop. That can have randomness in it; although the AdamW optimiser itself is deterministic, we are (in all but one of these training runs) using dropout, which drops a random bunch of activations at various points -- 10% of them with our config. And it seems entirely possible that each of the interventions could change the order of execution of different steps in non-obvious ways, which would lead to dropout being applied in different ways in different runs. So, the question was: what kinds of randomness -- in terms of the initial weights, or in terms of the training run -- did each intervention potentially change vs the baseline? Disregarding the full-fat float32 run: Given that, I wanted to get two measures of how sensitive to noise each phase of the training run was: the initialisation of weights at the start, and the training run itself. I decided to start by nailing down exactly what the training run started with. We already had a baseline training run with a specific state of the random number generator at the start; in our "real" baseline, we seeded with 42 at the start, and then initialised our weights. After that, the random number generator would have reached some specific state based on its initial seed and how many numbers had been generated so far. Now, in theory, we could get the RNG into that specific state by seeding it with some number A at that point. We don't know what A is, of course. But it seems vanishingly unlikely that it would be something we'd come up with -- specifically, we can be pretty sure that A ≠ 23 and A ≠ 67 . So, I put the old initial seed of 42 back in, but re-seeded after the model had been initialised: Firstly, with a re-seed value of 23: I let that run.... ...and got this model . Time for the normal evals: Next, I did another training run, the same as the previous one, but with 67 instead of 23 for the re-seed: That one ran: ...producing this model , which eval'ed like this 3 : Let's bring those together: That's a mean of ~3.684462, with a variance of ~0.0000752 and a standard deviation of ~0.008672. Those are tiny compared to the numbers from the two trains we did with the change of the seed prior to the model initialisation. That actually surprised me a bit; we're using dropout in all of these training runs, and it's dropping a random 10% of activations in every forward training pass. With our different training run starting seeds, they should be getting very different dropout patterns. Hand-wavingly, perhaps over the three million or so sequences we're training on, it averages out? Still a little counterintuitive, though. Anyway, let's take a look at the intervention results again, this time highlighting the ones that we believe will be starting with the same weights: Using the "99.7% should be within three SDs" heuristic, we get a range of 3.658446 - 3.710478. Of the intervention runs with (I believe) stable weights, only the no-AMP and the gradient clipping ones are within that range. That made me feel quite positive. If my beliefs are correct about which runs have the same weights, then noise in the training runs seems unlikely to be causing the differences -- that is, perhaps the results from the interventions for those same-weight training runs are real signal and not just noise. What would happen if instead of pinning the seed for generating the weights and varying the starting seed for the training run, we varied the weight seed and pinned the training one? We'd already done a training run with a seed of 42 before generating the weights and a re-seed to 23 after that: So I decided to see what would happen if I varied the pre-weights initialisation seed. Let that train: ...getting this model . Evals: Next, one with 67 as the weights initialisation seed: That trained: ...getting this model , and 4 : OK, so here we have: Compared to the SD we got when we varied just the initial seed, 0.0154919, it's not too far off. Using the 3-SD rule, we get a range of 3.637030 - 3.709400, and looking at the table again, this time with the ones that we don't expect to have the same weights highlighted: ...we can see that the QKV bias is well within that range (as are all of the interventions apart from the two negative-effect ones and scheduling the learning rate). Right, what does all of that tell us? This post obviously isn't even trying to be statistically rigorous. The number of training runs I've done and the amount of data is way too small for that. However, training runs are expensive (Lambda have raised their prices again, so these cost more than US$50 each!), so there's a limit to how much I can do. But even with the limited amount of data, something seems pretty clear: "One of these things is not like the others". Keeping the model weights stable and only allowing variation in randomness across the training run itself meant that almost all of the differences between training runs disappeared. Could this be a result of the small number of samples? I guess conceivably it might, but it seems vanishingly unlikely. So I feel reasonably confident in saying that the bulk of the variation in results that we can chalk up to random noise in these training runs comes from variations in the model weights' initialisation. Additionally, the first training run in this post -- the re-run of the baseline model with no changes -- gave exactly the same numbers as the original baseline run. So we can be confident that all of the models with no changes to the weight initialisation started with the same weights. Of course, I could be wrong about which models really did have the same weights, but given that they were running the same code with the same seed, I'm pretty much sure. That makes me fairly confident that the intervention runs that had the same initial weights gave a real signal about whether or not the intervention in question actually helped. The only exception is gradient clipping, which fell within the three-SD range for the same-weights tests -- and it's essentially free, adding just 100 seconds to a three hour training run. That's a really interesting result! As I said earlier, given that dropout is making us ignore a random 10% of activations during the training run, I would have thought that changing which random 10% were being ignored would have a much larger effect. And that's not even considering other sources of random noise in the training run. I was less surprised that model weight initialisation was important, though. It's pretty obvious that your starting position in the loss landscape is going to affect where you end up at the end of the training run. Still, we now have a reasonable level of trust that our interventions gave a real signal, so I think we have everything in place to see how they stack together, and do a best-effort training run. Can we approach the original GPT-2 small weights' performance on our test set loss? It should be fun to find out :-) Numbers chosen based on a misremembering of this XKCD . For some reason (perhaps because it rhymes) I thought that the old-timey funny number thing was "22 skidoo" rather than "23 skidoo".  ↩ On working through this later: with n samples from a dataset, it is (as I understand it) best to use n − 1 as the denominator here (Bessel's correction) for the "sample variance". If we had every possible value, then it would be correct to use n . However, while this changes a few details in the analysis, I don't think it changes the final conclusion of the post meaningfully (it would just bump up the SDs by 22% or so), so I've left it as-is.  ↩ I found it interesting that this model does the "you and I" hypercorrection that so many people do when trying to write formally! Based on the (correct) correction of "me and you move back home" to "you and I move back home", I think as a result of excessive pattern-matching.  ↩ Another grammatical error based on pattern-matching -- it would make sense that the possessive form of "it" in English was "it's", just like the possessive form of "John" is "John's".  ↩ I trained a baseline model on an 8x A100 40 GiB per GPU machine on Lambda (which was better than my original locally-trained model, I believe due to the larger batch size that the larger machine made possible). I tried adding gradient clipping to see if that would help by limiting the effects of loss spikes. I tried removing dropout , given that these days people tend not to use it (because we're doing single-epoch training runs). I tried adding bias to the attention weight matrices -- something that was popular back in the GPT-2 era, and was used by the original weights, but which my code did not use. Instead of just using the learning rate of 0.0004 that was used in the code from the book, I looked into what values people use these days, and learned how to schedule it over the course of the training run . Similarly, I learned more about weight decay and tried some alternative values. Then I tried making my model more like the original GPT-2 one by introducing weight tying to see if that would help. Finally, I decided to try training in "full-fat" float32 instead of using PyTorch's AMP and TF32 matrix multiplication performance enhancements. Weight tying and the number for weight decay I derived from a paper by Cerebras Research (probably without understanding it properly) were negatives. Full-fat float32, gradient clipping, attention biases, the GPT-2 weight decay parameter, removing dropout, and scheduling (and updating) the learning rate were positives. We would expect ~68.2% of results to be within one SD of the mean -- that is, between 3.6573651 and 3.6883489. Interestingly, our actual baseline result is outside that range! But it does include both the gradient clipping and the QKV bias results. We would additionally expect ~95.4% of the results to be within two SDs, which is 3.6418732 to 3.7038408. That includes our baseline and our weight decay result (though not our experiment removing dropout -- the six-DP loss number for that is 3.641282). Finally, we'd expect ~99.7% of results to be within three SDs, which is a range from 3.6263813 to 3.7193327. That covers all of our positive results apart from scheduling learning rate! Gradient clipping: randomness only affected the training run -- the weights it started with would have been exactly the same as the baseline model's. Removing dropout: although this is a parameter on the model, I don't think it changes the initial weights. But in the training run, it certainly does affect randomness by removing its use of the random number generator. Adding bias to the attention weights. This will change both the initial weights -- because we have those bias weights, things will be initialised differently -- and as a result, the training run, as the random number generator will have been sampled a different number of times prior to the run. Changing and scheduling the learning rate certainly should not change the initial weights, but it might conceivably have a non-obvious effect on training. Likewise weight decay; no effect I can see on the initial weights, but it could well change training dynamics. Weight-tying. When I added it to the code , I tried to do so in such a way that the other weights would be unaffected -- I created exactly the same weights as I would without weight tying, then threw away the output head and replaced it with a reference to the input embedding weights. So I think that in theory, this one won't have changed the other model weights (apart from ignoring the initialised-but-thrown-away output head), but it could well have changed the training run. Our normal baseline: weights initialised with seed 42, and training run starts with a "seed" of our imaginary A value from above: 3.691526 The first run above: weights initialised with seed 42, and training run starts with a seed of 23: 3.681356 The second run above: weights initialised with seed 42, and training run starts with a seed of 67: 3.680505 The first run above: weights initialised with seed 42, and training run starts with a seed of 23: 3.681356 Mean: ~3.673215 Variance: ~0.000145 SD: ~0.012062 Varying the random seed at the start, prior to initialising weights, and not constraining the starting point for the training runs, gave a mean of 3.672857, with an SD of 0.0154919. Keeping the same seed for model weights (so that they all started with the same weights), and varying the seed for the training run, gave a mean of 3.684462, with an SD of 0.008672. Varying the seed for the model weights (so that they all started with different weights), and keeping the training run seed pinned, gave a mean of 3.673215 and an SD of 0.012062. Numbers chosen based on a misremembering of this XKCD . For some reason (perhaps because it rhymes) I thought that the old-timey funny number thing was "22 skidoo" rather than "23 skidoo".  ↩ On working through this later: with n samples from a dataset, it is (as I understand it) best to use n − 1 as the denominator here (Bessel's correction) for the "sample variance". If we had every possible value, then it would be correct to use n . However, while this changes a few details in the analysis, I don't think it changes the final conclusion of the post meaningfully (it would just bump up the SDs by 22% or so), so I've left it as-is.  ↩ I found it interesting that this model does the "you and I" hypercorrection that so many people do when trying to write formally! Based on the (correct) correction of "me and you move back home" to "you and I move back home", I think as a result of excessive pattern-matching.  ↩ Another grammatical error based on pattern-matching -- it would make sense that the possessive form of "it" in English was "it's", just like the possessive form of "John" is "John's".  ↩

0 views

Russia Hacked Routers to Steal Microsoft Office Tokens

Hackers linked to Russia’s military intelligence units are using known flaws in older Internet routers to mass harvest authentication tokens from Microsoft Office users, security experts warned today. The spying campaign allowed state-backed Russian hackers to quietly siphon authentication tokens from users on more than 18,000 networks without deploying any malicious software or code. Microsoft said in a blog post today it identified more than 200 organizations and 5,000 consumer devices that were caught up in a stealthy but remarkably simple spying network built by a Russia-backed threat actor known as “ Forest Blizzard .” How targeted DNS requests were redirected at the router. Image: Black Lotus Labs. Also known as APT28 and Fancy Bear, Forest Blizzard is attributed to the military intelligence units within Russia’s General Staff Main Intelligence Directorate (GRU). APT 28 famously compromised the Hillary Clinton campaign, the Democratic National Committee, and the Democratic Congressional Campaign Committee in 2016 in an attempt to interfere with the U.S. presidential election. Researchers at Black Lotus Labs , a security division of the Internet backbone provider Lumen , found that at the peak of its activity in December 2025, Forest Blizzard’s surveillance dragnet ensnared more than 18,000 Internet routers that were mostly unsupported, end-of-life routers, or else far behind on security updates. A new report from Lumen says the hackers primarily targeted government agencies—including ministries of foreign affairs, law enforcement, and third-party email providers. Black Lotus Security Engineer Ryan English said the GRU hackers did not need to install malware on the targeted routers, which were mainly older Mikrotik and TP-Link devices marketed to the Small Office/Home Office (SOHO) market. Instead, they used known vulnerabilities to modify the Domain Name System (DNS) settings of the routers to include DNS servers controlled by the hackers. As the U.K.’s National Cyber Security Centre (NCSC) notes in a new advisory detailing how Russian cyber actors have been compromising routers, DNS is what allows individuals to reach websites by typing familiar addresses, instead of associated IP addresses. In a DNS hijacking attack, bad actors interfere with this process to covertly send users to malicious websites designed to steal login details or other sensitive information. English said the routers attacked by Forest Blizzard were reconfigured to use DNS servers that pointed to a handful of virtual private servers controlled by the attackers. Importantly, the attackers could then propagate their malicious DNS settings to all users on the local network, and from that point forward intercept any OAuth authentication tokens transmitted by those users. DNS hijacking through router compromise. Image: Microsoft. Because those tokens are typically transmitted only after the user has successfully logged in and gone through multi-factor authentication, the attackers could gain direct access to victim accounts without ever having to phish each user’s credentials and/or one-time codes. “Everyone is looking for some sophisticated malware to drop something on your mobile devices or something,” English said. “These guys didn’t use malware. They did this in an old-school, graybeard way that isn’t really sexy but it gets the job done.” Microsoft refers to the Forest Blizzard activity as using DNS hijacking “to support post-compromise adversary-in-the-middle (AiTM) attacks on Transport Layer Security (TLS) connections against Microsoft Outlook on the web domains.” The software giant said while targeting SOHO devices isn’t a new tactic, this is the first time Microsoft has seen Forest Blizzard using “DNS hijacking at scale to support AiTM of TLS connections after exploiting edge devices.” Black Lotus Labs engineer Danny Adamitis said it will be interesting to see how Forest Blizzard reacts to today’s flurry of attention to their espionage operation, noting that the group immediately switched up its tactics in response to a similar NCSC report (PDF) in August 2025. At the time, Forest Blizzard was using malware to control a far more targeted and smaller group of compromised routers. But Adamitis said the day after the NCSC report, the group quickly ditched the malware approach in favor of mass-altering the DNS settings on thousands of vulnerable routers. “Before the last NCSC report came out they used this capability in very limited instances,” Adamitis told KrebsOnSecurity. “After the report was released they implemented the capability in a more systemic fashion and used it to target everything that was vulnerable.” TP-Link was among the router makers facing a complete ban in the United States. But on March 23, the U.S. Federal Communications Commissio n (FCC) took a much broader approach, announcing it would no longer certify consumer-grade Internet routers that are produced outside of the United States. The FCC warned that foreign-made routers had become an untenable national security threat, and that poorly-secured routers present “a severe cybersecurity risk that could be leveraged to immediately and severely disrupt U.S. critical infrastructure and directly harm U.S. persons.” Experts have countered that few new consumer-grade routers would be available for purchase under this new FCC policy (besides maybe Musk’s Starlink satellite Internet routers, which are produced in Texas). The FCC says router makers can apply for a special “conditional approval” from the Department of War or Department of Homeland Security, and that the new policy does not affect any previously-purchased consumer-grade routers.

0 views

Sex and the Fedi

Over the weekend, Girl on the Net - an esteemed sex blogger who, incidentally, happens to be one of the smartest, strongest, and downright loveliest people that I know - tooted : If you ever get sick of me banging on about my life and think ‘ugh I wish she would stick to the porn’ then please know: hardly anyone ever boosts the … porn. And this made me think. I had an engaging conversation with numerous people about it, and I still don’t have good answers, but I enjoyed the discussion and wanted to keep a note of it. This is that note. I follow and chat with quite a lot of sex positive / sex work-related people in the fediverse, and many have expressed similar sentiments. They create, they share, they get “likes” - and, of course, ample criticism - but very few boosts / shares. It must be incredibly demoralising. (I am in a different position in that I neither know nor care how many views my blogposts get .) It made me ponder why people do not share sex-related content, when sex is clearly part of life for many (but not all) people. My thoughts were: stigma about sex as pleasure. It’s fine to have sex, but not to talk about it. One of Girl on the Net’s regular themes is about communication, and simply asking questions (not just about sex, but also including about sex and one’s preferences and horizons). But I imagine that, for some, talking about sex is uncomfortable, including sharing other people talking about sex. concerns relating to professional expectations and obligations. I fall into this category. I am sex positive, but I do not know where the Solicitors Regulation Authority would draw the line, and I don’t wish to be even close to where that line might be. So I play it safe, even though there is stuff that I would like to post or share. But, oh well, self-censorship ftw. Sometimes, I would love not to be “me” online . being embarrassed about what others here might think. Similar, but different, to the points above. This is about other fedizens, who might be co-workers, employers, family members, or whatever. sex as being in the sphere of one’s private life. older people, perhaps especially men, being self-aware of engaging with younger adults posting sex-related stuff, and coming across as creepy. I completely get this, and I am somewhat paranoid about it myself. Several people responded to say that, yes, they felt like this. They might want to engage with public content (and I’m not talking about responding lasciviously, or sending dick pics), but do not want to be perceived as being inappropriate. I received some thought-provoking feedback too: women and non-binary people said that they felt unsafe boosting or posting sex-related content, because of reactions from men hitting on them. That, by posting about sex, some men took it as an unwelcome opportunity to solicit sex with them. some people not wanting to boost as they feel that they don’t have enough followers to make it worthwhile. And, in terms of increasing the distribution of a toot, yes, that makes sense. It probably still sends a nice endorphin boost to the poster though, that someone likes their work enough to want to boost it :) Where someone has a popular “main” account, and a less popular “alt” account, but would only be willing/able to post sex-related stuff via that alt, this perhaps comes into play. just not liking the stuff enough to boost it. Fair enough! concerns over whether their server rules allow boosting of this kind of content, and not wanting to get blocked / banned. I can understand each of these, and why they might lead to a “like” rather than a “boost”. None of them inhibit paying or tipping someone, as a thank you for their work though, which is another way of being supportive. But this also comes against a backdrop of increasing difficulties for sex workers and other people post sex-related stuff. Payment processors denying income streams. Platform operators enforcing their ever more restrictive morality rules, making working harder, and requiring more admin just to keep going. If people take, take, take, without giving back in some meaningful way, then that is challenging even for those who create and share for fun (for appreciation, perhaps, rather than tooting into the void), let alone those for whom this is their livelihood. I wish that I had better answers than I do. stigma about sex as pleasure. It’s fine to have sex, but not to talk about it. One of Girl on the Net’s regular themes is about communication, and simply asking questions (not just about sex, but also including about sex and one’s preferences and horizons). But I imagine that, for some, talking about sex is uncomfortable, including sharing other people talking about sex. concerns relating to professional expectations and obligations. I fall into this category. I am sex positive, but I do not know where the Solicitors Regulation Authority would draw the line, and I don’t wish to be even close to where that line might be. So I play it safe, even though there is stuff that I would like to post or share. But, oh well, self-censorship ftw. Sometimes, I would love not to be “me” online . being embarrassed about what others here might think. Similar, but different, to the points above. This is about other fedizens, who might be co-workers, employers, family members, or whatever. sex as being in the sphere of one’s private life. older people, perhaps especially men, being self-aware of engaging with younger adults posting sex-related stuff, and coming across as creepy. I completely get this, and I am somewhat paranoid about it myself. Several people responded to say that, yes, they felt like this. They might want to engage with public content (and I’m not talking about responding lasciviously, or sending dick pics), but do not want to be perceived as being inappropriate. women and non-binary people said that they felt unsafe boosting or posting sex-related content, because of reactions from men hitting on them. That, by posting about sex, some men took it as an unwelcome opportunity to solicit sex with them. some people not wanting to boost as they feel that they don’t have enough followers to make it worthwhile. And, in terms of increasing the distribution of a toot, yes, that makes sense. It probably still sends a nice endorphin boost to the poster though, that someone likes their work enough to want to boost it :) Where someone has a popular “main” account, and a less popular “alt” account, but would only be willing/able to post sex-related stuff via that alt, this perhaps comes into play. just not liking the stuff enough to boost it. Fair enough! concerns over whether their server rules allow boosting of this kind of content, and not wanting to get blocked / banned.

0 views

Why Have a Dedicated Music Device?

In the last year or so I've read about many people moving from streaming services, like Apple Music and Spotify, to their own music library. To support these local libraries, many seem to be getting themselves a music player, such as the Fiio Echo Mini . While moving to a local library is something that I've thought about many times 1 , I don't understand why people are buying these little music players. The big selling points generally seem to be: With the exception of the 3rd point, pretty much every smartphone on the market will do all of this. And let's be honest, #3 doesn't really matter as most people use Bluetooth buds these days. Yes, I know some people still use old school wired earphones. I don't need an email from you. So if the device that's already in your pocket will do everything these little music players will already do, why get an extra device to lug around everywhere? I want to stress, these look really cool, and if that's why you want one, that's totally fine. But anecdotally, that's not what I'm seeing. Can someone enlighten me? I see the advantages of owning your own music library, but I don't get why people want to carry another device everywhere. I've decided to stick with streaming, but that's a post for another day.  ↩ Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Bluetooth connectivity so you can use with buds, or in your car. Plenty of local storage. Audio jack. Easy to drag and drop music. I've decided to stick with streaming, but that's a post for another day.  ↩

0 views

Anthropic’s New TPU Deal, Anthropic’s Computing Crunch, The Anthropic-Google Alliance

Anthropic needs compute, and Google has the most: it's a natural partnership, particularly for Google.

0 views

Rules and Gates

One of the first things I stumbled into as I was building what became Superpowers was the idea of a "gate" in a prompt. It's a term that Claude Code introduced me to, so clearly was part of the literature already. But the basic idea is that a gate is something that must be completed before moving on to the next step. Gates work really, really well. But I never thought much about why they work. Today, I finally asked one of my agents how they thought about the difference between rules and gates. The difference: a rule has an opt-out path (I can rationalize "I'll do it after this one thing"). A gate doesn't — the next action is blocked until the gate condition is met. Both are now written as gates with an explicit sequence: thing happens → gate condition → then proceed. Here's a concrete example from an agent today: Before (rule): "Verify claims with web research before asserting them." That's a rule. It lives in my head. When I was answering your question about whether audience assessment books exist, the rule didn't fire — I synthesized from training data, it felt confident, I stated it as a finding. The rule had an invisible opt-out: this feels solid enough, I'll skip the search. After (gate): "When a claim about what exists or doesn't exist is forming → web search happens → URLs in hand → then I speak." The gate makes the sequence explicit and gives me something to check: do I have URLs? If no, I haven't cleared the gate. "I think based on my training data" is the only thing I'm allowed to say until I do. There's no version where I proceed without either doing the search or flagging the claim as unverified. The practical test of whether something is a gate rather than a rule: when I'm about to skip it, does the gate formulation give me a concrete question I can't answer? "Do I have URLs?" is concrete. "Did I verify this?" is too easy to answer yes to without having actually done anything. Gates are conditions that an agent can evaluate objectively to figure out if they're allowed to finish a task and move on to the next one. They're an incredibly powerful framing for prompting that can dramatically reduce all kinds of agentic misbehavior.

0 views
Jim Nielsen Yesterday

Prototyping with LLMs

Did you know that Jesus gave advice about prototyping with an LLM? Here’s Luke 14:28-30: Suppose one of you wants to build a tower. Won’t you first sit down and estimate the cost to see if you have enough money to complete it? For if you lay the foundation and are not able to finish it, everyone who sees it will ridicule you, saying, ‘This person began to build and wasn’t able to finish.’ That pretty much sums me up when I try to vibe a prototype . Don’t get me wrong, I’m a big advocate of prototyping . And LLMs make prototyping really easy and interesting. And because it’s so easy, there’s a huge temptation to jump straight to prototyping. But what I’ve been finding in my own behavior is that I’ll be mid-prototyping with the LLM and asking myself, “What am I even trying to do here?” And the thought I have is: “I’d be in a much more productive place right now if I’d put a tiny bit more thought upfront into what I am actually trying to build.” Instead, I just jumped right in, chasing a fuzzy feeling or idea only to end up in a place where I’m more confused about what I set out to do than when I started. Don’t get me wrong, that’s fine. That’s part of prototyping. It’s inherent to the design process to get more confused before you find clarity. But there’s an alternative to LLM prototyping that’s often faster and cheaper: sketching. I’ve found many times that if I start an idea by sketching it out, do you know where I end up? At a place where I say, “Actually, I don’t want to build this.” And in that case, all I have to do is take my sketch and throw it away. It didn’t cost me any tokens or compute to figure that out. Talk about efficiency! I suppose what I’m saying here is: it’s good to think further ahead than the tracks you’re laying out immediately in front of you. Sketching is a great way to do that. (Thanks to Facundo for prompting these thoughts out of me.) Reply via: Email · Mastodon · Bluesky

0 views
Kev Quirk Yesterday

I Hate Insurance!

So yesterday I received an email from Admiral , our insurance provider, where we have a combined policy for both our cars and our home. Last year this cost £1,426.00 , but this year the renewal had gone up by a huge 33%, to £1,897.93 broken down as follows: Even at last year's price this was a shit tonne of money, so I started shopping around and here's what I ended up with: These policies have at least the same cover as Admiral. In some cases, better. I knew it would be cheaper shopping around, but I didn't think it would be nearly half. So, I called Admiral to see what they could do for me, considering I've been a loyal customer for 7 years. They knocked £167,83 (8.8%) off the policy for me, bringing the revised total to £1,730.10. Nice to see that long-term customers are rewarded with the best price! 🤷🏻‍♂️ So I obviously went with the much cheaper option and renewed with 3 different companies. It's a pain, as I'll now need to renew 3 policies at the same time every year, but if it means saving this much money, I'm happy to do it. Next year I'll get a multi-quote from Admiral to see if they're competitive. Something tells me they will be, as with most things these days, getting new customers is more important than retaining existing ones. Unfortunately having car and home insurance is a necessary evil in today's world, but I'm glad I was able to make it a little more palatable by saving myself over £700! If your insurance is up for renewal, don't just blindly renew - shop around as there's some serious savings to be had. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Wife's car - £339.34 My car - £455.68 Our home (building & contents) - £1,102.91 Wife's car - £300.17 My car - £402.22 Our home (building and contents) - £533.52 Total: £1056.86 (44% reduction!)

1 views

A Cryptography Engineer’s Perspective on Quantum Computing Timelines

My position on the urgency of rolling out quantum-resistant cryptography has changed compared to just a few months ago. You might have heard this privately from me in the past weeks, but it’s time to signal and justify this change of mind publicly. There had been rumors for a while of expected and unexpected progress towards cryptographically-relevant quantum computers, but over the last week we got two public instances of it. First, Google published a paper revising down dramatically the estimated number of logical qubits and gates required to break 256-bit elliptic curves like NIST P-256 and secp256k1, which makes the attack doable in minutes on fast-clock architectures like superconducting qubits. They weirdly 1 frame it around cryptocurrencies and mempools and salvaged goods or something, but the far more important implication are practical WebPKI MitM attacks. Shortly after, a different paper came out from Oratomic showing 256-bit elliptic curves can be broken in as few as 10,000 physical qubits if you have non-local connectivity , like neutral atoms seem to offer, thanks to better error correction. This attack would be slower, but even a single broken key per month can be catastrophic. They have this excellent graph on page 2 ( Babbush et al. is the Google paper, which they presumably had preview access to): Overall, it looks like everything is moving: the hardware is getting better, the algorithms are getting cheaper, the requirements for error correction are getting lower. I’ll be honest, I don’t actually know what all the physics in those papers means. That’s not my job and not my expertise. My job includes risk assessment on behalf of the users that entrusted me with their safety. What I know is what at least some actual experts are telling us. Heather Adkins and Sophie Schmieg are telling us that “quantum frontiers may be closer than they appear” and that 2029 is their deadline. That’s in 33 months, and no one had set such an aggressive timeline until this month. Scott Aaronson tells us that the “clearest warning that [he] can offer in public right now about the urgency of migrating to post-quantum cryptosystems” is a vague parallel with how nuclear fission research stopped happening in public between 1939 and 1940. The timelines presented at RWPQC 2026, just a few weeks ago, were much tighter than a couple years ago, and are already partially obsolete. The joke used to be that quantum computers have been 10 years out for 30 years now. Well, not true anymore, the timelines have started progressing. If you are thinking “well, this could be bad, or it could be nothing!” I need you to recognize how immediately dispositive that is. The bet is not “are you 100% sure a CRQC will exist in 2030?”, the bet is “are you 100% sure a CRQC will NOT exist in 2030?” I simply don’t see how a non-expert can look at what the experts are saying, and decide “I know better, there is in fact < 1% chance.” Remember that you are betting with your users’ lives. 2 Put another way, even if the most likely outcome was no CRQC in our lifetimes, that would be completely irrelevant, because our users don’t want just better-than-even odds 3 of being secure. Sure, papers about an abacus and a dog are funny and can make you look smart and contrarian on forums. But that’s not the job, and those arguments betray a lack of expertise . As Scott Aaronson said : Once you understand quantum fault-tolerance, asking “so when are you going to factor 35 with Shor’s algorithm?” becomes sort of like asking the Manhattan Project physicists in 1943, “so when are you going to produce at least a small nuclear explosion?” The job is not to be skeptical of things we’re not experts in, the job is to mitigate credible threats, and there are credible experts that are telling us about an imminent threat. In summary, it might be that in 10 years the predictions will turn out to be wrong, but at this point they might also be right soon, and that risk is now unacceptable. Concretely, what does this mean? It means we need to ship. Regrettably, we’ve got to roll out what we have. 4 That means large ML-DSA signatures shoved in places designed for small ECDSA signatures, like X.509, with the exception of Merkle Tree Certificates for the WebPKI, which is thankfully far enough along . This is not the article I wanted to write. I’ve had a pending draft for months now explaining we should ship PQ key exchange now, but take the time we still have to adapt protocols to larger signatures, because they were all designed with the assumption that signatures are cheap. That other article is now wrong, alas: we don’t have the time if we need to be finished by 2029 instead of 2035. For key exchange, the migration to ML-KEM is going well enough but: Any non-PQ key exchange should now be considered a potential active compromise, worthy of warning the user like OpenSSH does , because it’s very hard to make sure all secrets transmitted over the connection or encrypted in the file have a shorter shelf life than three years. We need to forget about non-interactive key exchanges (NIKEs) for a while; we only have KEMs (which are only unidirectionally authenticated without interactivity) in the PQ toolkit. It makes no more sense to deploy new schemes that are not post-quantum . I know, pairings were nice. I know, everything PQ is annoyingly large. I know, we had basically just figured out how to do ECDSA over P-256 safely. I know, there might not be practical PQ equivalents for threshold signatures or identity-based encryption. Trust me, I know it stings. But it is what it is. Hybrid classic + post-quantum authentication makes no sense to me anymore and will only slow us down; we should go straight to pure ML-DSA-44. 6 Hybrid key exchange is reasonably easy, with ephemeral keys that don’t even need a type or wire format for the composite private key, and a couple years ago it made sense to take the hedge. Authentication is not like that, and even with draft-ietf-lamps-pq-composite-sigs-15 with its 18 composite key types nearing publication, we’d waste precious time collectively figuring out how to treat these composite keys and how to expose them to users. It’s also been two years since Kyber hybrids and we’ve gained significant confidence in the Module-Lattice schemes. Hybrid signatures cost time and complexity budget, 5 and the only benefit is protection if ML-DSA is classically broken before the CRQCs come , which looks like the wrong tradeoff at this point. In symmetric encryption , we don’t need to do anything, thankfully. There is a common misconception that protection from Grover requires 256-bit keys, but that is based on an exceedingly simplified understanding of the algorithm . A more accurate characterization is that with a circuit depth of 2⁶⁴ logical gates (the approximate number of gates that current classical computing architectures can perform serially in a decade) running Grover on a 128-bit key space would require a circuit size of 2¹⁰⁶. There’s been no progress on this that I am aware of, and indeed there are old proofs that Grover is optimal and its quantum speedup doesn’t parallelize . Unnecessary 256-bit key requirements are harmful when bundled with the actually urgent PQ requirements, because they muddle the interoperability targets and they risk slowing down the rollout of asymmetric PQ cryptography. In my corner of the world, we’ll have to start thinking about what it means for half the cryptography packages in the Go standard library to be suddenly insecure, and how to balance the risk of downgrade attacks and backwards compatibility. It’s the first time in our careers we’ve faced anything like this: SHA-1 to SHA-256 was not nearly this disruptive, 7 and even that took forever with the occasional unexpected downgrade attack. Trusted Execution Environments (TEEs) like Intel SGX and AMD SEV-SNP and in general hardware attestation are just f***d. All their keys and roots are not PQ and I heard of no progress in rolling out PQ ones, which at hardware speeds means we are forced to accept they might not make it, and can’t be relied upon. I had to reassess a whole project because of this, and I will probably downgrade them to barely “defense in depth” in my toolkit. Ecosystems with cryptographic identities (like atproto and, yes, cryptocurrencies) need to start migrating very soon, because if the CRQCs come before they are done , they will have to make extremely hard decisions, picking between letting users be compromised and bricking them. File encryption is especially vulnerable to store-now-decrypt-later attacks, so we’ll probably have to start warning and then erroring out on non-PQ age recipient types soon. It’s unfortunately only been a few months since we even added PQ recipients, in version 1.3.0 . 8 Finally, this week I started teaching a PhD course in cryptography at the University of Bologna, and I’m going to mention RSA, ECDSA, and ECDH only as legacy algorithms, because that’s how those students will encounter them in their careers. I know, it feels weird. But it is what it is. For more willing-or-not PQ migration, follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected] . Traveling back from an excellent AtmosphereConf 2026 , I saw my first aurora, from the north-facing window of a Boeing 747. My work is made possible by Geomys , an organization of professional Go maintainers, which is funded by Ava Labs , Teleport , Tailscale , and Sentry . Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement .) Here are a few words from some of them! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Ava Labs — We at Ava Labs , maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network ), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. The whole paper is a bit goofy: it has a zero-knowledge proof for a quantum circuit that will certainly be rederived and improved upon before the actual hardware to run it on will exist. They seem to believe this is about responsible disclosure, so I assume this is just physicists not being experts in our field in the same way we are not experts in theirs.  ↩ “You” is doing a lot of work in this sentence, but the audience for this post is a bit unusual for me: I’m addressing my colleagues and the decision-makers that gate action on deployment of post-quantum cryptography.  ↩ I had a reviewer object to an attacker probability of success of 1/536,870,912 (0.0000002%, 2⁻²⁹) after 2⁶⁴ work, correctly so, because in cryptography we usually target 2⁻³².  ↩ Why trust the new stuff, though? There are two parts to it: the math and the implementation. The math is also not my job, so I again defer to experts like Sophie Schmieg, who tells us that she is very confident in lattices , and the NSA, who approved ML-KEM and ML-DSA at the Top Secret level for all national security purposes. It is also older than elliptic curve cryptography was when it first got deployed. (“Doesn’t the NSA lie to break our encryption?” No, the NSA has never intentionally jeopardized US national security with a non- NOBUS backdoor, and there is no way for ML-KEM and ML-DSA to hide a NOBUS backdoor .) On the implementation side, I am actually very qualified to have an opinion, having made cryptography implementation and testing my niche. ML-KEM and ML-DSA are a lot easier to implement securely than their classical alternatives, and with the better testing infrastructure we have now I expect to see exceedingly few bugs in their implementations.  ↩ One small exception in that if you already have the ability to convey multiple signatures from multiple public keys in your protocol, it can make sense to to “poor man’s hybrid signatures” by just requiring 2-of-2 signatures from one classical public key and one pure PQ key. Some of the tlog ecosystem might pick this route, but that’s only because the cost is significantly lowered by the existing support for nested n-of-m signing groups.  ↩ Why ML-DSA-44 when we usually use ML-KEM-768 instead of ML-KEM-512? Because ML-KEM-512 is Level 1, while ML-DSA-44 is Level 2, so it already has a bit of margin against minor cryptanalytic improvements.  ↩ Because SHA-256 is a better plug-in replacement for SHA-1, because SHA-1 was a much smaller surface than all of RSA and ECC, and because SHA-1 was not that broken: it still retained preimage resistance and could still be used in HMAC and HKDF.  ↩ The delay was in large part due to my unfortunate decision of blocking on the availability of HPKE hybrid recipients, which blocked on the CFRG, which took almost two years to select a stable label string for X-Wing (January 2024) with ML-KEM (August 2024), despite making precisely no changes to the designs. The IETF should have an internal post-mortem on this, but I doubt we’ll see one.  ↩ Any non-PQ key exchange should now be considered a potential active compromise, worthy of warning the user like OpenSSH does , because it’s very hard to make sure all secrets transmitted over the connection or encrypted in the file have a shorter shelf life than three years. We need to forget about non-interactive key exchanges (NIKEs) for a while; we only have KEMs (which are only unidirectionally authenticated without interactivity) in the PQ toolkit. The whole paper is a bit goofy: it has a zero-knowledge proof for a quantum circuit that will certainly be rederived and improved upon before the actual hardware to run it on will exist. They seem to believe this is about responsible disclosure, so I assume this is just physicists not being experts in our field in the same way we are not experts in theirs.  ↩ “You” is doing a lot of work in this sentence, but the audience for this post is a bit unusual for me: I’m addressing my colleagues and the decision-makers that gate action on deployment of post-quantum cryptography.  ↩ I had a reviewer object to an attacker probability of success of 1/536,870,912 (0.0000002%, 2⁻²⁹) after 2⁶⁴ work, correctly so, because in cryptography we usually target 2⁻³².  ↩ Why trust the new stuff, though? There are two parts to it: the math and the implementation. The math is also not my job, so I again defer to experts like Sophie Schmieg, who tells us that she is very confident in lattices , and the NSA, who approved ML-KEM and ML-DSA at the Top Secret level for all national security purposes. It is also older than elliptic curve cryptography was when it first got deployed. (“Doesn’t the NSA lie to break our encryption?” No, the NSA has never intentionally jeopardized US national security with a non- NOBUS backdoor, and there is no way for ML-KEM and ML-DSA to hide a NOBUS backdoor .) On the implementation side, I am actually very qualified to have an opinion, having made cryptography implementation and testing my niche. ML-KEM and ML-DSA are a lot easier to implement securely than their classical alternatives, and with the better testing infrastructure we have now I expect to see exceedingly few bugs in their implementations.  ↩ One small exception in that if you already have the ability to convey multiple signatures from multiple public keys in your protocol, it can make sense to to “poor man’s hybrid signatures” by just requiring 2-of-2 signatures from one classical public key and one pure PQ key. Some of the tlog ecosystem might pick this route, but that’s only because the cost is significantly lowered by the existing support for nested n-of-m signing groups.  ↩ Why ML-DSA-44 when we usually use ML-KEM-768 instead of ML-KEM-512? Because ML-KEM-512 is Level 1, while ML-DSA-44 is Level 2, so it already has a bit of margin against minor cryptanalytic improvements.  ↩ Because SHA-256 is a better plug-in replacement for SHA-1, because SHA-1 was a much smaller surface than all of RSA and ECC, and because SHA-1 was not that broken: it still retained preimage resistance and could still be used in HMAC and HKDF.  ↩ The delay was in large part due to my unfortunate decision of blocking on the availability of HPKE hybrid recipients, which blocked on the CFRG, which took almost two years to select a stable label string for X-Wing (January 2024) with ML-KEM (August 2024), despite making precisely no changes to the designs. The IETF should have an internal post-mortem on this, but I doubt we’ll see one.  ↩

0 views

News: OpenAI CFO Doesn't Believe Company Ready For IPO, Unsure Revenue Will Support Commitments

News out of The Information's Anissa Gardizy and Amir Efrati over the weekend - OpenAI CFO Sarah Friar has apparently clashed with CEO Sam Altman over timing around OpenAI's IPO, emphasis mine: I cannot express how strange this is. Generally a CFO and CEO are in lock-step over IPO timing, or at the very least the CFO has an iron grip on the actual timing because, well, CEOs love to go public and the CFO generally exists to curb their instincts. Nevertheless, Clammy Sam Altman has clearly sidelined Friar, and as of August last year, the CFO of OpenAI doesn't report to the CEO . In fact, the person Friar reports to ( Fiji Simo ) just took a medical leave of absence: It is extremely peculiar to not have the Chief Financial Officer report to the Chief Executive Officer , but remember folks, this is OpenAI, the world's least-normal company! Anyway, all of this seemed really weird, so I asked investor, writer and economist Paul Kedrosky for his thoughts: Very cool! Paul is also a guest on this week's episode of my podcast Better Offline , by the way. Out at 12AM ET Tuesday. Anyway, The Information's piece also adds another fun detail - that OpenAI's margins were even worse than expected in 2025: Riddle me this, Batman! If your AI company always has to buy extra compute to meet demand, and said extra compute always makes margins worse, doesn't that mean that your company will either always be unprofitable or die because it buys too much compute? Say, that reminds me of something Anthropic CEO Dario Amodei said to Dwarkesh Patel earlier in the year ... It is extremely strange that the CFO of a company doesn't report to the CEO of a company, and even more strange that the CFO is directly saying "we are not ready for IPO" as its CEO jams his foot on the accelerator. It's clear that both OpenAI and Anthropic are rushing toward a public offering so that their CEOs can cash out, and that their underlying economics are equal parts problematic and worrying. Though I am entirely guessing here, I imagine Friar sees something within OpenAi's finances that give her pause. An S-1 - one of the filings a company makes before going public - is an audited document, and I imagine the whimsical mathematics that OpenAI engages in - such as, per The Wall Street Journal , calculating profitability without training compute - might not match up with what actual financiers crave. If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of  NVIDIA ,  Anthropic and OpenAI’s finances , and  the AI bubble writ large . I just put out  a massive Hater’s Guide To The SaaSpocalypse , as well as last week’s deep dive into How AI Isn't Too Big To Fail . Supporting my premium supports my free newsletter. OpenAI CFO Sarah Friar has, per The Information, said that OpenAI is not ready to go public in 2026, in part because of the "risks from its spending commitments" and not being sure whether the company's revenue growth would support its spending commitments. Friar (CFO) no longer reports to Sam Altman (CEO) and hasn't done so since August 2025. OpenAI's margins were lower in 2025 "...due to the company having to buy more expensive compute at the last minute."

0 views
iDiallo Yesterday

AI Did It in 12 Minutes. It Took Me 10 Hours to Fix It

I've been working on personal projects since the 2000s. One thing I've always been adamant about is understanding the code I write. Even when Stack Overflow came along, I was that annoying guy who told people not to copy and paste code into their repos. Instead, they should read it and adapt it to their specific case. On personal projects, I've applied this to a fault. Projects never get done because I'm reading and editing code to make it work exactly as I want. I am by no means trying to convince you that my code is high quality. Every day, I regret the design choices I made for this very blog. But at the very least, I like to understand the code that powers my projects. So you can imagine how I struggle with the reviewing part when AI writes a large chunk of our daily work. Large language models are just so verbose, and often produce large blocks of code that don't even get used. I don't want to attribute it to malice (wasting your tokens) when I know this is an emergent technology we are all still adapting to. But it doesn't help that there is just so much code to review. What I tell myself when I review an AI-generated PR is: if I don't have a mental model of how the application works, how can I be of any use when it fails? This weekend, I decided to tackle a project I've been postponing since I created this blog over a decade ago. I needed a nice interface to upload assets, such as images, to go with each blog post. According to my git history, I started work on it 11 years ago but never finished. Yet I've still been uploading images regularly to go with blog posts. How? Using and . Like a caveman. Whenever I finish writing a blog post, I create images in Photoshop or Photopea, fire up FileZilla to connect to my file server, and drag and drop my assets. Then, of course, I have to log back into the blog's admin panel and provide the correct URLs for those assets. It's as primitive as it gets. This was a perfect project to tackle with AI. The good news is that I was able to complete it in a single day. The bad news. Well, I read through close to 5,000 lines of code. This is the first time I've used an LLM to write PHP code. I used the GLM-5 model from z.ai, which I've often played with for generating graphs from data. I'll have to try it on a few more PHP projects before I commit to this opinion, but it consistently wrote spaghetti code . Spaghetti code, to me, is anything where logic is embedded directly in a template. HTML, CSS, and database queries all crammed into the same file. That's really hard to maintain. I did realize, though, that my instructions were in direct contradiction with the agent's system prompt. It was explicit in the thinking process: This is NOT a Next.js project - the user explicitly wants a PHP app. So I should NOT use the Next.js project that's already initialized. I should create a PHP application. Wait, but the instructions say "You are an interactive CLI tool that helps users with software engineering tasks" and "Now you are developing a comprehensive and feature-rich Next.js project from scratch." But the user is explicitly asking for a PHP app. I should follow the user's request. It did generate a bunch of Node.js code, which I was able to remove manually. Luckily, it kept the PHP project in its own folder. If you're wondering how 12 files contain ~5,000 lines of code, I wondered the same. But that's what spaghetti code does. I set it up locally, ran and , and a few more files and folders were generated. When I finally ran the application, it didn't work. I spent a few hours working through permissions, updating the install script, and modifying the SQLite setup. I thought StackOverflow was dead, but I don't think I would have gotten SQLite working without it. One error, for example, was that SQLite kept throwing a warning that it was running in read-only mode. Apparently, you have to make the parent folder writable (not just the database file) to enable write mode. It had been a long time since I'd manually d files in PHP. I normally use namespaces and autoload. Since this project was generated from scratch, I had to hunt down various statements that all had incorrect paths. Once I sorted those out, I had to deal with authentication. PHP sessions come with batteries included, you call and you can read and write session variables via the global. But I couldn't figure out why it kept failing. When I created a standalone test file, sessions worked fine. But when loaded through the application, values weren't being saved. I spent a good while debugging before I found that was missing from the login success flow. When I logged in, the page redirected to the dashboard, but every subsequent action that required authentication immediately kicked me out. Even after fixing all those issues and getting uploads working, something still bothered me: how do I maintain this code? How do I add new pages to manage uploaded assets? Do I add meatballs directly to the spaghetti? Or do I just trust the AI agent to know where to put new features? Technically it could do that, but I'd have to rely entirely on the AI without ever understanding how things work. So I did the only sane thing: I rewrote a large part of the code and restructured the project. Maybe I should have started there, but I didn't know what I wanted until I saw it. Which is probably why I had been dragging this project along for 11 years. Yes, now I have 22 files, almost double the original count. But the code is also much simpler at just 1,254 lines. There's far less cognitive load when it comes to fixing bugs. There's still a lot to improve, but it's a much leaner foundation. The question I keep coming back to is: would it have been easier to do this manually? Well, the timeline speaks for itself. I had been neglecting this project for years. Without AI, I probably never would have finished it. That said, it would have been easier to build on my existing framework. My blog's framework has been tested for years and has accumulated a lot of useful features: a template engine, a working router, an auth system, and more. All things I had to re-engineer from scratch here. If I'd taken the time to work within my own framework, it probably would have taken less time overall. But AI gave me the illusion that the work could be done much faster. Z.ai generated the whole thing in just 12 minutes. It took an additional 10 hours to clean it up and get it working the way I wanted. This reminds me of several non-technical friends who built/vibe-coded apps last year. The initial results looked impressive. Most of them don't have a working app anymore, because they realized that the cleanup is just as important as the generation if you want something that actually holds together. I can only imagine what "vibe-debugging" looks like. I'm glad I have a working app, but I'm not sure I can honestly call this vibe-coded. Most, if not all, of the files have been rewritten. When companies claim that a significant percentage of their code is AI-generated , do their developers agree? For me, it's unthinkable to deploy code I haven't vetted and understood. But I'm not the benchmark. In the meantime, I think I've earned the right to say this the next time I ship an AI-assisted app: "I apologize for so many lines of code - I didn't have time to write a shorter app."

0 views
DHH Yesterday

Panther Lake is the real deal

Intel really delivered with Panther Lake. A 2026 Dell XPS 14 using this chipset with an IPS screen can hit just 1.4 watts of idle power draw on Omarchy. That's good enough for over 47 hours!! And in real-world mixed use on another 74-Wh machine, I've seen around 16 hours of battery life. That's a huge jump over the ~6 hours I was getting over the past two years from AMD-powered Framework laptops. Technically, Intel already had something close to Panther Lake on efficiency with the Lunar Lake chips from last year, but those were quite slow on any multi-core workloads (like a developer would need). With Panther Lake (358H), I'm getting 17,500 on Geekbench 6, which is about 10% faster than the already excellent AMD HX370, and a match for Apple's M5.  Apple remains ahead on single-core performance, but even there, Panther Lake is on par with an M3. And I don't remember anyone complaining that those were too slow. What everyone has been pining for was better battery life, and now we got it. On a machine with excellent integrated graphics that are good enough to play a ton of triple-A games no less! But we're getting more than that. The PC makers are getting their act together on all fronts. Haptic touchpads on level with Apple's is now standard on both high-end Dell and Asus laptops. Many of the new machines also have tandem OLED screens that blow even the nice micro-LED options from Apple out of the water. And PCs are now somehow both sleeker and slimmer than the MacBooks. Jonathan Ive knew this, he was just a bit ahead of the components, and he was willing to sacrifice reliability to get to what wasn't possible back then. But now it is, and the PC makers are taking full advantage. Now I know that any comparison between Macs and PCs are moot for most people. There's not a lot of cross-shopping going on these days. If you're locked into the Apple walled garden, it's hard to untangle yourself, so most just continue to buy whatever their team offers. But for the few who are either fed up with Apple in general, macOS Tahoe in particular, or just want to try a whole new way of computing with Omarchy, it's fantastic that battery life is no longer a blocker. It's been the #1 reason cited by folks who've been interested in trying Omarchy, but felt like they couldn't let go of Apple's efficiency advantage. Now that's largely gone. I also just love a good turnaround story. Intel had been on the ropes for years. Now they have a fantastic integrated GPU that's compatible with all the tens of thousands of PC games on the market, a super-efficient CPU that's a match for an M5 on multi-core and an M3 on single-core performance, and a range of PC makers finally taking the fight directly to Apple on touchpads, build quality, and weight. These new Panther Lake CPUs are made in Arizona too, btw. With the world as it is, I think any American should breathe a sigh of relief that if things get spicy with Taiwan, there's more to frontier computing than a TSMC plant within a short reach of China. There's still more work to be done on that front (as Intel CPU cores still come from TSMC!), but it's a huge step in the right direction. Personally, I'm just thrilled that competition is lifting all boats. Apple gave the entire laptop industry a huge wake-up call in 2020 with the introduction of the M chips. Intel's former CEO, Pat Gelsinger, saw the threat clearly, kicked off the 18A plan, but sadly didn't last long enough in the top seat to see his bet pay off with Panther Lake. The rest of us now benefit from his boldness. I'm also thrilled to see both Dell and Intel leaning into Linux. Omarchy 3.5 ships with every possible tweak to make these Panther Lake chips perform at their best, and that was only possible because Michael Dell assigned a team to work on it. So much love to Mr Dell for letting us borrow the brains and commits from senior engineers within both his company and Intel to ship this big new release. If you've been waiting on the sidelines for a laptop that can run Omarchy and still get amazing battery life, now is your magic moment. Give the new Dell XPS series, or any of the other laptops shipping with Panther Lake, a try. I think you'll be as impressed as I've been.

0 views
HeyDingus Yesterday

The difference between a company that makes money and a company that makes something worth caring about

David Sparks blogs that companies whose leaders actually give a damn about the products are the ones worth watching: You could argue that’s unhealthy. Maybe it is. But there’s something about a CEO who feels physical pain when the product falls short. That energy flows downhill. When the person at the top cares that much, everyone else figures out pretty quickly that they’d better care too. […] You can spot it pretty easily. When a CEO talks about their company, do they talk about the product or the business? Walt talked about the park. Steve talked about the iPhone. Jensen talks about the chip. The ones who love the product can’t help themselves. The ones who don’t talk about market share and strategic initiatives. Sparks’ sentiment pairs well with Marco Arment’s letter to presumed future Apple CEO John Ternus: Apple doesn’t settle for fine, functional, or good enough in its hardware (and thanks for your incredible work on that). We love making and using products that aren’t just great, but greater than they need to be, always raising the bar of greatness for its own sake. Software, services, revenue sources, and world impact need to be held to that same standard. Focus on making great computers with great user experiences above all else, and you can trust that every other major goal will follow: profit, market share, expansion, impact, and benefit to the world. We have high expectations for Ternus. I hope he can live up to them. HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email .

0 views
devansh Yesterday

On LLMs and Vulnerability Research

I have been meaning to write this for six months. The landscape kept shifting. It has now shifted enough to say something definitive. I work at the intersection of vulnerability triage. I see, every day, how this landscape is changing. These views are personal and do not represent my employer. Take them with appropriate salt. Two things happened in quick succession. Frontier models got dramatically better (Opus 4.6, GPT 5.4). Agentic toolkits (Claude Code, Codex, OpenCode) gave those models hands. The combination produces solid vulnerability research. "LLMs are next-token predictors." This framing was always reductive. It is now actively misleading. The gap between what these models theoretically do (predict the next word) and what they actually do (reason about concurrent thread execution in kernel code to identify use-after-free conditions) has grown too wide for the old frame to hold. Three mechanisms explain why. Implicit structural understanding. Tokenizers know nothing about code. Byte Pair Encoding treats , , and as frequent byte sequences, not syntactic constructs. But the transformer layers above tell a different story. Through training on massive code corpora, attention heads specialise: some track variable identity and provenance, others develop bias toward control flow tokens. The model converges on internal representations that capture semantic properties of code, something functionally equivalent to an abstract syntax tree, built implicitly, never formally. Neural taint analysis. The most security-relevant emergent capability. The model learns associations between sources of untrusted input (user-controlled data, network input, file reads) and dangerous sinks (system calls, SQL queries, memory operations). When it identifies a path from source to sink without adequate sanitisation, it flags a vulnerability. This is not formal taint analysis. No dataflow graph as well. It is a statistical approximation. But it works well for intra-procedural bugs where the source-to-sink path is short, and degrades as distance increases across functions, files, and abstraction layers. Test-time reasoning. The most consequential advance. Standard inference is a single forward pass: reactive, fast, fundamentally limited. Reasoning models (o-series, extended thinking, DeepSeek R1) break this constraint by generating internal reasoning tokens, a scratchpad where the model works through a problem step by step before answering. The model traces execution paths, tracks variable values, evaluates branch conditions. Symbolic execution in natural language. Less precise than formal tools but capable of handling what they choke on: complex pointer arithmetic, dynamic dispatch, deeply nested callbacks. It self-verifies, generating a hypothesis ("the lock isn't held across this path"), then testing it ("wait, is there a lock acquisition I missed?"). It backtracks when reasoning hits dead ends. DeepSeek R1 showed these behaviours emerge from pure reinforcement learning with correctness-based rewards. Nobody taught the model to check its own work. It discovered that verification produces better answers. The model is not generating the most probable next token. It is spending variable compute to solve a specific problem. Three advances compound on each other. Mixture of Experts. Every frontier model now uses MoE. A model might contain 400 billion parameters but activate only 17 billion per token. Vastly more encoded knowledge about code patterns, API behaviours, and vulnerability classes without proportional inference cost. Million-token context. In 2023, analysing a codebase required chunking code into a vector database, retrieving fragments via similarity search, and feeding them to the model. RAG is inherently lossy: code split at arbitrary boundaries, cross-file relationships destroyed, critical context discarded. For vulnerability analysis, where understanding cross-module data flow is the entire point, this information loss is devastating. At one million tokens, you fit an entire mid-size codebase in a single prompt. The model traces user input from an HTTP handler through three middleware layers into a database query builder and spots a sanitisation gap on line 4,200 exploitable via the endpoint on line 890. No chunking. No retrieval. No information loss. Reinforcement-learned reasoning. Earlier models trained purely on next-token prediction. Modern frontier models add an RL phase: generate reasoning chains, reward correctness of the final answer rather than plausibility of text. Over millions of iterations, this shapes reasoning to produce correct analyses rather than plausible-sounding ones. The strategies transfer across domains. A model that learned to verify mathematical reasoning applies the same verification to code. A persistent belief: truly "novel" vulnerability classes exist, bugs so unprecedented that only human genius could discover them. Comforting. Also wrong. Decompose the bugs held up as examples. HTTP request smuggling: the insight that a proxy and backend might disagree about where one request ends and another begins feels like a creative leap. But the actual bug is the intersection of known primitives: ambiguous protocol specification, inconsistent parsing between components, a security-critical assumption about message boundaries. None novel individually. The "novelty" was in combining them. Prototype pollution RCEs in JavaScript frameworks. Exotic until you realise it is dynamic property assignment in a prototype-based language, unsanitised input reaching object modification, and a rendering pipeline evaluating modified objects in a privileged context. Injection, type confusion, privilege boundary crossing. Taxonomy staples for decades. The pattern holds universally. "Novel" vulnerabilities decompose into compositions of known primitives: spec ambiguities, type confusions, missing boundary checks, TOCTOU gaps, trust boundary violations. The novelty is in the composition, not the components. This is precisely what frontier LLMs are increasingly good at. A model that understands protocol ambiguity, inconsistent component behaviour, and security boundary assumptions has all the ingredients to hypothesise a request-smuggling-class vulnerability when pointed at a reverse proxy codebase. It does not need to have seen that exact bug class. It needs to recognise that the conditions for parser disagreement exist and that parser disagreement at a trust boundary has security implications. Compositional reasoning over known primitives. Exactly what test-time reasoning enables. LLMs will not discover the next Spectre tomorrow. Microarchitectural side channels in CPU pipelines are largely absent from code-level training data. But the space of "LLM-inaccessible" vulnerabilities is smaller than the security community assumes, and it shrinks with every model generation. Most of what we call novel vulnerability research is creative recombination within a known search space. That is what these models do best. Effective AI vulnerability research = good scaffolding + adequate tokens. Scaffolding (harness design, prompt engineering, problem framing) is wildly underestimated. Claude Code and Codex are general-purpose coding environments, not optimised for vulnerability research. A purpose-built harness provides threat models, defines trust boundaries, highlights historical vulnerability patterns in the specific technology stack, and constrains search to security-relevant code paths. The operator designing that context determines whether the model spends its reasoning budget wisely or wastes it on dead ends. Two researchers, same model, same codebase, dramatically different results. Token quality beats token quantity. A thousand reasoning tokens on the right code path with the right threat model outperform a million tokens sprayed across a repo with "find vulnerabilities." The search space is effectively infinite. You cannot brute-force it. You narrow it with human intelligence encoded as context, directing machine intelligence toward where bugs actually live. "LLMs are non-deterministic, so you can't trust their findings." Sounds devastating. Almost entirely irrelevant. It confuses the properties of the tool with the properties of the target. The bugs are deterministic. They are in the code. A buffer overflow on line 847 is still there whether the model notices it on attempt one or attempt five. Non-determinism in the search process does not make the search less valid. It makes it more thorough under repetition. Each run samples a different trajectory through the hypothesis space. The union of multiple runs covers more search space than any single run. Conceptually identical to fuzzing. Nobody says "fuzzers are non-deterministic so we can't trust them." You run the fuzzer longer, cover more input space, find more bugs. Same principle. Non-determinism under repetition becomes coverage. In 2023 and 2024, the state of the art was architecture. Multi-agent systems, RAG pipelines, tool integration with SMT solvers and fuzzers and static analysis engines. The best orchestration won. That era is ending. A frontier model ingests a million tokens of code in a single prompt. Your RAG pipeline is not an advantage when the model without RAG sees the whole codebase while your pipeline shows fragments selected by retrieval that does not know what is security-relevant. A reasoning model spends thousands of tokens tracing execution paths and verifying hypotheses. Your external solver integration is not a differentiator when the model approximates what the solver does with contextual understanding the solver lacks. Agentic toolkits handle orchestration better than your custom tooling. The implication the security industry has not fully processed: vulnerability research is being democratised. When finding a memory safety bug in a C library required a Project Zero-calibre researcher with years of experience, the supply was measured in hundreds worldwide. When it requires a well-prompted API call, the supply is effectively unlimited. What replaces architecture as the competitive advantage? Two things. Domain expertise encoded as context. Not "find bugs in this code" but "this is a TLS implementation; here are three classes of timing side-channel that have affected similar implementations; analyse whether the constant-time guarantees hold across these specific code paths." The human provides the insight. The model does the grunt work. Access to compute. Test-time reasoning scales with inference compute. More tokens means deeper analysis, more self-verification, more backtracking. Teams that let a model spend ten minutes on a complex code path will find bugs that teams limited to five-second responses will miss. The end state: vulnerability discovery for known bug classes becomes a commodity, available to anyone with API access and a credit card. The researchers who thrive will focus where the model cannot: novel vulnerability classes, application-level logic flaws, architectural security review, adversarial creativity. This is not a prediction. It is already happening. The pace is set by model capability, which doubles on a timeline measured in months. Beyond next-token prediction Implicit structural understanding Neural taint analysis Test-time reasoning The architecture that enabled this Mixture of Experts Million-token context Reinforcement-learned reasoning The myth of novel vulnerabilities Scaffolding and tokens Non-determinism is a feature Orchestration is no longer your moat

0 views
Stratechery Yesterday

OpenAI Buys TBPN, Tech and the Token Tsunami

OpenAI's purchase of TBPN makes no sense, which may be par for the course for OpenAI. Then, AI is breaking stuff, starting with tech services.

0 views
ava's blog 2 days ago

2 museums, 3 exhibitions - work culture, oceans, sex work

Used the free days to go to two museums; ended up visiting an exhibition about work culture, about the oceans, and about sex work. All museum texts include an English portion, so don't get discouraged when you spot German words; feel free to look for the translation if you do not understand the German portion :) This exhibition reminded me again why museums are so great. I can go years without stepping foot in a museum at times, and some bad and disappointing experiences make it harder to justify it. This one made me happy to go again :) Not only was the art really interesting and inspiring, but the participation options were varied and engaging. Lots of stats, options to discuss, being able to put stickers on what applies to you, rating different activities on whether they qualify as free time or labour by using red or green felt balls, using string to vote on labour strikes, adding your own thoughts on a little paper you stick on a board with questions... it's so cool when museum visitors become part of the exhibition. I learned about Taylorism too. They had various books and ads in the exhibition that were used to share that model back then. It was praised as revolutionary, as the new way forward... and the optimization was intense. Everything had to be normed and unified, every production step analyzed, broken down and written down... searching Taylorism online shows a very watered down, basic-productivity-type of stuff; it seemed to be much more hardcore in these materials in Germany back then. It wasn't just applied to stuff like building machinery and conveyor belt type work, where this genuinely made some work faster, safer, easier to reproduce, quality higher, but was also imported to the private. They had a book there was was about "the new way to run the household" which applied Taylorism to the household chores and even the design and layout of the house. They had whole kitchen layouts that were optimized so that the path between different items and kitchen devices was short and not blocked by the table or chairs; the housewife operating like a worker in a factory, learning specific steps and paths by heart to be the most efficient. It reminded me so much of the culture and language around AI. Ask anyone nowadays and probably no one is doing a taylorist layout of their home or directly referencing Taylorism in their own self improvement journey; still, Taylorism changed work and work culture. Seeing the book saying Taylorism is the new way forward in the home, and knowing how it actually is now, it felt very similar to the marketing around smart homes, and AI bros telling everyone it will run everything, and you should let it run all your personal projects and self improvement or else you'll get left behind both privately and professionally. Pretty interesting! Also, look at this caricature predicting Zoom/Teams calls in 1926 already: Have some of the interesting explanation signs I saw: The worker as Christ that is sacrificing himself for capitalism (the glass art not pictured). Some stats: As a surprise to no one, Germans would like to sleep and hike more: Hurts to see how proud we used to be about our social security systems; the spending was seen as progressive, positive, a sign of wealth and power. Now we starve these systems to death. We went out to eat after: I was a little let down by this one because of my own expectations. I had expected more focus on the actual ocean, instead of centering the human so hard. It was all: We sent this device down there to do research, we built boats, we use things to cross the ocean, we deliver stories and ideas via crossing the ocean, we make up creepy stories about the ocean; transatlantic slave trade, migration, etc. and some of our impact on destroying the ocean, climate change, overfishing (while not daring to criticize fishing, really, because they don't dare offend the visitors who still eat fish, I guess). It was depressing, but accidentally so; it didn't feel like they actually wanted to focus on teaching people anything about the ocean itself, or what they can change to not contribute to the issues the ocean faces; it was more a shrine, an altar to human intervention, celebrating oil rigs and the extraction of resources from the ocean. It didn't seem to celebrate the animals and other organisms much beyond just using them to gawk at or eat. So I didn't take many pictures... The highlight was definitely the huge yarn corals (bigger than just the part on the picture) Next up is the exhibition on sex work! This one was emptier and nicer to visit, and definitely worth it. Beautiful, interesting, very good graphics about legislation around the world, notable sex work spots over the course of history in Germany, big events and personalities in the sex work scene. I was surprised that digital modes of sex work were mentioned on one text sign and otherwise not shown or discussed; it was very, very focused on street sex work, bars, clubs, brothels. I think instead of a section covering witches (for some reason?), I'd have appreciated a section on ""modern"" sex work, in which people livestream, sell pictures and videos, and make custom content. OF especially has changed the respectability of some sex work and has enabled many sex workers to do it from the safety of their own home, more comfortable, and reach more customers all around the globe. People who would not otherwise have done sex work now do sex work due to these platforms (even I did, in 2019). I think that deserves to be covered and discussed. Look at this beautiful but very sad quilt about sex workers facing police violence: Here's Marsha: Lots of art had stories included with them. I liked how it humanized sex work and its workers; everyone can relate to weird, funny, odd, dangerous or morally grey work experiences, especially with customers. AIDS and Covid are very difficult topics. The distrust in governments due to the AIDS epidemic is justified and it was handled wrongly; and you can see how Covid measures were also used to punish the unwanted, the criminalized, the ones without a lobby. All kinds of companies in a variety of industries got financial support and workarounds to still remain in business, while sex workers were left to fend for themselves; no support, just prohibition. No dialogue with the workers on how to make their work safe, just seeing them as a danger. Covid of course posed different challenges than HIV transmission, but it could have been handled better. It shows you how the government handles crises for people who they cannot milk for money. Sex workers are generally disregarded as victims of the Nazis. Of course, sometimes their other identities overlap with groups that were honored and have public memorials (Jewish, Sinti and Roma, queer, disabled etc.) but the part of them that was targeted for their sex work, or people who were only targeted for their sex work and not other parts of their identities never got any justice or memorial. Sex workers were regarded as "asocial" and "degenerate" and institutionalized and were also subject to involuntary hospitalizations, forced labour and more. Just as the NS regime tried to argue for born killers and other supposed sign someone was going to become a criminal or otherwise "undesirable", it argued that some women are just "born prostitutes". The exhibition had different maps of bigger German cities like Cologne, Hamburg and Berlin and their popular cruising and sex work spots. This piece of info stood out to me: The biggest, most well-known trans and gay bar in Berlin was converted into the headquarters of the SA. Other than that, the exhibition had some examples of makeshift dildos, the first condoms, and some amazing video interviews. A little chapel with a water fountain serves as a memorial for all the sex workers who have been killed. Cool that you made it this far. Reply via email Published 06 Apr, 2026

0 views

Germany Doxes “UNKN,” Head of RU Ransomware Gangs REvil, GandCrab

An elusive hacker who went by the handle “ UNKN ” and ran the early Russian ransomware groups GandCrab and REvil now has a name and a face. Authorities in Germany say 31-year-old Russian Daniil Maksimovich Shchukin headed both cybercrime gangs and helped carry out at least 130 acts of computer sabotage and extortion against victims across the country between 2019 and 2021. Shchukin was named as UNKN (a.k.a. UNKNOWN) in an advisory published by the German Federal Criminal Police (the “Bundeskriminalamt” or BKA for short). The BKA said Shchukin and another Russian — 43-year-old Anatoly Sergeevitsch Kravchuk — extorted nearly $2 million euros across two dozen cyberattacks that caused more than 35 million euros in total economic damage. Daniil Maksimovich SHCHUKIN, a.k.a. UNKN, and Anatoly Sergeevitsch Karvchuk, alleged leaders of the GandCrab and REvil ransomware groups. Germany’s BKA said Shchukin acted as the head of one of the largest worldwide operating ransomware groups GandCrab and REvil, which pioneered the practice of double extortion — charging victims once for a key needed to unlock hacked systems, and a separate payment in exchange for a promise not to publish stolen data. Shchukin’s name appeared in a Feb. 2023 filing (PDF) from the U.S. Justice Department seeking the seizure of various cryptocurrency accounts associated with proceeds from the REvil ransomware gang’s activities. The government said the digital wallet tied to Shchukin contained more than $317,000 in ill-gotten cryptocurrency. The Gandcrab ransomware affiliate program first surfaced in January 2018, and paid enterprising hackers huge shares of the profits just for hacking into user accounts at major corporations. The Gandcrab team would then try to expand that access, often siphoning vast amounts of sensitive and internal documents in the process. The malware’s curators shipped five major revisions to the GandCrab code, each corresponding with sneaky new features and bug fixes aimed at thwarting the efforts of computer security firms to stymie the spread of the malware. On May 31, 2019, the GandCrab team announced the group was shutting down after extorting more than $2 billion from victims. “We are a living proof that you can do evil and get off scot-free,” GandCrab’s farewell address famously quipped. “We have proved that one can make a lifetime of money in one year. We have proved that you can become number one by general admission, not in your own conceit.” The REvil ransomware affiliate program materialized around the same as GandCrab’s demise, fronted by a user named UNKNOWN who announced on a Russian cybercrime forum that he’d deposited $1 million in the forum’s escrow to show he meant business. By this time, many cybersecurity experts had concluded REvil was little more than a reorganization of GandCrab. UNKNOWN also gave an interview to Dmitry Smilyanets , a former malicious hacker hired by Recorded Future , wherein UNKNOWN described a rags-to-riches tale unencumbered by ethics and morals. “As a child, I scrounged through the trash heaps and smoked cigarette butts,” UNKNOWN told Recorded Future. “I walked 10 km one way to the school. I wore the same clothes for six months. In my youth, in a communal apartment, I didn’t eat for two or even three days. Now I am a millionaire.” As described in The Ransomware Hunting Team by Renee Dudley and Daniel Golden , UNKNOWN and REvil reinvested significant earnings into improving their success and mirroring practices of legitimate businesses. The authors wrote: “Just as a real-world manufacturer might hire other companies to handle logistics or web design, ransomware developers increasingly outsourced tasks beyond their purview, focusing instead on improving the quality of their ransomware. The higher quality ransomware—which, in many cases, the Hunting Team could not break—resulted in more and higher pay-outs from victims. The monumental payments enabled gangs to reinvest in their enterprises. They hired more specialists, and their success accelerated.” “Criminals raced to join the booming ransomware economy. Underworld ancillary service providers sprouted or pivoted from other criminal work to meet developers’ demand for customized support. Partnering with gangs like GandCrab, ‘cryptor’ providers ensured ransomware could not be detected by standard anti-malware scanners. ‘Initial access brokerages’ specialized in stealing credentials and finding vulnerabilities in target networks, selling that access to ransomware operators and affiliates. Bitcoin “tumblers” offered discounts to gangs that used them as a preferred vendor for laundering ransom payments. Some contractors were open to working with any gang, while others entered exclusive partnerships.” REvil would evolve into a feared “big-game-hunting” machine capable of extracting hefty extortion payments from victims, largely going after organizations with more than $100 million in annual revenues and fat new cyber insurance policies that were known to pay out. Over the July 4, 2021 weekend in the United States, REvil hacked into and extorted Kaseya , a company that handled IT operations for more than 1,500 businesses, nonprofits and government agencies. The FBI would later announce they’d infiltrated the ransomware group’s servers prior to the Kaseya hack but couldn’t tip their hand at the time. REvil never recovered from that core compromise, or from the FBI’s release of a free decryption key for REvil victims who couldn’t or didn’t pay. Shchukin is from Krasnodar, Russia and is thought to reside there, the BKA said. “Based on the investigations so far, it is assumed that the wanted person is abroad, presumably in Russia,” the BKA advised. “Travel behaviour cannot be ruled out.” There is little that connects Shchukin to UNKNOWN’s various accounts on the Russian crime forums. But a review of the Russian crime forums indexed by the cyber intelligence firm Intel 471 shows there is plenty connecting Shchukin to a hacker identity called “ Ger0in ” who operated large botnets and sold “installs” — allowing other cybercriminals to rapidly deploy malware of their choice to thousands of PCs in one go. However, Ger0in was only active between 2010 and 2011, well before UNKNOWN’s appearance as the REvil front man. A review of the mugshots released by the BKA at the image comparison site Pimeyes found a match on this birthday celebration from 2023 , which features a young man named Daniel wearing the same fancy watch as in the BKA photos. Images from Daniil Shchukin’s birthday party celebration in Krasnodar in 2023. Update, April 6, 12:06 p.m. ET : A reader forwarded this English-dubbed audio recording from the a ccc.de (37C3) conference talk in Germany from 2023 that previously outed Shchukin as the REvil leader (Shchuckin is mentioned at around 24:25).

0 views
HeyDingus 2 days ago

7 Things (Which Are Songs I’ve Been Obsessed With) This Week [#185]

A weekly list of interesting things I found on the internet, posted on Sundays. Sometimes themed, often not. 1️⃣ “ Badlands” by Mumford & Sons & Gracie Abrams 2️⃣ “ Easier Gone” by Jason Aldean & Brittany Aldean 3️⃣ “ Grace Kelly” by Piper.Ally 4️⃣ “ Forever Start (Stripped)” by Ryan Nealon & Jillian Rossi 5️⃣ “ FTS ” by The Summer Set & Travie McCoy 6️⃣ “ Opalite” by Taylor Swift 7️⃣ “ Angels Like You” by Miley Cyrus Thanks for reading 7 Things . If you enjoyed these links or have something neat to share, please let me know . And remember that you can get more links to internet nuggets that I’m finding every day by following me @jarrod on the social web. HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email .

0 views