Posts in Devops (20 found)

Abstraction, not syntax

Alternative configuration formats solve superficial problems. Configuration languages solve the deeper problem: the need for abstraction.

0 views

Co-Pilots, Not Competitors: PM/EM Alignment Done Right

In commercial aviation, most people know there's a "Pilot" and a "Co-Pilot" up front (also known as the "Captain" and "First Officer"). The ranks of Captain and First Officer denote seniority but both pilots, in practice, take turns filling one of two roles: "pilot flying" and "pilot monitoring". The pilot flying is at the controls actually operating the plane. The pilot monitoring is usually the one talking to Air Traffic Control, running checklists, and watching over the various aircraft systems to make sure they're healthy throughout the flight. Obviously, both pilots share the goal of getting the passengers to the destination safely and on time while managing the costs of fuel etc. secondarily. They have the same engines, fuel tank and aircraft systems and no way to alter that once airborne. They succeed or fail together. It would be a hilariously bad idea to give one pilot the goal of getting to the destination as quickly as possible while giving the other pilot the goal of limiting fuel use as much as possible and making sure the wings stay on. Or, to take an even stupider example, give one pilot the goal of landing in Los Angeles and the other the goal of landing in San Francisco. Obviously that wouldn't work at because those goals are in opposition to each other. You can get there fast if you don't care about fuel or the long-term health of the aircraft, or you can optimize for fuel use and aircraft stress if you don't need to worry about the travel time as much. It sounds ridiculous! And yet, this is how a lot of organizations structure EM and PM goals. An EM and PM have to use the same pool of available capacity to achieve their objectives. They are assigned a team of developers and what that team of developers is able to do in a given period is a zero-sum-game. In a modern structure, following the kinds of practices that lead to good DORA metrics , their ownership is probably going to be a mix of building new features and care and feeding of what they've shipped before. Like a plane's finite fuel tank, the capacity of the team is fixed and exhaustible. Boiled down to its essence, the job of leaders is managing that capacity to produce the best outcomes for the business. Problems arise, however, in determining what those outcomes are because they're going to be filtered through the lens of each individual's incentive structure. Once fuel is spent, it’s spent. Add more destinations without adjusting the plan, and you’re guaranteeing failure. It would not be controversial to say that a typical Product Manager is accountable and incentivized to define and deliver new features that increase user adoption and drive revenue growth. Nor would it be controversial to say that a typical Engineering Manager is accountable and incentivized to make sure that the code that is shipped is relatively error-free, doesn't crash, treats customer data appropriately, scales to meet demand, etc etc. You know the drill. I think this is fine and reflects the reality that EMs and PMs aren't like pilots in that their roles are not fully interchangeable and there is some specialization in those areas they're accountable for. But if you just stop there, you have a problem. You've created the scenario where one pilot is trying to get to the destination as quickly as possible and one is trying to save fuel and keep the airplane from falling apart. You need to go a step further and coalesce all of those things into shared goals for the team. An interview question I like to ask both prospective EM and PM candidates is "How do you balance the need to ship new stuff with the need to take care of the stuff you've already shipped?" The most common answer is something like "We reserve X% of our story points for 'Engineering Work' and the rest is for 'Product Work'. This way of doing things is a coping strategy disguised as a solution. It's the wrong framing entirely because everything is product work . Performance is a feature, reliability is a feature, scalability is a feature. A product that doesn't work isn't a product, and so a product manager needs to care about that sort of thing too. Pretending there's a clean line between "Product Work" and "Engineering Work" is how teams can quietly drift off-course. On the flip-side, I'm not letting engineering managers off the hook either. New features and product improvements are usually key to maintaining business growth and keeping existing customers happy. All the scalability and performance in the world don't matter if you don't have users. Over-rotating on trying to get to "five nines" of uptime when you're not making any revenue is a waste of your precious capacity that could be spent on things that grow the business and that growth will bring more opportunities for solving interesting technical challenges and more personal growth opportunities for everyone who is there. EMs shouldn't ignore the health of the business any more than PMs should ignore the health of the systems their teams own. Different hats are fine, different destinations are not. If you use OKRs or some other cascading system of goals where the C-Suite sets some broad goals for the company and those cascade into ever more specific versions as you walk down the org chart hierarchy, what happens when you get to the team level? Do the EM and PM have separate OKRs they're trying to achieve? Put them side by-side and ask yourself the following questions: If you answer 'no' to any of those, congratulations, you've just planned your team's failure to achieve their goals. To put it another way your planning is not finished yet! You need to keep going and narrow down the objectives to something that both leaders can get behind and do their best to commit to. You've already got your goals side-by-side. After you've identified the ones that are in conflict, have a conversation about business value. Don't bring in the whole team yet, they're engineers and the PM might feel outnumbered. Just have the conversation between the two of you and see if you can come to some kind of consensus about which of the items in conflict are more valuable. Prioritize that one and deprioritize the one that it's in oppostition to. Use "what's best for the business" as your tie-breaker. That's not always going to be perfectly quantifiable so there's a lot of subjectivity and bias that's going to enter into it. This is where alignment up the org chart is very beneficial. These concepts don't just apply at the team level. If you find you can't agree and aren't making progress, try to bring in a neutral third party to facilitate the discussion. If you have scrum-masters, agile coaches, program managers or the like they can often fill this role nicely. You could also consult a partner from Customer Support who will have a strong incentive to advocate for the customer above all to add more context to the discussion. Don't outsource the decision though, both the EM and PM need to ultimately agree (even reluctantly) on what's right for the team. If your org, like many, has parallel product and engineering hierarchies, alignment around goals at EACH level is critical. The VP Eng and VP Product should aim to have shared goals for the entire team. Same thing at the Director level, and every other level. That way if each team succeeds, each leader up the org chart succeeds, and ultimately the customer and the business are the beneficiaries. If you don't have that, doing it at the team level alone is going to approach impossible and you should send this post to your bosses. 😉 But in seriousness, you should spend some energy trying to get alignment up and down the org chart. If your VP Eng and VP Product don’t agree on direction, your team is flying two different flight plans. No amount of team-level cleverness can fully fix that. Your job is to surface it, not absorb it. Things you can do: You likely set goals on a schedule. Every n months you're expected to produce a new set and report on the outcomes from the last set. That's all well and good but I think most people who've done it a few times recognize that the act of doing it is more valuable than the artifacts the exercise produces are. (Often described as "Plans are useless, planning is critical.") The world changes constantly, the knowledge you and your team have about what you're doing increases constantly, all of these things can impact your EM/PM alignment so it's also important to stay close, keep the lines of communication very open, and make adjustments whenever needed. The greatest predictor of a team's success, health, and happiness is the quality of the relationship between the EM and PM. Keeping that relationship healthy and fixing it if it starts to break down will keep the team on the right flight path with plenty of spare fuel for surprise diversions if they're needed. Regardless of what framework you're using, or even if you're not using one at all, the team should have one set of goals they're working toward and the EM and PM should succeed or fail together on achieving those goals. Shared goals don’t guarantee a smooth landing. Misaligned ones guarantee a crash. " A380 Cockpit " by Naddsy is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! If we had infinite capacity, are these all achievable? Or are some mutually exclusive? Example: Launch a new feature with expensive storage requirements (PM) vs. Cut infra spend in half (EM) Does achieving any of these objectives make another one harder? Example: Increase the number of shipped experiments and MVPs (PM) vs. Cut the inbound rate of customer support tickets escalated to the team. (EM) Do we actually have the capacity to do all of this? Example: Ship a major feature in time for the annual customer conference (PM) vs. Upgrade our key framework which is 2 major versions behind and is going out of support. (EM) If your original set of goals conflicted, the goals at the next level up probably do to. Call that out and ask for it to be reconciled. Escalate when structural problems get in the way of the team achieving thier goals. Only the most dysfunctional organizations would create systems that guarantee failure on purpose .

0 views
マリウス 6 days ago

Alpine Linux on a Bare Metal Server

When I began work on 📨🚕 ( MSG.TAXI ) , I kept things deliberately low-key, since I didn’t want it turning into a playground for architecture astronauts . For the web console’s tech stack, I went with the most boring yet easy-to-master CRUD stack I could find , that doesn’t depend on JavaScript . And while deploying Rails in a sane way (without resorting to cOnTaInErS ) is a massive PITA, thanks to the Rails author’s cargo-cult mentality and his followers latching onto every half-baked wannabe-revolutionary idea, like Kamal and more recently Omarchy , as if it were a philosopher’s stone, from a development perspective it’s still the most effective getting-shit-done framework I’ve used to date. Best of all, it doesn’t rely on JavaScript (aside from actions, which can be avoided with a little extra effort). Similarly, on the infrastructure side, I wanted a foundation that was as lightweight as possible and wouldn’t get in my way. And while I’m absolutely the type of person who would run a Gentoo server, I ultimately went with Alpine Linux due to its easier installation, relatively sane defaults (with a few exceptions, more on that later ), and its preference for straightforward, no-nonsense tooling that doesn’t try to hide magic behind the scenes. “Why not NixOS?” you might ask. Since I’m deploying a lightweight, home-brewed Ruby/Rails setup alongside a few other components, I didn’t see the point of wrapping everything as Nix packages just to gain the theoretical benefits of NixOS. In particular, the CI would have taken significantly longer, while the actual payoff in my case would have been negligible. Since I’m paying for 📨🚕 out of my own pocket, I wanted infrastructure that’s cheap yet reliable. With plenty of people on the internet praising Hetzner , I ended up renting AMD hardware in one of their Finnish datacenters. Hetzner doesn’t offer as many Linux deployment options as cloud providers like Vultr , so I had to set up Alpine myself, which was pretty straightforward. To kickstart an Alpine installation on a Hetzner system, you just need to access the server’s iPXE console, either by renting a Hetzner KVM for an hour or by using their free vKVM feature. From there, you can launch the Alpine Linux by initializing the network interface and chain-loading the file: From that point on setup should be easy thanks to Alpine’s installer routine. If you’re using Hetzner’s vKVM feature to install Alpine, this chapter is for you. Otherwise, feel free to skip ahead. vKVM is a somewhat hacky yet ingenious trick Hetzner came up with, and it deserves a round of applause. If you’re curious about how it works under the hood, rent a real KVM once and reboot your server into vKVM mode. What you’ll see is that after enabling vKVM in Hetzner’s Robot , iPXE loads a network image, which boots a custom Linux OS. Within that OS, Hetzner launches a QEMU VM that uses your server’s drives to boot whatever you have installed. It’s basically Inception at the OS level. As long as vKVM is active (meaning the iPXE image stays loaded), your server is actually running inside this virtualized environment, with display output forwarded to your browser. Run while in vKVM mode and you’ll see, for example, your NIC showing up as a VirtIO device. Here’s the catch: When you install Alpine through this virtualized KVM environment, it won’t generate the your physical server actually needs. For instance, if your server uses an NVMe drive, you may discover that doesn’t include the module, causing the OS to fail on boot. Hetzner’s documentation doesn’t mention this, and it can easily bite you later. Tl;dr: If you installed your system via vKVM , make sure your includes all necessary modules. After updating , regenerate the . There are several ways to do this, but I prefer . Always double-check that the regenerated really contains everything you need. Unfortunately Alpine doesn’t provide tools for this, so here’s a .tar.gz with Debian’s and . Extract it into , and note that you may need to for them to work properly, due to Alpine’s somewhat annoying defaults (more on that later ). Finally, after rebooting, make sure you’ve actually left the vKVM session. You can double check by running . If the session is still active (default: 1h), your system may have just booted back into the VM, which you can identify by its Virt-devices. As soon as your Alpine Linux system is up and running there are a couple of things that I found important to change right off the bat. Alpine’s default boot timeout is just 1 second, set in ( ). If you ever need to debug a boot-related issue over a high-latency KVM connection, you will dread that 1-second window. I recommend increasing it to 5 seconds and running to apply the change. In practice, you hopefully won’t be rebooting the server that often, so the extra four seconds won’t matter day-to-day. Alpine uses the classic to configure network settings. On Hetzner’s dedicated servers, you can either continue using DHCP for IPv4 or set the assigned IP address statically. For IPv6, you’ll be given a subnet from which you can choose your own address. Keep in mind that the first usable IPv6 on Hetzner’s dedicated servers is : Amongst the first things you do should be disabling root login and password authentication via SSH: Apart from that you might want to limit the type of key exchange methods and algorithms that your SSH server allows, depending on the type of keys that you’re using. Security by obscurity: Move your SSH server from its default port (22) to something higher up and more random to make it harder for port-scanners to hit it. Finicky but more secure: Implement port knocking and use a handy client to open the SSH port for you only, for a limited time only. Secure: Set up a small cloud instance to act as Wireguard peer and configure your server’s SSH port to only accept connections from the cloud instance using a firewall rule . Use Tailscale if a dedicated Wireguard instance is beyond your expertise. You will likely want to have proper (GNU) tools around, over the defaults that Alpine comes with ( see below ). Some of the obvious choices include the following: In addition, I also like to keep a handful of convenience tools around: This is a tricky part because everyone’s monitoring setup looks different. However, there are a few things that make sense in general. Regardless what you do with your logs it’s generally a good idea to switch from BusyBox to something that allows for more advanced configurations, like syslog-ng : You probably should have an overview of how your hardware is doing. Depending on what type of hard drives your server has, you might want to install the or packages. UFW is generally considered an uncomplicated way to implement firewalling without having to complete a CCNP Security certification beforehand: Depending on your SSH setup and whether you are running any other services that could benefit from it, installing Fail2Ban might make sense: The configuration files are located at and you should normally only create/edit the files. The easiest way to backup all the changes that you’ve made to the general configuration is by using , the integrated Alpine local backup solution that was originally intended as a tool to manage diskless mode installations. I would, however, recommend to manually back up installed packages ( ) and use Restic for the rest of the system, including configuration files and important data, e.g.: However, backups depend on the data that your system produces and your desired backup target. If you’re looking for an easy to use, hosted but not-too-expensive one-off option, then Tarsnap might be for you. You should as well look into topics like local mail delivery, system integrity checks (e.g. AIDE ) and intrusion detection/prevention (e.g. CrowdSec ). Also, if you would like to get notified for various server events, check 📨🚕 ( MSG.TAXI ) ! :-) One of the biggest annoyances with Alpine is BusyBox : You need SSH? That’s BusyBox. The logs? Yeah, BusyBox. Mail? That’s BusyBox, too. You want to untar an archive? BusyBox. What? It’s gzipped? Guess what, you son of a gun, gzip is also BusyBox. I understand why Alpine chose BusyBox for pretty much everything, given the context that Alpine is most often used in ( cOnTaInErS ). Unfortunately, most BusyBox implementations are incomplete or incompatible with their full GNU counterparts, leaving you wondering why something that worked flawlessly on your desktop Linux fails on the Alpine box. By the time I finished setting up the server, there was barely any BusyBox tooling left. However, I occasionally had to resort to some odd trickery to get things working. You now have a good basis to set up whatever it is that you’re planning to use the machine for. Have fun! Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program .

0 views
Chris Coyier 1 weeks ago

Fixing the `opendiff` command line tool

On my Mac, you can use this command like… And it’ll open some built-in GUI app called FileMerge to show you the diff. It wasn’t working for me. I wish I copied the exact error but it was something about the path being wrong or the executable not being available or something. The solution that worked for me was to open XCode and go to Settings > Location . The Command Line Tools section didn’t have anything selected, and I had to select XCode from the list. Then started working again. 🤷‍♀️

0 views
Karboosx 1 weeks ago

Become Unbannable from Your Email/Gmail

You don't own your email; you rent it. This vulnerability leaves your entire digital life at risk. Read how to seize back control of your most critical online asset by switching providers and setting up your own unbannable email server.

0 views
James O'Claire 1 weeks ago

Self hosting 10TB in S3 on a framework laptop + disks

About 5 months ago I made the decision to start self hosting my own S3 . I was working on AppGoblin’s SDK tracking of the top 100k Android and iOS apps so was wanting a lot of space, but for cheap. I got really lucky with getting a second hand Framework laptop. The laptop was missing it’s screen, and was one of the older ones, so it was perfect for a home server. In addition I bought a “just a bunch of disks” JBOD. The framework laptop is running ZFS + garage S3 . I’ve been away, I’ve been working, I’ve been busy, and I’ve definitely been using my S3. But I hadn’t thought about the laptop in 4 months. When I finally logged in, I saw I’ve used 10TB of space and it was patiently waiting for a restart for some upgrades. I nervously restarted, and was so relieved to see everything come right back up. I also saw a pending upgrade for garage v1 to v2. This went along without a hitch too. Feels like it’s been a good weekend. Just so you know, I understand my use case for ZFS is possibly a bit non standard as I’m using a USB to connect the laptop and JBOD . This initially caused me issues with ZFS when garage was heavily reading and writing (the initial setup had the SQLite metadata also stored on the JBOD/ZFS). I moved my metadata to the laptop, which has so far resolved any ZFS issues again.

0 views
Brain Baking 1 weeks ago

I'm Sorry RSS Subscribers, Ooh I Am For Real

Never meant to make your reader cry, I apologize a trillion times. My baby a drama Hugo don’t like me; she be doin’ things like duplicatin’ them RSS entries. Come from her release page to my server tryna fight me, bringing her breaking changes along; messing up quite wrong. That’s as far as I can take that Outkast song. I upgraded to Hugo and thought I fixed all the breaking changes but I was wrong and noticed the biggest one too late. Some dangling atom file once used to generate Gemini news feeds suddenly went back online causing all items to be duplicated. Aren’t new releases with shady notes documenting the changes great. Apologies dear RSS readers. The issue has been fixed for a few days now but it seems that most readers like to cache results so if you’re still seeing double please press that Purge Cache Now button. And thanks for reading Brain Baking ! By Wouter Groeneveld on 4 October 2025.  Reply via email .

0 views

Am I solving the problem or just coping with it?

Solving problems and putting in processes that eliminate them is a core part of the job of a manager. Knowing ahead of time whether or not your solution is going to work can be tricky, and time pressures and the urgency that pervades startups can make quick solutions seem really attractive. However, those are often a trap that papers over the real issue enough to look like it’s solved, only to have it come roaring back worse later. If you’ve been around the world of incident response and the subsequent activity of retrospectives/post-mortems, you’ve probably heard of the “ 5 whys ” method of root-cause analysis. In the world of complex systems, attributing failures to a single root cause has fallen out of favour and 5 whys has gone with it. Nevertheless, there is value in digging deeper than what initially presents itself as the challenge to uncover deeper, more systemic issues that might be contributing factors. Rather than “why,” I like to ask myself if I’m really solving this dysfunction or just coping with it. There’s a lot of temptation to cope. Coping strategies have some features that make them attractive. Imagine your problem is that your team is “always behind schedule.” That’s a problem I’m sure most people are familiar with. There are plenty of easy-to-grab band-aid strategies for that pattern that are very attractive: All of these can make you feel like you’re solving the problem but: This dichotomy is probably more familiar and easy to recognize in the technical realm. If your code has a memory leak you can either proactively restart it every now and then (coping) or you can find the leak and fix it (actual solution). The reasons you should prefer the actual solution are the same in both scenarios. The 'strategy' of restarting the service will work for a while, but you know in the back of your mind that eventually it’s going to manage to crash itself between restarts. Same goes for your people processes. At one of my jobs we had a fairly chaotic release process. This was a while ago and “ Continuous Delivery r” was still a fairly newfangled idea that wasn’t widely adopted, so we did what most companies did and released on a schedule. It was a SaaS product, so releasing was really just pushing a batch of changes to production. The changes had ostensibly been tested (we still had human QA people then) and everything looked good, but nevertheless nearly every release had some catastrophic issues that needed immediate hotfixes. We didn’t have set working hours, except on release day. We expected all of the developers to be in the office when the code went out in case they were needed for fixes. We released around 9am and firefighting often continued throughout the morning and into the afternoon. When that happened, the office manager would usually order some pizzas so people could have something to eat while fixing prod. Eventually this happened so often that it just became routine. Release day meant lunch was proactively ordered in for the dev team, then chaos ensued and was fixed. Of course, we did eventually tackle the real pain points by increasing the frequency of releases (when something hurts, do it more, it’ll give you incentives to fix the real problems), adding more automated testing, and generally just getting better at it all. The lunches continued well past the point where the majority of the people on the team remembered or ever knew why they were there. Some other practical questions you can ask yourself to try to recognize stopgap strategies that aren’t addressing the real problems. As mentioned before, anything that stops working when you stop doing it is likely not a good solution. If you’ve got a system where someone has to push the “don’t crash production” button every day or else production crashes, eventually someone’s going to fail to push the button and production is going to crash. The button is not a solution; it’s a coping strategy. If you’re having a lot of off-hours incidents and your incident response team is complaining about the burden, you could train more people in incident response to reduce their burden (and maybe you should), but that’s not solving the problem, which is really that you’re having too many incidents. A true solution would be understanding why and addressing that. Code not sufficiently tested? Production infra not sized appropriately? Who knows, but the answer is not simply throwing more bodies at it. That will reduce the acute problem of burnout in your responders (symptom) but not the chronic issues that are causing the incidents in the first place. If you’re not changing the underlying conditions you’re probably not fixing the problem. If all your meetings are running over time you could appoint someone to act as the timing police and give warnings and scold people who talk too long, but that’s just adding work for someone and not touching the real reasons, which could be unclear agendas, poor facilitation, or the fact that the VP can’t manage to stay on topic for any amount of time. The solution is to fix your meeting culture. Require agendas, limit the number of topics, train people on facilitation, give some candid feedback to the VP. These solutions actually remove the failure mode that causes meetings to run long. The timekeeper “strategy” doesn’t do anything about that. This is similar to the first question, but it’s something you can think of like a metric, even though it’s probably more a heuristic than something directly measurable. Treating your releases like pre-scheduled incidents like we did would be a good warning sign that you’re coping, but if each one is less dramatic than the last and wraps up sooner, those are signs that you’re on the right track. Getting to the point where releases are unremarkable and you don’t need the whole team on standby would be a good indicator that you’ve got solid solutions in place. When production is down, anything you can do to get it back up again is the right move. If you’re bleeding and you have to make a tourniquet with your own shirt, you do it. There are lots of scenarios where a short-term fix is the best thing for the situation. But keep in mind, this is effectively debt . Like technical debt, it should be repaid — ideally sooner rather than later. A good incident process will recover production by any means necessary. Once, during an incident related to a global DNS provider’s outage, we were literally uploading /etc/hosts files to our infrastructure to keep the part of our product that relied on some 3rd parties working. Needless to say, once the DNS incident was resolved, we went back and cleaned those up. You can do the same with processes. When there’s immediate pain that can be relieved with a quick patch-up, you should do it, and use the fact that the pain has abated to fix the problem permanently. You also can’t fix them all. You might not have enough influence or political capital to make the kinds of changes that real solutions require. In that case, your job is to advocate for the right things to happen, point out the ways that coping is hurting the organization, and exert influence over the parts that are in your control. I’ve tried to give you some strategies for recognizing when you’re just coping with a situation rather than fixing it. The differences can be subtle, but once you start to spot them it gets easier and you’ll find things start to get better in more permanent, sustainable ways. " Temporary Fix " by reader of the pack is licensed under CC BY-ND 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! They can be contained within your team ; you don’t need to influence people outside your sphere. (We’ll just work through the weekend and get the project back on track.) They’re highly visible. (Look at Jason’s team putting in the extra effort! What a good manager!) They can move vanity metrics in the short term, which is often something rewarded. (We increased our velocity and did 20% more story points!) They come with a cost . (The team will burn out and quit if they have to work through too many weekends.) If you stop doing them, the situation comes right back . (The next project wasn’t different and we’re back behind schedule again.) They only address the symptoms , not the causes. (Improving your culture around estimations and deadlines would be a better fix.)

0 views
André Arko 2 weeks ago

<code>jj</code> part 1: what is it

I’ve been working on a blog post about migrating to jj for two months now. Rather than finish my ultimate opus and smother all of you in eight thousand words, I finally realized I could ship incrementally and post as I finish each section. Here’s part 1: what is jj and how do I start using it? Sure, you can do that. Convert an existing git repo with or clone a repo with . Work in the repo like usual, but with no needed, changes are staged automatically. Commit with , mark what you want to push with , and then push it with . If you make any additional changes to that branch, update the branch tip by running again before each push. Get changes from the remote with with . Set up a local copy of a remote branch with . Check out a branch with , and then loop back up to the start of the previous paragraph for commit and push. That’s probably all you need to get started, so good luck and have fun! Still here? Cool, let’s talk about how jj is different from git. There’s a list of differences from git in the jj docs, but more than specific differences, I found it helpful to think of jj as like git, but every change in the repo creates a commit. Edit a file? There’s a commit before the edit and after the edit. Run a jj command? There’s a commit before the command and after the command. Some really interesting effects fall out of storing every action as a commit, like no more staging, trivial undo, committed conflicts , and change IDs. When edits are always immediately committed, you don’t need a staging area, or to manually move files into the staging area. It’s just a commit, and you can edit it by editing the files on disk directly. Any jj command you run can be fully rewound, because any command creates a new operation commit in the op log. No matter how many commits you just revised in that rebase, you can perfectly restore their previous state by running . Any merge conflict is stored in the commit itself. A rebase conflict doesn’t stop the rebase—your rebase is already done, and now has some commits with conflicts inside them. Conflicts are simply commits with conflict markers, and you can fix them whenever you want. You can even rebase a branch full of conflicts without resolving them! They’re just commits. (Albeit with conflict markers inside them.) Ironically, every action being a commit also leads away from commits: how do you talk about a commit both before and after you amended it? You add change IDs. Changes give you a single identifier for your intention, even as you need many commits to track how you amended, rebased, and then merged those changes. Once you’ve internalized a model where every state is a commit, and change IDs stick around through amending commits, you can do some wild shenanigans that used to be quite hard with git. Five separate PRs open but you want to work with all of them at once? Easy. Have one commit that needs to be split into five different new commits across five branches? Also easy. One other genius concept jj offers is revsets . In essence, revsets are a query language for selecting changes, based on name, message, metadata, parents, children, or several other options. Being able to select lists of changes easily is a huge improvement, especially for commands like log or rebase. For more about jj’s design, concepts, and why they are interesting, check out the blog posts jj strategy , What I’ve Learned From JJ , jj init , and jj is great for the wrong reason . For a quick reference you can refer to later, there’s a single page summary in the jj cheat sheet PDF . Keep an eye out for the next part of this series in the next few days. We’ll talk about commands in jj, and exactly how they are both different and better than git commands.

2 views
neilzone 2 weeks ago

I am getting better at turning off unneeded self-hosted services

One of the things at which I’ve become better over the years is turning off self-hosted services. I love self-hosting things, and trying out new things, but I am also mindful that each new thing carries its own set of risks, particularly in terms of security. Similarly, while I do my best to secure servers and services to a reasonable standard, and to isolate things in a sensible way on our network, there is still residual risk and, if I am not using something enough, there is no point me tolerating that risk. So, once a month, I take a look at what I am self-hosting, and decide (a) if I really still need each thing, and (b) if I do, if I am doing the thing in the most sensible (for me) way. Occasionally, I am on the fence, particularly if it is something that I use infrequently but do still use it. Those are prime candidates for considering if I can achieve the same thing but in a better way. This is particularly true of services where upgrading them in a bit of a pain - I automate as much as I can, but if I have to interfere with something manually, then it is a greater risk, and I am less likely to keep it around. Overall, though, I’ve become a bit more ruthless in taking things down. I have backups, and, if I take down something which I later decide that I want to use, I can always reinstall it and restore the backup. I also go through and remove the DNS entries, and any firewall configuration, and again I have backups of those. I get the benefit of trying new things, but in a more managed manner.

1 views
Martin Fowler 2 weeks ago

Anchoring AI to a reference application

Service templates are a typical building block in the “golden paths” organisations build for their engineering teams, to make it easy to do the right thing. The templates are supposed to be the role models for all the services in the organisation, always representing the most up to date coding patterns and standards. One of the challenges with service templates though is that once a team instantiated a service with one, it’s tedious to feed template updates back to those services. Birgitta Böckeler considers whether GenAI can help with that.

0 views
Gabriel Garrido 3 weeks ago

WireGuard topologies for self-hosting at home

I recently migrated my self-hosted services from a VPS (virtual private server) at a remote data center to a physical server at home. This change was motivated by wanting to be in control of the hardware and network where said services run, while trying to keep things as simple as possible. What follows is a walk-through of how I reasoned through different WireGuard toplogies for the VPN (virtual private network) in which my devices and services reside. Before starting, it’s worth emphasizing that using WireGuard (or a VPN altogether) is ont strictly required for self-hosting. WireGuard implies one more moving part in your system, the cost of which is justified only by what it affords you to do. The constraints that I outline below should provide clarity as to why using WireGuard is appropriate for my needs. It goes without saying that not everyone has the same needs, resources, and threat model, all of which a design should account for. That said, there isn’t anything particularly special about what I’m doing. There is likely enough overlap here for this to be useful to individuals or small to medium-sized organizations looking to host their services. I hope that this review helps others build a better mental model of WireGuard, and the sorts of networks that you can build up to per practical considerations. Going through this exercise proved to be an excellent learning experience, and that is worthwhile on its own. This post assumes some familiarity with networking. This is a subject in which acronyms are frequently employed, so I’ve made sure to spell these out wherever introduced. The constraints behind the design of my network can be categorized into first-order and second-order constraints. Deploying WireGuard responds to the first-order constraints, whereas the specifics of how WireGuard is deployed responds to the second-order constraints. There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Choosing to deploy WireGuard is motivated by constraints two and three. Constraint one is not sufficient on its own to necessitate using WireGuard because everything can run on the local area network (LAN). Once deployed, I should be able to connect all of my devices using hardware, software, and keys that I control within the boundaries of my home office. These devices all exist in the same physical network, but may reside in separate virtual LANs (VLANs) or subnets. Regardless, WireGuard is used to establish secure communications within and across these boundaries, while working in tandem with network and device firewalls for access control. I cannot connect to my home network directly from the wide area network (WAN, e.g the Internet) because it is behind Carrier-Grade Network Address Translation (CGNAT) . A remote host is added to the WireGuard network to establish connections from outside. This host runs on hardware that I do not control, which goes against the spirit of the first constraint. However, an allowance is made considering that the role of this peer is not load-bearing in the overarching design, and can be removed from the network as needed. Assuming WireGuard is now inherent in this design, its use should adhere to the following constraints: Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Satisfying these constraints preclude the use of solutions such as Tailscale, Headscale, or Netbird. Using WireGuard natively has the added benefit that I can rely on a vetted and stable version as packaged by my Linux distribution of choice, Debian . Lastly, it is worth stating requirements or features that are often found in designs such as these, but that are not currently relevant to me. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. Let’s look at the resources in the network, and how these connect with each other. Consider the following matrix. Each row denotes whether the resource in the first column connects to the resources in the remaining columns, either to consume a service or perform a task. For example, we can tell that the server does not connect to any device, but all devices connect to the server. Said specifically: The purpose of this matrix is to determine which connections between devices ought to be supported, regardless of the network topology . This informs how WireGuard peers are configured, and what sort of firewall rules need to be established. Before proceeding, let’s define the networks and device IP addresses that will be used. The name of the WireGuard network interface will be , where applicable 1 . For purposes of this explanation, port will be used in all of the devices when a port needs to be defined. I’ll explore different topologies as I build to up to the design that I currently employ. By starting with the simplest topology, we can appreciate the benefits and trade-offs involved in each step, while strengthening our conceptual model of WireGuard. Each topology below is accompanied by a simple diagram of the network. In it, the orange arrow denotes a device connecting to another device. Where two devices connect to each other, a bidirectional arrow is employed. Later on, green arrows denote a device forwarding packets to and from other resources. The basic scenario, and perhaps the most familiar to someone looking to start using WireGuard to self-host at home, is hosting in the network that is established by the router provided by an Internet service provider (ISP). Let’s assume its configuration has not been modified other than changing the Wi-Fi and admin passwords. A topology that can be used here is point-to-point , where each device lists every other device it connects to as its peer. In WireGuard terminology, “peers” are endpoints configured to connect with each other to establish an encrypted “tunnel” through which packets are sent and received. According to the connections matrix, the desktop computer and the server are peers, but the desktop computer and tablet aren’t. The WireGuard configuration for the desktop computer looks as follows: Note that the LAN IP address of each peer is specified under . This is used to find the peer’s device and establish the WireGuard tunnel. specifies the IP addresses used within the WireGuard network. In other words, the phone is the desktop’s peer, it can be found at to establish the tunnel. Let’s assume each of these devices have firewalls that allow UDP traffic through port and all subsequent traffic through the WireGuard interface. Once the WireGuard configurations of the server, laptop, and phone include the corresponding peers, secure communication is established through the WireGuard network interface. Let’s try sending a packet from the desktop computer to the phone. The packet was routed directly to the phone and echoed back. At this point access control is enforced in each device’s firewall. Allowing everything that comes through the interface is convenient, but it should be limited to the relevant ports and protocols for least-privileged access. An obvious problem with this scenario is that the Dynamic Host Configuration Protocol (DHCP) server in the router likely allocates IP addresses dynamically when devices connect to the LAN network. The IP address for a device may thus change over time, and WireGuard will be unable to find a peer to establish a connection to it. For example, I’m at home and my phone dies. The LAN IP address is freed and assigned to another device that comes online. WireGuard will attempt to connect to (per ) and fail for any of the following reasons: Fortunately, most routers support configuring static IP addresses for a given device in the network. Doing so for all devices in our WireGuard network fixes this problem as the IP address used in will be reserved accordingly. Suppose I want to work at a coffee shop, but still need access to something that’s hosted on my home server. As mentioned in the constraints, my home network is behind CGNAT. This means that I cannot connect directly to it using whatever WAN IP address my router is using at the moment. What I can do instead is use a device that has a publicly routable IP address and make that a WireGuard peer of our server. In this case that’ll be a VPS in some data center. How is the packet ultimately relayed to and from the server at home? Both the server and laptop established direct encrypted tunnels with the VPS. WireGuard on the VPS will receive the encrypted packets from the laptop, decrypt them, and notice that they’re meant for the server. It will then encrypt these packets with the server’s key and send them through the server’s tunnel. It’ll do same thing with the server’s response, except towards the laptop using the laptop’s tunnel. A device that forwards packets between peers needs to be configured for IPv4 packet forwarding. I will not cover the specifics of this configuration because it depends on what operating system and firewall are used 2 . The VPS has a public IP address of , and its WireGuard IP address will be . The laptop and server are listed as peers in its WireGuard configuration: Note that is omitted for each peer. The publicly routable IP addresses of the laptop and the home router are not known to us. Even if they were, they cannot be reached by the VPS. However, they will be known to the VPS when these connect to it. Now, the server at home adds the VPS as its peer, using the VPS public IP address as its : We also make use of to send an empty packet every 25 seconds. This is done to establish the tunnel ahead of time, and to keep it open. This is necessary because otherwise the tunnel may not exist when I’m at the coffee shop trying to access the server at home. Remember, the VPS doesn’t know how to reach the server unless the server is connected to it. Let’s take a careful look at the laptop’s configuration, and what we’re looking to achieve. When the laptop is at home, it connects to the server using an endpoint address that is routable within the home LAN network. This endpoint address is not routable when I’m outside, in which case I want the connection to go through the VPS. To achieve this, the laptop maintains two mutually-exclusive WireGuard interfaces: and . The former is active only while I’m in the home network, and the latter while I’m on the go. Unlike the server, the VPS does not need to be added as a peer to the laptop’s interface because it doesn’t need connect to it while at home. Instead, the VPS is added to the configuration: The section for both and is mostly the same. The laptop should have the same adress and key, regardless of where it is. Only is omitted in because no other device will look to connect to it, in which case we can have WireGuard set a port dynamically. What differs is the peer configuration. In the VPS is set as the only peer. However, the home server’s IP address is added to the VPS’ list of . WireGuard uses this information to route any packets for the VPS or the server through the VPS. Unlike the server’s peer configuration for the VPS, is not needed because the laptop is always the one initiating the tunnel when it reaches out to the server. We can verify that packets are being routed appropriately to the server through the VPS: We solved for outside connectivity using a network topology called hub-and-spoke. The laptop and home server are not connecting point-to-point. The VPS acts as a hub or gateway through which connections among members of the network (i.e the spokes) are routed. If we scope down our network to just the laptop and the home server, we see how this hub is not only a peer of every spoke, but also just its only peer. Yet, how exactly is the packet routed back to the laptop? Mind you, at home the laptop is a peer of the server. When the server responds to the laptop, it will attempt to route the response directly to the laptop’s peer endpoint. This fails because the laptop is not actually reachable via that direct connection when I’m on the go. This makes the laptop a “roaming client”; it connects to the network from different locations, and its may change. This all works because the hub has been configured to do Network Address Translation (NAT); it is replacing the source address of each packet for its own as it is being forwarded. The spokes at end of each hub accept the packets because they appear to originate from its peer. In other words, when the laptop is reaching out to the home server, the server sees traffic coming from the VPS and returns it there. The hub is forwarding packets among the spokes without regards to access control. Thus, its firewall should be configured for least-privilege access. For example, if the laptop is only accessing git repositories in the home server over SSH, then the hub firewall should only allow forwarding from the laptop’s peer connection to the home server’s IP address and SSH port. Let’s reiterate. If I now wish to sync my laptop with my desktop computer from outside the network, I would be adding yet another spoke to this hub. The desktop computer and the VPS configure each other as peers, while the desktop’s IP address is included in the VPS’ list of the laptop’s configuration. Our topology within the home network is still point-to-point. As soon as I return home, my laptop will connect directly to the server when I toggle off and on. But now that we know about hub-and-spoke , it might make sense to consider using it at home as well. According to the connection matrix, the server can assume the role of a hub because all other devices already connect to it. Likewise, the server runs 24/7, so it will always be online to route packets. This topology simplifies the WireGuard configurations for all of the spokes. The desktop computer, phone, laptop, and tablet can now list the server as its only peer in . This is convenient because now only one static address in the LAN network needs to be allocated by the DHCP server – the server’s. Consider the changes to the WireGuard configuration of the desktop computer. All peers are removed except the server, and the IPs of the phone and laptop are added to the server’s of . WireGuard will route packets for these other hosts through the server. We could also use Classless Inter-Domain Routing (CIDR) notation to state that packets for all hosts in the WireGuard network go through the server peer: The server, in turn, keeps listing every device at home as its peer but no longer needs an for each. The peers will initiate the connection to the server. Once again, let’s test sending a packet from the desktop computer to the phone. The packet was hops once through the server ( ), is received by the phone, and is echoed back. The downside to this topology is that the server is now a single point of failure. If the server dies then the spokes won’t be able to connect with each other through WireGuard. There’s also an added cost to having every packet flow through the hub. As for access control, much like we saw in the VPS, the hub now concentrates firewalling responsibilities. It knows which peer is looking to connect to which peer, thus it should establish rules for which packets can be forwarded. This is not mutually exclusive with input firewall rules on each device; those should exist as well. We’ve seen that home hub will route packets between the spokes. Furthermore, because it is peer of the VPS, the server can be used to route connections coming from outside the LAN network. Effectively, these are two hubs that connect to each other so that packets can flow across separate physical networks. If the laptop wants to sync with the desktop while it is outside the LAN network, then the packets make two hops: once through the VPS, and another through the server. If the laptop is within the LAN network, the packets hop only once through the server. Yet, there’s a subtle caveat to this design. The laptop can initiate a sync with the desktop from outside the LAN network and receive the response that it expects. However, the desktop can only initiate a sync with the laptop while the latter is within the LAN network. Why? Similar to our previous example of the laptop communicating with the server, the laptop is configured as a peer of the home hub. When the desktop initiates a sync, the server will attempt to route the packet to the laptop. Per our last change, the laptop doesn’t have a fixed and there is no established tunnel because the laptop is outside the network. Additionally, the home hub is not configured to route packets destined for the laptop through the VPS peer. The packet is thus dropped by the hub. One could look into making the routing dynamic such that the packets are delivered through other means, perhaps through mesh networking. But herein lies a compromise that I’ve made. In this design, a spoke in the home hub cannot initiate connections to a roaming client. It can only receive connections from them, because the roaming client uses NAT through the remote hub. I’m perfectly fine with this compromise as I don’t actually need this bidirectionality, and I don’t want the additional complexity from solving this issue. The remote hub facilitates tunneling into the home hub, not out of. My needs call for allowing my mobile devices (e.g laptop, phone, tablet) to communicate with the non-mobile devices at home (e.g server, desktop), and this has been solved. At this point we’re done insofar the overarching topology of our WireGuard network, but there is an improvement that can be made to make our home hub less brittle. Consider the case where I’m using a router that can run WireGuard. Making the router the hub of our home network poses some benefits over the previous setup. First, the router is already a single point of failure by way of being responsible for the underlying LAN network. Making the router the hub isn’t as costly as it is with some other device in the network. Second, all devices in the network are already connected to the router. This simplifies the overall configuration because it is no longer necessary to configure static IP addresses in the DHCP server. Instead, each spoke can use the network gateway address to reach the hub. Let’s the assume that the gateway for the LAN network is , and the WireGuard IP address for the router is . Each spoke replaces the server peer with the router’s, and uses the gateway address for its . For example, in the desktop computer: The server is demoted to a spoke and is configured like all other spokes. In turn, the router lists all peers like the server previously did: Again, the firewall in the router is now responsible for enforcing access control between spokes. For the sake of illustrating how much further the underlying networks can evolve without interfering with the WireGuard network, consider the final design. I’ve broken apart the LAN network into separate VLANs to isolate network traffic. The server resides in its own VLAN, and client devices in another. The router keeps on forwarding packets in WireGuard network regardless of where these devices are. The only change that is necessary to keep things working is to update address for the router peer in each spoke. The spoke now uses the corresponding VLAN gateway address, rather than that of the LAN network: I’ve been using this setup for some months now and it’s been working without issues. A couple of thoughts come to mind having gone through this exercise and written about it. Running WireGuard on the router simplifies things considerably. If the home network were not behind CGNAT then I could do away with the VPS hub altogether. I would still need separate WireGuard interfaces for when I’m on the go, but that’s not a big deal. Nonetheless, within the LAN network, configuration is simpler by using a hub-and-spoke topology with the router as hub. Centralizing access control on the router’s firewall is also appreciated. WireGuard is simple to deploy and it just works. Nonetheless, some knowledge of networking is required to think through how to deploy WireGuard appropriately for a given context. Being comfortable with configuring interfaces and firewalls is also necessary to troubleshoot the inevitable connectivity issues. One can appreciate why solutions that abstract over WireGuard exist. I used Tailscale extensively before this and did not have think through things as much as I did here. This was all solved for me. I just had to install the agent on each device, authorize it, and suddenly packets moved securely and efficiently across networks. And yet, WireGuard was there all along and I knew that I could unearth the abstraction. Now I appreciate its simplicity even more, and take relish in having a stronger understanding of what I previously took for granted. Lastly, I purposefully omitted other aspects of my WireGuard setup for self-hosting, particularly around DNS. This will be the subject of another article, which is rather similar to the one I wrote on using Tailscale with custom domains . Furthermore, a closer look at access control in this topology might be of interest to others considering that there are multiple firewalls that come into play. Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎ There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. The desktop computer connects to the server to access a calendar, git repositories, etc The tablet connects to the server to download RSS feeds The laptop and desktop connect with each other to sync files Said device is not running WireGuard Said device is using a different or , in which case the peer’s or port in does not match Said device is using a different , in which case the peer’s does not match WireGuard: Next Generation Kernel Network Tunnel Networking for System Administrators Primary WireGuard Topologies How Tailscale works Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎

0 views
ENOSUCHBLOG 3 weeks ago

Dear GitHub: no YAML anchors, please

TL;DR : for a very long time, GitHub Actions lacked support for YAML anchors. This was a good thing . YAML anchors in GitHub Actions are (1) redundant with existing functionality, (2) introduce a complication to the data model that makes CI/CD human and machine comprehension harder, and (3) are not even uniquely useful because GitHub has chosen not to support the one feature (merge keys) that lacks a semantic equivalent in GitHub Actions. For these reasons, YAML anchors are a step backwards that reinforces GitHub Actions’ status as an insecure by default CI/CD platform. GitHub should immediately remove support for YAML anchors, before adoption becomes widespread. GitHub recently announced that YAML anchors are now supported in GitHub Actions. That means that users can write things like this: On face value, this seems like a reasonable feature: the job and step abstractions in GitHub Actions lend themselves to duplication, and YAML anchors are one way to reduce that duplication. Unfortunately, YAML anchors are a terrible tool for this job. Furthermore (as we’ll see) GitHub’s implementation of YAML anchors is incomplete , precluding the actual small subset of use cases where YAML anchors are uniquely useful (but still not a good idea). We’ll see why below. Pictured: the author’s understanding of the GitHub Actions product roadmap. The simplest reason why YAML anchors are a bad idea is because they’re redundant with other more explicit mechanisms for reducing duplication in GitHub Actions. GitHub’s own example above could be rewritten without YAML anchors as: This version is significantly clearer, but has slightly different semantics: all jobs inherit the workflow-level . But this, in my opinion, is a good thing : the need to template environment variables across a subset of jobs suggests an architectural error in the workflow design. In other words: if you find yourself wanting to use YAML anchors to share “global” configuration between jobs or steps, you probably actually want separate workflows, or at least separate jobs with job-level blocks. In summary: YAML anchors further muddy the abstractions of workflows, jobs, and steps, by introducing a cross-cutting form of global state that doesn’t play by the rules of the rest of the system. This, to me, suggests that the current Actions team lacks a strong set of opinions about how GitHub Actions should be used, leading to a “kitchen sink” approach that serves all users equally poorly. As noted above: YAML anchors introduce a new form of non-locality into GitHub Actions. Furthermore, this form of non-locality is fully general : any YAML node can be anchored and referenced. This is a bad idea for humans and machines alike: For humans: a new form of non-locality makes it harder to preserve local understanding of what a workflow, job, or step does: a unit of work may now depend on any other unit of work in the same file, including one hundreds or thousands of lines away. This makes it harder to reason about the behavior of one’s GitHub Actions without context switching. It would only be fair to note that GitHub Actions already has some forms of non-locality: global contexts, scoping rules for blocks, dependencies, step and job outputs, and so on. These can be difficult to debug! But what sets them apart is their lack of generality : each has precise semantics and scoping rules, meaning that a user who understands those rules can comprehend what a unit of work does without referencing the source of an environment variable, output, &c. For machines: non-locality makes it significantly harder to write tools that analyze (or transform) GitHub Actions workflows. The pain here boils down to the fact that YAML anchors diverge from the one-to-one object model 1 that GitHub Actions otherwise maps onto. With anchors, that mapping becomes one-to-many: the same element may appear once in the source, but multiple times in the loaded object representation. In effect, this breaks a critical assumption that many tools make about YAML in GitHub Actions: that an entity in the deserialized object can be mapped back to a single concrete location in the source YAML. This is needed to present reasonable source locations in error messages, but it doesn’t hold if the object model doesn’t represent anchors and references explicitly. Furthermore, this is the reality for every YAML parser in wide use: all widespread YAML parsers choose (reasonably) to copy anchored values into each location where they’re referenced, meaning that the analyzing tool cannot “see” the original element for source location purposes. I feel these pains directly: I maintain zizmor as a static analysis tool for GitHub Actions, and makes both of these assumptions. Moreover, ’s dependencies make these assumptions: (like most other YAML parsers) chooses to deserialize YAML anchors by copying the anchored value into each location where it’s referenced 2 . One of the few things that make YAML anchors uniquely useful is merge keys : a merge key allows a user to compose multiple referenced mappings together into a single mapping. An example from the YAML spec, which I think tidily demonstrates both their use case and how incredibly confusing merge keys are: I personally find this syntax incredibly hard to read, but at least it has a unique use case that could be useful in GitHub Actions: composing multiple sets of environment variables together with clear precedence rules is manifestly useful. Except: GitHub Actions doesn’t support merge keys ! They appear to be using their own internal YAML parser that already had some degree of support for anchors and references, but not for merge keys. To me, this takes the situation from a set of bad technical decisions (and lack of strong opinions around how GitHub Actions should be used) to farce : the one thing that makes YAML anchors uniquely useful in the context of GitHub Actions is the one thing that GitHub Actions doesn’t support. To summarize, I think YAML anchors in GitHub Actions are (1) redundant with existing functionality, (2) introduce a complication to the data model that makes CI/CD human and machine comprehension harder, and (3) are not even uniquely useful because GitHub has chosen not to support the one feature (merge keys) that lacks a semantic equivalent in GitHub Actions. Of these reasons, I think (2) is the most important: GitHub Actions security has been in the news   a great deal recently , with the overwhelming consensus being that it’s too easy to introduce vulnerabilities in (or expose otherwise latent vulnerabilities through ) GitHub Actions workflow. For this reason, we need GitHub Actions to be easy to analyze for humans and machine alike. In effect, this means that GitHub should be decreasing the complexity of GitHub Actions, not increasing it. YAML anchors are a step in the wrong direction for all of the reasons aforementioned. Of course, I’m not without self-interest here: I maintain a static analysis tool for GitHub Actions, and supporting YAML anchors is going to be an absolute royal pain in my ass 3 . But it’s not just me: tools like actionlint , claws , and poutine are all likely to struggle with supporting YAML anchors, as they fundamentally alter each tool’s relationship to GitHub Actions’ assumed data model. As-is, this change blows a massive hole in the larger open source ecosystem’s ability to analyze GitHub Actions for correctness and security. All told: I strongly believe that GitHub should immediately remove support for YAML anchors in GitHub Actions. The “good” news is that they can probably do so with a bare minimum of user disruption, since support has only been public for a few days and adoption is (probably) still primarily at the single-use workflow layer and not the reusable action (or workflow) layer. That object model is essentially the JSON object model, where all elements appear as literal components of their source representation and take a small subset of possible types (string, number, boolean, array, object, null).  ↩ In other words: even though YAML itself is a superset of JSON, users don’t want YAML-isms to leak through to the object model. Everybody wants the JSON object model, and that means no “anchor” or “reference” elements anywhere in a deserialized structure.  ↩ To the point where I’m not clear it’s actually worth supporting anchors to any meaningful extent, and instead immediately flagging them as an attempt at obfuscation.  ↩ For humans: a new form of non-locality makes it harder to preserve local understanding of what a workflow, job, or step does: a unit of work may now depend on any other unit of work in the same file, including one hundreds or thousands of lines away. This makes it harder to reason about the behavior of one’s GitHub Actions without context switching. It would only be fair to note that GitHub Actions already has some forms of non-locality: global contexts, scoping rules for blocks, dependencies, step and job outputs, and so on. These can be difficult to debug! But what sets them apart is their lack of generality : each has precise semantics and scoping rules, meaning that a user who understands those rules can comprehend what a unit of work does without referencing the source of an environment variable, output, &c. For machines: non-locality makes it significantly harder to write tools that analyze (or transform) GitHub Actions workflows. The pain here boils down to the fact that YAML anchors diverge from the one-to-one object model 1 that GitHub Actions otherwise maps onto. With anchors, that mapping becomes one-to-many: the same element may appear once in the source, but multiple times in the loaded object representation. In effect, this breaks a critical assumption that many tools make about YAML in GitHub Actions: that an entity in the deserialized object can be mapped back to a single concrete location in the source YAML. This is needed to present reasonable source locations in error messages, but it doesn’t hold if the object model doesn’t represent anchors and references explicitly. Furthermore, this is the reality for every YAML parser in wide use: all widespread YAML parsers choose (reasonably) to copy anchored values into each location where they’re referenced, meaning that the analyzing tool cannot “see” the original element for source location purposes. I feel these pains directly: I maintain zizmor as a static analysis tool for GitHub Actions, and makes both of these assumptions. Moreover, ’s dependencies make these assumptions: (like most other YAML parsers) chooses to deserialize YAML anchors by copying the anchored value into each location where it’s referenced 2 . That object model is essentially the JSON object model, where all elements appear as literal components of their source representation and take a small subset of possible types (string, number, boolean, array, object, null).  ↩ In other words: even though YAML itself is a superset of JSON, users don’t want YAML-isms to leak through to the object model. Everybody wants the JSON object model, and that means no “anchor” or “reference” elements anywhere in a deserialized structure.  ↩ To the point where I’m not clear it’s actually worth supporting anchors to any meaningful extent, and instead immediately flagging them as an attempt at obfuscation.  ↩

0 views
Jeff Geerling 3 weeks ago

You can finally manage Macs with FileVault remotely in Tahoe

It's nice to have a workstation set up that you can access anywhere via Screen Sharing to check on progress, jot down a note, etc.—and not be tied to a web app or some cloud service. I run a Wireguard VPN at home and at my studio, so I can remotely log into any of my infrastructure through the VPN—Macs included.

0 views
Marc Brooker 3 weeks ago

Seven Years of Firecracker

Time flies like an arrow. Fruit flies like a banana. Back at re:Invent 2018, we shared Firecracker with the world. Firecracker is open source software that makes it easy to create and manage small virtual machines. At the time, we talked about Firecracker as one of the key technologies behind AWS Lambda, including how it’d allowed us to make Lambda faster, more efficient, and more secure. A couple years later, we published Firecracker: Lightweight Virtualization for Serverless Applications (at NSDI’20). Here’s me talking through the paper back then: The paper went into more detail into how we’re using Firecracker in Lambda, how we think about the economics of multitenancy ( more about that here ), and how we chose virtualization over kernel-level isolation (containers) or language-level isolation for Lambda. Despite these challenges, virtualization provides many compelling benefits. From an isolation perspective, the most compelling benefit is that it moves the security-critical interface from the OS boundary to a boundary supported in hardware and comparatively simpler software. It removes the need to trade off between kernel features and security: the guest kernel can supply its full feature set with no change to the threat model. VMMs are much smaller than general-purpose OS kernels, exposing a small number of well-understood abstractions without compromising on software compatibility or requiring software to be modified. Firecracker has really taken off, in all three ways we hoped it would. First, we use it in many more places inside AWS, backing the infrastructure we offer to customers across multiple services. Second, folks use the open source version directly, building their own cool products and businesses on it. Third, it was the motivation for a wave of innovation in the VM space. In this post, I wanted to write a bit about two of the ways we’re using Firecracker at AWS that weren’t covered in the paper. Bedrock AgentCore Back in July, we announced the preview of Amazon Bedrock AgentCore . AgentCore is built to run AI agents. If you’re not steeped in the world of AI right now, you might be confused by the many definitions of the word agent . I like Simon Willison’s take : An LLM agent runs tools in a loop to achieve a goal. 1 Most production agents today are programs, mostly Python, which use a framework that makes it easy to interact with tools and the underlying AI model. My favorite one of those frameworks is Strands , which does a great job of combining traditional imperative code with prompt-driven model-based interactions. I build a lot of little agents with Strands, most being less than 30 lines of Python (check out the strands samples for some ideas ). So where does Firecracker come in? AgentCore Runtime is the compute component of AgentCore. It’s the place in the cloud that the agent code you’ve written runs. When we looked at the agent isolation problem, we realized that Lambda’s per-function model isn’t rich enough for agents. Specifically, because agents do lots of different kinds of work on behalf of different customers. So we built AgentCore runtime with session isolation . Each session with the agent is given its own MicroVM, and that MicroVM is terminated when the session is over. Over the course of a session (up to 8 hours), there can be multiple interactions with the user, and many tool and LLM calls. But, when it’s over the MicroVM is destroyed and all the session context is securely forgotten. This makes interactions between agent sessions explicit (e.g. via AgentCore Memory or stateful tools), with no interactions at the code level, making it easier to reason about security. Firecracker is great here, because agent sessions vary from milliseconds (single-turn, single-shot, agent interactions with small models), to hours (multi-turn interactions, with thousands of tool calls and LLM interactions). Context varies from zero to gigabytes. The flexibility of Firecracker, including the ability to grow and shrink the CPU and memory use of VMs in place, was a key part of being able to build this economically. Aurora DSQL We announced Aurora DSQL, our serverless relational database with PostgreSQL compatibility, in December 2014. I’ve written about DSQL’s architecture before , but here wanted to highlight the role of Firecracker. Each active SQL transaction in DSQL runs inside its own Query Processor (QPs), including its own copy of PostgreSQL. These QPs are used multiple times (for the same DSQL database), but only handle one transaction at a time. I’ve written before about why this is interesting from a database perspective. Instead of repeating that, lets dive down to the page level and take a look from the virtualization level. Let’s say I’m creating a new DSQL QP in a new Firecracker for a new connection in an incoming database. One way I could do that is to start Firecracker, boot Linux, start PostgreSQL, start the management and observability agents, load all the metadata, and get going. That’s not going to take too long. A couple hundred milliseconds, probably. But we can do much better. With clones . Firecracker supports snapshot and restore , where it writes down all the VM memory, registers, and device state into a file, and then can create a new VM from that file. Cloning is the simple idea that once you have a snapshot you can restore it as many time as you like. So we boot up, start the database, do some customization, and then take a snapshot. When we need a new QP for a given database, we restore the snapshot. That’s orders of magnitude faster. This significantly reduces creation time, saving the CPU used for all that booting and starting. Awesome. But it does something else too: it allows the cloned microVMs to share unchanged ( clean ) memory pages with each other, significantly reducing memory demand (with fine-grained control over what is shared). This is a big saving, because a lot of the memory used by Linux, PostgreSQL, and the other processes on the box aren’t modified again after start-up. VMs get their own copies of pages they write to (we’re not talking about sharing writable memory here), ensuring that memory is still strongly isolated between each MicroVM. Another knock-on effect is the shared pages can also appear only once in some levels of the CPU cache hierarchy, further improving performance. There’s a bit more plumbing that’s needed to make some things like random numbers work correctly in the cloned VMs 2 . Last year, I wrote about our paper Resource management in Aurora Serverless . To understand these systems more deeply, let’s compare their approaches to one common challenge: Linux’s approach to memory management. At a high level, in stock Linux’s mind, an empty memory page is a wasted memory page. So it takes basically every opportunity it can to fill all the available physical memory up with caches, buffers, page caches, and whatever else it may think it’ll want later. This is a great general idea. But in DSQL and Aurora Serverless, where the marginal cost of a guest VM holding onto a page is non-zero, it’s the wrong one for the overall system. As we say in the Aurora serverless paper , Aurora Serverless fixes this with careful tracking of page access frequency: − Cold page identification: A kernel process called DARC [8] continuously monitors pages and identifies cold pages. It marks cold file-based pages as free and swaps out cold anonymous pages. This works well, but is heavier than what we needed for DSQL. In DSQL, we take a much simpler approach: we terminate VMs after a fixed period of time. This naturally cleans up all that built-up cruft without the need for extra accounting. DSQL can do this because connection handling, caching, and concurrency control are handled outside the QP VM. In a lot of ways this is similar to the approach we took with MVCC garbage collection in DSQL. Instead of PostgreSQL’s , which needs to carefully keep track of references to old versions from the set of running transactions, we instead bound the set of running transactions with a simple rule (no transaction can run longer than 5 minutes). This allows DSQL to simply discard versions older than that deadline, safe in the knowledge that they are no longer referenced. Simplicity, as always, is a system property.

0 views
codedge 4 weeks ago

Modern messaging: Running your own XMPP server

Since a years we know, or might suspect, our chats are listend on, our uploaded files are sold for advertising or what purpose ever and the chance our social messengers leak our private data is incredibly high. It is about time to work against this. Since 3 years the European Commission works on a plan to automatically monitor all chat, email and messenger conversations. 1 2 If this is going to pass, and I strongly hope it will not, the European Union is moving into a direction we know from states suppressing freedom of speech. I went for setting up my own XMPP server, as this does not have any big resource requirements and still support clustering (for high-availabilty purposes), encryption via OMEMO, file sharing and has support for platforms and operating systems. Also the ecosystem with clients and multiple use cases evolved over the years to provide rock-solid software and solutions for multi-user chats or event audio and video calls. All steps and settings are bundled in a repository containing Ansible roles: https://codeberg.org/codedge/chat All code snippets written below work in either Debian os Raspberry Pi OS. The connection from your client to the XMPP server is encrypted and we need certificates for our server. First thing to do is setting up our domains and point it to the IP - both IPv4 and IPv6 is supported and we can specify both later in our configuration. I assume the server is going to be run under and you all the following domains have been set up. Fill in the IPv6 addresses accordingly. ejabberd is a robust server software, that is included in most Linux distributions. Install from Process One repository I discovered ProcessOne, the company behind ejabberd , also provides a Debian repository . Install from Github To get the most recent one, I use the packages offered in their code repository . Installing version 25.07 just download the asset from the release: Make sure the fowolling ports are opened in your firewall, taken from ejabberd firewall settings . Port , used for MQTT, is also mentioned in the ejabberd docs, but we do not use this in our setup. So this port stays closed. Depending how you installed ejabberd the config file is either at or . The configuration is a balance of 70:30 between having a privacy-focused setup for your users and meeting most of the suggestions of the XMPP complicance test . That means, settings that protect the provacy of the users are higher rated despite not passing the test. Therefore notable privacy and security settings are: The configuration file is in YAML format. Keep an eye for indentation. Let’s start digging into the configuration. Set the domain of your server Set the database type Instead of using the default type, we opt for , better said . Generate DH params Generate a fresh set of params for the DH key exchange. In your terminal run and link the new file in the ejabberd configuration. Ensure TLS for server-to-server connections Use TLS for server-to-server (s2s) connections. The listners The listeners aka inside the config especially for , , and are important. All of them listen on port . Only one request handler is attached to port , the . For adminstration of ejabberd we need a user with admin rights and properly set up ACLs and access rules. There is a separat section for ACLs inside the config in which we set up an admin user name . The name of the user is important for later, when we actually create this user. The should already be set up, just to confirm that you have a correct entry for the action. Now the new user needs to be create by running this command on the console. Watch out to put in the correct domain. Another user can be registered with the same command. We set as the admin user in the config previously. That is how ejabberd knows which user has admin permissions. Enabling file uploads is done with . First, create a folder where the uploads should be stored. Now update the ejabberd configuration like this: The allowed file upload size is defined in the param and is set to 10MB. Make sure, to delete uploaded files in a reasonable amount of time via cronjob. This is an example of a cronjob, that deletes files that are older than 1 week. Registration in ejabberd is done via and can be enabled with these entries in the config file: If you want to enable registration for your server make sure you enable a captcha for it. Otherwise you will get a lot of spam and fake registrations. ejabberd provides a working captcha script , that you can copy to your server and link in your configuration. You will need and installed on you system. In the config file ejabberd can provision TLS certificates on its own. No need to install certbot . To not expose ejabberd directly to the internet, is put in front of the XMPP server. Instead of using nginx , every other web server (caddy, …) or proxy can be used as well. Here is a sample config for nginx : The nginx vhosts offers files, and , for indicating which other connection methods (BOSH, WS) your server offers. The details can be read in XEP-0156 extension. Opposite to the examples in the XEP, there is no BOSH, but only a websocket connection our server offers. The BOSH part is removed from the config file. host-meta.json Put that file in a folder your nginx serves. Have a look at the path and URL it is expected to be, see . Clients I can recommend are Profanity , an easy to use command-line client, and Monal for MacOS and iOS. A good overview of client can be found on the offical XMPP website . Citizen-led initiative collecting information about Chat Controle https://fightchatcontrol.eu   ↩︎ Explanation by Patrick Breyer, former member of the European Parliament https://www.patrick-breyer.de/en/posts/chat-control/   ↩︎ 5222 : Jabber/XMPP client connections, plain or STARTTLS 5223 : Jabber client connections, using the old SSL method 5269 : Jabber/XMPP incoming server connections 5280/5443 : HTTP/HTTPS for Web Admin and many more 7777 : SOCKS5 file transfer proxy 3478/5349 : STUN+TURN/STUNS+TURNS service XMPP over HTTP is disabled ( mod_bosh ) Discover then a user last accessed a server is disabled ( mod_last ) Delete uploaded files on a regular base (see upload config ) Register account via a web page is disabled ( mod_register_web ) In-band registration can be enabled, default off, captcha secured ( mod_register , see registration config ) Citizen-led initiative collecting information about Chat Controle https://fightchatcontrol.eu   ↩︎ Explanation by Patrick Breyer, former member of the European Parliament https://www.patrick-breyer.de/en/posts/chat-control/   ↩︎

0 views

Enabling Silent Telemetry Data Transmission with InvisiFlow

Thanks for reading Dangling Pointers! Subscribe for free to receive new posts and support my work.

0 views
マリウス 1 months ago

Njalla Has Silently Changed: A Word of Caution

I’ve been using Njalla as my primary domain service for the past few years, and I’ve had nothing but good things to say about them. Their website is simple yet functional, their support is quick and efficient, and the company offers its services in a way that should be the global standard when it comes to data and privacy protection. Njalla made sense for me on many different levels: They’re a domain provider headquartered on a former pirate island nation in the Caribbean, which is home to countless offshore trust funds, that registers your domain in their name, so none of your personal information appears in the mandatory ICANN registration data. All of this is offered without any KYC requirements, and with the option to pay using Monero. And if that’s not enough, Njalla sends every email encrypted with your GPG public key and can even forego email entirely in favor of XMPP notifications with OMEMO encryption. Yes, Njalla also provides an API (with access tokens that can be configured with granular permissions) which works seamlessly with tools like Certbot , Lego , and others to request Let’s Encrypt certificates via DNS validation. Heck, there are even Terraform providers that support it. And if that still weren’t enough reasons to like Njalla , the Njalla blog offered unrivaled transparency and entertainment for everyone, giving people the chance to see with their own eyes how Njalla was fighting for the little guys . On top of that, I’ve always sympathized with brokep , Njalla ’s founder, and his work and many of his views. If you’re unfamiliar with him or his history, I recommend the (relatively new) series The Pirate Bay , which premiered on Sveriges Television at the end of 2024. Over the past few years, I’ve been quite vocal in my praise for Njalla . In fact, if you’re a regular reader of this site or have come across me on other platforms, you’ve probably seen me plug Njalla the same way Jensen Huang plugs AI . However, a recent interaction in the VT100 community channel prompted me to do what I periodically do with every service I use: Check what’s new. This time, it was Njalla ’s turn. While browsing through various pages on their website, I came across the About page and was surprised to find the following statement: Njalla is run by njalla.srl based in Costa Rica. Curious, I checked their terms of service and confirmed that Njalla does indeed appear to have relocated its operations from Nevis to Costa Rica. I searched through my email history to see if there had been any announcement about this change, but found nothing. Wanting to know when this happened, I checked Njalla ’s Mastodon and Bluesky profiles, but again found no mention of the move. I even went as far as looking at brokep ’s social profiles , only to find that they were either deleted or inactive. At that point, I started to get a bad feeling. Had Njalla been sold to someone else? Before jumping to conclusions, I decided to contact Njalla support to clarify the situation. Subject: 1337 Services LLC -> Njalla SRL? I just stumbled upon the fact that Njalla seemingly changed hands without any notice, and I would like to understand what happened to 1337 Services LLC on Nevis and who the new owner Njalla SRL is. I would appreciate further insights into this topic. Kind regards! The support replied promptly: Internal restructuring. Nothing to worry about. However, while it was a response , it wasn’t particularly satisfying, so I decided to be the PITA that I am somewhat known to be and ask again: Thank you for your reply and your re-assurance. I’d like to apoligize in advance for being a PITA, but with brokep seemingly having disabled his social media profiles (Bsky, Mastodon, X), discovering this change felt “off”. Also, as someone who has a relatively decent understanding of offshore jurisdictions and their governing laws, I am wondering about the motivation for this move. Costa Rica’s offshore landscape appears to have changed over the past years and their SRL/SA seemingly requires company books to reflect share ownership, which in turn list the owner’s name and ID numbers within the Central Bank of Costa Rica via the Registry of Transparency and Final Beneficiaries (RTBF). While UBO info is not publicly available unless explicitly listed as initial shareholders in the national registry, the information is still accessible and shareable by government entities and could make it easier for foreign entities to pressure the owners (and hence Njalla) into doing things it would otherwise not do. In addition, foreign court orders appear to be somewhat easier enforceable in Costa Rica’s jurisdiction, as opposed to Nevis, where foreign entities would in theory require dealing with local courts to obtain a local court order. While the trend towards transparency, information sharing and absurd KYC hasn’t passed Nevis/St. Kitts either (especially in terms of banking infrastructure) it appears that jurisdictions like Nevis or the Seychelles still seem to be “better” choices for operating a service like Njalla. I have been very vocal to recommend Njalla on different platforms, and I would like to update my recommendation based on this new reality. I would hence be curious to understand the rational behind the change, if you wouldn’t mind sharing a few insights. If preferrable you’re welcome to reach out via email to xxx (pubkey attached) or via XMPP/OMEMO to xxx. Thank you kindly and best regards! While I wasn’t expecting Njalla to offer in-depth strategic reasoning for this move, I nevertheless hoped for them to provide solid arguments as for why they believe Costa Rica is a good option for their operations and maybe even an explanation on why they haven’t notified their customers of the change. However, the reply that I got back was disappointing, to put it mildly: We do understand your concerns, but the reasoning or insights is not something we share. If you feel you can’t recommend our services any more, then of course you shouldn’t. That is totally up to you. Kind regards, Njalla It’s clear that Njalla is operating on a take it or leave it basis here. While they are entirely within their rights to do so, one could argue that using Njalla inherently requires a significant degree of trust. After all, they are the ones who legally own your domain . Given that, I think it’s fair to say that customers deserve at least some level of transparency in return. At least enough to feel confident that Njalla isn’t working against their interests. Note: There are many reasons why moving the company might make sense. While we can draw up as many conspiracy theories as we’d like, the most banal explanation might have to do with brokep being a Swedish citizen and supposedly still residing in the EU. Doing business within a place like Nevis, which the EU considers a tax haven and which is being grey-listed as non-cooperative tax jurisdiction every once in a while, can be a bit of a PITA . Not only is it unlikely for brokep to benefit from low-/no-tax advantages that the jurisdiction offers, it is on the contrary quite possible that EU CFC rules are intentionally hurting him, especially with a digital (low substance) business like Njalla , especially when dealing with cryptocurrency, especially with Monero, to discourage the average Joe Peter from doing business in jurisdictions that have historically been reserved for the bloc’s politicians and other elites. After all, the democractization of tax havens is certainly not something world leaders are in favor of. Costa Rica has managed to escape the bloc’s Annex I (in 2023) and Annex II (in 2024, approved 2025) and is not considered a low-tax country or tax haven. With CR joining the OECD’s CRS, it has certainly become easier for EU residents to do business in the Latin American country, despite its territorial tax regime. As boring as this sounds, but it might just be that brokep got sick of dealing with the EU charade around offshore tax havens – btw, hey, EU, how are things in Luxembourg, Cyprus and Monaco going? – and chose a more viable solution. I dug up Njalla ’s Terms of Service on the Internet Archive and found that the change seems to have occurred sometime between October 2, 2024, and December 16, 2024. Whether or not it’s legally permissible for a company to change something as fundamental as its jurisdiction or corporate registration without informing existing customers, I found the lack of communication troubling. What concerned me even more was that, after I pointed out the change, Njalla didn’t seem interested in offering any further explanation. On the contrary, their responses came across as an attempt to quietly brush the matter aside and move on. While the service continues to function as it always has, and I haven’t encountered any issues, I’m honestly uncertain about how to interpret the situation. As I’ve mentioned before, I deeply admire the work that brokep is doing, and I’m a strong supporter of Njalla ’s mission. I’ve been recommending their service for years, and I likely will continue to do so, although with reservations. That said, this situation has somewhat tarnished my perception of Njalla . Not only has their blog become less insightful over the years, but it also appears that they are actively concealing information from those who trust them: Their customers. With Njalla ’s lack of transparency and unsatisfactory responses, I’m uncertain about what to make of the situation. I’d assume that if you have a normal domain with Njalla , there’s probably little to worry about, provided the company hasn’t been sold to a new owner. The service seems to be operating as usual, and I haven’t heard of any malicious intent regarding domain ownership. That said, if you’re considering registering a domain to poke fun at a logistics provider or other international entities that might take issue with it, I wouldn’t be so confident that Njalla will still have Batman handling the situation. As long as you don’t provide your PII and use untraceable payment methods, however, the worst-case scenario is that Njalla shuts down your domain and won’t return it to you. I continue to hold several domains with Njalla . While I could migrate to another provider, I’m willing to wait, observe, and give Njalla the benefit of the doubt for now. That said, I will certainly be more cautious moving forward and think twice before registering any new domains with them. Frankly, there aren’t many trustworthy and reliable alternatives, especially ones backed by prominent figures with (for the most part) agreeable values. If you’re seeking services based in offshore jurisdictions, there’s a non-exhaustive list in the domains section of the infrastructure page. It’s important to note that when you allow someone else to register a domain on your behalf, you’re effectively entrusting them with ownership of the domain , meaning they could ultimately do whatever they wish with it. Therefore, trustworthiness is a critical factor when evaluating these services. Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program .

1 views
Karboosx 1 months ago

Continuous Delivery - The easy way

Skip the complex setups. Here's how to build a simple CD pipeline for your website using nothing but a GitHub webhook and a bash script.

0 views
ENOSUCHBLOG 1 months ago

One year of zizmor

This is a dual purpose post: I’ve released zizmor v1.13.0 , and zizmor has turned one year old 1 ! This release isn’t the biggest one ever, but it does include some nice changes (many of which were contributed or suggested by zizmor’s increasingly large community). Highlights include: A new (pedantic-level) undocumented-permissions audit rule, which flags explicit permission grants (like ) that lack an explanatory comment. Many thanks to @johnbillion for contributing this! (Long requested) support for disabling individual audit rules entirely, via in the configuration. Support for auto-fixing many of the obfuscation audit’s findings, including a brand new constant expression evaluator for github-actions-expressions that can unroll many common cases of obfuscated GitHub Actions expressions. Many thanks to @mostafa for contributing this, along with so much else of zizmor’s auto-fix functionality over the last few releases! Support for “grouped” configurations, which resolve a long-standing architectural limitation in how zizmor loads and applies a user’s configuration. The TL;DR of this change is that versions of zizmor prior to 1.13.0 could only ever load a single configuration file per invocation, which meant that invocations like wouldn’t honor any configuration files in or . This has been changed so that each input group (i.e. each input argument to ) loads its own isolated configuration. In practice, this should have no effect on typical users, since the average user likely only runs or . But it’s something to be aware of (and will likely benefit you) if you’re a bulk user! These changes build on a lot of other recent (v1.10.0 and later) improvements, which could easily be the subject of a much longer post. But I’d like to focus the rest of this post on some numbers for and reflections on the past year of zizmor’s growth. As of today (September 12, 2025), zizmor has: 26 unique audit rules , encompassing a wide range of reactive and proactive security checks for GitHub Actions workflows and action definitions. This is up from 10 rules in the first tagged release (v0.1.0), which predates the changelog. It also undercounts the growth of zizmor’s complexity and feature set, since many of the rules have been significantly enhanced over time. Just over 3000 stars on GitHub, growing from close to zero at the start of the year. Most of that growth was in the shape of a “hockey stick” during the first few months, but growth has been steady overall: About 3.2 million downloads from PyPI alone , just barely squeaking us into the top 10,000 most-downloaded packages on PyPI 2 . Potentially more interesting is the growth in PyPI downloads: around 1 in 5 of those downloads (633K precisely) were in the last month alone . Notably, PyPI is just one of several distribution channels for zizmor: the official Docker image has another ~140K downloads, the Homebrew formula has another ~50K, conda-forge has another ~90K, and so forth. And these are only the ones I can easily track! So, so many cool downstream users: cURL, CPython, PyPI itself, Rust, Rustls, Sigstore, Anubis, and many others are all running zizmor in their CI/CD to proactively catch potential security issues. In addition to these projects (who humble me with their use), I’ve also been thrilled to see entire communities and companies adopt and/or recommend zizmor: pyOpenSci , Grafana Labs , and Wiz have all publicly recommended zizmor based on their own experience or expertise. Roughly 50 contributors , excluding myself and bots, along with several dozen “regulars” who file bug reports and feature requests. This number is easily the most satisfying of the above: it’s a far cry from the project’s start, when I wasn’t sure if anyone would ever use it, let alone contribute to it. Some thoughts and observations from the past year. People like to use zizmor, even though it’s a security tool! Crazy! Most people, including myself , hate security tools : they are, as a class, frustrating to install and configure, obtuse to run, and are arrogantly incorrect in their assumption of user tolerance for signal over noise. My own negative experiences with security tooling made me hesitant to build zizmor in the open: I was worried that (1) no one would use it, or (2) lots of people would use it in anger , and hate it (and me) for it. To my surprise, this didn’t happen! The overwhelming response to zizmor has been positive: I’ve had a lot of people thank me for building it, and specifically for making it pleasant to use. I’ve tried to reflect on what exactly I did to succeed in not eliciting a “hate” reaction to zizmor, and the following things come to mind: It’s very easy to install and run: distributing it via PyPI (even though it’s a pure Rust binary) means that it’s a single or away for most users. It’s very fast by default: offline runs take tens of milliseconds for representative inputs; online runs are slower, but still fast enough to outpace typical CI/CD setups. In other words: people very rarely wait for zizmor to complete, or in the worst case wait no longer than they’re already waiting for the rest of their CI/CD. It strikes the right balance with its persona design: the default (“ regular ”) persona prioritizes signal over noise, while letting users opt into more noisy personae (like pedantic and auditor ) as they please. One outcome from this choice which has been (pleasantly) surprising is seeing users opt into a more sensitive persona than the default, including even enforcing zizmor at the pedantic or auditor level in their CI/CD. I didn’t expect this (I expected most users to ignore non-default personae), and it suggests that zizmor’s other usability-first design decisions afford us a “pain budget” that users are willing to spend on more sensitive checks. On a very basic level, I don’t really want zizmor to exist: what I really want is a CI/CD system that’s secure by construction , and that doesn’t require static analysis to be bolted onto it for a degree of safety. By analogy: I want the Rust of CI/CD, not C or C++ with a layer of ASan slapped on top. GitHub Actions could have been that system (and arguably still could be), but there appears to be relatively little internal political 3 appetite within GitHub for making that happen 4 . In this framing, I’ve come to see zizmor as an external forcing function on GitHub: a low-friction, low-complexity tool that reveals GitHub Actions’ security flaws is also a tool that gives the hardworking engineers inside of GitHub the kind of political ammunition they need to convince their leadership to prioritize the product itself. I’m not conceited enough to think that zizmor (or I) alone am the only such external forcing function: there are many others, including entire companies seeking to capitalize on GitHub Actions’ flaws. However, I do think the last year of zizmor’s development has seen GitHub place more public emphasis on security improvements to GitHub Actions 5 , and I would like to think that zizmor has played at least a small part in that. Six months ago, I would have probably said that zizmor is mostly done : I was happy with its core design and set of audits, and I was having trouble personally imagining what else would be reasonable to add to it. But then, stuff kept happening! zizmor’s user base has continued to identify new things that zizmor should be doing, and have continued to make contributions towards those things. They’ve also identified new ways in which zizmor should operate: the auto-fix mode (v1.10.0) and LSP support (v1.11.0) are just two examples of this. This has made me less certain about what “done” will look like for zizmor: it’s clear to me that a lot of other people have (very good!) ideas about what zizmor can and should do to make the GitHub Actions ecosystem safer, and I’m looking forward to helping to realize those ideas. Some specific things I see on the horizon: More interprocedural analysis: zizmor is largely “intraprocedural” in the moment, in the sense the it analyzes individual inputs (workflows or action definitions) in isolation. This approach makes zizmor simple, but it also leaves us with an partial picture of e.g. a repository’s overall CI/CD posture. As a simple example: the unpinned-uses audit will correctly flag any unpinned action usages, but it won’t detect transitively unpinned usages that cross input boundaries. Better support for large-scale users: it’s increasingly clear to me that a significant portion (and increasing) portion of zizmor’s userbase is security teams within bigger open source projects (and companies), who want to use zizmor as part of “estate management” for dozens, hundreds, or even thousands of individual repositories. zizmor itself doesn’t struggle to operate in these settings, but large scales make it harder to triage and incrementally address zizmor’s findings. I’m not sure exactly what an improved UX for bulk triage will look like, but some kind of GitHub App integration seems worth exploring 6 . A bigger architectural reevaluation: zizmor’s current architecture is naïve in the sense that individual audits don’t share or re-use computation between each other. For example, two audits that both evaluate GitHub Actions expressions will independently re-parse those expressions rather than caching that reusing that work. This is not a performance issue at zizmor’s current audit count, but will likely eventually become one. When this happens, I’ll likely need to think about a larger architectural change that allows audits to either share computed analysis state or push more analysis state into zizmor’s input collection phase. Another (unrelated) architectural change that’ll likely eventually need to happen involves , which zizmor currently uses extensively: its deprecated status is a long-term maintenance risk. My hope is that alternatives (like saphyr , which I’ve been following) will become sufficiently mature and feature-rich in the medium-long term to enable fully replacing (and potentially even some of our use of in e.g. and .). It depends on how you count: zizmor’s first commit was roughly 13 months ago, while its first tagged release (v0.1.0) was roughly 11 months ago. So I’m splitting the two and saying it’s been one year.  ↩ Number #9,047 as of time of writing.  ↩ Read: product leadership. I have a great deal of faith in and respect for GitHub’s engineers, who seem to be uniformly upset about the slow-but-accelerating degradation of GitHub’s product quality.  ↩ A more cynical framing would be that GitHub has entered the “value strip-mining” phase of its product lifecycle, where offerings like GitHub Actions are kept alive just enough to meet contractual obligations and serve as a staging bed for the next round of “innovation” (read: AI in more places). Fixing longstanding flaws in GitHub Actions’ design and security model would not benefit this phase, so it is not prioritized.  ↩ Off the top of my head: immutable actions/releases , fixing action policies , acknowledging how dangerous is, &c.  ↩ This avenue of exploration is not without risks: my experience has been that many security tools that choose to go the “app” integration route end up as Yet Another Infernal Dashboard that security teams dread actually using.  ↩ A new (pedantic-level) undocumented-permissions audit rule, which flags explicit permission grants (like ) that lack an explanatory comment. Many thanks to @johnbillion for contributing this! (Long requested) support for disabling individual audit rules entirely, via in the configuration. Support for auto-fixing many of the obfuscation audit’s findings, including a brand new constant expression evaluator for github-actions-expressions that can unroll many common cases of obfuscated GitHub Actions expressions. Many thanks to @mostafa for contributing this, along with so much else of zizmor’s auto-fix functionality over the last few releases! Support for “grouped” configurations, which resolve a long-standing architectural limitation in how zizmor loads and applies a user’s configuration. The TL;DR of this change is that versions of zizmor prior to 1.13.0 could only ever load a single configuration file per invocation, which meant that invocations like wouldn’t honor any configuration files in or . This has been changed so that each input group (i.e. each input argument to ) loads its own isolated configuration. In practice, this should have no effect on typical users, since the average user likely only runs or . But it’s something to be aware of (and will likely benefit you) if you’re a bulk user! 26 unique audit rules , encompassing a wide range of reactive and proactive security checks for GitHub Actions workflows and action definitions. This is up from 10 rules in the first tagged release (v0.1.0), which predates the changelog. It also undercounts the growth of zizmor’s complexity and feature set, since many of the rules have been significantly enhanced over time. Just over 3000 stars on GitHub, growing from close to zero at the start of the year. Most of that growth was in the shape of a “hockey stick” during the first few months, but growth has been steady overall: About 3.2 million downloads from PyPI alone , just barely squeaking us into the top 10,000 most-downloaded packages on PyPI 2 . Potentially more interesting is the growth in PyPI downloads: around 1 in 5 of those downloads (633K precisely) were in the last month alone . Notably, PyPI is just one of several distribution channels for zizmor: the official Docker image has another ~140K downloads, the Homebrew formula has another ~50K, conda-forge has another ~90K, and so forth. And these are only the ones I can easily track! So, so many cool downstream users: cURL, CPython, PyPI itself, Rust, Rustls, Sigstore, Anubis, and many others are all running zizmor in their CI/CD to proactively catch potential security issues. In addition to these projects (who humble me with their use), I’ve also been thrilled to see entire communities and companies adopt and/or recommend zizmor: pyOpenSci , Grafana Labs , and Wiz have all publicly recommended zizmor based on their own experience or expertise. Roughly 50 contributors , excluding myself and bots, along with several dozen “regulars” who file bug reports and feature requests. This number is easily the most satisfying of the above: it’s a far cry from the project’s start, when I wasn’t sure if anyone would ever use it, let alone contribute to it. It’s very easy to install and run: distributing it via PyPI (even though it’s a pure Rust binary) means that it’s a single or away for most users. It’s very fast by default: offline runs take tens of milliseconds for representative inputs; online runs are slower, but still fast enough to outpace typical CI/CD setups. In other words: people very rarely wait for zizmor to complete, or in the worst case wait no longer than they’re already waiting for the rest of their CI/CD. It strikes the right balance with its persona design: the default (“ regular ”) persona prioritizes signal over noise, while letting users opt into more noisy personae (like pedantic and auditor ) as they please. One outcome from this choice which has been (pleasantly) surprising is seeing users opt into a more sensitive persona than the default, including even enforcing zizmor at the pedantic or auditor level in their CI/CD. I didn’t expect this (I expected most users to ignore non-default personae), and it suggests that zizmor’s other usability-first design decisions afford us a “pain budget” that users are willing to spend on more sensitive checks. More interprocedural analysis: zizmor is largely “intraprocedural” in the moment, in the sense the it analyzes individual inputs (workflows or action definitions) in isolation. This approach makes zizmor simple, but it also leaves us with an partial picture of e.g. a repository’s overall CI/CD posture. As a simple example: the unpinned-uses audit will correctly flag any unpinned action usages, but it won’t detect transitively unpinned usages that cross input boundaries. Better support for large-scale users: it’s increasingly clear to me that a significant portion (and increasing) portion of zizmor’s userbase is security teams within bigger open source projects (and companies), who want to use zizmor as part of “estate management” for dozens, hundreds, or even thousands of individual repositories. zizmor itself doesn’t struggle to operate in these settings, but large scales make it harder to triage and incrementally address zizmor’s findings. I’m not sure exactly what an improved UX for bulk triage will look like, but some kind of GitHub App integration seems worth exploring 6 . A bigger architectural reevaluation: zizmor’s current architecture is naïve in the sense that individual audits don’t share or re-use computation between each other. For example, two audits that both evaluate GitHub Actions expressions will independently re-parse those expressions rather than caching that reusing that work. This is not a performance issue at zizmor’s current audit count, but will likely eventually become one. When this happens, I’ll likely need to think about a larger architectural change that allows audits to either share computed analysis state or push more analysis state into zizmor’s input collection phase. Another (unrelated) architectural change that’ll likely eventually need to happen involves , which zizmor currently uses extensively: its deprecated status is a long-term maintenance risk. My hope is that alternatives (like saphyr , which I’ve been following) will become sufficiently mature and feature-rich in the medium-long term to enable fully replacing (and potentially even some of our use of in e.g. and .). It depends on how you count: zizmor’s first commit was roughly 13 months ago, while its first tagged release (v0.1.0) was roughly 11 months ago. So I’m splitting the two and saying it’s been one year.  ↩ Number #9,047 as of time of writing.  ↩ Read: product leadership. I have a great deal of faith in and respect for GitHub’s engineers, who seem to be uniformly upset about the slow-but-accelerating degradation of GitHub’s product quality.  ↩ A more cynical framing would be that GitHub has entered the “value strip-mining” phase of its product lifecycle, where offerings like GitHub Actions are kept alive just enough to meet contractual obligations and serve as a staging bed for the next round of “innovation” (read: AI in more places). Fixing longstanding flaws in GitHub Actions’ design and security model would not benefit this phase, so it is not prioritized.  ↩ Off the top of my head: immutable actions/releases , fixing action policies , acknowledging how dangerous is, &c.  ↩ This avenue of exploration is not without risks: my experience has been that many security tools that choose to go the “app” integration route end up as Yet Another Infernal Dashboard that security teams dread actually using.  ↩

0 views