GreatReads - Blog Aggregator · Phoenix Framework

Dear GitHub: no YAML anchors, please

TL;DR : for a very long time, GitHub Actions lacked support for YAML anchors. This was a good thing . YAML anchors in GitHub Actions are (1) redundant with existing functionality, (2) introduce a complication to the data model that makes CI/CD human and machine comprehension harder, and (3) are not even uniquely useful because GitHub has chosen not to support the one feature (merge keys) that lacks a semantic equivalent in GitHub Actions. For these reasons, YAML anchors are a step backwards that reinforces GitHub Actions’ status as an insecure by default CI/CD platform. GitHub should immediately remove support for YAML anchors, before adoption becomes widespread. GitHub recently announced that YAML anchors are now supported in GitHub Actions. That means that users can write things like this: On face value, this seems like a reasonable feature: the job and step abstractions in GitHub Actions lend themselves to duplication, and YAML anchors are one way to reduce that duplication. Unfortunately, YAML anchors are a terrible tool for this job. Furthermore (as we’ll see) GitHub’s implementation of YAML anchors is incomplete , precluding the actual small subset of use cases where YAML anchors are uniquely useful (but still not a good idea). We’ll see why below. Pictured: the author’s understanding of the GitHub Actions product roadmap. The simplest reason why YAML anchors are a bad idea is because they’re redundant with other more explicit mechanisms for reducing duplication in GitHub Actions. GitHub’s own example above could be rewritten without YAML anchors as: This version is significantly clearer, but has slightly different semantics: all jobs inherit the workflow-level . But this, in my opinion, is a good thing : the need to template environment variables across a subset of jobs suggests an architectural error in the workflow design. In other words: if you find yourself wanting to use YAML anchors to share “global” configuration between jobs or steps, you probably actually want separate workflows, or at least separate jobs with job-level blocks. In summary: YAML anchors further muddy the abstractions of workflows, jobs, and steps, by introducing a cross-cutting form of global state that doesn’t play by the rules of the rest of the system. This, to me, suggests that the current Actions team lacks a strong set of opinions about how GitHub Actions should be used, leading to a “kitchen sink” approach that serves all users equally poorly. As noted above: YAML anchors introduce a new form of non-locality into GitHub Actions. Furthermore, this form of non-locality is fully general : any YAML node can be anchored and referenced. This is a bad idea for humans and machines alike: For humans: a new form of non-locality makes it harder to preserve local understanding of what a workflow, job, or step does: a unit of work may now depend on any other unit of work in the same file, including one hundreds or thousands of lines away. This makes it harder to reason about the behavior of one’s GitHub Actions without context switching. It would only be fair to note that GitHub Actions already has some forms of non-locality: global contexts, scoping rules for blocks, dependencies, step and job outputs, and so on. These can be difficult to debug! But what sets them apart is their lack of generality : each has precise semantics and scoping rules, meaning that a user who understands those rules can comprehend what a unit of work does without referencing the source of an environment variable, output, &c. For machines: non-locality makes it significantly harder to write tools that analyze (or transform) GitHub Actions workflows. The pain here boils down to the fact that YAML anchors diverge from the one-to-one object model 1 that GitHub Actions otherwise maps onto. With anchors, that mapping becomes one-to-many: the same element may appear once in the source, but multiple times in the loaded object representation. In effect, this breaks a critical assumption that many tools make about YAML in GitHub Actions: that an entity in the deserialized object can be mapped back to a single concrete location in the source YAML. This is needed to present reasonable source locations in error messages, but it doesn’t hold if the object model doesn’t represent anchors and references explicitly. Furthermore, this is the reality for every YAML parser in wide use: all widespread YAML parsers choose (reasonably) to copy anchored values into each location where they’re referenced, meaning that the analyzing tool cannot “see” the original element for source location purposes. I feel these pains directly: I maintain zizmor as a static analysis tool for GitHub Actions, and makes both of these assumptions. Moreover, ’s dependencies make these assumptions: (like most other YAML parsers) chooses to deserialize YAML anchors by copying the anchored value into each location where it’s referenced 2 . One of the few things that make YAML anchors uniquely useful is merge keys : a merge key allows a user to compose multiple referenced mappings together into a single mapping. An example from the YAML spec, which I think tidily demonstrates both their use case and how incredibly confusing merge keys are: I personally find this syntax incredibly hard to read, but at least it has a unique use case that could be useful in GitHub Actions: composing multiple sets of environment variables together with clear precedence rules is manifestly useful. Except: GitHub Actions doesn’t support merge keys ! They appear to be using their own internal YAML parser that already had some degree of support for anchors and references, but not for merge keys. To me, this takes the situation from a set of bad technical decisions (and lack of strong opinions around how GitHub Actions should be used) to farce : the one thing that makes YAML anchors uniquely useful in the context of GitHub Actions is the one thing that GitHub Actions doesn’t support. To summarize, I think YAML anchors in GitHub Actions are (1) redundant with existing functionality, (2) introduce a complication to the data model that makes CI/CD human and machine comprehension harder, and (3) are not even uniquely useful because GitHub has chosen not to support the one feature (merge keys) that lacks a semantic equivalent in GitHub Actions. Of these reasons, I think (2) is the most important: GitHub Actions security has been in the news a great deal recently , with the overwhelming consensus being that it’s too easy to introduce vulnerabilities in (or expose otherwise latent vulnerabilities through ) GitHub Actions workflow. For this reason, we need GitHub Actions to be easy to analyze for humans and machine alike. In effect, this means that GitHub should be decreasing the complexity of GitHub Actions, not increasing it. YAML anchors are a step in the wrong direction for all of the reasons aforementioned. Of course, I’m not without self-interest here: I maintain a static analysis tool for GitHub Actions, and supporting YAML anchors is going to be an absolute royal pain in my ass 3 . But it’s not just me: tools like actionlint , claws , and poutine are all likely to struggle with supporting YAML anchors, as they fundamentally alter each tool’s relationship to GitHub Actions’ assumed data model. As-is, this change blows a massive hole in the larger open source ecosystem’s ability to analyze GitHub Actions for correctness and security. All told: I strongly believe that GitHub should immediately remove support for YAML anchors in GitHub Actions. The “good” news is that they can probably do so with a bare minimum of user disruption, since support has only been public for a few days and adoption is (probably) still primarily at the single-use workflow layer and not the reusable action (or workflow) layer. That object model is essentially the JSON object model, where all elements appear as literal components of their source representation and take a small subset of possible types (string, number, boolean, array, object, null). ↩ In other words: even though YAML itself is a superset of JSON, users don’t want YAML-isms to leak through to the object model. Everybody wants the JSON object model, and that means no “anchor” or “reference” elements anywhere in a deserialized structure. ↩ To the point where I’m not clear it’s actually worth supporting anchors to any meaningful extent, and instead immediately flagging them as an attempt at obfuscation. ↩ For humans: a new form of non-locality makes it harder to preserve local understanding of what a workflow, job, or step does: a unit of work may now depend on any other unit of work in the same file, including one hundreds or thousands of lines away. This makes it harder to reason about the behavior of one’s GitHub Actions without context switching. It would only be fair to note that GitHub Actions already has some forms of non-locality: global contexts, scoping rules for blocks, dependencies, step and job outputs, and so on. These can be difficult to debug! But what sets them apart is their lack of generality : each has precise semantics and scoping rules, meaning that a user who understands those rules can comprehend what a unit of work does without referencing the source of an environment variable, output, &c. For machines: non-locality makes it significantly harder to write tools that analyze (or transform) GitHub Actions workflows. The pain here boils down to the fact that YAML anchors diverge from the one-to-one object model 1 that GitHub Actions otherwise maps onto. With anchors, that mapping becomes one-to-many: the same element may appear once in the source, but multiple times in the loaded object representation. In effect, this breaks a critical assumption that many tools make about YAML in GitHub Actions: that an entity in the deserialized object can be mapped back to a single concrete location in the source YAML. This is needed to present reasonable source locations in error messages, but it doesn’t hold if the object model doesn’t represent anchors and references explicitly. Furthermore, this is the reality for every YAML parser in wide use: all widespread YAML parsers choose (reasonably) to copy anchored values into each location where they’re referenced, meaning that the analyzing tool cannot “see” the original element for source location purposes. I feel these pains directly: I maintain zizmor as a static analysis tool for GitHub Actions, and makes both of these assumptions. Moreover, ’s dependencies make these assumptions: (like most other YAML parsers) chooses to deserialize YAML anchors by copying the anchored value into each location where it’s referenced 2 . That object model is essentially the JSON object model, where all elements appear as literal components of their source representation and take a small subset of possible types (string, number, boolean, array, object, null). ↩ In other words: even though YAML itself is a superset of JSON, users don’t want YAML-isms to leak through to the object model. Everybody wants the JSON object model, and that means no “anchor” or “reference” elements anywhere in a deserialized structure. ↩ To the point where I’m not clear it’s actually worth supporting anchors to any meaningful extent, and instead immediately flagging them as an attempt at obfuscation. ↩

DevOps

JSON

Ci Cd Yaml

0 views

ENOSUCHBLOG 5 months ago

Bypassing GitHub Actions policies in the dumbest way possible

TL;DR : GitHub Actions provides a policy mechanism for limiting the kinds of actions and reusable workflows that can be used within a repository, organization, or entire enterprise. Unfortunately, this mechanism is trivial to bypass . GitHub has told me that they don’t consider this a security issue (I disagree), so I’m publishing this post as-is. Update 2025-06-13 : GitHub has silently updated the actions policies documentation to note the bypass in this post: Policies never restrict access to local actions on the runner filesystem (where the path start with ). GitHub Actions is GitHub’s CI/CD offering. I’m a big fan of it, despite its spotty security track record . Because a CI/CD offering is essentially arbitrary code execution as a service , users are expected to be careful about what they allow to run in their workflows, especially privileged workflows that have access to secrets and/or can modify the repository itself. That, in effect, means that users need to be careful about what actions and reusable workflows they trust. Like with other open source ecosystems, downstream consumers (i.e., users of GitHub Actions) retrieve their components (i.e., action definitions) from an essentially open index (the “Actions Marketplace” 1 ). To establish trust in those components, downstream users perform all of the normal fuzzy heuristics: they look at the number of stars, the number of other user, recency of activity, whether the user/organization is a “good” one, and so forth. Unfortunately, this isn’t good enough along two dimensions: Even actions that satisfy these heuristics can be compromised. They’re heuristics after all, not verifiable assertions of quality or trustworthiness. The recent tj-actions attack typifies this: even popular, widely-used actions are themselves software components, with their own supply chains (and CI/CD setups). This kind of acceptance scheme just doesn’t scale , both in terms of human effort and system complexity: complex CI/CD setups can have dozens (or hundreds) of workflows, each of which can contain dozens (or hundreds) of jobs that in turn employ actions and reusable workflows. These sorts of large setups don’t necessarily have a single owner (or even a single team) responsible for gating admission and preventing a the introduction of unvetted actions and reusable workflows. The problem (as stated above) is best solved by eliminating the failure mode itself: rather than giving the system’s committers the ability to introduce new actions and reusable workflows without sufficient review, the system should prevent them from doing so in the first place . To their credit, GitHub understands this! They have a feature called “Actions policies 2 ” that does exactly this. From the Manage GitHub Actions settings documentation: You can restrict workflows to use actions and reusable workflows in specific organizations and repositories. Specified actions cannot be set to more than 1000. (sic) To restrict access to specific tags or commit SHAs of an action or reusable workflow, use the same syntax used in the workflow to select the action or reusable workflow. For an action, the syntax is . For example, use to select a tag or to select a SHA. For more information, see Using pre-written building blocks in your workflow. For a reusable workflow, the syntax is . For example, . For more information, see Reusing workflows. You can use the wildcard character to match patterns. For example, to allow all actions and reusable workflows in organizations that start with , you can specify . To allow all actions and reusable workflows in repositories that start with , you can use . For more information about using the wildcard, see Workflow syntax for GitHub Actions. Use to separate patterns. For example, to allow and , you can specify . GitHub also provides special “preset” cases for this functionality, such as allowing only actions and reusable workflows that belong to the same organization namespace as the repository itself. Here’s what that looks like on a dummy organization and repository of mine: …and here’s what happens when I try to violate that policy, e.g. by using in a workflow: This is fantastic, except that it’s trivial to bypass. Let’s see how. To understand how we’re going to bypass this, we need to understand a few of the building blocks underneath actions and reusable workflows. In particular: These four aspects of GitHub Actions compose together into the world’s dumbest policy bypass : instead of doing , the user can (or otherwise fetch) the repository into the runner’s filesystem, and then use to run the very same action. Here’s what that looks like in practice: (The actual block of the step is inconsequential — I just used that repository for the demo, but anything would work.) And naturally, it works just fine: The fix for this bypass is simple, if potentially somewhat painful: GitHub Actions could consider “local” references to be another category for the purpose of policies, and reject them whenever the policy doesn’t permit them. This would seal off the entire problem, since would just stop working. The downside is that it would potentially break existing users of policies who also use local actions and reusable workflows, assuming there are significant numbers of them 4 . The other option would be to leave it the way it is, but explicitly document local references as a limitation of this policy mechanism. I honestly think this would be perfectly fine; what matters is that users 5 are informed of a feature’s limitations, not necessarily that the feature lacks limitations. First, I’ll couch this again: this is not exactly fancy stuff. It’s a very dumb bypass, and I don’t think it’s critical by any means. At the same time, I think this matters a great deal : ineffective policy mechanisms are worse than missing policy mechanisms, because they provide all of the feeling of security through compliance while actually incentivizing malicious forms of compliance . In this case, the maliciously complying party is almost certainly a developer just trying to get their job done: like most other developers who encounter an inscrutable policy restriction, they will try to hack around it such that the policy is satisfied in name only. For that reason alone I think GitHub should fix this bypass, either by actually fixing it or at least documenting its limitations. Without either of those, projects and organizations are likely to mistakenly believe that these sorts of policies provide a security boundary where none in fact exists . Technically “publishing” an action to the Actions Marketplace is not required; anybody can do to fetch the action defined in even if it isn’t published. All publishing does is give the action a marketplace page and the potential for a little blue checkmark of unclear security value. ↩ Actually, I don’t know what this feature is called. It’s titled “Policies” under the “Actions” section of the repo/org/enterprise settings and is documented under “Github Actions policies” in the Enterprise documentation, but I’m not sure if that’s an umbrella term or not. I’m just going to keep calling it “Actions policies” for now. ↩ Where “owner” is an individual owner or an organization, which in turn might be controlled by an enterprise. But that last bit isn’t visible in the namespace. ↩ I honestly have no idea how widely used this policy feature is. ↩ Here, policy authors and enforcers. ↩ Even actions that satisfy these heuristics can be compromised. They’re heuristics after all, not verifiable assertions of quality or trustworthiness. The recent tj-actions attack typifies this: even popular, widely-used actions are themselves software components, with their own supply chains (and CI/CD setups). This kind of acceptance scheme just doesn’t scale , both in terms of human effort and system complexity: complex CI/CD setups can have dozens (or hundreds) of workflows, each of which can contain dozens (or hundreds) of jobs that in turn employ actions and reusable workflows. These sorts of large setups don’t necessarily have a single owner (or even a single team) responsible for gating admission and preventing a the introduction of unvetted actions and reusable workflows. For an action, the syntax is . For example, use to select a tag or to select a SHA. For more information, see Using pre-written building blocks in your workflow. For a reusable workflow, the syntax is . For example, . For more information, see Reusing workflows. Actions and reusable workflows share the same namespace as the rest of GitHub, i.e. 3 ; When a user writes something like in a workflow, GitHub resolves that reference to mean “the file defined at tag in the repository”; keywords can also refer to relative paths on the runner itself. For example, runs the step with the in the current directory. Relative paths from the runner are not inherently part of the repository state itself: the runner is can contain any state introduced by previous steps within the same job. Technically “publishing” an action to the Actions Marketplace is not required; anybody can do to fetch the action defined in even if it isn’t published. All publishing does is give the action a marketplace page and the potential for a little blue checkmark of unclear security value. ↩ Actually, I don’t know what this feature is called. It’s titled “Policies” under the “Actions” section of the repo/org/enterprise settings and is documented under “Github Actions policies” in the Enterprise documentation, but I’m not sure if that’s an umbrella term or not. I’m just going to keep calling it “Actions policies” for now. ↩ Where “owner” is an individual owner or an organization, which in turn might be controlled by an enterprise. But that last bit isn’t visible in the namespace. ↩ I honestly have no idea how widely used this policy feature is. ↩ Here, policy authors and enforcers. ↩

Security

Ci Cd

0 views

usher.dev 9 months ago

Minimal GitOps-like Deployment Tool

However, it's easy to forget that there's a class of service/business where this doesn't make sense. Many small businesses (if they're not using a PaaS like Fly), don't have the resources to run a Kubernetes cluster and 95% of the time, just need an app to run on a single VPS. Sometimes, maybe during an event or a promotional period, it might make sense to add another server to the mix, load balancing traffic somewhere (DigitalOcean load balancers are pretty effective). Assuming we have some nice way to spin up more servers running the app (a nice Ansible Playbook perhaps?), and assuming we have some nice CD pipeline to deploy the app, we need some way to make sure all the servers are running the latest version. One pattern I like for this is a simplified GitOps pull model (think ArgoCD or Flux but held together with duct tape) where we: This way, your deployment pipeline doesn't need to know anything about the servers it's deploying to - it just needs to know how to update the repo with a reference to the new release. In one case, I have a repo called 'ops' which along with all the ansible playbooks, has a set of files, one for each app which contain (along with other settings), a line like: I can then run a script like the following on each server, which pulls down the repo, checks each app and updates the systemd service file to point to the latest release. For cases where Kubernetes and a full GitOps solution is overkill, this is a nice way to get a simple deployment pipeline setup. Have a repo which indicates what app releases should currently be deployed. Get your servers to query this repo to determine what it should be running and automatically update the app on the server to match. Make your deployment pipeline update the repo with a reference to the new release.

DevOps

Ci Cd

0 views

Rafael Camargo 10 months ago

Conditioning Vercel deployments on successful CircleCI workflows

The way Vercel connects to GitHub is kinda mind-blowing. Getting started is so simple that, before you even google "how to deploy", your app is already live. That first experience is amazing. No doubt, this ease of use is why I keep choosing Vercel for every new API I build. But if you're working solo and committing straight to main, y.

DevOps

Ci Cd

0 views

Carlos Becker 1 years ago

Increasing GitHub Actions Disk Space

A couple of days ago, all of the sudden, my jobs started running out of space.

Ci Cd

DevOps Github

0 views

Aran Wilkinson 1 years ago

continuous integration with go and github actions

DevOps

Ci Cd

Go

0 views

The Tymscar Blog 2 years ago

How I deploy private GitHub projects to local self-hosted servers (CI/CD)

I have a lot of experience with massive CI/CD pipelines that deploy private code to public servers. I’ve also worked with pipelines that deploy public repositories to private servers, such as my homelab. However, I never experimented with a pipeline that takes a private GitHub repo, builds it, and deploys it to a server on the LAN. That’s precisely what I needed for a project I’m currently working on that isn’t yet public.

DevOps

Ci Cd

0 views

Alex Molas 2 years ago

Automate your static blogroll.

In this post I explain how I built my automatic blogroll using Github Action and Github Pages.

Ci Cd

Tutorial

DevOps

0 views

harrisoncramer.me 4 years ago

Speeding up CircleCI Builds with Caching

Redownloading dependencies for every step in your CI/CD pipeline can be time consuming. You can dramatically speed up the build time of your application with caching, making your team more responsive to breaking changes and ultimately more productive. Here's how to do it.

Ci Cd

DevOps

Performance

0 views

Dizzy Zone 6 years ago

Moving from Jenkins to Drone

I’ve written in the past that this blog is a playground for me to try various tools and play with code around it. Jenkins has been my choice as the CI for it since the start, mostly since it was something I’m used to. However, I’ve also stated that running it on an old laptop with no real backups is a recipe for disaster. I have since rectified the issue by hiding the laptop under a box in a closet but that meant moving away from Jenkins to something that’s lighter and more portable. The choice is the self-hosted enterprise edition of Drone . Drone consists of two parts - server and agents. The server handles auth, users, secret management, handles hooks for source control and orchestrates the work for agents. The agents are responsible for actually executing the workflows they receive from the server. This basically means that you can have workers anywhere as long as they can reach the drone server. Drone is written in Go and is extremely lightweight. This means that it has an extremely low memory requirements which is of great advantage to me as I’m trying to keep the costs to a minimum. The server I’m running takes up around 15MB of memory while the worker takes 14MB. Everything is dockerized so it’s super easy to deploy as well. My setup consists of a server running in an AWS EC2 instance and a worker running at home in a MSI Cubi Silent NUC I’ve recently acquired. The Raspberry I’ve used for jenkins is also a great candidate to run a worker but due to the workloads I throw at it(lots of disk io - the sd card can’t handle that. Looking at you - javascript.) it’s less than ideal in my situation. I’ll keep it on hand just in case I need more workers. The old laptop could also be a candidate here for the future. That’s part of the joy with Drone - you can literally run it anywhere. Drone looks for a file in an enabled repository. It’s in this file that you specify your pipeline. What makes drone great is that you can actually run the pipeline locally, using the drone cli. It makes testing the builds super easy. That’s a huge contrast to what I’m used to with Jenkins(Disclaimer: I might just be a scrub, I’m not hating on Jenkins). What this also means is that you really don’t need to worry about storing the jobs themselves anywhere as they are just as safe as the rest of your code in your source control. Hopefully anyway. The steps in the pipeline are run in docker containers which are thrown out after the pipeline is done. It means that the jobs are nice and reproducible. And while I hate YAML the pipelines are quite easy to understand. Click for a look at an example . I like it. Drone seems to be on a really good path towards becoming an excellent CI tool. There’s things missing though. It seems a bit basic. Things like the lack of global secrets(they are now defined per-repo instead or I didn’t manage to find them) or proper documentation as the current one seems a bit lacking. Took me quite a while to get my first proper job running and I’ve only managed that after I’ve looked at a few examples on the internet, rather than the docs. There’s also the question of pricing. The website is not super clear on the pricing, but from what I gather, the enterprise edition is free as long as you run less than 15000 jobs per year. The pricing afterwards is per user at an unknown rate. Anyways, I should be covered as I probably run less than 500 jobs per year and do not plan on adding new ones any time soon. There’s also the lack of option to run multiple different jobs from the same repository, which leads to a pattern on many small repo’s appearing on my source control. I’m not too fond of that and wish there was a way to schedule quite a few jobs from a single repo. Nonetheless, once you get the hang of it Drone seems to be a powerful tool. The UI is crisp, deploying it is easy and it runs on a potato. Everything I like from my CI installation. As for the lacking features - I’m sure they are coming soon™. I’ll keep this space updated with my latest projects running on drone. Stay tuned! Have you used Drone? Do you prefer something else? Do let me know in the comments below.

Ci Cd

Cloud

Go

DevOps

0 views