Latest Posts (10 found)
Farid Zakaria 1 months ago

Nix derivation madness

I’ve written a bit about Nix and I still face moments where foundational aspects of the package system confounds and surprises me. Recently I hit an issue that stumped me as it break some basic comprehension I had on how Nix works. I wanted to produce the build and runtime graph for the Ruby interpreter. I have Ruby but I don’t seem to have the derivation, , file present on my machine. No worries, I think I can it and download it from the NixOS cache. I guess the NixOS cache doesn’t seem to have it. 🤷 This was actually perplexing me at this moment. In fact there are multiple discourse posts about it. My mental model however of Nix though is that I must have first evaluated the derivation (drv) in order to determine the output path to even substitute. How could the NixOS cache not have it present? Is this derivation wrong somehow? Nope. This is the derivation Nix believes that produced this Ruby binary from the database. 🤨 What does the binary cache itself say? Even the cache itself thinks this particular derivation, , produced this particular Ruby output. What if I try a different command? So I seem to have a completely different derivation, , that resulted in the same output which is not what the binary cache announces. WTF? 🫠 Thinking back to a previous post, I remember touching on modulo fixed-output derivations . Is that what’s going on? Let’s investigate from first principles. 🤓 Let’s first create which is our fixed-output derivation . ☝️ Since this is a fixed-output derivation (FOD) the produced path will not be affected to changes to the derivation beyond the contents of . Now we will create a derivation that uses this FOD. The for the output for this derivation will change on changes to the derivation except if the derivation path for the FOD changes. This is in fact what makes it “modulo” the fixed-output derivations. Let’s test this all out by changing our derivation. Let’s do this by just adding some garbage attribute to the derivation. What happens now? The path of the derivation itself, , has changed but the output path remains consistent. What about the derivation that leverages it? It also got a new derivation path but the output path remained unchanged. 😮 That means changes to fixed-output-derivations didn’t cause new outputs in either derivation but it did create a complete new tree of files. 🤯 That means in nixpkgs changes to fixed-output derivations can cause them to have new store paths for their but result in dependent derivations to have the same output path. If the output path had already been stored in the NixOS cache, then we lose the link between the new and this output path. 💥 The amount of churn that we are creating in derivations was unbeknownst to me. It can get even weirder! This example came from @ericson2314 . We will duplicate the to another file whose only difference is the value of the garbage. Let’s now use both of these in our derivation. We can now instantiate and build this as normal. What is weird about that? Well, let’s take the JSON representation of the derivation and remove one of the inputs. We can do this because although there are two input derivations, we know they both produce the same output! Let’s load this modified derivation back into our and build it again! We got the same output . Not only do we have a trait for our output paths to derivations but we can also take certain derivations and completely change them by removing inputs and still get the same output! 😹 The road to Nix enlightenment is no joke and full of dragons.

2 views
Farid Zakaria 1 months ago

Fuzzing for fun and profit

I watched recently a keynote by Will Wilson on fuzzing – Fuzzing’25 Keynote . The talk is excellent, and one main highlight is the fact we have at our disposal is the capability to “fuzz” our software toaday and yet we do not. While I’ve seen the power of QuickCheck-like tools to create property based testing, I never had never used fuzzing over an application as a whole, specifically American Fuzzy Lop . I was intrigued to add this skill to my toolbelt and maybe apply it to CppNix . As with everything else, I need to learn things from first principles . I would like to create a scenario with a known-failure and see how AFL discovers it. To get started let’s first make sure we have access to AFL via Nix . We will be using AFL++ , the daughter of AFL that incorporates newer updates and features. How does AFL work? 🤔 AFL will feed your program various inputs to try and cause a crash! 💥 In order to generate better inputs, you compile your code with a variant of or distributed by AFL which will insert special instructions to keep track of coverage of branches as it creates various test cases. Let’s create a program that crashes when given the input . We leverage a so that the compiler does not optimize the multiple instructions together. We now can compile our code with to get the instrumented binary . AFL needs to be given some sample inputs Let’s feed it the simplest starter seed possible – an empty file! Now we simply run , and the magic happens . ✨ A really nice TUI appears that informs you of various statistics of the running fuzzer, and importantly if any crashes had been found – ! The output directory contains all the saved information including the input that caused the crashes. Let’s inspect it! AFL was successfully able to find our code-word, , that caused the crash. It is important to note however that for my simple program it found the failure-case rather quickly, however for large programs it can take a long time to explore the complete state space. Companies such as Google, continously run fuzzers such as AFL on well-known open source projects to help detect failures.

0 views
Farid Zakaria 2 months ago

Bazel Knowledge: Smuggling capabilities through a tarball

tl;dr : Linux capabilities are just xattrs (extended attributes) on files — and since can preserve xattrs, Bazel can “smuggle” them into OCI layers without ever running . Every so often I stumble on a trick that makes me do a double-take. This one came up while poking around needing to replace the contents of a that set capabilities on a file, via , and trying to replace it with rules_oci . I learnt this idea from reading bazeldnf . What are capabilities? 🤔 We are all pretty familiar with the all powerful in Linux and escalating to via . Capabilities break that monolith into smaller, more focused privileges [ ref ]. Instead of giving a process the full keys to the kingdom, you can hand it just the one it needs. For example: Capabilities are inherited from the spawning process but they can also be added to the file itself, such that any time that process is it has the desired capabilities. The Linux kernel stores these capabilities in the “extended attributes” (i.e. additional metadata) of the file [ ref ]. If the filesystem you are using does not support extended attributes, then you cannot set capabilities on a file. Let’s see an example we will work through. If we build this with Bazel and try to run it, we will see that it fails unless we either spawn it with , or add it to the binary via . Okay great – but what does this have to do with Bazel? Well we were converting a that used to modify the binary. If your OCI image runs as a non-root user, it will also be unpermitted from creating the raw socket. We can build this Docker image and notice that the entrypoint fails . If we amend the by adding we also see it succeeds. Now we can build and run it again. Back to Bazel! Actions in Bazel are executed under the user that spawned the Bazel process. We can validate this with a simple . How can we go ahead then to create a file with a capability set such that we can replace our layer? Escalating privileges inside a Bazel action with isn’t straightforward. You might need to configure for the user, so that it can execute without a password. You could also run the whole command as but that is granting too much privilege everywhere. This is where the magic happens ✨. Let’s take another detour! What are OCI images? I actually did a previous write-up on containers from first principles if you are curious for a deeper dive. We can export the image from Docker and inspect it. An OCI image is a archive containing metadata and a series of “blobs” some of which are themselves are archives. These blobs are the “layers” that are used to construct the final filesystem and contain all the files that will comprise the rootfs. For capabilities to transport themselves through a tar archive, the tar archive itself must have the capability to store extended attributes as well. You can enable this feature with the option. If you decompress the archive, and have necessary privileges to set extended attributes ( or ) then the unarchived file will retain the capability and everything will work! What does this have to do with building an OCI image in Bazel? 🤨 Turns out that a trick we can employ is to toggle the necessary bits to mark a file as having a necessary capability in the tar archive . This is exactly what the xattrs rule in bazeldnf does! 🤓 The key idea : capabilities live in extended attributes, and can carry those along. That means you don’t need to run inside a at build time as the equivalent — Bazel can smuggle the bits straight into the image tar layer to be consumed by a OCI compliant runtime. ☝️ This trick neatly sidesteps the need for in your rules and keeps builds hermetic. Not every filesystem or runtime will honor these attributes, but when it works it’s a clever, Bazel-flavored way to package privileged binaries without breaking sandboxing.

0 views
Farid Zakaria 2 months ago

Writing a protoc plugin in Java

Know thy enemy. – Sun Tzu Anyone who’s used Protocol Bufffers We use Protocol Buffers heavily at $DAYJOB$ and it’s becoming increasingly a large pain point, most notably due to challenges with coercing multiple versions in a dependency graph. Recently, a team wanted to augment the generated Java code protoc (Protobuf compiler) emits. I was aware that the compiler had a “plugin” architecture but had never looked deeper into it. Let’s explore writing a Protocol Buffer plugin, in Java and for the Java generated code. 🤓 If you’d like to see the end result check out github.com/fzakaria/protoc-plugin-example Turns out that plugins are simple in that they operate solely over standard input & output and unsurprisingly marshal protobuf over them. A plugin is just a program which reads a protocol buffer from standard input and then writes a protocol buffer to standard output. [ ref ] The request & response protos are described in plugin.proto . Here is a dumb plugin that emits a fixed class to demonstrate. We can run this and see that the expected file is produced. Let’s now look at an example in . You can generate the traditional Java code for this using which by default includes the capability to output Java. Nothing out of the ordinary here, we are merely baselining our knowledge. 👌 How can I now modify this code? If you audit the generated code you will see comments that contain such as: Insertion points are markers within the generated source that allow other plugins to include additional content. We have to modify our that we include in the response to specify the insertion point and instead of a new file being created, the contents of files will be merged. ✨ Our example plugin would like to add the function to every message type described in the proto file. We do this by setting the appropriate insertion point which we found from auditing the original generated code. In this particular example, we want to add our new funciton to the Class definition and pick as our insertion point. We now run both the Java generator alongside our custom plugin. We can audit the generated source and we see that our new method is now included! 🔥 Note: The plugin must be listed after as the order matters on the command-line. While we are limited by the insertion points previously defined in the open-source implementation of the Java protobuf generator, it does provide a convenient way to augment the the generated files. We can also include additional source files that may wrap the original files for cases where the insertion points may not suffice.

0 views
Farid Zakaria 2 months ago

Bazel Knowledge: Testing for clean JVM shutdown

Ever run into the issue where you exit your method in Java but the application is still running? That can happen if you have non-daemon threads still running. 🤔 The JVM specification specifically states the condition under which the JVM may exit [ ref ]: A program terminates all its activity and exits when one of two things happens: What are daemon-threads? They are effectively background threads that you might spin up for tasks such as garbage collection, where you explicitly don’t want them to inhibit the JVM from shutting down. A common problem however is that if you have code-paths on exit that fail to stop all non-daemon threads, the JVM process will fail to exit which can cause problems if you are relying on this functionality for graceful restarts or shutdown. Let’s observe a simple example. If we run this, although we exit the main thread, we observe that the JVM does not exit and the thread continues to do its “work”. Often you will see classes implement or so that an orderly shutdown of these sort of resources can occur. It would be great however to test that such graceful cleanup is done appropriately for our codebases. Is this possible in Bazel? If we run this test however we notice the test PASSES 😱 Turns out that Bazel’s JUnit test runner uses after running the tests, which according to the JVM specification allows the runtime to shutdown irrespective of active non-daemon threads. [ ref ] From discussion with others in the community, this explicit shutdown was added specifically because many tests would hang due to improper non-daemon thread cleanup. 🤦 How can we validate graceful shutdown then? Well, we can leverage and startup our and validate that the application exits within a specific timeout. Additionally, I’ve put forward a pull-request PR#26879 which adds a new system property that can be added to a such that the test runner validates that there are no non-daemon threads running before exiting. It would have been great to remove the call completely when the presence of the property is true; however I could not find a way to then set the exit value of the test. Turns out that even simple things can be a little complicated and it was a bit of a headscratcher to see why our tests were passing despite our failure to properly tear down resources. All the threads that are not daemon threads terminate. Some thread invokes the method of or , and the exit operation is not forbidden by the security manager. Some thread invokes the method of or , and the exit operation is not forbidden by the security manager.

0 views
Farid Zakaria 3 months ago

Bazel Knowledge: dive into unused_deps

The Java language implementation for Bazel has a great feature called strict dependencies – the feature enforces that all directly used classes are loaded from jars provided by a target’s direct dependencies. If you’ve ever seen the following message from Bazel, you’ve encountered the feature. The analog tool for removing dependencies which are not directly referenced is unused_deps . You can run this on your Java codebase to prune your dependencies to those only strictly required. That’s a pretty cool feature, but how does it work? 🤔 Turns out the Go code for the tool is relatively short, let’s dive in! I love learning the inner machinery of how the tools I leverage work. 🤓 Let’s use a simple example to explore the tool. First thing the tool does is query which targets to look at , and it emits this to stderr so that part is a little obvious. It performs a query searching for any rules that start with , or . This would catch our common rules such as or . Here is where things get a little more interesting . The tool emits an ephemeral Bazel in a temporary directory that contains a Bazel aspect . What is the aspect the tool injects into our codebase? The aspect is designed to emit additional files that contain the arguments to the compilation actions. If we inspect what this file looks like for the simple I created , we see it’s the arguments to itself. If you are wondering what is? Bazel uses a custom compiler plugin that will be relevant shortly. ☝️ How does the aspect get injected into our project? Well, after figuring out which targets to build via the , will your target pattern and specify to include this additional dependency and enable the aspect via the flag. If you are using Bazel 8+ and have disabled, which is the default, you will need my PR#1387 to make it work. The end result after the is that every Java target (i.e. ) will have produced a file in the directory. Why did it go through such lengths to produce this file? The tool is trying is trying to find the direct dependencies of each Java target. The tool searches for the line for each target to see the dependencies that were needed to build it. QUESTION #1 : Why does the tool need to set up this aspect anyways? Bazel will already emit param files for each Java target that contains nearly identical information. The tool will then iterate through all these JAR files, open them up and look at the file within it for the value of which is the Bazel target expression for this dependency. In this case we can see the desired value is . If you happen to use rules_jvm_external to pull in Maven dependencies, the ruleset will “stamp” the downloaded JARs which means injecting them with the entry in their specifically to work with [ ref ]. QUESTION #2 Why does go to such lengths to discover the labels of the direct dependencies of a particular target? Could this be replaced with a command as well ? 🕵️ For our target we have the following After the labels of all the direct dependencies are known for each target, will parse the jdeps file, , of each target which is a binary protocol serialization of found in deps.go . Using we can inspect and explore the file. This is the super cool feature of Bazel and integrating into the Java compiler. 🔥 Bazel invokes the Java compiler itself and will then iterate through all the symbols, via a provided symbol table, the compiler had to resolve. For each symbol, if the dependency is not from the list than it must have been provided through a transitive dependency. [ ref ]. The presence of kind would actually trigger a failure for the strict Java dependency check if enabled. then takes the list of the direct dependencies and keeps only all the dependencies the compiler reported back as actually requiring to perform compilation. The set difference represents the set of targets that are effectively unused and can be reported back to the user for removal! ✨ QUESTION #3 : There is a third type of dependency kind which I saw when investigating our codebase. I was unable to discern how to trigger it and what it represents. What I enjoy about Bazel is learning how you can improve developer experience and provide insightful tools when you integrate the build system deeply with the underlying language, is a great example of this.

0 views
Farid Zakaria 3 months ago

Using Nix as a library

I have been actively trying to contribute to CppNix – mostly because using it brings me joy and it turns out so does contributing. 🤗 Stepping into any new codebase can be overwhelming. You are trying to navigate new nomenclature, coding standards, tooling and overall architecture. Nix is over 20 years old and has its fair share of warts in a codebase. Knowledge of the codebase is bimodal either being very diffuse or consolidated to a few minds (i.e. @ericson2314 ). Thankfully everyone on the Matrix channel has been extremely welcoming. I have been actively following Snix , a modern Rust re-implementation of the components of the Nix package manager. I like the ideals from the project authors of communicating over well-defined API boundaries via separate processes and a library-first type of design. I was wondering however whether we could leverage CppNix as a library as well. 🤔 Is there a need to throw the baby out with the bath water? 👶 Turns out using Nix as a library is incredibly straightforward! To start, let’s create a that will include our necessary packages: (duh), (build tool) and . </summary> Adding to our will initiate a for any package that contains a output and set up the necessary environment variables. This will be the mechanism with which our build tool finds the necessary shared-objects and header files. We can also run to see that they can be discovered. Let’s now create a trivial file. Since we have our setup, we can declare “system dependencies” that we expect to be present, knowing that we are including these dependencies from our . For our sample project I will recreate functionality that is already present in the command. We will write a function that accepts a path, resolve its derivation and prints it as JSON. We can now build our project and run it! 🔥 That feels pretty cool! Lots of projects end up augmenting Nix by wrapping it with fancy bash scripts , however we can just as easily leverage it as a library and write native-first code. Learning the necessary functions to call is a little obtuse however I was able to reason through the necessary APIs by looking at unit-tests in the repository. What idea do you want to leverage Nix for but maybe put off since you thought doing it on top of Nix would be too hacky? Special thanks to @xokdvium who helped me through some learnings on and how to leverage Nix as a library. 🙇

0 views
Farid Zakaria 3 months ago

GitHub Code Search is the real MVP

There is endless hype about the productivity boon that LLMs will usher in. While I am amazed at the utility offered by these superintelligent LLMs, at the moment (August 2025) I remain bearish on the utilization of these tools to have any meaningful impact on productivity especially for production-grade codebases where correctness, maintainability, and security are paramount. They are clearly helpful for exploring ideas or any goal where the code produced may be discarded at the end. Thinking about how much promise of productivity we might gain from this tool had me reflecting on what other changes in the past 5 years had already benefited me and a clear winner stands out: GitHub’s code search via cs.github.com . Pre-2020, code search in the open-source domain never really had a good solution, given the diaspora of various hosting platforms. If you’ve worked in any large corporate environment (Amazon, Google, Meta etc…) you might have already had exposure to the powers of an incredible code search. The lack of such a tool for public codebases was a limitation we simply worked This is partly why third-party libraries were consolidated into well-known projects like Apache or established companies such as Google’s Guava . An upside to the consolidation of code on GitHub’s platform was capitalized on with the release of their revamped code search. Made generally available in May 2023, the new engine added powerful features like symbol search and the ability to follow references. The productivity win is clear to me, even with the introduction of LLMs. I visit cs.github.com daily, more frequently and with more interaction than any of the LLMs available to me. Finding code written by other humans is fun , and for some reason, more joyful to read. There is a certain level of joy to finding solutions to problems you may be facing that were authored and written by another human. This psychological effect may diminish as the code I’m wading through begins to tilt toward AI-generated content. But for now, the majority of the code I’m viewing still subjectively looks like that authored by a human. I also tend to work in niche areas such as NixOS or Bazel that don’t have a large corpus of material online so the results from the LLM tend to be more disappointing. If given a Sophie’s choice between GitHub code search and LLMs, strictly for the purpose of code authorship, I would pick code search as of today. Humans easily adapt to their environment, a phenomenon known as the hedonic treadmill. As we all get excited for the incoming technology of generative AI, let’s take a moment to reflect on the already amazing contribution to engineering we have become accustomed to due to a wonderful code search.

0 views
Farid Zakaria 3 months ago

Angle brackets in a Nix flake world

At DEFCON33 , the Nix community had its first-ever presence via nix.vegas and I ended up in fun conversation with tomberek 🙌. “What fun things can we do with and with the eventual deprecation of ? The actual 💡 was from tomberek and this is a demonstration of what that might look like without necessitating any changes to CppNix itself. As a very worthwhile aside, the first time presence of the Nix community at DEFCON was fantastic and I am extra appreciative to numinit and RossComputerguy 🙇. The badges handed out were so cool. They have strobing LEDs but also can act as a substituter for the Nix infra that was setup. Okay, back to the idea 💁. Importing nixpks via the through the angle-bracket syntax has been a long-standing wart on the reproducibility promises of Nix. There is a really great article about all the problems with this approach to bringing in projects on nix.dev , for those whom are still leveraging it. With the eventual planned removal of support for , we are now presented with an opportunity of some new functionality in Nix, namely the angled brackets that can be reconstituted for a new purpose. Looks like others are already starting to think about this idea. The project htmnix demonstrates the functionality of writing pure-HTML but evaluating it with 😂. For something potentially more immediately useful, how about giving quicker access to the attributes of the current flake? 🤔 A common pattern that has emerged is to inject and into so that they are available to modules in NixOS or home-manager. This lets you add the modules from your or reference the packages in your from within the modules themselves. That seems nice but also unnecessary. Why not leverage the angled brackets for the same purpose. ☝️ That would make the equivalent example without needing to now wire up the . Is this possible today? Yes! Whenever Nix sees angled brackets it desugars the expression to a call to . If we offer a variant of in scope, Nix will call our implementation rather than the default implementation. Let’s implement a variant that utilizes to return the current flake attributes. Our goal is to write something as simple as the following and have the contents within the angle brackets be treated as an attribute path of the flake. What do we have to do to get this to work? Well we need to provide our own version of . We write a function of that trivially splits the contents within the angle bracket to access the attrset of the flake as returned by . There is some additional magic with 🪄 which is not documented . It allows giving a different base set of variables, via a provided attrset, to use for variables. This is how we can override in all subsequent files. So does this even work? Yes! 🔥 With the caveat that we had to provide since getting the current flake via requires it . This is a pretty ergonomic way to access the attributes of the current Flake automatically without having us all to go through the same setup for what is amounting to common best practices. The need to have is a bit of a bummer although this is a pretty neat improvement. There could be a new builtin, , which automatically provides the context of the current flake and therefore could be pure. I got some wonderful feedback from eljamm via the discourse post that we can just leverage and avoid having to use . We now don’t need to provide 👌 and we get all the same fun new ergonomic way to access flake attributes.

0 views
Farid Zakaria 4 months ago

Bazel Knowledge: Mixing and matching how to build third_party is lunacy

Have you ever found your full of mixed bytecode versions and wondered why? The original intent of Bazel , and it’s peer group (i.e. Buck and Pants), are to build everything from source and to consolidate into large-ish repositories. These are the practices done by the companies (eg. Google and Meta), who built these tools and therefore the tools are originally purpose-built for this use-case. Building from source for everything is very orthogonal to how most developers experience development, especially in open-source – unless you are a fan of NixOS . This makes total sense as the cost of setting up a mono-repository for every small effort would be a Herculean task. In order to “meet developers where they are”, Bazel itself has adopted a third-party registry system ( https://registry.bazel.build/ ) and rules for individual languages have emerged to make interoperability with pre-existing language package managers simpler such as rules_jvm_external for Java. Unfortunately, as the use of Bazel at $DAYJOB$ continues to expand, I am beginning to see the costs and fallout of this popular approach. There is no set standard as to when to build from source and when to pull from a third-party artifact repository such as Maven Central across rules, so one may find themselves in at best confusing builds and at worst broken code. Let’s take a look at an example to illustrate this point. If you’d like to see all the source online, I’ve published them to fzakaira/reproduction#protobuf-bytecode . In this example I would like to leverage a new-ish JDK (eg. JDK21), default language version to 11 (i.e. ) but I want to build a particular slice of my code at a different bytecode level ( ). This might seem like a good fit for transitions , however I found that to be a big complexity addition to the codebase and if you can avoid it that might be best. 🧠 Let’s set up our JDK21 and our language version. Now let’s modify the toolchain such that our particular slice of our codebase builds with a different bytecode target. Here is where it gets interesting , let’s build a simple and check all the bytecode within it. I wrote a simple handy tool, check-jar-versions , that can quickly list out all the bytecode versions within a JAR file. We see some classes compiled at Java 14 for the code within but probably unsurprisingly we get Java 11 as well 😲. Why ? This is because automatically includes dependencies for the protobuf runtime to the compiled Java code. Okay well to be honest, since I have a pretty basic application that is a little unsurprising since I guess my assumption is I’ve built everything from source and clearly doesn’t catch the library. Where this gets a little tricky to find and more subtle is when you mix in prebuilt artifacts from Maven which is popular via rules_jvm_external . We can demo that by adding a single to our dependency. Now if the dependency is earlier in the graph for our we get different results. Now all those Java 11 files are shadowed by the one from the prebuilt protobuf JAR which are at the Java 8 bytecode level. 🤯 We have established a pattern in our repository where we have decided to use prebuilt JARs for our third-party dependencies. Even if we don’t explicitly depend on the prebuilt Maven protobuf JAR, it may come in transitvely from another dependency. The problem however is that our dependant ruleset @protobuf – same is true for @grpc-java – chose to build from source and therefore we get different results depending on the order of the dependencies in the build. It’s even more confusing since mixes & matches the two types [ ref ]. Mixing prebuilt jars and source-built targets without discipline creates confusing and inconsistent builds. Bazel doesn’t protect you — it just builds what you tell it to. The fact that class files may also be shadowed by others in the graph can hide this fact and lead to suprising failure modes. Ok, so I sort of understand the problem. What can I do about it ? 🤓 Pick a idiom and try to stick to it ! You might have to go out of your way to do so. For our repository, we pull in too much from Maven Central in our dependency graph, so we’ve decided to make sure all our rulesets leverage the same prebuilt JARs. In the case of it meant creating a new that uses the prebuilt JAR. In the case of , we had to patch the rules to do the equivalent.

0 views