Posts in Java (20 found)
Brain Baking 2 days ago

My (Retro) Desk Setup in 2025

A lot has happened since the desk setup post from March 2024 —that being I got kicked out of my usual cosy home office upstairs as it was being rebranded into our son’s bedroom. We’ve been trying to fit the office space into the rest of the house by exploring different alternatives: clear a corner of our bedroom and shove everything in there, cut on stuff and integrate it into the living room, … None of the options felt particularly appealing to me. I grew attached to the upstairs place and didn’t want to lose the skylight. And then we renovated our home resulting in more shuffling around of room designations: the living room migrated to the new section with high glass windows to better connect with the back garden. That logically meant I could claim the vacant living room space. Which I did: My home office setup since May 2025. Compared to the old setup, quite a few things changed. First, it’s clear that the new space is much more roomy. But that doesn’t automatically mean I’m able to fit more stuff into it. After comparing both setups, you’ll probably wonder where most of my retro hardware went off to: only the 486 made it into the corder on the left. I first experimented with replicating the same setup downstairs resulting in a very long desk shoved under the window containing the PC towers and screens. That worked, as again there’s enough space, but at the same time, it didn’t at all: putting a lot of stuff in front of the window not only blocks the view, it also makes the office feel cramped and cluttered. That is why the desk is now split into two. The WinXP and Win98 machines have been temporarily stashed away in a closet as I still have to find a way to fit the third desk somewhere at the back (not pictured). Currently, a cupboard stray from the old living room is refusing to let go. We have some ideas to better organize the space but at the moment I can’t find the energy to make it happen. I haven’t even properly reconnected the 486 tower. The messy cables on the photo have been neatly tucked away by now, at least that’s something. Next, since I also have more wall space, I moved all board games into a new Kallax in the new space (pictured on the left). There’s still ample space left to welcome new board games which was becoming a big problem in the old shelf in the hallway that now holds the games of the kids. On the opposite side of the wall (not pictured), I’ve mounted the Billy bookcases from upstairs that now bleed into the back wall (pictured on the right). These two components are new: the small one is currently holding Switch games and audio CDs and the one on the far right is still mostly empty except for fountain pen ink on the top shelf. The problem with filling all that wall space is that there’s almost none left to decorate with a piece of art. Fortunately, the Monkey Island posters survived the move, but I was hoping to be able to put up something else. The big window doesn’t help here: the old space’s skylight allowed me to optimize the wall space. The window is both a blessing and a curse. Admittedly, it’s very nice to be able to stare outside in-between the blue screen sessions, especially if it’s spring/summer when everything is bright green. The new space is far from finished. I intend to put a table down there next to the board game shelf so that noisy gaming sessions don’t bother the people in the living room. The retro hardware pieces deserve a permanent spot and I’m bummed out that some of them had to be (hopefully temporality) stowed away. A KVM switch won’t help here as I already optimized the monitor usage (see the setup of previous years ). My wife suggested to throw a TV in there to connect the SNES and GameCube but the books are eating up all the wall space and I don’t want the office to degrade into a cluttered mess. I’m not even sure whether the metre long desk is worth it for just a laptop and a second screen compared to the one I used before. The relax chair how used for nightly baby feeds still needs to find its way back here as well. I imagine that in a year things will look differently yet again. Hopefully, by then, it will feature more retroness . Related topics: / setup / By Wouter Groeneveld on 12 October 2025.  Reply via email .

0 views
Kix Panganiban 1 weeks ago

Python feels sucky to use now

I've been writing software for over 15 years at this point, and most of that time has been in Python. I've always been a Python fan. When I first picked it up in uni, I felt it was fluent, easy to understand, and simple to use -- at least compared to other languages I was using at the time, like Java, PHP, and C++. I've kept myself mostly up to date with "modern" Python -- think pure tooling, , and syntax, and strict almost everywhere. For the most part, I've been convinced that it's fine. But lately, I've been running into frustrations, especially with async workflows and type safety, that made me wonder if there’s a better tool for some jobs. And then I had to help rewrite a service from Python to Typescript + Bun. I'd stayed mostly detached from Typescript before, only dabbling in non-critical path code, but oh, what a different and truly joyful world it turned out to be to write code in. Here are some of my key observations: Bun is fast . It builds fast -- including installing new dependencies -- and runs fast, whether we're talking runtime performance or the direct loading of TS files. Bun's speed comes from its use of JavaScriptCore instead of V8, which cuts down on overhead, and its native bundler and package manager are written in Zig, making dependency resolution and builds lightning-quick compared to or even Python’s with . When I’m iterating on a project, shaving off seconds (or minutes) on installs and builds is a game-changer -- no more waiting around for to resolve or virtual envs to spin up. And at runtime, Bun directly executes Typescript without a separate compilation step. This just feels like a breath of fresh air for developer productivity. Type annotations and type-checking in Python still feel like mere suggestions, whereas they're fundamental in Typescript . This is especially true when defining interfaces or using inheritance -- compared to ABCs (Abstract Base Classes) and Protocols in Python, which can feel clunky. In Typescript, type definitions are baked into the language - I can define an or with precise control over shapes of data, and the compiler catches mismatches while I'm writing (provided that I've enabled it on my editor). Tools like enforce this rigorously. In Python, even with strict , type hints are optional and often ignored by the runtime, leading to errors that only surface when the code runs. Plus, Python’s approach to interfaces via or feels verbose and less intuitive -- while Typescript’s type system feels like better mental model for reasoning about code. About 99% of web-related code is async. Async is first-class in Typescript and Bun, while it’s still a mess in Python . Sure -- Python's and the list of packages supporting it have grown, but it often feels forced and riddled with gotchas and pitfalls. In Typescript, / is a core language feature, seamlessly integrated with the event loop in environments like Node.js or Bun. Promises are a natural part of the ecosystem, and most libraries are built with async in mind from the ground up. Compare that to Python, where was bolted on later (introduced in 3.5), and the ecosystem (in 2025!) is still only slowly catching up. I’ve run into issues with libraries that don’t play nicely with , forcing me to mix synchronous and asynchronous code in awkward ways. This experience has me rethinking how I approach projects. While I’m not abandoning Python -- it’s still my go-to for many things -- I’m excited to explore more of what Typescript and Bun have to offer. It’s like discovering a new favorite tool in the shed, and I can’t wait to see what I build with it next. Bun is fast . It builds fast -- including installing new dependencies -- and runs fast, whether we're talking runtime performance or the direct loading of TS files. Bun's speed comes from its use of JavaScriptCore instead of V8, which cuts down on overhead, and its native bundler and package manager are written in Zig, making dependency resolution and builds lightning-quick compared to or even Python’s with . When I’m iterating on a project, shaving off seconds (or minutes) on installs and builds is a game-changer -- no more waiting around for to resolve or virtual envs to spin up. And at runtime, Bun directly executes Typescript without a separate compilation step. This just feels like a breath of fresh air for developer productivity. Type annotations and type-checking in Python still feel like mere suggestions, whereas they're fundamental in Typescript . This is especially true when defining interfaces or using inheritance -- compared to ABCs (Abstract Base Classes) and Protocols in Python, which can feel clunky. In Typescript, type definitions are baked into the language - I can define an or with precise control over shapes of data, and the compiler catches mismatches while I'm writing (provided that I've enabled it on my editor). Tools like enforce this rigorously. In Python, even with strict , type hints are optional and often ignored by the runtime, leading to errors that only surface when the code runs. Plus, Python’s approach to interfaces via or feels verbose and less intuitive -- while Typescript’s type system feels like better mental model for reasoning about code. About 99% of web-related code is async. Async is first-class in Typescript and Bun, while it’s still a mess in Python . Sure -- Python's and the list of packages supporting it have grown, but it often feels forced and riddled with gotchas and pitfalls. In Typescript, / is a core language feature, seamlessly integrated with the event loop in environments like Node.js or Bun. Promises are a natural part of the ecosystem, and most libraries are built with async in mind from the ground up. Compare that to Python, where was bolted on later (introduced in 3.5), and the ecosystem (in 2025!) is still only slowly catching up. I’ve run into issues with libraries that don’t play nicely with , forcing me to mix synchronous and asynchronous code in awkward ways. Sub-point: Many Python patterns still push for workers and message queues -- think RQ and Celery -- when a simple async function in Typescript could handle the same task with less overhead. In Python, if I need to handle background tasks or I/O-bound operations, the go-to solution often involves spinning up a separate worker process with something like Celery, backed by a broker like Redis or RabbitMQ. This adds complexity -- now I’m managing infrastructure, debugging message serialization, and dealing with potential failures in the queue. In Typescript with Bun, I can often just write an function, maybe wrap it in a or use a lightweight library like if I need queuing, and call it a day. For a recent project, I replaced a Celery-based task system with a simple async setup in Typescript, cutting down deployment complexity and reducing latency since there’s no broker middleman. It’s not that Python can’t do async -- it’s that the cultural and technical patterns around it often lead to over-engineering for problems that Typescript, in my opinion, solves more elegantly.

0 views

Principles and Methodologies for Serial Performance Optimization

Principles and Methodologies for Serial Performance Optimization Sujin Park, Mingyu Guan, Xiang Cheng, and Taesoo Kim OSDI'25 This paper is a psychological trojan horse for computer nerds of a certain vintage. Every paragraph of sections 3 and 4 inflates the ego a bit more. One arrives at section 5 feeling good about their performance optimization skillset, and then one learns that these skills can be replaced by an LLM. A faint hissing sound reaches one’s ears as one’s ego deflates into a shriveled piece of plastic on the ground. Eight Methodologies The authors reviewed 477 papers containing specific instances of serial performance optimization and boiled the techniques down into eight categories. Serial is the operative word here: this paper is about optimizing portions of code that cannot be parallelized. However, some of the techniques are applicable in computations that comprise a serial portion (the critical path) and a parallel portion. Here are the techniques: Amortizing a fixed cost over many items. For example, refactoring code so that some computation can be hoisted out of a (per-item) loop. Store the results of a computation in memory so that it can be used later on. Memoization is a good example. Compute something earlier than is necessary (possibly speculatively). This works in programs which alternate between serial and parallel portions. If the precomputation can be done during a parallel portion, then it can be thought of as “free” because it is off of the critical path. Act like one of my high schoolers: don’t do work now when it could be done earlier. This works great in late spring, when a teacher realizes they have assigned too much work for the year so they cancel an assignment that most students (not mine) have already completed. Deferring interacts with other techniques: Similar to precomputing, deferring can move work off of the critical path Deferring can enable batching by deferring work until a large batch has been accumulated Compute a quick and dirty approximation to the right answer rather than the exactly right answer. Make a generic component more efficient for a specific use case. For example, a library user could pass hints at runtime which gives the library more information to make tradeoffs. Or profile guided optimization to enable the compiler to make better decisions when compiling the generic library. Use a hardware accelerator, to avoid paying the Turing Tax . Chip away at performance inefficiencies caused by the abstractions in a layered software architecture. For example, DPDK allows applications to bypass many networking abstractions. After finishing section 4 of the paper, I felt pretty good. My ego happily agreed with a world view that says that serial performance optimization can be expressed in terms of 8 “basis vectors”, all of which I had extensive experience with. And then came section 5. The authors fine-tuned GPT-4o with a dataset derived from analyzing SOSP and OSDI papers. Each item in the dataset comprises a problem description, observations, and solutions (in terms of the 8 techniques described earlier). The authors made data and scripts available here . The fine-tuned model is called SysGPT . Table 4 shows example inputs (problems + observations), the output from GPT-4, the output from SysGPT, and the actual solution from a real-world paper. Here is an example: Source: https://www.usenix.org/conference/osdi25/presentation/park-sujin Table 5 has quantitative results, where each model is only asked to choose a subset of the 8 optimization strategies for a given problem (N represents the number of example problems given to the baseline GPT-4o model in a prompt): Source: https://www.usenix.org/conference/osdi25/presentation/park-sujin Dangling Pointers It would be interesting to extend these recipes to also include problems which can be parallelized. This paper assumes that the problem and observations are produced by humans. It would be fascinating to see how much of that can be automated. For example, a model could have access to source code and profiling information, and output a set of observations about system performance before optimization. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts and support my work. Eight Methodologies The authors reviewed 477 papers containing specific instances of serial performance optimization and boiled the techniques down into eight categories. Serial is the operative word here: this paper is about optimizing portions of code that cannot be parallelized. However, some of the techniques are applicable in computations that comprise a serial portion (the critical path) and a parallel portion. Here are the techniques: Batching Amortizing a fixed cost over many items. For example, refactoring code so that some computation can be hoisted out of a (per-item) loop. Caching Store the results of a computation in memory so that it can be used later on. Memoization is a good example. Precomputing Compute something earlier than is necessary (possibly speculatively). This works in programs which alternate between serial and parallel portions. If the precomputation can be done during a parallel portion, then it can be thought of as “free” because it is off of the critical path. Deferring Act like one of my high schoolers: don’t do work now when it could be done earlier. This works great in late spring, when a teacher realizes they have assigned too much work for the year so they cancel an assignment that most students (not mine) have already completed. Deferring interacts with other techniques: Similar to precomputing, deferring can move work off of the critical path Deferring can enable batching by deferring work until a large batch has been accumulated

0 views
Cassidy Williams 3 weeks ago

2000 Poops

Flash back to Spring 2020, when we were all confused and uncertain about what the world was going to look like, and unsure of how we would stay connected to each other. One of my cousins texted our cousin group chat mentioning the app Poop Map as a cheeky (heh) way of keeping up with the fam. We started a family league, and it was honestly pretty great. We’d congratulate each other on our 5-star poops, and mourn the 1-stars. Over time I made other leagues with friends online and offline, and it was really fun. I even talked about it on Scott Hanselman’s podcast when he asked about how to maintain social connections online (if you wanna hear about it, listen at the 11 minute mark in the episode). Eventually, people started to drop off the app, because… it’s dumb? Which is fair. It’s pretty dumb. But alas, I pride myself in being consistent, so I kept at it. For years. The last person I know on the app is my sister-in-law’s high school friend, also known by her very apt username, . She and I have pretty much no other contact except for this app, and yet we’ve bonded. 2000 poops feels like a good place to stop. With 12 countries covered around the world and 45 achievements in the app (including “Are you OK?” courtesy of norovirus, and “Punctuate Pooper” for going on the same day for 12 months in a row), I feel good about saying goodbye. My mom is also really happy I’m stopping. Wonder why? Anyway, goodbye, Poop Map, and goodbye to the fun usernames for the friends along the way: (that’s me), , , , , , , , , , , , , , , , , and of course, . Also, before you go, here’s a fun data visualization I made of all my entries ! Smell ya later!

0 views
Neil Madden 1 months ago

Rating 26 years of Java changes

I first started programming Java at IBM back in 1999 as a Pre-University Employee. If I remember correctly, we had Java 1.1.8 installed at that time, but were moving to Java 1.2 (“Java 2”), which was a massive release—I remember engineers at the time grumbling that the ever-present “ Java in a Nutshell ” book had grown to over 600 pages. I thought I’d take a look back at 26 years of Java releases and rate some of the language and core library changes (Java SE only) that have occurred over this time. It’s a very different language to what I started out with! I can’t possibly cover every feature of those releases , as there are just way too many. So I’m just going to cherry-pick some that seemed significant at the time, or have been in retrospect. I’m not going to cover UI- or graphics-related stuff (Swing, Java2D etc), or VM/GC improvements. Just language changes and core libraries. And obviously this is highly subjective. Feel free to put your own opinions in the comments! The descriptions are brief and not intended as an introduction to the features in question: see the links from the Wikipedia page for more background. NB: later features are listed from when they were first introduced as a preview. The Collections Framework : before the collections framework, there was just raw arrays, Vector, and Hashtable. It gets the job done, but I don’t think anyone thinks the Java collections framework is particularly well designed. One of the biggest issues was a failure to distinguish between mutable and immutable collections, strange inconsistencies like why Iterator as a remove() method (but not, say, update or insert), and so on. Various improvements have been made over the years, and I do still use it in preference to pulling in a better alternative library, so it has shown the test of time in that respect. 4/10 The keyword: I remember being somewhat outraged at the time that they could introduce a new keyword! I’m personally quite fond of asserts as an easy way to check invariants without having to do complex refactoring to make things unit-testable, but that is not a popular approach. I can’t remember the last time I saw an assert in any production Java code. 3/10 Regular expressions: Did I really have to wait 3 years to use regex in Java? I don’t remember ever having any issues with the implementation they finally went for. The Matcher class is perhaps a little clunky, but gets the job done. Good, solid, essential functionality. 9/10 “New” I/O (NIO): Provided non-blocking I/O for the first time, but really just a horrible API (still inexplicably using 32-bit signed integers for file sizes, limiting files to 2GB, confusing interface). I still basically never use these interfaces except when I really need to. I learnt Tcl/Tk at the same time that I learnt Java, and Java’s I/O always just seemed extraordinarily baroque for no good reason. Has barely improved in 2 and a half decades. 0/10 Also notable in this release was the new crypto APIs : the Java Cryptography Extensions (JCE) added encryption and MAC support to the existing signatures and hashes, and we got JSSE for SSL. Useful functionality, dr eadful error-prone APIs . 1/10 Absolutely loads of changes in this release. This feels like the start of modern Java to me. Generics : as Go discovered on its attempt to speed-run Java’s mistakes all over again, if you don’t add generics from the start then you’ll have to retrofit them later, badly. I wouldn’t want to live without them, and the rapid and universal adoption of them shows what a success they’ve been. They certainly have complicated the language, and there are plenty of rough edges (type erasure, reflection, etc), but God I wouldn’t want to live without them. 8/10 . Annotations: sometimes useful, sometimes overused. I know I’ve been guilty of abusing them in the past. At the time it felt like they were ushering a new age of custom static analysis, but that doesn’t really seem to be used much. Mostly just used to mark things as deprecated or when overriding a method. Meh. 5/10 Autoboxing: there was a time when, if you wanted to store an integer in a collection, you had to manually convert to and from the primitive int type and the Integer “boxed” class. Such conversion code was everywhere. Java 5 got rid of that, by getting the compiler to insert those conversions for you. Brevity, but no less inefficient. 7/10 Enums : I’d learned Haskell by this point, so I couldn’t see the point of introducing enums without going the whole hog and doing algebraic datatypes and pattern-matching. (Especially as Scala launched about this time). Decent feature, and a good implementation, but underwhelming. 6/10 Vararg methods: these have done quite a lot to reduce verbosity across the standard library. A nice small improvement that’s had a good quality of life enhancement. I still never really know when to put @SafeVarargs annotations on things though. 8/10 The for-each loop: cracking, use it all the time. Still not a patch on Tcl’s foreach (which can loop over multiple collections at once), but still very good. Could be improved and has been somewhat replaced by Streams. 8/10 Static imports: Again, a good simple change. I probably would have avoided adding * imports for statics, but it’s quite nice for DSLs. 8/10 Doug Lea’s java.util.concurrent etc : these felt really well designed. So well designed that everyone started using them in preference to the core collection classes, and they ended up back-porting a lot of the methods. 10/10 After the big bang of Java 5, Java 6 was mostly performance and VM improvements, I believe, so we had to wait until 2011 for more new language features. Strings in switch: seems like a code smell to me. Never use this, and never see it used. 1/10 Try-with-resources : made a huge difference in exception safety. Combined with the improvements in exception chaining (so root cause exceptions are not lost), this was a massive win. Still use it everywhere. 10/10 Diamond operator for type parameter inference: a good minor syntactic improvement to cut down the visual noise. 6/10 Binary literals and underscores in literals: again, minor syntactic sugar. Nice to have, rarely something I care about much. 4/10 Path and Filesystem APIs: I tend to use these over the older File APIs, but just because it feels like I should. I couldn’t really tell you if they are better or not. Still overly verbose. Still insanely hard to set file permissions in a cross-platform way. 3/10 Lambdas: somewhat controversial at the time. I was very in favour of them, but only use them sparingly these days, due to ugly stack traces and other drawbacks. Named method references provide most of the benefit without being anonymous. Deciding to exclude checked exceptions from the various standard functional interfaces was understandable, but also regularly a royal PITA. 4/10 Streams: Ah, streams. So much potential, but so frustrating in practice. I was hoping that Java would just do the obvious thing and put filter/map/reduce methods onto Collection and Map, but they went with this instead. The benefits of functional programming weren’t enough to carry the feature, I think, so they had to justify it by promising easy parallel computing. This scope creep enormously over-complicated the feature, makes it hard to debug issues, and yet I almost never see parallel streams being used. What I do still see quite regularly is resource leaks from people not realising that the stream returned from Files.lines() has to be close()d when you’re done—but doing so makes the code a lot uglier. Combine that with ugly hacks around callbacks that throw checked exceptions, the non-discoverable API (where are the static helper functions I need for this method again?), and the large impact on lots of very common code, and I have to say I think this was one of the largest blunders in modern Java. I blogged what I thought was a better approach 2 years earlier, and I still think it would have been better. There was plenty of good research that different approaches were better , since at least Oleg Kiselyov’s work in the early noughties . 1/10 Java Time: Much better than what came before, but I have barely had to use much of this API at all, so I’m not in a position to really judge how good this is. Despite knowing how complex time and dates are, I do have a nagging suspicion that surely it doesn’t all need to be this complex? 8/10 Modules: I still don’t really know what the point of all this was. Enormous upheaval for minimal concrete benefit that I can discern. The general advice seems to be that modules are (should be) an internal detail of the JRE and best ignored in application code (apart from when they spuriously break things). Awful. -10/10 (that’s minus 10!) jshell: cute! A REPL! Use it sometimes. Took them long enough. 6/10 The start of time-based releases, and a distinct ramp-up of features from here on, trying to keep up with the kids. Local type inference (“var”) : Some love this, some hate it. I’m definitely in the former camp. 9/10 New HTTP Client : replaced the old URL.openStream() approach by creating something more like Apache HttpClient. It works for most purposes, but I do find the interface overly verbose. 6/10 This release also added TLS 1.3 support, along with djb-suite crypto algorithms. Yay. 9/10 Switch expressions : another nice mild quality-of-life improvement. Not world changing, but occasionally nice to have. 6/10 Text blocks: on the face of it, what’s not to like about multi-line strings? Well, apparently there’s a good reason that injection attacks remain high on the OWASP Top 10, as the JEP introducing this feature seemed intent on getting everyone writing SQL, HTML and JavaScript using string concatenation again. Nearly gave me a heart attack at the time, and still seems like a pointless feature. Text templates (later) are trying to fix this, but seem to be currently in limbo . 3/10 Pattern matching in : a little bit of syntactic sugar to avoid an explicit cast. But didn’t we all agree that using was a bad idea decades ago? I’m really not sure who was doing the cost/benefit analysis on these kinds of features. 4/10 Records: about bloody time! Love ‘em. 10/10 Better error messages for NullPointerExceptions: lovely. 8/10 Sealed classes: in principal I like these a lot. We’re slowly getting towards a weird implementation of algebraic datatypes. I haven’t used them very much yet so far. 8/10 EdDSA signatures: again, a nice little improvement in the built-in cryptography. Came with a rather serious bug though… 8/10 Vector (SIMD) API: this will be great when it is finally done, but still baking several years later. ?/10 Pattern matching switch: another piece of the algebraic datatype puzzle. Seems somehow more acceptable than instanceof, despite being largely the same idea in a better form. 7/10 UTF-8 by default: Fixed a thousand encoding errors in one fell swoop. 10/10 Record patterns: an obvious extension, and I think we’re now pretty much there with ADTs? 9/10 Virtual threads: being someone who never really got on with async/callback/promise/reactive stream-based programming in Java, I was really happy to see this feature. I haven’t really had much reason to use them in anger yet, so I don’t know how well they’ve been done. But I’m hopeful! ?/10 String templates: these are exactly what I asked for in A few programming language features I’d like to see , based on E’s quasi-literal syntax, and they fix the issues I had with text blocks. Unfortunately, the first design had some issues, and so they’ve gone back to the drawing board. Hopefully not for too long. I really wish they’d not released text blocks without this feature. 10/10 (if they ever arrive). Sequenced collections: a simple addition that adds a common super-type to all collections that have a defined “encounter order”: lists, deques, sorted sets, etc. It defines convenient getFirst() and getLast() methods and a way to iterate items in the defined order or in reverse order. This is a nice unification, and plugs what seems like an obvious gap in the collections types, if perhaps not the most pressing issue? 6/10 Wildcards in patterns: adds the familiar syntax from Haskell and Prolog etc of using as a non-capturing wildcard variable in patterns when you don’t care about the value of that part. 6/10 Simplified console applications: Java finally makes simple programs simple for beginners, about a decade after universities stopped teaching Java to beginners… Snark aside, this is a welcome simplification. 8/10 This release also adds support for KEMs , although in the simplest possible form only. Meh. 4/10 The only significant change in this release is the ability to have statements before a call to super() in a constructor. Fine. 5/10 Primitive types in patterns: plugs a gap in pattern matching. 7/10 Markdown javadoc comments: Does anyone really care about this? 1/10 The main feature here from my point of view as a crypto geek is the addition of post-quantum cryptography in the form of the newly standardised ML-KEM and ML-DSA algorithms, and support in TLS. Stable values: this is essentially support for lazily-initialised final variables. Lazy initialisation is often trickier than it should be in Java, so this is a welcome addition. Remembering Alice ML , I wonder if there is some overlap between the proposed StableValue and a Future? 7/10 ? PEM encoding of cryptographic objects is welcome from my point of view, but someone will need to tell me why this is not just ? Decoding support is useful though, as that’s a frequent reason I have to grab Bouncy Castle still. 7/10 Well, that brings us pretty much up to date. What do you think? Agree, disagree? Are you a passionate defender of streams or Java modules? Have at it in the comments.

0 views
Dan Moore! 1 months ago

Career Leverage as a Developer

I was recently on the “I’m a Software Engineer, What’s Next?” podcast. You can view the whole podcast episode and you can subscribe and learn more about the podcast as well. (You can see all my podcast appearances .) We covered a lot of interesting ground, but one thing we talked about was undifferentiated but scary problems. When you find one of these in the business world, that makes for a good software company. FusionAuth is one example of this. There, we focus on authentication. Authentication is undifferentiated because: Authentication is scary and risky because: Of course the deeper you get into any area, the less scary it seems, but for the average developer, I think authentication is imposing. There are other undifferentiated but scary areas of software development, including: But one insight that came out of the discussion is that this applies to your career as well. If you focus on undifferentiated scary problems, then you have a lucrative career ahead of you, because the problem is important to solve (because it is scary) and transfers between companies (because it is undifferentiated). If you focus on differentiated problems, such as a scary area of the code base that is unique to the project, you’ll be tied to a particular company. If you focus on safe problems, you can switch between companies but you’re unlikely to make a lot of money, because the problems you are working on won’t be that important. For new developers, I wouldn’t recommend immediate specialization into a scary, undifferentiated problem. There’s an explore-versus-exploit continuum in careers, and early on, exploration is crucial. You have to find what you are interested in. But at some point, choosing an undifferentiated scary problem and solving it in a relatively unique way gives you significant career leverage. And leverage makes you valuable in the workplace. It also helps you avoid being a cog. Every employer wants fungible employees, but every employee should resist being fungible. Don’t be “Java Engineer 2” or “React Developer with 3 years experience.” Be someone people want to work with. The goal is for people to say, “I want to work with [your name],” not “I want to work with any React developer.” By tackling problems that are both scary (high-impact) and undifferentiated (universally applicable), you build expertise that travels with you while positioning yourself as someone who can handle what others avoid. most online apps need it it’s not a competitive advantage for most applications there are well known standards (OIDC, SAML, OAuth) it impacts conversion and user experience the risk of user data being exposed impacts reputation and bottom line there’s jargon there’s security risk performance legacy code upkeep real time systems distributed systems

0 views
matklad 1 months ago

Look Out For Bugs

One of my biggest mid-career shifts in how I write code was internalizing the idea from this post: Don’t Write Bugs Historically, I approached coding with an iteration-focused mindset — you write a draft version of a program, you set up some kind of a test to verify that it does what you want it to do, and then you just quickly iterate on your draft until the result passes all the checks. This was a great approach when I was only learning to code, as it allowed me to iterate past the things which were not relevant for me at that point, and focus on what matters. Who cares if it is or in the “паблик статик войд мэйн стринг а-эр-джи-эс”, it’s just some obscure magic spell anyway, and completely irrelevant to the maze-traversing thingy I am working on! Carrying over this approach past the learning phase was a mistake. As Lawrence points out, while you can spend time chasing bugs in the freshly written code, it is possible to dramatically cut the amount of bugs you introduce in the first place, if you focus on optimizing that (and not just the iteration time). It felt (and still feels) like a superpower! But there’s already a perfectly fine article about not making bugs, so I am not going to duplicate it. Instead, I want to share a related, but different super power: You can find bugs by just reading code. I remember feeling this superpower for the first time. I was investigating various rope implementations, and, as a part of that, I looked at the , the implementation powering IntelliJ, very old and battle tested code. And, by just reading the code, I found a bug, since fixed . It wasn’t hard, the original code is just 500 lines of verbose Java (yup, that’s all that you need for a production rope). And I wasn’t even trying to find a bug, it just sort-of jumped out at me while I was trying to understand how the code works. That is, you can find some existing piece of software, carefully skim through implementation, and discover real problems that can be fixed. You can do this to your software as well! By just re-reading a module you wrote last year, you might find subtle problems. I regularly discover TigerBeetle issues by just covering this or that topic on IronBeetle : bug discovered live , fixed , and PR merged . Here are some tips for getting better at this: The key is careful, slow reading. What you actually are doing is building the mental model of a program inside your head. Reading the source code is just an instrument for achieving that goal. I can’t emphasize this enough: programming is all about building a precise understanding inside your mind, and then looking for the diff between your brain and what’s in git. Don’t dodge an opportunity to read more of the code. If you are reviewing a PR, don’t review just the diff, review the entire subsystem. When writing code, don’t hesitate to stop and to probe and feel the context around. Go for or to understand the historical “why” of the code. When reading, mostly ignore the textual order, don’t just read each source file top-down. Instead, use these two other frames: Start at or subsystem equivalent, and use “goto definition” to follow an imaginary program counter. Identify the key data structures and fields, and search for all places where they are created and modified. You want to see a slice across space and time, state and control flow (c.f. Concurrent Expression Problem ). Just earlier today I used the second trick to debug an issue for which I haven’t got a repro. I identified as the key assignment that was recently introduced, then ctrl + f for , and that immediately revealed a gap in my mental model. Note how this was helped by the fact that the thing in question, , was always called that in the source code! If your language allows it, avoid , use proper names. Identify and collect specific error-prone patterns or general smells in the code. In Zig, if there’s an allocator and a in the same scope, you need to be very careful . If there’s an isolated tricky function, it’s probably fine. If there’s a tricky interaction between functions, it is a smell, and some bugs are lurking there. Bottom line: reading the code is surprisingly efficient at proactively revealing problems. Create space for calm reading. When reading, find ways to build mental models quickly, this is not entirely trivial.

0 views
neilzone 1 months ago

My third Airsoft game day and perhaps I am finally getting the hang of it

I played my third Airsoft game day today, at Red Alert, near Thatcham, again. It was great fun, and, for the first time, I felt that I might be getting the hang of Airsoft. Sure, it is just running around and shooting toy guns at each other, but the first couple of times, I really had no clue what was going on, or what to do. This time was a lot better. I did have to fight a bit with my safety glasses sweating up today, and I spent part of one of the games with less than ideal vision, but I was still reasonably effective. I resorted to a sort of headband, but over my forehead, and that worked pretty well. As it gets cooler, perhaps this will become less of a problem. I played more aggressively than before, in terms of running up and taking on the opposition. I did this whether I was attacking or defending, so more of the “front line” than hanging around at the back. I guess that I was less worried about being hit, and more keen to be involved. It doesn’t hurt too much, and I go back to respawn and start again. I think that not having to think quite so much about the mechanics of the rifle helped, as I could just get on and use it, and focus on other things. Getting used to the site layout is helpful. I also tried to make use of some of the things that I had been taught in the practice evenings, especially use of cover, which definitely helped. I spent some time being sneaky and taking the long way round to flank the enemy to attack from their rear, which was also fun, but it takes a long time to walk back to respawn, which (especially on a hot day, as today was) was a bit of a pain. But I got some sneaky kills in that way. I’m still getting used to the range of my rifle, which is a lot less than I had expected. I don’t think that it is a particularly bad rifle / range - it is not noticeably worse than other people’s similar rifles - but it is just less than I would have thought. I did pretty well with it, in terms of the number of kills, so I have no real complaints. I am not looking to spend much more money on a nascent hobby at the moment, but I could be tempted to upgrade the spring and see if that has a positive effect (within chrono limits for the site). The first two times, I played on semi-automatic the whole time (one BB per pull of the trigger). This time, I experimented with full auto, so BBs firing for as long as I keep the trigger depressed. I firing no more than three or four rounds at a time (short bursts), and that worked quite well. It did mean that I got through a lot more ammunition - about £10 worth, by my estimation. Some games, I got through three hicap magazines, and into a fourth. A sling has made a massive difference, in terms of comfort, and I’ve experimented with the attachment points. This has been a good additional purchase. I think I’d like to give pyrotechnics a go at some point. Smoke grenades, or perhaps a frag grenade. But that feels like an unnecessary distraction at the moment, and I should get better with my rifle first. Not terrible, but definitely room for improvement. I did quite a bit of running, and sprinting between cover, but by the end of the day, I was definitely feeling it.

0 views
matklad 2 months ago

Zig's Lovely Syntax

It’s a bit of a silly post, because syntax is the least interesting detail about the language, but, still, I can’t stop thinking how Zig gets this detail just right for the class of curly-braced languages, and, well, now you’ll have to think about that too. On the first glance, Zig looks almost exactly like Rust, because Zig borrows from Rust liberally. And I think that Rust has great syntax, considering all the semantics it needs to express (see “Rust’s Ugly Syntax” ). But Zig improves on that, mostly by leveraging simpler language semantics, but also through some purely syntactical tasteful decisions. How do you spell a number ninety-two? Easy, . But what type is that? Statically-typed languages often come with several flavors of integers: , , . And there’s often a syntax for literals of a particular types: , , . Zig doesn’t have suffixes, because, in Zig, all integer literals have the same type: : The value of an integer literal is known at compile time and is coerced to a specific type on assignment or ascription: To emphasize, this is not type inference, this is implicit comptime coercion. This does mean that code like generally doesn’t work, and requires an explicit type. Raw or multiline strings are spelled like this: This syntax doesn’t require a special form for escaping itself: It nicely dodges indentation problems that plague every other language with a similar feature. And, the best thing ever: lexically, each line is a separate token. As Zig has only line-comments, this means that is always whitespace. Unlike most other languages, Zig can be correctly lexed in a line-by-line manner. Raw strings is perhaps the biggest improvement of Zig over Rust. Rust brute-forces the problem with syntax, which does the required job, technically, but suffers from the mentioned problems: indentation is messy, nesting quotes requires adjusting hashes, unclosed raw literal breaks the following lexical structure completely, and rustfmt’s formatting of raw strings tends to be rather ugly. On the plus side, this syntax at least cannot be expressed by a context-free grammar! For the record, Zig takes C syntax (not that C would notice): The feels weird! It will make sense by the end of the post. Here, I want only to note part, which matches the assignment syntax . This is great! This means that grepping for gives you all instances where a field is written to. This is hugely valuable: most of usages are reads, but, to understand the flow of data, you only need to consider writes. Ability to mechanically partition the entire set of usages into majority of boring reads and a few interesting writes does wonders for code comprehension. Where Zig departs from C the most is the syntax for types. C uses a needlessly confusing spiral rule. In Zig, all types are prefix: While pointer type is prefix, pointer dereference is postfix, which is a more natural subject-verb order to read: Zig has general syntax for “raw” identifiers: It is useful to avoid collisions with keywords, or for exporting a symbol whose name is otherwise not a valid Zig identifier. It is a bit more to type than Kotlin’s delightful , but manages to re-use Zig’s syntax for built-ins ( ) and strings. Like, Rust, Zig goes for function declaration syntax. This is such a massive improvement over C/Java style function declarations: it puts token (which is completely absent in traditional C family) and function name next to each other, which means that textual search for allows you to quickly find the function. Then Zig adds a little twist. While in Rust we write The arrow is gone! Now that I’ve used this for some time, I find arrow very annoying to type, and adding to the visual noise. Rust needs the arrow: Rust has lambdas with an inferred return type, and, in a lambda, the return type is optional. So you need some sort of an explicit syntax to tell the parser if there is return type: And it’s understandable that lambdas and functions would want to use compatible syntax. But Zig doesn’t have lambdas, so it just makes the type mandatory. So the main is Related small thing, but, as name of the type, I think I like more than . Zig is using and for binding values to names: This is ok, a bit weird after Rust’s, whose would be in Zig, but not really noticeable after some months. I do think this particular part is not great, because , the more frequent one, is longer. I think Kotlin nails it: , , . Note all three are monosyllable, unlike and ! Number of syllables matters more than the number of letters! Like Rust, Zig uses syntax for ascribing types, which is better than because optional suffixes are easier to parse visually and mechanically than optional prefixes. Zig doesn’t use and and spells the relevant operators as and : This is easier to type and much easier to read, but there’s also a deeper reason why they are not sigils. Zig marks any control flow with a keyword. And, because boolean operators short-circuit, they are control flow! Treating them as normal binary operator leads to an entirely incorrect mental model. For bitwise operations, Zig of course uses and . Both Zig and Rust have statements and expressions. Zig is a bit more statement oriented, and requires explicit returns: Furthermore, because there are no lambdas, scope of return is always clear. Relatedly, the value of a block expression is void. A block is a list of statements, and doesn’t have an optional expression at the end. This removes the semicolon problem — while Rust rules around semicolons are sufficiently clear (until you get to macros), there’s some constant mental overhead to getting them right all the time. Zig is more uniform and mechanical here. If you need a block that yields a value, Zig supports a general syntax for breaking out of a labeled block: Rust makes pedantically correct choice regarding s: braces are mandatory: This removes the dreaded “dangling else” grammatical ambiguity. While theoretically nice, it makes -expression one-line feel too heavy. It’s not the braces, it’s the whitespace around them: But the ternary is important! Exploding a simple choice into multi-line condition hurts readability. Zig goes with the traditional choice of making parentheses required and braces optional: By itself, this does create a risk of style bugs. But in Zig formatter (non-configurable, user-directed) is a part of the compiler, and formatting errors that can mask bugs are caught during compilation. For example, is an error due to inconsistent whitespace around the minus sign, which signals a plausible mixup of infix and binary minus. No such errors are currently produced for incorrect indentation (the value add there is relatively little, given ), but this is planned. NB: because Rust requires branches to be blocks, it is forced to make synonym with . Otherwise, the ternary would be even more unusable! Syntax design is tricky! Whether you need s and whether you make or mandatory in ifs are not orthogonal! Like Python, Zig allows on loops. Unlike Python, loops are expressions, which leads to a nicely readable imperative searches: Zig doesn’t have syntactically-infinite loop like Rust’s or Go’s . Normally I’d consider that a drawback, because these loops produce different control flow, affecting reachability analysis in the compiler, and I don’t think it’s great to make reachability dependent on condition being visibly constant. But! As Zig places semantics front and center, and the rules for what is and isn’t a comptime constant are a backbone of every feature, “anything equivalent to ” becomes sufficiently precise. Incidentally, these days I tend to write “infinite” loops as Almost always there is an up-front bound for the number of iterations until the break, and its worth asserting this bound, because debugging crashes is easier than debugging hangs. , , , , and all use the same Ruby/Rust inspired syntax for naming captured values: I like how the iterator comes first, and then the name of an item follows, logically and syntactically. I have a very strong opinion about variable shadowing. It goes both ways: I spent hours debugging code which incorrectly tried to use a variable that was shadowed by something else, but I also spent hours debugging code that accidentally used a variable that should have been shadowed! I really don’t know whether on balance it is better to forbid or encourage shadowing! Zig of course forbids shadowing, but what’s curious is that it’s just one episode of the large crusade against any complexity in name resolution. There’s no “prelude”, if you want to use anything from std, you need to import it: There are no glob imports, if you want to use an item from std, you need to import it: Zig doesn’t have inheritance, mixins, argument-dependent lookup, extension functions, implicit or traits, so, if you see , that is guaranteed to be a boring method declared on type. Similarly, while Zig has powerful comptime capabilities, it intentionally disallows declaring methods at compile time. Like Rust, Zig used to allow a method and a field to share a name, because it actually is syntactically clear enough at the call site which is which. But then this feature got removed from Zig. More generally, Zig doesn’t have namespaces. There can be only one kind of in scope, while Rust allows things like I am astonished at the relative lack of inconvenience in Zig’s approach. Turns out that is all the syntax you’ll ever need for accessing things? For the historically inclined, see “The module naming situation” thread in the rust mailing list archive to learn the story of how rust got its syntax. The lack of namespaces touches on the most notable (by its absence) feature of Zig syntax, which deeply relates to the most profound aspect of Zig’s semantics. Everything is an expression. By which I mean, there’s no separate syntactic categories of values, types, and patterns. Values, types, and patterns are of course different things. And usually in the language grammar it is syntactically obvious whether a particular text fragment refers to a type or a value: So the standard way is to have separate syntax families for the three categories, which need to be internally unambiguous, but can be ambiguous across the categories because the place in the grammar dictates the category: when parsing , everything until is a pattern, stuff between and is a type, and after we have a value. There are two problems here. First, there’s a combinatorial explosion of sorts in the syntax, because, while three categories describe different things, it turns out that they have the same general tree-ish shape. The second problem is that it might be hard to maintain category separation in the grammar. Rust started with the three categories separated by a bright line. But then, changes happen. Originally, Rust only allowed syntax for assignment. But today you can also write to do unpacking like Similarly, the turbofish used to move the parser from the value to the type mode, but now const parameters are values that can be found in the type position! The alternative is not to pick this fight at all. Rather than trying to keep the categories separately in the syntax, use the same surface syntax to express all three, and categorize later, during semantic analysis. In fact, this is already happens in the example — these are different things! One is a place (lvalue) and another is a “true” value (rvalue), but we use the same syntax for both. I don’t think such syntactic unification necessarily implies semantic unification, but Zig does treat everything uniformly, as a value with comptime and runtime behavior (for some values, runtime behavior may be missing, for others — comptime): The fact that you can write an where a type goes is occasionally useful. But the fact that simple types look like simple values syntactically consistently make the language feel significantly less busy. As a special case of everything being an expression, instances of generic types look like this: Just a function call! Though, there’s some resistance to trickery involved to make this work. Usually, languages rely on type inference to allow eliding generic arguments. That in turn requires making argument syntax optional, and that in turn leads to separating generic and non-generic arguments into separate parameter lists and some introducer sigil for generics, like or . Zig solves this syntactic challenge in the most brute-force way possible. Generic parameters are never inferred, if a function takes 3 comptime arguments and 2 runtime arguments, it will always be called with 5 arguments syntactically. Like with the (absence of) importing flourishes, a reasonable reaction would be “wait, does this mean that I’ll have to specify the types all the time?” And, like with import, in practice this is a non-issue. The trick are comptime closures. Consider a generic : We have to specify type when creating an instance of an . But subsequently, when we are using the array list, we don’t have to specify the type parameter again, because the type of variable already closes over . This is the major truth of object-orienting programming, the truth so profound that no one even notices it: in real code, 90% of functions are happiest as (non-virtual) methods. And, because of that, the annotation burden in real-world Zig programs is low. While Zig doesn’t have Hindley-Milner constraint-based type inference, it relies heavily on one specific way to propagate types. Let’s revisit the first example: This doesn’t compile: and are different values, we can’t select between two at runtime because they are different. We need to coerce the constants to a specific runtime type: But this doesn’t kick the can sufficiently far enough and essentially reproduces the with two incompatible branches. We need to sink coercion down the branches: And that’s exactly how Zig’s “Result Location Semantics” works. Type “inference” runs a simple left-to-right tree-walking algorithm, which resembles interpreter’s . In fact, is exactly what happens. Zig is not a compiler, it is an interpreter. When evaluates an expression, it gets: When interpreting code like the interpreter passes the result location ( ) and type down the tree of subexpressions. If branches store result directly into object field (there’s a inside each branch, as opposed to one after the ), and each coerces its comptime constant to the appropriate runtime type of the result. This mechanism enables concise syntax for specifying enums: When evaluates the switch, it first evaluates the scrutinee, and realizes that it has type . When evaluating arm, it sets result type to for the condition, and a literal gets coerced to . The same happens for the second arm, where result type further sinks down the . Result type semantics also explains the leading dot in the record literal syntax: Syntactically, we just want to disambiguate records from blocks. But, semantically, we want to coerce the literal to whatever type we want to get out of this expression. In Zig, is a shorthand for . I must confess that did weird me out a lot at first during writing code (I don’t mind reading the dot). It’s not the easiest thing to type! But that was fixed once I added snippet, expanding to . The benefits to lightweight record literal syntax are huge, as they allow for some pretty nice APIs. In particular, you get named and default arguments for free: I don’t really miss the absence of named arguments in Rust, you can always design APIs without them. But they are free in Zig, so I use them liberally. Syntax wise, we get two features (calling functions and initializing objects) for the price of one! Finally, the thing that weirds out some people when they see Zig code, and makes others reconsider their choice GitHub handles, even when they haven’t seen any Zig: syntax for built-in functions. Every language needs to glue “userspace” code with primitive operations supported by the compiler. Usually, the gluing is achieved by making the standard library privileged and allowing it to define intrinsic functions without bodies, or by adding ad-hoc operators directly to the language (like Rust’s ). And Zig does have a fair amount of operators, like or . But the release valve for a lot of functionality are built-in functions in distinct syntactic namespace, so Zig separates out , , , , , , , , , and . There’s no need to overload casting when you can give each variant a name. There’s also for type ascription. The types goes first, because the mechanism here is result type semantics: evaluates the first argument as a type, and then uses that as the type for the second argument. Curiously, I think actually can be implemented in the userspace: In Zig, a type of function parameter may depend on values of preceding (comptime) ones! My favorite builtin is . First, it’s the most obvious way to import code: Its crystal clear where the file comes from. But, second, it is an instance of reverse syntax sugar. You see, import isn’t really a function. You can’t do The argument of has to be a string, syntactically. It really is syntax, except that the function-call form is re-used, because it already has the right shape. So, this is it. Just a bunch of silly syntactical decisions, which add up to a language which is positively enjoyable to read. As for big lessons, obviously, the less features your language has, the less syntax you’ll need. And less syntax is generally good, because varied syntactic constructs tend to step on each other toes. Languages are not combinations of orthogonal aspects. Features tug and pull the language in different directions and their combinations might turn to be miraculous features in their own right, or might drag the language down. Even with a small feature-set fixed, there’s still a lot of work to pick a good concrete syntax: unambiguous to parse, useful to grep, easy to read and not to painful to write. A smart thing is of course to steal and borrow solutions from other languages, not because of familiarity, but because the ruthless natural selection tends to weed out poor ideas. But there’s a lot of inertia in languages, so there’s no need to fear innovation. If an odd-looking syntax is actually good, people will take to it. Is there anything about Zig’s syntax I don’t like? I thought no, when starting this post. But in the process of writing it I did discover one form that annoys me. It is the while with the increment loop: This is two-thirds of a C-style loop (without the declarator), and it sucks for the same reason: control flow jumps all over the place and is unrelated to the source code order. We go from condition, to the body, to the increment. But in the source order the increment is between the condition and the body. In Zig, this loop sucks for one additional reason: that separating the increment I think is the single example of control flow in Zig that is expressed by a sigil, rather than a keyword. This form used to be rather important, as Zig lacked a counting loop. It has form now, so I am tempted to call the while-with-increment redundant. Annoyingly, is almost equivalent to But not exactly: if contains a , or , the version would run the one extra time, which is useless and might be outright buggy. Oh well.

0 views
Martin Fowler 2 months ago

How far can we push AI autonomy in code generation?

Birgitta Böckeler reports on a series of experiments we did to explore how far Generative AI can currently be pushed toward autonomously developing high-quality, up-to-date software without human intervention. As a test case, we created an agentic workflow to build a simple Spring Boot application end to end. We found that the workflow could ultimately generate these simple applications, but still observed significant issues in the results—especially as we increased the complexity. The model would generate features we hadn't asked for, make shifting assumptions around gaps in the requirements, and declare success even when tests were failing. We concluded that while many of our strategies — such as reusable prompts or a reference application — are valuable for enhancing AI-assisted workflows, a human in the loop to supervise generation remains essential.

0 views
Armin Ronacher 2 months ago

In Support Of Shitty Types

You probably know that I love Rust and TypeScript, and I’m a big proponent of good typing systems. One of the reasons I find them useful is that they enable autocomplete, which is generally a good feature. Having a well-integrated type system that makes sense and gives you optimization potential for memory layouts is generally a good idea. From that, you’d naturally think this would also be great for agentic coding tools. There’s clearly some benefit to it. If you have an agent write TypeScript and the agent adds types, it performs well. I don’t know if it outperforms raw JavaScript, but at the very least it doesn’t seem to do any harm. But most agentic tools don’t have access to an LSP (language server protocol). My experiments with agentic coding tools that do have LSP access (with type information available) haven’t meaningfully benefited from it. The LSP protocol slows things down and pollutes the context significantly. Also, the models haven’t been trained sufficiently to understand how to work with this information. Just getting a type check failure from the compiler in text form yields better results. What you end up with is an agent coding loop that, without type checks enabled, results in the agent making forward progress by writing code and putting types somewhere. As long as this compiles to some version of JavaScript (if you use Bun, much of it ends up type-erased), it creates working code. And from there it continues. But that’s bad progress—it’s the type of progress where it needs to come back after and clean up the types. It’s curious because types are obviously being written but they’re largely being ignored. If you do put the type check into the loop, my tests actually showed worse performance. That’s because the agent manages to get the code running, and only after it’s done does it run the type check. Only then, maybe at a much later point, does it realize it made type errors. Then it starts fixing them, maybe goes in a loop, and wastes a ton of context. If you make it do the type checks after every single edit, you end up eating even more into the context. This gets really bad when the types themselves are incredibly complicated and non-obvious. TypeScript has arcane expression functionality, and some libraries go overboard with complex constructs (e.g., conditional types ). LLMs have little clue how to read any of this. For instance, if you give it access to the .d.ts files from TanStack Router and the forward declaration stuff it uses for the router system to work properly, it doesn’t understand any of it. It guesses, and sometimes guesses badly. It’s utterly confused. When it runs into type errors, it performs all kinds of manipulations, none of which are helpful. Python typing has an even worse problem, because there we have to work with a very complicated ecosystem where different type checkers cannot even agree on how type checking should work. That means that the LLM, at least from my testing, is not even fully capable of understanding how to resolve type check errors from tools which are not from mypy. It’s not universally bad, but if you actually end up with a complex type checking error that you cannot resolve yourself, it is shocking how the LLM is also often not able to fully figure out what’s going on, or at least needs multiple attempts. As a shining example of types adding a lot of value we have Go. Go’s types are much less expressive and very structural. Things conform to interfaces purely by having certain methods. The LLM does not need to understand much to comprehend that. Also, the types that Go has are rather strictly enforced. If they are wrong, it won’t compile. Because Go has a much simpler type system that doesn’t support complicated constructs, it works much better—both for LLMs to understand the code they produce and for the LLM to understand real-world libraries you might give to an LLM. I don’t really know what to do with this, but these behaviors suggest there’s a lot more value in best-effort type systems or type hints like JSDoc. Because at least as far as the LLM is concerned, it doesn’t need to fully understand the types, it just needs to have a rough understanding of what type some object probably is. For the LLM it’s more important that the type name in the error message aligns with the type name in source. I think it’s an interesting question whether this behavior of LLMs today will influence future language design. I don’t know if it will, but I think it gives a lot of credence to some of the decisions that led to languages like Go and Java. As critical as I have been in the past about their rather simple approaches to problems and having a design that maybe doesn’t hold developers in a particularly high regard, I now think that they actually are measurably in a very good spot. There is more elegance to their design than I gave it credit for.

0 views
Jefferson Heard 2 months ago

How I design backend services

First off, this is gonna be a long article. Strap in... I'm old, I get it. For the majority of my career, MVC was the way to design web applications. But I read a book about Evolutionary System Design and went to an O'Reilly conference on system architecture in 2018 and saw Martin Fowler talk on the " Microservice Design Canvas ," and the two things together completely changed my thinking on how to design systems. I've really fused these ideas to form my own theory about backend systems design. My goal is to create systems that are: The microservice design canvas has you design commands, queries, publications, and subscriptions, then data. I go a little further, because I try to take deployment, scaling aspects, and security into account at design time. Also I don't take a "microservices first" approach. In my experience the amount of work and money it takes to instrument and monitor microservices really just doesn't pay off for young companies with few or zero customers (where practically every system starts). I make it so it's easy to refactor pieces into microservices, but there's no requirement for the system to start there. There's also no requirement that you start with streaming or event sourcing solutions. This model works for traditional web APIs and for all the more advanced stuff that you end up needing when you finally do scale to hundreds of thousands to tens of millions of active daily users. All code examples in this post will be in Python, but truthfully this could work in anything from Java to Go to Typescript to LISP. My personal preferred core stack is: This is the order you want to do the design in. MVC runs deep in development culture, and most developers will try to start any backend project in the ORM. This is a mistake, as I've pointed out before. If you do that, then you will end up adding elements that serve no purpose in the final version of the service, and you'll end up creating data structures that aren't conducive to the efficient working of your interface. Start with the interface layer. Then hit the data layer. The data layer gives you your options for auth implementation, so then do that. All those tell you enough about your environment and infrastructure requirements that you can design those. And it's not that you won't or can't skip around a bit, but in general these things will resolve themselves fully in this order. Typical examples of service design start with a TODO list, but I'm going to start with a simple group calendar, because there are more moving parts that illustrate more than the trivial TODO list, and you will be able to see how much this model simplifies a service that's naturally a little more complex. This will be short, but I want to set the stage for the decisions I'm making and how I'm breaking it down. I'm treating this like a small service. User management is expected to happen elsewhere. This service will be called (indirectly) by a UI and by directly by other services that serve up the UI and broker scheduling information with other parts of the company's SaaS offering. It is an MVP, so it's not expected to handle calendar slots or anything "Calendly" like. It's just a barebones store of appointment data. Since user management is separate, the appointments can be owned by users, groups, or rooms or anything, but we don't care because that's all handled (presumably) by the user, group, and resource service. What do my customers care about for V1? What does my customer success team care about for V1? What do my product people care about for V1? What are the (non-universal) engineering requirements to make this system flow well? So basically my requirements are: There are also a few affordances that make my job easier: First a link to the post where I go into detail about Commands: Just to sum it up, your command and queries should be governed by base classes that provide the "guts" of their mechanics, and serve as suitable points to enhance those gut mechanics. The main difference between Commands and Queries is that the Query classes should only have access to a read-only connection into the data. But if for example, you want an audit trail of every command or query issued against the system, or if you want a total ordering or the ability to switch to event sourcing, the base classes are where you do it. I typically write bulk mutations first, because there are just so many cases where people want to import data and bulk mutations typically have more efficient implementation options than fanning out atomic ones. For our calendar, here is a list of commands: We want a small set of queries we can optimize our data structures for, but that will suffice to construct any kind of calendar for the UI, and to aid users in creating conflict-free appointments. Our publications: Our subscriptions: Our schedules: Now this gets interesting. Because we've already designed our commands there's a lot more to our data model than we first thought, as well as a lot fewer data attributes in the core model. We want as few attributes as possible. Online database migrations are pretty rock-solid these days, so accidentally getting into production without something that you want for a future release won't be a problem. Also, it's clear from above that we don't just want Postgres for our data model. We're searching appointments, so we want OpenSearch, and for simplicity's sake I'm assuming we're using Redis for pub-sub, and FastStream , which is a FastAPI like framework for streaming applications. You could use Kafka, SQS/SNS, or RabbitMQ, depending on your scale or your dev and SRE teams' proficiency. Now that we know how people want from our service, we can define our data and infrastructure, and points-of-scale in our code around it. We're going to use Postgres to store our appointment information. I'm not going to go into deep detail here about the fields, since those can be derived from the Command classes and the UI requirements but as someone who has designed more than one calendar in my lifetime, I have some notes: Now, let's talk data models. Recurrences will be stored separately from appointments, and each appointment within the recurrence will be an actual appointment record within the database. We'll add a new scheduled task, UpdateOccurrences, which will run once a month and calculate occurrences out for 2 years (an implementation note to tell Product and CS about). The same code should be used when saving an appointment so that occurrences are saved that far out on initial appointment creation. We'll want to set a field on our Recurrence model to say how far occurrences have been calculated. That way if someone modifies or deletes an occurrence from the set, we won't accidentally re-create it when we call UpdateOccurrences. Along with the Postgres record, we're going to want to index title, description, attendees, is external, and owning-application within OpenSearch. I won't bore you with the details of indexing these correctly because the requirements change a lot depending on the locales you operate in and tokenizers you choose. Also, you'll end up needing to query the service that expands attendee IDs into names most likely or code the search function to call it to reify full text searches to possible attendee IDs. This latter idea may be a little better for MVP, since it won't require you to setup an async task and broker to save appointments. How about that audit trail? Well you'll notice conveniently if you're using FastAPI and a pattern like my Command class that all commands are Events, and publications are also events. We can dump those events into Redshift or BigQuery and boom, our audit trail is now real. We can use the event history to cover the CS case of recreating changes in the event a bug or a person someone screws up someone's calendar. We can use the same audit trail to figure out how many appointments were created by whom for engagement metrics. And we can use the audit trail along with any logging we do in DataDog to measure our service against our performance and reliability metrics. The other great thing about everything being an Event is that we can adapt our commands to a Streaming modality easily once we get to the point where we have to scale out. Dump the command event into Kafka and consume it with FastStream. We're using an external service for all our user info, and we're proxying this service through our app-facing and user APIs, so presumably authentication is handled for us. Same with the security environment. All that is probably handled upstream. The only extra requirements we really have here is authorization. We want to allow our customers to make private appointments. That's easy, we can add that as a simple flag. But we probably also need to keep track of who can see who's calendar. I personally love AuthZed's (I swear they don't sponsor me, I just think it's a great tool) open source SpiceDB permission service. It's an implementation of Google's Zanzibar standard, which handles all the permissions storage and checks throughout Google, so you know it can handle your use case. So I'm going to suggest these new permissions in authzed without going into further implementation details: In each command I will make sure that the calendar can be modified, and in each query I will make sure that the response will be viewable to the requesting resource or person, and whether it should just be empty appointments with busy labels. Maybe this belongs above in Data and Infrastructure, but I want to treat this section separately. I prefer to use Datadog personally, but it can be quite expensive. However you create your dashboard and whatever you log data into, you want to measure, at the very least: Cost effective to develop. Easy for new developers to drop into. Performant enough to be cost effective to scale. Easy to break up into microservices or other components to support horizontal scaling as required. Easily "composable," i.e. big things don't require new code so much as aggregates of small things. Easy to monitor, react to problems, and to debug. Easy to add nontrivial things to on a time budget without creating new technical debt, or at least keeping it local to the thing added. CDK or Terraform/Helm AuthZed for permissions Creating, editing, and deleting appointments. Recurrence. Being able to see whether a new appointment will cause a conflict in their own or other attendees calendars, and what those conflicts are. When scheduling multiple people, they want to see the conflict-free openings in the calendar before having to click through weeks of people's calendars to find a time. Being able to view their calendar in various ways, up to a year at a time. Having all their calendar items together. All apps in the company should dump appointment data into this if they are creating it, and I should be able to load outlook and google calendars as well. Average calendar load time for a single person's calendar should be 2sec or less on good internet. Average calendar load time for a month of appointments up to 50 people should be 30 seconds or less. To be able to set 0 or more configurable reminders by email, SMS, or push. If an appointment disappears or moves in the calendar, they want to know how that happened - what app moved it and who, so that if a customer thinks it's a bug they can prove it one way or another and take the appropriate action, and we should be able to restore the old appointment. Engagement: how many calendar views across which people per day, and any appointment metadata that would be appropriate as classifiers distinguishing between the type of appointments people are adding or how many recurring vs. individual, etc. And how many appointments added / removed per day by people (as opposed to external services like Google and Outlook) There are other apps in my company with calendars. I need to make sure that appointments managed by those apps stay managed by those apps if there's metadata that my service cannot account for directly. timezone-aware timespans, including recurrence rules links and attachments appointment metadata such as originator, attendees, title and description some way to refer to an external owner of the appointment, for appts coming from services like Outlook and Google and for internal owners understanding which app made it and whether it has to be managed by that app (such as if the app is storing other metadata on the record that would be corrupted if a different app edited it. An audit trail of some sort to serve both the CS and Product use-cases. Restoring appointments can be manual by CS or engineering, but only if the audit trail exists with sufficient data on it. On average, a person will not have more than 4-5 appointments a day, 5 days a week. That totals out to 1,300 appts per person per year that we have to save, load, or destroy. It's likely I can do this much bulk writing in a single transaction and stay within the time limits we've imposed on ourselves. Across all our services, there are really no more than 150,000 daily active users, and of those, we can estimate that they're only changing their calendars a couple of times a day at most. That means that the write traffic, at least at first, will be fairly low. I can likely get to our MVP without pushing the writes to an async worker, although it's likely something we're going to want as our app grows. Traffic will be bursty, but within limits. When new users are added or when they sign up to sync their calendar from Outlook or Google, we're going to see a spike in traffic from baseline. This can likely be handled by combining FastAPI BackgroundTasks (less clunky than a queueing system and async workers for infrequent bursty traffic) with the kubernetes autoscaler, at least at first. SetOpenHours (rrule) - Allows someone to set the time of day where someone else is allowed to schedule meetings including them. Like work hours. SaveAppointments - Bulk creation and update of appointment data. DeleteAppointments - Bulk deletion of appointment data. Cannot delete from external services or app-management-required appointments. CheckForConflicts (timespan, attendees, limit, cursor) - Return a list of people that have conflicts for a given timespan. FindOpenings (timespan, attendees, limit, cursor) - Return a list of timespans within the input timespan where the attendees have no conflicts. GetAppointments (timespan, attendees, limit, cursor) - Return a list of appointments, sorted by attendee, and then by timespan. SearchAppointments (search_criteria, sort_criteria, timespan, limit, cursor) - Return a list of appointments that meet a certain set of search criteria, which can be one or more of: Title Description Is External Is Owned By (app) AppointmentCreated( id, owner, attendees, timespan) - lets subscribed services know when someone's calendar appointment was created. AppointmentModified (id, owner, attendees, timespan) - lets subscribed services know when someone's calendar appointment was moved. AppointmentDeleted (id, owner, attendees, timespan) - lets subscribed services know when someone's calendar appointment was deleted. AppointmentReminderSentReceived (id, owner, attendees, scheduled_time, actual_time) - lets subscribed services know when a reminder was sent out and received by the given attendees. AppointmentReminderSentFailed (id, owner, attendees, scheduled_time, code, reason) - lets subscribed services know when a reminder was sent but failed to be received. ExternalCalendarsBroker - Trust me, use a 3rd party service for this. I'm going to suggest either Cronofy or Nylas's APIs for this. But be aware of and put in extra time to design for external system failures and hangups. Your users will judge discrepancies between their Google and Outlook calendars vs. yours harshly and you want to be able to explain those differences when they happen and have something to push support on at the service you do choose when there are issues. SendReminders - Sends reminders out to attendees. Will run once a minute to check if there are reminders to send and then send them. This may actually be broken up into a couple of scheduled tasks, depending on how many reminders you're sending out per minute, and how long they take to process. There is a fair amount of subtlety in sending reminders when people are allowed to change appointments up to the last minute. You're going to want to define a "fitness function" for how often reminders can fail, and how often reminders for deleted appointments can be sent out, and use that to determine how fiddly you want to be here. Indexing Timespans - Postgres has an interval type, and that interval can be indexed with a GIST index. This gives us an optimal way to search for appointments with overlapping intervals and covering single points in time. Indexing Attendees - Likewise an array type with a GIN index will give us the ability to search for all appointments that include a given set of attendees. We may need a CTE to deal with set math, but it will still be highly optimized and relatively straightforward. Timezones - Store the timezone as an ISO code (not an offset) on a separate field and store the actual interval in UTC. If you don't store all your time info in the same timezone then you can't effectively index the timespan of an appointment. Okay, you can but your indexing trigger starts to look complicated and you're saving nothing in terms of complexity in the timespan itself. Why use a code and not an offset? Because if someone moves the appointment and it crosses a DST boundary when they do so, you won't know to change the offset without the UI's help, making the interaction between your service and others need more documentation and making it more error prone. Recurrence Info - God save you and your users, read the RRULE standard closely, and in particular read the implementation notes for the most popular Javascript library and whatever your backend implementation language is. Better yet, this is one of those rare places where I'd advise you roll your own if you have time , rather than use the widely accepted standard, because the standard is SO loose, and because the different implementations out there often skirt different sets of details in it. But if you use RRULE, one big non-obvious detail you need to trust me on: store the RRULE itself in the local time and timezone that the user used to create the recurring appointment. If you don't, day-of-week calculations will be thrown off depending on how far away from the UTC timezone someone is, and how close to midnight their appointment starts. It's not that you can't correct for it, but one way lies 2400 lines of madness and bugs and the other way lies a different but far simpler type of madness. CanViewCalendarDetails - Whether or not the caller can view a given attendee's calendar (group, person, whatever – the perm services can handle this) CanModifyCalendar - Whether or not the caller can modify a given attendee's calendar. CanViewCalendarFreeBusy - Whether or not the caller can view free/busy info for a given attendee, even if they can't view the full calendar. p50 and P95 time to execute a save or delete appointment command. Alert if above 75-90% of the threshold determined by the users and schedule time to get it down if alerts are becoming more consistent. P50 and P95 times for each query. There are lots of ways to increase the performance. Sharding OpenSearch indexes, Postgres partitioning, tuning queries, storing the appointment data in multiple redundant tables with different indexing schemes tuned to each specific query, and caching the results of calls to other services or whole calendar responses. Failure rates - you want to alert on spikes in these. 404s, 401s, and 403s are generally problems downstream from you. They could indicate a broken UI or a misconfigured service. 400s could be a failure of coordination between you and your consumers, where you've released a breaking change without telling someone. 500 is definitely on you. 502 and 503 mean your services aren't scaling to meet depend. Track spikes and hourly failure rates over time. If the failure rates are increasing, then you should schedule maintenance time on it before the rates spill over your threshold. The key to good stewardship of engineering is proactivity. If you catch a failure before your customers do, you're doing well.

0 views
annie's blog 2 months ago

Let there be lapses

Let there be lapses Weeds in the garden, unswept porches, A walk never taken, A flower unnoticed, Missed bill, missed text, missed appointment. Let there be undone things Half-written sentences never finished A stack of books never read Blank pages, unseen lines Words never seen or heard or spoken. Let there be glory in what-is-not — All the unachieved Unbelieved Underserved Overlooked. Let us glory in these. Let there be errors Not just the tiny ones we can laugh away But enormous, life-altering errors. Huge risks taken which do not end well. Huge efforts made which result in what we call failure. (In fairness, Any effort is success in certain realities.) But let us — for a moment — judge by the world of machines, Of binaries Of industrialized morality And call it failure. Failure is the word we assign to all unexpected outcomes. So, let there be failure. Let failure warp our seeing and diminish our being, Let it ride among us waving a torch, Shame-blasting and guilt-smearing, Blinding us with ridiculously disproportional fiery judgment, Grinding nose to dirt Binding self to work. Let there be mistakes which make us weep Keep us awake at night Cause us to question our sanity, our decency, Our right to be here, Our ability to keep being here. Let there be broken edges Sawed-off pieces we cannot smooth down Pointy bits irritating and upsetting Dangling splinters and shards over chasms of regret. Let there be surrender. Let us call it what it is: giving up. Surrender sounds too noble, Enlightened, as if I didn’t have to but I chose to. That’s not what this is. Let there be quitting. Let there be Done. Not because we see what we have made, and it is good. This is not putting a bow on a gift. This is saying some things are too broken to be fixed. Let there be giving up. Lay down there, lay down, be still, give up. Face in the mud, breathing in, wheezing in the stuff of life, the dirt, The lowly dirt, the trudged-upon dirt, the worthless dirt From which we came and to which we all return. Let us lay there, breathing in this dirt, This pure self This known self This elemental self. Hell yes, failure. I embrace you. Brother! Sister! Mother! Father! Come quickly! Come and rejoice, for I have failed! Come and celebrate! Set out the feast! Call the guests! And enter into the joy of your child: Humanity raw Humanity broken Humanity dirty Humanity ill-fitted to survive Humanity traumatized Humanity doing such a fucked-up job of it Humanity violent and stumbling Humanity bruised and crusted at the edges Humanity clawing its way from the dark tunnel of history Humanity side-eyeing the stars while blood drips from our fingers Humanity bargaining for the right to squirm Humanity bringing a sword to a gunfight Humanity bullshitting Humanity asking clever little questions Humanity dressed in robes, obsessed with ovaries Humanity unhinged and in charge Humanity waving exasperated hands in the air Humanity dishing out pieces of pie Humanity weeping at the sight of spring flowers Humanity with big rough hands so careful so gentle holding a tiny new fragile thing Humanity with smooth precise hands making deals, ending lives Humanity dropping bombs Humanity being a big dumb bully Humanity the most awkward of the species Humanity voted most likely to secede from the planet Humanity pointing and saying look at this! wow! Humanity wondering, always wondering Humanity exhausted sitting in a patch of sunlight Being dirt. Dirt with form, dirt with spirit. Pale faces float through quiet rooms, ghostly fingers flutter in hallways. Pens move across expensive paper. Golden liquid sloshes in crystal while murmuring voices ooze and wind and hush and tell us there is nothing to worry about.  But this is no time to be civilized.  Let there be lapses: Lapses of courtesy, lapses of decorum. Failures of politeness. Refusals to conform. Let there be a wildness ringing in us for each other — Hissing, bared teeth, spitting — Reverberating, thrumming, cracking the marble palaces full of dead men’s bones.

0 views
Max Bernstein 4 months ago

What I talk about when I talk about IRs

I have a lot of thoughts about the design of compiler intermediate representations (IRs). In this post I’m going to try and communicate some of those ideas and why I think they are important. The overarching idea is being able to make decisions with only local information. That comes in a couple of different flavors. We’ll assume that we’re compiling a method at a time, instead of a something more trace-like (tracing, tracelets, basic block versioning, etc). A function will normally have some control-flow: , , , any amount of jumping around within a function. Let’s look at an example function in a language with advanced control-flow constructs: Most compilers will deconstruct this , with its many nested expressions, into simple comparisons and jumps. In order to resolve jump targets in your compiler, you may have some notion of labels (in this case, words ending with a colon): This looks kind of like a pseudo-assembly language. It has its high-level language features decomposed into many smaller instructions. It also has implicit fallthrough between labeled sections (for example, into ). I mention these things because they, like the rest of the ideas in this post, are points in an IR design space. Representing code this way is an explicit choice, not a given. For example, one could make the jumps explicit by adding a at the end of . As soon as we add that instruction, the code becomes position-independent: as long as we start with , the chunks of code between labels could be ordered any which way: they are addressed by name and have no implicit ordering requirements. This may seem arbitrary, but it gives the optimizer more flexibility. If some optimization rule decides, for example, that a branch to may rarely be taken, it can freely re-order it toward the end of the function (or even on a different page!) so that more hot code can be on a single cache line. Explicit jumps and labels turn the code from a strictly linear assembly into a control-flow graph (CFG). Each sequence of code without internal control-flow is a called basic block and is a vertex in this graph. The directed edges represent jumps between blocks. See for example this crude GraphViz representation: We’re actually kind of looking at extended basic blocks (EBBs), which allow for multiple control exits per block but only one control entry. A strict basic block representation of the above code would look, in text form, something like this: Notice how each block has exactly one terminator (control-flow instruction), with (in this case) 0 or 2 targets. Opinions differ about the merits and issues of extended vs normal basic blocks. Most compilers I see use normal basic blocks. In either case, bringing the IR into a graph form gives us an advantage: thanks to Cousot and Cousot, our favorite power couple, we know how to do abstract interpretation on graphs and we can use this to build an advanced optimizer. See, for example, my intro to abstract interpretation post . Some IRs are stack based. For concatenative languages or some newer JIT compilers, IRs are formatted in such a way that each opcode reads its operands from a stack and writes its outputs to a stack. This is reminiscent of a point-free coding style in languages such as Haskell or OCaml. In this style, there is an implicit shared state: the stack. Dataflow is explicit (pushes and pops) and instructions can only be rearranged if the stack structure is preserved. This requires some non-local reasoning: to move an instruction, one must also rearrange the stack. By contrast, in a register-based IR, things are more explicit. Instructions take named inputs ( , , etc) and produce named outputs. Instructions can be slightly more easily moved around (modulo effects) as long as inputs remain defined. Local variables do not exist. The stack does not exist. Everything is IR “variables”. The constraints (names being defined) are part of the IR . This gets a little bit tricky if it’s possible to define a name multiple times. What does mean in the instruction for ? Which definition does it refer to? In order to reason about the instruction , we have to keep around some context. This is non-trivial: requiring compiler writers to constantly truck around side-tables and update them as they do analysis is slow and error-prone. Fortunately, if we enforce some interesting rules, we can push that analysis work into one pass up-front… Static single assignment (SSA) was introduced by a bunch of folks at IBM (see my blog post about the different implementations). In SSA-based IRs, each variable can only be defined once. Put another way, a variable is its defining instruction; alternately, a variable and its defining instruction are addressed by the same name. The previous example is not valid SSA; has two definitions. If we turn the previous example into SSA, we can now use a different name for each instruction. This is related to the unique name assumption or the global names property: names do not depend on context. Now we can identify each different instruction by the variable it defines. This is useful in analysis… I’d be remiss if I did not mention continuation-passing style (CPS) based IRs (and in fact, I had forgotten in the original draft of the post). As an IR, CPS is normally used in the analysis and optimization of functional programs, for example in the OCaml and Racket compilers. It is not required, however; MLton, for example, uses SSA in its compiler for Standard ML. SSA and CPS can more or less represent the same programs, but they can each feel a natural fit for different languages (and different compiler authors). I don’t feel qualified to say much more here. For a more informed opinion, check out Andy Wingo’s approaching cps soup , especially the benefits and drawbacks near the end. Speaking of CPS, I took a class with Olin Shivers and he described abstract interpretation as “automated theorem finding”. Unlike theorem provers such as Lean and Rocq, where you have to manually prove the properties you want, static analysis finds interesting facts that already exist in your program (and optimizers use them to make your program faster). Your static analysis pass(es) can annotate your IR nodes with little bits of information such as: If your static analysis is over SSA, then generally the static analysis is easier and (potentially) storing facts is easier. This is due to this property called sparseness . Where a static analysis over non-SSA programs has to store facts about all variables at all program points , an analysis over SSA need only store facts about all variables, independent of context. I sometimes describe this as “pushing time through the IR” but I don’t know that that makes a ton of sense. Potentially more subtle here is that we could represent the above IR snippet as a list of tuples, where instructions are related via some other table (say, a “variable environment”): Instead, though, we could allocate an object for each instruction and let them refer to one another by pointer (or index, if using Rust or something). Then they directly refer to one another (no need for a side-table), which might be faster and more compact. We can re-create nice names as needed for printing. Then, when optimizing, we look up the type information of an operand by directly reading a field ( or similar). Another thing to note: when you start adding type information to your IR, you’re going to start asking type information questions in your analysis. Questions such as “what type is this instruction?”, where “type” could span a semilattice, and even refer to a specific run-time object by its pointer. In that case, it’s important to ask the right questions . For example: instructions are likely not the only opcodes that could produce specific objects; if you have an instruction like , for example, that burns a specific expected pointer into the generated code, the type (and therefore the pointer) will come from the instruction. The big idea is that types represent a different slice of your IR than the opcodes and should be treated as such. Anyway, SSA only stores type information about instructions and does not encode information that we might later learn in the IR. With basic SSA, there’s not a good way to encode refinements… Static single information (SSI) form gives us new ways to encode metadata about instructions (variables). It was introduced by C. Scott Ananian in 1999 in his MS thesis (PDF). (I also discussed it briefly in the Scrapscript IR post .) Consider the following SSA program (represented as pseudo-Python): is undefined at . is defined and an integer at . But then we do something interesting: we split control flow based on the run-time value of . We can take this split to add new and interesting information to . For non-sparse analysis, we can record some fact on the side. That’s fine. When doing a dataflow analysis, we can keep track of the fact that at , is nonnegative, and at , is negative. This is neat: we can then determine that all paths to this function return a positive integer. Importantly, does not override the existing known type of . Instead, it is a refinement: a set intersection. A lattice meet. The middle bit of a Venn diagram containing two overlapping circles, and . If we want to keep our information sparse, though, we have to add a new definition to the IR. This is complicated (choose which variables to split, replace all uses, to maintain SSA, etc) but gives us new places to store information inside the IR . It means that every time we refer to , we know that it is nonnegative and every time we refer to , we know that it is negative. This information is independent of context! I should note that you can get a lot of the benefits of SSI without going “full SSI”. There is no need to split every variable at every branch, nor add a special new merge instruction. Okay, so we can encode a lot of information very sparsely in the IR. That’s neat. It’s powerful. But we should also be mindful that even in this very sparse representation, we are encoding information implicitly that we may not want to: execution order. In a traditional CFG representation, the instructions are already scheduled , or ordered. Normally this comes from the programmer in the original source form and is faithfully maintained. We get data use edges in an IR like SSA, but the control information is left implicit. Some forms of IR, however, seek to reify both data and control dependencies into the IR itself. One such IR design is sea of nodes (SoN), which was originally designed by Cliff Click during his PhD. In sea of nodes, every instruction gets its own vertex in the graph. Instructions have use edges to their operands, which can be either data or some other ordering property (control, effects, etc). The main idea is that IR nodes are by default unordered and are only ordered later, after effect analysis has removed a bunch of use edges. Per Vognsen also notes that there is another motivating example of sea of nodes: in the previous SSI example, the cannot be validly hoisted above the check. In a “normal” IR, this is implicit in the ordering. In a sea of nodes world, this is explicitly marked with an edge from the to the . I think Graal, for example, calls these nodes “Pi nodes”. I think I need to re-read the original paper, read a modern implementation (I get the feeling it’s not done exactly the same way anymore), and then go write more about it later. For now, see Simple , by Cliff Click and friends. It is an implementation in Java and a little book to go with it. Design neighbors include value dependence graphs (VDG), value state dependence graphs (VSDG), region value state dependence graphs (RVSDG), and program dependence graphs (PDG). Speaking of Cliff Click, I once heard/read something he said that sounded really interesting. Roughly, it was “elaborate the full semantics of the operation into the IR and let the optimizer sort it out”. That is, “open code” or “inline” your semantics. For example, don’t emit code for a generic add operation that you later specialize: Instead, emit code that replicates the written semantics of the operation, whatever that is for your local language. This can include optimistic fast paths: This has the advantage that you may end up with fewer specialized rewrite rules because constant propagation and branch folding take care of these specializations “for free”. You can even attach probabilities to more or less likely branches to offer outlining hints in case all of this is never specialized. Sure, the downside of this is that the generated IR might be bigger, so your optimizer might be slower—or worse, that your resulting generated code at the end might be bigger. But outlining, deduplication (functionalization?), and probably some other clever methods can help here. Similarly, Maxime Chevalier-Boisvert and Marc Feeley write about this (PDF) in the context of basic block versioning. If the runtime’s generic add functions is written in IR, then callers to that function can specialize “through it” by calling it in different basic block contexts. That more or less gets you call-site specialization “for free”. See Figure 4 from their paper (lightly edited by me), where I think dollar-prefixed variable names indicate special functions known to the compiler: This is nice if you are starting a runtime from scratch or have resources to devote to re-writing chunks of the runtime in your IR. Then, even in a method JIT, you can get your inlined language semantics by function (partial) inlining. There’s probably more in this vein to be explored right now and probably more to invent in the future, too. Some other potentially interesting concepts to think about include: Thank you to Chris Fallin , Hunter Goldstein, and Per Vognsen for valuable feedback on drafts of this post.

0 views
Steve Klabnik 4 months ago

A tale of two Claudes

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only. I recently had two very different experiences with Claude Code. I wanted to share them both, because I find the contrast interesting. I have really been enjoying TailwindCSS . I also have started several web projects using it in the past year. Back in January, Tailwind released version 4 . Honestly, I am not a good enough Tailwind user to be able to appreciate the new features, but I do prefer to keep my projects up to date. So I’ve needed to update my Tailwind projects to version 4. Claude… is nearly completely useless for this. You’d think that upgrading a very popular CSS framework would be a straightforward task. But Claude has failed to set it up multiple times. While it’s true that Claude Code claims that its training cutoff is in January of 2025 in its system prompt, it seems to have a cutoff in March. Regardless, a useful tool shouldn’t need to be explicitly trained on the latest version of a framework to be able to help with it. Yet, it consistently fails to either set up Tailwind 4 in new projects, or upgrade existing projects to this fourth release. I did manage to get it to work earlier this week, when I asked it to update this website , but at this point, it was mostly just me messing around. I wish I had saved the exact prompt, but basically I said something like “you have failed me consistently in this task, so please search for upgrade guides and the like to guide you in this update.” It did do that, but for some reason, the upgrade tool couldn’t update my configs, and so it decided to do it itself, and well, while this time it did manage to technically update, it didn’t do it completely correctly. It removed the typography plugin, which I had to ask it to put back. And it didn’t properly import the config into my global CSS, which also caused weird issues. Regardless, it did manage to sort of do it, after some coaxing, but this was a frustrating experience that was probably the third or fourth time I had tried to get it to do this. While Tailwind 4 is a new major version release, it’s still incredibly popular, and one of the hunches I have about LLM usage for coding is that more popular tools will be easier for LLMs to use. This didn’t seem to be the case here, though. For… reasons, I am working on an assembler. Don’t worry about it, you may find out about it someday, and you may not. It’s not for work, just for fun. Anyway, if you’re not familiar, basically this process turns assembly code into machine code. Specifically, in this case, I’m encoding x86_64 assembly into an ELF executable. ELF is a file format that is used by, among other things, Linux, for executables. I’m working with Claude on this, and I again didn’t save my prompts, which I am annoyed by, but I roughly said something like “please implement codegen such that it produces an ELF executable” and let it go. I did not expect this to work, or go well. As I said above, I sort of expect more common tasks to be easier for LLMs, given that they have more training data. While I’m sure there’s some code in Rust out there that does this, I still don’t expect it to be anywhere near as common as TailwindCSS. A few minutes later, Claude said “hey, I did it!” For completely unrelated reasons, I got a message from Ixi , a friend of mine who would be particularly interested in this project, and so I said something along the lines of “hey, I’m working on a project you’d be interested in, claude has just made some large claims: Steve, we’ve successfully implemented a complete end-to-end x86-64 code generation pipeline Do you want to look at the PR together?” and they said yes. So I shared my screen, and we went through the PR together. It was pretty fine. A lot of the stuff at first was normal, but a bit boring: the error handling was fine, and maybe slightly different than I would have done, but it wasn’t so bad that if a co-worker had sent in the PR, I would have objected to merging it. But Ixi noticed something: These sizes are… wrong. They should (probably) be one byte. Fun! Most of the code looked overall reasonable, though. So let’s look at that ELF it produced. I ran the executable, and it segfaulted. Surprise! While Claude did think it was done, because it had produced an ELF file, and the tests were passing, it didn’t actually work. Given that those sizes were wrong, this wasn’t really a surprise. We decided to dig into the code a bit more, but first: I told Claude “hey, this example you compiled segfaults, something is wrong, can you fix it?” Again, not the exact prompt. I wish I had saved it, but I didn’t think I was writing this blog post at the time, so I didn’t. But it was very basic like that. And I let Claude go in the background while we took a look. We used to see some of the information about the ELF file. We ended up using to see where it was crashing: There’s no debug info here, so we only have addresses. There was a jump going to an address that ends in . This is incorrect: was three bytes into a , so… we were jumping to the middle of an instruction, which is nonsense, hence the segfault. (Now that I’m looking at this, I also find these other addresses suspect. I wish I had taken better notes while we were looking at this… the point is this was a quick check as to what was going on, and it was very clear that this is just straight-up wrong.) How could this happen? Well, if the sizes of the instructions were wrong, when the code figures out where to jump, it would go to the wrong place, as it would mis-calculate how far away the jump target was. So this all made sense. Hypothesis: instruction sizes are incorrect, which means the jump calculations are incorrect, which means fail. I had to run, so we ended the call, but I left Claude running. A few minutes later, I checked in on it, and that’s when I was really surprised. Claude had fixed the issue! It wasn’t really even about that, but also, how it did so: Hmmm. It’s kind of right, but also maybe not? I kept scrolling: That also didn’t feel right? More scrollback through. It was creating sample programs in assembly, and then assembling them, and then examining the contents of the files. It was double checking its assumptions and understanding of the code, and then, there it was: Okay, wait… , and not / ? Sure enough, not only had it figured out that and were one byte, not two, but it had also figured out that had a few forms with different sizes, and that did too. After fixing that… the executables worked! But really, the most impressive part is that it had basically done exactly what we had done, but in the background, while we were looking at the code. It took a little longer to get there, because it went down one rabbit hole, but it had also found other bugs that weren’t immediately obvious to us. And, I’ll be extra honest: while I understand what’s going on here, I am not super great at and , and if Ixi wasn’t around, Claude would have probably found this before me, if I were alone. This is part of why having friends is great, but if I didn’t have that friend around at that moment, this would have tremendously helped. I am also impressed because, as mentioned above: I would expect this task to be much more rare than web development stuff. Yet, Claude did a far better job, finding the issue quickly and fixing it, as opposed to the multiple times I gave it a chance with Tailwind. What does this mean? I don’t know, but I figured I want to start documenting these experiences, good and bad, and so I wanted to share both of these stories, one good, and one bad. LLMs are not magic, or perfect, but I have found them to be useful. It’s important to share both the successes, and also the failures. I’m hoping to do a better job of tracking the prompts and details to give you a better understanding of the specifics of how I’m using these tools, but the perfect is the enemy of the good, and so these are the stories you’re going to get for now. Here’s my post about this post on Bluesky: A tale of two Claudes steveklabnik.com/writing/a-ta... A tale of two Claudes Blog post: A tale of two Claudes by Steve Klabnik

0 views

That was easy: A keyboard for vibecoding

I've been spending a lot of my coding time lately working with a variety of AI coding assistants. If you have any contact with the tech ecosystem in 2025, tools like Cursor, WindSurf, Claude Code, Cline, Aider, Codex, and VS Code Copilot are pretty hard to avoid running into. At their core, these tools are fancy interfaces around AI models like Claude or ChatGPT or Gemini that have extra tools and affordances for writing code. If you believe the hype, making an app is now almost as easy as...pressing a button. The truth is a little bit more complicated. The quality of the tools is somewhat variable, but they've all been evolving pretty rapidly. At this point, my mental model for AI coding assistants is an incredibly well-read junior engineer with questionable taste, unlimited enthusiasm, no ego, ADHD, and a traumatic brain injury that causes pretty serious memory issues. If they write something down, they're good. They've developed all sorts of wild coping strategies that are far more effective than they have any right to be. They generally have no spine and back down way too fast when you question something they're doing...even if they're right. They're prone to distraction. Again, getting them to write down what they're supposed to be doing can help a ton. But also, they're prone to just kind of trailing off without having quite finished what they're working on. I find myself typing on a pretty regular basis. When I mentioned this to my friend Harper , he said that he runs into the same thing and finds himself typing many, many times per day. And then he said he wanted a keyboard that just does that for him. Way back in 2021, Stack Overflow famously shipped a three-key keyboard optimized for their workflows with C and V keys for copy and paste. In the Stack Overflow era, we really did need those three keys to be productive as engineers. LLMs have made software development simultaneously a lot more complicated and at least 3x "easier". Once Harper asked, I knew exactly what I had to do. Sadly, genuine Staples Easy Buttons are a little bit harder to come by, but knockoffs are easy to find on your choice of online mega-retailer. Once it showed up, I found the four screws hidden under the four cheap rubber feet, and took a look at what I was working with. Inside was a tiny circuit board with a single IC, a cheap speaker, a pair of AA batteries and a button. In place of the spring you might find in a more premium switch was a stamped circle of open-cell foam. This thing was incredibly cheap, but it was suitable for our purposes. I dug out a Raspberry Pi Pico, opened up the Arduino IDE and dashed out a quick purpose-built keyboard firmware: Before I started soldering, I tested things out by shorting pin 21 to ground with a pair of tweezers, accidentally spamming Slack. A little bit of solder and hot-glue later and we were in business. When you push the button, it tells your AI buddy to keep working and still plays an inspirational message to remind you just how much AI is improving your life. https://youtu.be/3t6V3p3hR0g If you take the batteries out, it still works as a keyboard, but stays silent. Which is a good thing, because the sound effects get old really fast.

0 views
Jefferson Heard 5 months ago

So You Wanna Buy a Tech Company

I've run the tech side of the M&A playbook now I think 10 times. I want to talk to fellow tech executives who are looking at acquiring a company about tech diligence and what it's for. In 2021 we bought a 30 acre hobby farm. Our house was built in 1947, and it was maintained largely by DIYers for the last quarter century. When we had our home inspection done it wasn't to decide whether or not to buy the house, but to lay out for us clearly what work would likely need to be done, and to help us prioritize it. The well needed replaced. The basement needed shored up in one place. The heating system memorably involved a heat pump, an electric emergency coil, a heating-oil switchover, and a wood burning furnace. But ultimately, we bought the house. The inspection was important, but it didn't carry much weight with us in terms of whether to buy. It just helped us understand what the purchase implied about where our money was going to go in improvements. As the technical executive running diligence – whether you do it yourself or you contract it out to a company that specializes in it – what you're ultimately doing is the same thing. You need to know how to get the new acquisition incorporated cleanly and quickly and at what cost. The Ecology of Great Tech No spam. Unsubscribe anytime. So then, if not "whether to buy," what do you want out of tech diligence? That's it. It's not about whether you're going to buy that house that comes with the 30 acres, mature walnut grove, and spring-fed pond. Diligence is about strategy. And honestly? I don't think most people really understand that. It's about being able to hit the ground running as soon as the deal closes. In the series of blog posts starting with So You Wanna Buy a Tech Company , I will develop deep-dives on the different aspects of technical diligence and what the CTO and parent, and target tech team's roles are in them. But first I want to spend the rest of this post talking about how to get your money's worth out of tech diligence. I'm going to talk about what kinds of questions to ask, and how to build those three bullet points above. If you're interested in a deeper conversation about tech diligence, reach out to me . I love this stuff, and I'd be more than happy to talk about running your play with my network of experts or one-on-one helping you build any of the plans I talk about here afterwards. Okay, first thing's first. There is an occasion where you tell your fellow executives to run screaming . It's possible for a company to be so underwater technically that they're teetering on the precipice of disaster and inviting that company into your organization will drag you down with them. But it's unlikely. Most of what can go wrong after acquisition boils down to poor planning or poorly set expectations. Regarding expectations, the second goal of technical diligence is to introduce the target team to the culture, and to the way engineering leadership thinks about building, and how they lead. Before I continue, it's of utmost importance that you're open and nonjudgemental throughout your interactions with the tech and product teams. And it's important that you do interact, even if you've hired an outside firm for the diligence work. They need to get a feel for your company, and you need to see how their team is reacting to the prospect of working with you. With that out of the way, let's talk about how you make a plan. Something that almost never gets covered in tech diligence is unwanted overlap or redundancy. When you buy a company and don't kill their product, there will be some announcement from marketing that "Company X is excited to join the Y family of products!" This implies integration to customers, and they will expect it. What will that integration take to achieve? There's a big emotional component to this. It's less important that the guts of your product are seamlessly integrated than that your customers get the experience of integration in a timely fashion. Determine what constitutes the experience of integration, and develop a plan for it. Use your tech diligence people to ask questions like: In general, scarce or conflict-prone resources are things about the customer's experience that can come into conflict between your product and your acquisition target's product. The important things to make customers feel like the acquisition was of value to them are The quicker you reduce redundancy and confusion, the faster you get to added value. You should also be careful when you take features away from the target product and direct people into your own. It should feel natural and not be something someone has to remember. Also, they won't want to lose history from the product with the lost feature. You should talk to their product team about their vision for integration. How they're thinking about this will tell you a lot about how they're going to work with the rest of your team. Then talk to the tech diligence team about that vision and brainstorm questions to ask to figure out feasibility, horizon, and cost. Every piece of software has tech debt. If the tech team tells you theirs doesn't that tells you a lot about them. Ask them to define what tech debt means to them, as well. I have another post which talks about tech debt in-depth, and their own definition ought to align with it on some level. The more their view of debt aligns with aesthetics and trends and the less impactful it is on business the more direct leadership they will need. Mainly your job here is to get the tech team to talk about their debt, give you their gut impressions of impact and priority, and to get them comfortable with the idea that you're not judging them for doing what was necessary to get their business to where it is. The important thing here isn't so much that there might be "gotchas" like AGPL licenses or production data on Dropbox, but that you find them and have a plan to do something about them. At a high level you need to understand: There are clearly other details you need, and I'll go over this top in depth in another post, but at a high level this is what I consider the most important take-away. Most of the time you'll be acquiring a company that's significantly smaller than you. They're at an earlier stage in their journey, and as such they will have security risks you've already mitigated. You want to ask yourself and your tech diligence folks, "If we incorporate their technology, will it materially impact our security posture or certifications?" You don't need them to be perfect, but you do need to know how far off they are from your own standards. A very non-exhaustive list of good things to understand are: A lot of acquisitions will have a "bring things up to standards" period immediately post-merge where they cannot release new features or provide any evidence to customers of integration with your products. You want to know how long this period will be so you can inform customer and your colleagues' expectations. To get at that, what you're really after is: When you acquire the company, you acquire the team. Its trust issues in its own leadership, in your leadership, its overall sentiment and culture. The size, strength, independence, and makeup of the team gives you your strategy for incorporating them. Diligence here is about determining that and about first impressions. You're not (likely) up front deciding who you're going to keep and who you're going to lose. That puts the cart before the horse. You want an incorporation strategy first and foremost. And here alone, more than answering questions you have, it's more important to establish trust. They're going through a big change, and you and your fellow executives are the architects of that change. They need a sense that they're participating in that plan and that it's not being made for them. They need to get a feel for what's likely to happen on day 1, 30, 60, 90, and forward. Does their product have a trajectory or is it likely to be sunset? You don't have to tell them, but they're going to walk away with a theory anyway, so it's best if you establish trust and rapport early. This is the most important part, but it's also the part I have the least advice for. It's not because I don't have an opinion but because the questions you have to ask are so culture and team dependent. You should work with your diligence folks to get the facts about the team and use your impressions to create questions that the get answered. But it's all going to be highly dependent on whether you're acquiring a couple of founders or a 50-person tech team, and what the state of their culture is. Your overall goals, though don't change. This doesn't go half into the detail I'd like it to and it's still probably too long. In future posts on this subject, I'm going go into detail about each of the subjects and talk about what I look for, what questions I've asked in the past, and what my experiences have been good and bad in incorporating companies into the fold. I'm not going to give horror stories. There's plenty of clickbait on the internet for that if you want it. But I can talk about what mistakes I've made and what I think I'd do again. I love the process of technical diligence. I love meeting new teams, talking about cool technology, and figuring out how to make the best marriage out of two companies on a course for greatness. If you'd like to talk to me in depth about this, send me a note and we'll set something up! The Ecology of Great Tech No spam. Unsubscribe anytime. An assessment of the technical strengths, debt, and risk. A orderly plan for incorporating the team and the technology post-acquisition. A clear understanding of the likely costs to budgets and timelines of the above. How is auth handled? Do they support SSO? How are profiles stored and updated? Are there "conflict-prone" resources of the user that the product keeps track of that our product also does, like calendar appointments? Avoiding double entry Avoiding conflicts between systems Giving them a clear choice of which product to open. How and where their software is deployed. How resources with an ongoing cost are divided between customers. The basic ongoing cost profile per customer, commonly called Cost of Goods Sold or COGS. How customers receive updates, and whether there are special customers with leverage over the update process. Whether there are looming deadlines that pose a significant risk to the software. How the tech team develops software. That is, their SDLC. Do their code repositories contain private keys or passwords? How is tenancy handled? Do they keep dependencies up to date? Do they review CVEs regularly on the tech they use? Where all does PII end up? Ask the hard questions here, because people do do things like put PII in application logs, even though they shouldn't. Are they externally or self-certified for SOC2? If they deal with specially classified data (e.g. health or education or data about EU citizens) what level of compliance do they maintain? They need to have a non hand-wavy answer about FERPA, HIPAA, GDPR, etc. What are the remedies needed to get them up to our standards? Is their tech team capable of implementing them, or do we need to pull resources? What is the priority order to mitigate critical vulnerabilities to our preexisting business if we incorporate them? What is the likely cost of that to timelines and budgets? Are there people with ridiculously high "bus factor?" Who are the people on your team who are most natural partners for the leadership post-merger? E.g. are they reporting to you or is there a director that you think is a great fit to work with them? Is the team excited about this opportunity or worried. No really , what do you think the balance is there? Are there detractors who hate your company or the whole idea that their product might not "win?" Does the target tech team trust their own leaders? You need a plan to pair your leadership with theirs. You need a way to gauge whether the merge is progressing well down the road. You need a set of strategies to employ if it needs to be brought on track.

0 views

Python is an interpreted language with a compiler

After I put up a post about a Python gotcha, someone remarked that "there are very few interpreted languages in common usage," and that they "wish Python was more widely recognized as a compiled language." This got me thinking: what is the distinction between a compiled or interpreted language? I was pretty sure that I do think Python is interpreted [1] , but how would I draw that distinction cleanly? On the surface level, it seems like the distinction between compiled and interpreted languages is obvious: compiled languages have a compiler, and interpreted languages have an interpreter. We typically call Java a compiled language and Python an interpreted language. But on the inside, Java has an interpreter and Python has a compiler. What's going on? A compiler takes code written in one programming language and turns it into a runnable thing. It's common for this to be machine code in an executable program, but it can also by bytecode for VM or assembly language. On the other hand, an interpreter directly takes a program and runs it. It doesn't require any pre-compilation to do so, and can apply a variety of techniques to achieve this (even a compiler). That's where the distinction really lies: what you end up running. An interpeter runs your program, while a compiler produces something that can run later [2] (or right now, if it's in an interpreter). A compiled language is one that uses a compiler, and an interpreted language uses an interpreter. Except... many languages [3] use both. Let's look at Java. It has a compiler, which you feed Java source code into and you get out an artifact that you can't run directly . No, you have to feed that into the Java virtual machine, which then interprets the bytecode and runs it. So the entire Java stack seems to have both a compiler and an interpreter. But it's the usage , that you have to pre-compile it, that makes it a compiled language. And similarly is Python [4] . It has an interpreter, which you feed Python source code into and it runs the program. But on the inside, it has a compiler. That compiler takes the source code, turns it into Python bytecode, and then feeds that into the Python virtual machine. So, just like Java, it goes from code to bytecode (which is even written to the disk, usually) and bytecode to VM, which then runs it. And here again we see the usage, where you don't pre-compile anything, you just run it. That's the difference. And that's why Python is an interpreted language with a compiler! Ultimately, why does it matter? If I can do and get my Rust program running the same as if I did , don't they feel the same? On the surface level, they do, and that's because it's a really nice interface so we've adopted it for many interactions! But underneath it, you see the differences peeping out from the compiled or interpreted nature. When you run a Python program, it will run until it encounters an error, even if there's malformed syntax! As long as it doesn't need to load that malformed syntax, you're able to start running. But if you a Rust program, it won't run at all if it encounters an error in the compilation step! It has to run the entire compilation process before the program will start at all. The difference in approaches runs pretty deep into the feel of an entire toolchain. That's where it matters, because it is one of the fundamental choices that everything else is built around. The words here are ultimately arbitrary. But they tell us a lot about the language and tools we're using. Thank you to Adam for feedback on a draft of this post. It is worth occasionally challenging your own beliefs and assumptions! It's how you grow, and how you figure out when you are actually wrong. ↩ This feels like it rhymes with async functions in Python. Invoking a regular function runs it immediately, while invoking an async function creates something which can run later. ↩ And it doesn't even apply at the language level, because you could write an interpreter for C++ or a compiler for Hurl , not that you'd want to, but we're going to gloss over that distinction here and just keep calling them "compiled/interpreted languages." It's how we talk about it already, and it's not that confusing. ↩ Here, I'm talking about the standard CPython implementation. Others will differ in their details. ↩

0 views
Dan Moore! 6 months ago

Why Are Interviews Harder Than The Job?

I’ve seen a lot of startups try to make hiring easier over the years, but they all seem to converge on becoming slightly better applicant tracking systems. Then, a few months ago, I saw this LinkedIn post . Here’s the post in case you don’t want to log in to see it. It’s similar in tone to this meme . I’m currently leading an interview process at my current job. Whenever I do this, I remember how grueling and painful hiring is. And that is as a hiring manager–I know it is even tougher as a candidate in this job market. After all I am getting paid and many candidates are unemployed. I’ve been in the latter situation and the situation is often quite urgent. But today, I wanted to dig into why interviews for software related jobs are often harder than the job itself. This is a common gripe, where the interview digs into all kinds of technical aspects that are not really needed in the day to day job, or is much tougher than the day to day work. The reason for this is interview time is limited. Interviewers want to get as much information as they can about the candidate. (Interviewees should do this too. Even in this market, finding out what you can about the company where you’ll be working is a good idea.) An interview is like running 100m and a job is like a 10k. If someone wants to see who is better at running across, but only has a certain amount of time, they are going to have everyone run 100 meters, and not a 10k. Even if the real goal is to find the best 10k runner. Hence the Leetcode tests. This is, of course, not great. But it is the least bad option. Some alternatives include: Hiring is hard because both parties have limited data and are trying to present in the best way, yet success is multi-dimensional and may not be visible for months. No easy answers. contract to hire : This is great if the candidate has flexibility and risk tolerance (health care not tied to a job, willing to risk moving to another job in N months if the contract doesn’t lead to a hire). Many great possible hires will pass at contract to hire, though it does work for some people. homework for interviews : Asking a candidate to solve some problem which lets a candidate work in a slightly less high stakes environment. but requires candidates to do extra work, taking longer than just the interview. This also takes longer to evaluate as a hiring manager. And ff you are doing this, make sure you ask candidates to explain their solution, which helps mitigate AI assisted or copy paste solutions. pair programming : During the hiring process, work on an actual work related project. Companies that have OSS projects can use those, otherwise use something similar to what a new hire would be working on. Viable, but can be hard to pick up enough signal about non-technical skills. Also high-pressure for the candidate–I remember trying to use IntelliJ for the first time at an interview to write some Java code. leverage your network : Hire people you’ve worked with in the past. Time-tested, works well, but limits opportunities for those without experience or a network. Also means as a company you’re going to be more homogeneous, which can limit you (see this 1996 HBR article ). historical interview : Beloved by the authors of “Who” , with this method you ask a series of questions about the candidate’s history, gleaning insight into their history. If they have done something similar to what you are looking for in the past, they’ll be able to do it in the future. I did this for the current hiring process so the jury is out for me, but am hopeful.

0 views
baby steps 7 months ago

Rust in 2025: Language interop and the extensible compiler

For many years, C has effectively been the “lingua franca” of the computing world. It’s pretty hard to combine code from two different programming languages in the same process–unless one of them is C. The same could theoretically be true for Rust, but in practice there are a number of obstacles that make that harder than it needs to be. Building out silky smooth language interop should be a core goal of helping Rust to target foundational applications . I think the right way to do this is not by extending rustc with knowledge of other programming languages but rather by building on Rust’s core premise of being an extensible language. By investing in building out an “extensible compiler” we can allow crate authors to create a plethora of ergonomic, efficient bridges between Rust and other languages. When it comes to interop… When it comes to extensibility… In my head, I divide language interop into two core use cases. The first is what I call Least Common Denominator (LCD), where people would like to write one piece of code and then use it in a wide variety of environments. This might mean authoring a core SDK that can be invoked from many languages but it also covers writing a codebase that can be used from both Kotlin (Android) and Swift (iOS) or having a single piece of code usable for everything from servers to embedded systems. It might also be creating WebAssembly components for use in browsers or on edge providers. What distinguishes the LCD use-case is two things. First, it is primarily unidirectional—calls mostly go from the other language to Rust. Second, you don’t have to handle all of Rust. You really want to expose an API that is “simple enough” that it can be expressed reasonably idiomatically from many other languages. Examples of libraries supporting this use case today are uniffi and diplomat . This problem is not new, it’s the same basic use case that WebAssembly components are targeting as well as old school things like COM and CORBA (in my view, though, each of those solutions is a bit too narrow for what we need). When you dig in, the requirements for LCD get a bit more complicated. You want to start with simple types, yes, but quickly get people asking for the ability to make the generated wrapper from a given language more idiomatic. And you want to focus on calls into Rust, but you also need to support callbacks. In fact, to really integrate with other systems, you need generic facilities for things like logs, metrics, and I/O that can be mapped in different ways. For example, in a mobile environment, you don’t necessarily want to use tokio to do an outgoing networking request. It is better to use the system libraries since they have special cases to account for the quirks of radio-based communication. To really crack the LCD problem, you also have to solve a few other problems too: Obviously, there’s enough here to keep us going for a long time. I think the place to start is building out something akin to the “serde” of language interop: the serde package itself just defines the core trait for serialization and a derive. All of the format-specific details are factored out into other crates defined by a variety of people. I’d like to see a universal set of conventions for defining the “generic API” that your Rust code follows and then a tool that extracts these conventions and hands them off to a backend to do the actual language specific work. It’s not essential, but I think this core dispatching tool should live in the rust-lang org. All the language-specific details, on the other hand, would live in crates.io as crates that can be created by anyone. The second use case is what I call the deep interop problem. For this use case, people want to be able to go deep in a particular language. Often this is because their Rust program needs to invoke APIs implemented in that other language, but it can also be that they want to stub out some part of that other program and replace it with Rust. One common example that requires deep interop is embedded developers looking to invoke gnarly C/C++ header files supplied by vendors. Deep interop also arises when you have an older codebase, such as the Rust for Linux project attempting to integrate Rust into their kernel or companies looking to integrate Rust into their existing codebases, most commonly C++ or Java. Some of the existing deep interop crates focus specifically on the use case of invoking APIs from the other language (e.g., bindgen and duchess ) but most wind up supporting bidirectional interaction (e.g., pyo3 , [npapi-rs][], and neon ). One interesting example is cxx , which supports bidirectional Rust-C++ interop, but does so in a rather opinionated way, encouraging you to make use of a subset of C++’s features that can be readily mapped (in this way, it’s a bit of a hybrid of LCD and deep interop). I want to see smooth interop with all languages, but C and C++ are particularly important. This is because they have historically been the language of choice for foundational applications, and hence there is a lot of code that we need to integrate with. Integration with C today in Rust is, in my view, “ok” – most of what you need is there, but it’s not as nicely integrated into the compiler or as accessible as it should be. Integration with C++ is a huge problem. I’m happy to see the Foundation’s Rust-C++ Interoperability Initiative as well a projects like Google’s crubit and of course the venerable cxx . The traditional way to enable seamless interop with another language is to “bake it in” i.e., Kotlin has very smooth support for invoking Java code and Swift/Zig can natively build C and C++. I would prefer for Rust to take a different path, one I call the extensible compiler . The idea is to enable interop via, effectively, supercharged procedural macros that can integrate with the compiler to supply type information, generate shims and glue code, and generally manage the details of making Rust “play nicely” with another language. In some sense, this is the same thing we do today. All the crates I mentioned above leverage procedural macros and custom derives to do their job. But procedural macrods today are the “simplest thing that could possibly work”: tokens in, tokens out. Considering how simplistic they are, they’ve gotten us remarkably, but they also have distinct limitations. Error messages generated by the compiler are not expressed in terms of the macro input but rather the Rust code that gets generated, which can be really confusing; macros are not able to access type information or communicate information between macro invocations; macros cannot generate code on demand, as it is needed, which means that we spend time compiling code we might not need but also that we cannot integrate with monomorphization. And so forth. I think we should integrate procedural macros more deeply into the compiler. 2 I’d like macros that can inspect types, that can generate code in response to monomorphization, that can influence diagnostics 3 and lints, and maybe even customize things like method dispatch rules. That will allow all people to author crates that provide awesome interop with all those languages, but it will also help people write crates for all kinds of other things. To get a sense for what I’m talking about, check out F#’s type providers and what they can do. The challenge here will be figuring out how to keep the stabilization surface area as small as possible. Whenever possible I would look for ways to have macros communicate by generating ordinary Rust code, perhaps with some small tweaks. Imagine macros that generate things like a “virtual function”, that has an ordinary Rust signature but where the body for a particular instance is constructed by a callback into the procedural macro during monomorphization. And what format should that body take? Ideally, it’d just be Rust code, so as to avoid introducing any new surface area. So, it turns out I’m a big fan of Rust. And, I ain’t gonna lie, when I see a prominent project pick some other language, at least in a scenario where Rust would’ve done equally well, it makes me sad. And yet I also know that if every project were written in Rust, that would be so sad . I mean, who would we steal good ideas from? I really like the idea of focusing our attention on making Rust work well with other languages , not on convincing people Rust is better 4 . The easier it is to add Rust to a project, the more people will try it – and if Rust is truly a better fit for them, they’ll use it more and more. This post pitched out a north star where How do we get there? I think there’s some concrete next steps: Well, as easy as it can be.  ↩︎ Rust’s incremental compilation system is pretty well suited to this vision. It works by executing an arbitrary function and then recording what bits of the program state that function looks at. The next time we run the compiler, we can see if those bits of state have changed to avoid re-running the function. The interesting thing is that this function could as well be part of a procedural macro, it doesn’t have to be built-in to the compiler.  ↩︎ Stuff like the tool attribute namespace is super cool! More of this!  ↩︎ I’ve always been fond of this article Rust vs Go, “Why they’re better together” .  ↩︎

0 views