Posts in Haskell (20 found)
Neil Madden 1 months ago

Rating 26 years of Java changes

I first started programming Java at IBM back in 1999 as a Pre-University Employee. If I remember correctly, we had Java 1.1.8 installed at that time, but were moving to Java 1.2 (“Java 2”), which was a massive release—I remember engineers at the time grumbling that the ever-present “ Java in a Nutshell ” book had grown to over 600 pages. I thought I’d take a look back at 26 years of Java releases and rate some of the language and core library changes (Java SE only) that have occurred over this time. It’s a very different language to what I started out with! I can’t possibly cover every feature of those releases , as there are just way too many. So I’m just going to cherry-pick some that seemed significant at the time, or have been in retrospect. I’m not going to cover UI- or graphics-related stuff (Swing, Java2D etc), or VM/GC improvements. Just language changes and core libraries. And obviously this is highly subjective. Feel free to put your own opinions in the comments! The descriptions are brief and not intended as an introduction to the features in question: see the links from the Wikipedia page for more background. NB: later features are listed from when they were first introduced as a preview. The Collections Framework : before the collections framework, there was just raw arrays, Vector, and Hashtable. It gets the job done, but I don’t think anyone thinks the Java collections framework is particularly well designed. One of the biggest issues was a failure to distinguish between mutable and immutable collections, strange inconsistencies like why Iterator as a remove() method (but not, say, update or insert), and so on. Various improvements have been made over the years, and I do still use it in preference to pulling in a better alternative library, so it has shown the test of time in that respect. 4/10 The keyword: I remember being somewhat outraged at the time that they could introduce a new keyword! I’m personally quite fond of asserts as an easy way to check invariants without having to do complex refactoring to make things unit-testable, but that is not a popular approach. I can’t remember the last time I saw an assert in any production Java code. 3/10 Regular expressions: Did I really have to wait 3 years to use regex in Java? I don’t remember ever having any issues with the implementation they finally went for. The Matcher class is perhaps a little clunky, but gets the job done. Good, solid, essential functionality. 9/10 “New” I/O (NIO): Provided non-blocking I/O for the first time, but really just a horrible API (still inexplicably using 32-bit signed integers for file sizes, limiting files to 2GB, confusing interface). I still basically never use these interfaces except when I really need to. I learnt Tcl/Tk at the same time that I learnt Java, and Java’s I/O always just seemed extraordinarily baroque for no good reason. Has barely improved in 2 and a half decades. 0/10 Also notable in this release was the new crypto APIs : the Java Cryptography Extensions (JCE) added encryption and MAC support to the existing signatures and hashes, and we got JSSE for SSL. Useful functionality, dr eadful error-prone APIs . 1/10 Absolutely loads of changes in this release. This feels like the start of modern Java to me. Generics : as Go discovered on its attempt to speed-run Java’s mistakes all over again, if you don’t add generics from the start then you’ll have to retrofit them later, badly. I wouldn’t want to live without them, and the rapid and universal adoption of them shows what a success they’ve been. They certainly have complicated the language, and there are plenty of rough edges (type erasure, reflection, etc), but God I wouldn’t want to live without them. 8/10 . Annotations: sometimes useful, sometimes overused. I know I’ve been guilty of abusing them in the past. At the time it felt like they were ushering a new age of custom static analysis, but that doesn’t really seem to be used much. Mostly just used to mark things as deprecated or when overriding a method. Meh. 5/10 Autoboxing: there was a time when, if you wanted to store an integer in a collection, you had to manually convert to and from the primitive int type and the Integer “boxed” class. Such conversion code was everywhere. Java 5 got rid of that, by getting the compiler to insert those conversions for you. Brevity, but no less inefficient. 7/10 Enums : I’d learned Haskell by this point, so I couldn’t see the point of introducing enums without going the whole hog and doing algebraic datatypes and pattern-matching. (Especially as Scala launched about this time). Decent feature, and a good implementation, but underwhelming. 6/10 Vararg methods: these have done quite a lot to reduce verbosity across the standard library. A nice small improvement that’s had a good quality of life enhancement. I still never really know when to put @SafeVarargs annotations on things though. 8/10 The for-each loop: cracking, use it all the time. Still not a patch on Tcl’s foreach (which can loop over multiple collections at once), but still very good. Could be improved and has been somewhat replaced by Streams. 8/10 Static imports: Again, a good simple change. I probably would have avoided adding * imports for statics, but it’s quite nice for DSLs. 8/10 Doug Lea’s java.util.concurrent etc : these felt really well designed. So well designed that everyone started using them in preference to the core collection classes, and they ended up back-porting a lot of the methods. 10/10 After the big bang of Java 5, Java 6 was mostly performance and VM improvements, I believe, so we had to wait until 2011 for more new language features. Strings in switch: seems like a code smell to me. Never use this, and never see it used. 1/10 Try-with-resources : made a huge difference in exception safety. Combined with the improvements in exception chaining (so root cause exceptions are not lost), this was a massive win. Still use it everywhere. 10/10 Diamond operator for type parameter inference: a good minor syntactic improvement to cut down the visual noise. 6/10 Binary literals and underscores in literals: again, minor syntactic sugar. Nice to have, rarely something I care about much. 4/10 Path and Filesystem APIs: I tend to use these over the older File APIs, but just because it feels like I should. I couldn’t really tell you if they are better or not. Still overly verbose. Still insanely hard to set file permissions in a cross-platform way. 3/10 Lambdas: somewhat controversial at the time. I was very in favour of them, but only use them sparingly these days, due to ugly stack traces and other drawbacks. Named method references provide most of the benefit without being anonymous. Deciding to exclude checked exceptions from the various standard functional interfaces was understandable, but also regularly a royal PITA. 4/10 Streams: Ah, streams. So much potential, but so frustrating in practice. I was hoping that Java would just do the obvious thing and put filter/map/reduce methods onto Collection and Map, but they went with this instead. The benefits of functional programming weren’t enough to carry the feature, I think, so they had to justify it by promising easy parallel computing. This scope creep enormously over-complicated the feature, makes it hard to debug issues, and yet I almost never see parallel streams being used. What I do still see quite regularly is resource leaks from people not realising that the stream returned from Files.lines() has to be close()d when you’re done—but doing so makes the code a lot uglier. Combine that with ugly hacks around callbacks that throw checked exceptions, the non-discoverable API (where are the static helper functions I need for this method again?), and the large impact on lots of very common code, and I have to say I think this was one of the largest blunders in modern Java. I blogged what I thought was a better approach 2 years earlier, and I still think it would have been better. There was plenty of good research that different approaches were better , since at least Oleg Kiselyov’s work in the early noughties . 1/10 Java Time: Much better than what came before, but I have barely had to use much of this API at all, so I’m not in a position to really judge how good this is. Despite knowing how complex time and dates are, I do have a nagging suspicion that surely it doesn’t all need to be this complex? 8/10 Modules: I still don’t really know what the point of all this was. Enormous upheaval for minimal concrete benefit that I can discern. The general advice seems to be that modules are (should be) an internal detail of the JRE and best ignored in application code (apart from when they spuriously break things). Awful. -10/10 (that’s minus 10!) jshell: cute! A REPL! Use it sometimes. Took them long enough. 6/10 The start of time-based releases, and a distinct ramp-up of features from here on, trying to keep up with the kids. Local type inference (“var”) : Some love this, some hate it. I’m definitely in the former camp. 9/10 New HTTP Client : replaced the old URL.openStream() approach by creating something more like Apache HttpClient. It works for most purposes, but I do find the interface overly verbose. 6/10 This release also added TLS 1.3 support, along with djb-suite crypto algorithms. Yay. 9/10 Switch expressions : another nice mild quality-of-life improvement. Not world changing, but occasionally nice to have. 6/10 Text blocks: on the face of it, what’s not to like about multi-line strings? Well, apparently there’s a good reason that injection attacks remain high on the OWASP Top 10, as the JEP introducing this feature seemed intent on getting everyone writing SQL, HTML and JavaScript using string concatenation again. Nearly gave me a heart attack at the time, and still seems like a pointless feature. Text templates (later) are trying to fix this, but seem to be currently in limbo . 3/10 Pattern matching in : a little bit of syntactic sugar to avoid an explicit cast. But didn’t we all agree that using was a bad idea decades ago? I’m really not sure who was doing the cost/benefit analysis on these kinds of features. 4/10 Records: about bloody time! Love ‘em. 10/10 Better error messages for NullPointerExceptions: lovely. 8/10 Sealed classes: in principal I like these a lot. We’re slowly getting towards a weird implementation of algebraic datatypes. I haven’t used them very much yet so far. 8/10 EdDSA signatures: again, a nice little improvement in the built-in cryptography. Came with a rather serious bug though… 8/10 Vector (SIMD) API: this will be great when it is finally done, but still baking several years later. ?/10 Pattern matching switch: another piece of the algebraic datatype puzzle. Seems somehow more acceptable than instanceof, despite being largely the same idea in a better form. 7/10 UTF-8 by default: Fixed a thousand encoding errors in one fell swoop. 10/10 Record patterns: an obvious extension, and I think we’re now pretty much there with ADTs? 9/10 Virtual threads: being someone who never really got on with async/callback/promise/reactive stream-based programming in Java, I was really happy to see this feature. I haven’t really had much reason to use them in anger yet, so I don’t know how well they’ve been done. But I’m hopeful! ?/10 String templates: these are exactly what I asked for in A few programming language features I’d like to see , based on E’s quasi-literal syntax, and they fix the issues I had with text blocks. Unfortunately, the first design had some issues, and so they’ve gone back to the drawing board. Hopefully not for too long. I really wish they’d not released text blocks without this feature. 10/10 (if they ever arrive). Sequenced collections: a simple addition that adds a common super-type to all collections that have a defined “encounter order”: lists, deques, sorted sets, etc. It defines convenient getFirst() and getLast() methods and a way to iterate items in the defined order or in reverse order. This is a nice unification, and plugs what seems like an obvious gap in the collections types, if perhaps not the most pressing issue? 6/10 Wildcards in patterns: adds the familiar syntax from Haskell and Prolog etc of using as a non-capturing wildcard variable in patterns when you don’t care about the value of that part. 6/10 Simplified console applications: Java finally makes simple programs simple for beginners, about a decade after universities stopped teaching Java to beginners… Snark aside, this is a welcome simplification. 8/10 This release also adds support for KEMs , although in the simplest possible form only. Meh. 4/10 The only significant change in this release is the ability to have statements before a call to super() in a constructor. Fine. 5/10 Primitive types in patterns: plugs a gap in pattern matching. 7/10 Markdown javadoc comments: Does anyone really care about this? 1/10 The main feature here from my point of view as a crypto geek is the addition of post-quantum cryptography in the form of the newly standardised ML-KEM and ML-DSA algorithms, and support in TLS. Stable values: this is essentially support for lazily-initialised final variables. Lazy initialisation is often trickier than it should be in Java, so this is a welcome addition. Remembering Alice ML , I wonder if there is some overlap between the proposed StableValue and a Future? 7/10 ? PEM encoding of cryptographic objects is welcome from my point of view, but someone will need to tell me why this is not just ? Decoding support is useful though, as that’s a frequent reason I have to grab Bouncy Castle still. 7/10 Well, that brings us pretty much up to date. What do you think? Agree, disagree? Are you a passionate defender of streams or Java modules? Have at it in the comments.

0 views
Abhinav Sarkar 1 months ago

A Fast Bytecode VM for Arithmetic: The Compiler

In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics: In this post, we write the compiler for our AST to bytecode, and a decompiler for the bytecode. This post was originally published on abhinavsarkar.net . This is the second post in a series of posts: AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST , the interpreter jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode . Bytecode is a flattened and compact representation of a program, usually manifested as a byte array. Bytecode is essentially an Instruction Set (IS), but custom-made to be executed by a Virtual Machine (VM), instead of a physical machine. Each bytecode instruction is one byte in size (that’s where it gets its name from). A bytecode and its VM are created in synergy so that the execution is as efficient as possible 1 . Compiling source code to bytecode and executing it in a VM also allows the program to be run on all platforms that the VM supports without the developer caring much about portability concerns. The most popular combo of bytecode and VM is probably the Java bytecode and the Java virtual machine . The VM s can be stack-based or register-based . In a stack-based VM , all values created during the execution of a program are stored only in a Stack data-structure residing in the memory. Whereas, in a register-based VM , there is also an additional set of fixed number of registers that are used to store values in preference to the stack 2 . Register-based VM s are usually faster, but stack-based VM s are usually simpler to implement. For our purpose, we choose to implement a stack-based VM . We are going to write a compiler that compiles our expression AST to bytecode. But first, let’s design the bytecode for our stack-based VM . Here is our expression AST as a reminder: Let’s figure out the right bytecode for each case. First, we create Opcodes for each bytecode, which are sort of mnemonics for actual bytecode. Think of them as Assembly is to Machine Code . For a number literal, we need to put it directly in the bytecode so that we can use it later during the execution. We also need an opcode to push it on the stack. Let’s call it with an parameter. Binary operations recursively use for their operands. To evaluate a binary operation, we need its operands to be evaluated before, so we compile them first to bytecode. After that, all we need is an opcode per operator. Let’s call them , , , and for , , , and operators respectively. Variables and expressions are more complex 3 . In the AST interpreter we chucked the variables in a map, but we cannot do that in a VM . There is no environment map in a VM , and all values must reside in the stack. How do we have variables at all then? Let’s think for a bit. Each expression, after being evaluated in the VM , must push exactly one value on the stack: its result. expressions are a trivial case. When a binary operation is evaluated, first its left operand is evaluated. That pushes one value on the stack. Then its right operand is evaluated, and that pushes another value on the stack. Finally, the operation pops the two values from the top of the stack, does its thing, and pushes the resultant value back on the stack—again one value for the entire expression. A expression binds a variable’s value to its name, and then the variable can be referred from the body of the expression. But how can we refer to a variable when the stack contains only values, not names? Let’s imagine that we are in middle of evaluating a large expression, wherein we encounter a expression. First we evaluate its assignment expression, and that pushes a value on the top of the stack. Let’s say that the stack has values at this point. After this we get to evaluate the body expression. At all times when we are doing that, the value from assignment stays at the same point in the stack because evaluating sub-expressions, no matter how complicated, only adds new values to the stack, without popping an existing value from before. Therefore, we can use the stack index of the assignment value ( ) to refer to it from within the body expression. So, we encode as an opcode and an integer index into the stack. We choose to use a to index the stack, limiting us to a stack depth of 256. We encode the variable references with an opcode , which when executed gets the value from the stack at the given index and pushes it on the stack. For a expression, after we compile its assignment and body expressions, we need to make sure that the exactly-one-value invariant holds. Evaluating the assignment and body pushes two values on the stack, but we can have only one! So we overwrite the assignment value with the body value, and pop the stack to remove the body value. We invent a new opcode to do this, called so because its effect is equivalent to swapping the topmost two values on the stack, and then popping the new top value 4 . Putting all the opcodes together, we have the ADT: Notice that we also assigned bytecodes—that is, a unique byte value—to each above, which are just their ordinals. Now we are ready to write the compiler. The compiler takes an expression with the bytecode size, and compiles it to a strict of that size. Recall that in the previous post , we wrote our parser such that the bytecode size for each AST node was calculated while parsing it. This allows us to pre-allocate a bytestring of required size before compiling the AST . We compile to actual bytes here, and don’t use the opcodes. We use the function from the module that allocates enough memory for the provided bytecode size, and gives us a pointer to the allocated memory. We call this pointer for frame pointer . Then we traverse the AST recursively, writing bytes for opcodes and arguments for each case. We use pointer arithmetic and the function to write the bytes. numbers are encoded as two bytes in little endian fashion. In the recursive traversal function , we pass and return the current stack pointer and instruction pointer . We update these correctly for each case 5 . We also take care of checking that the pointers stay in the right bounds, failing which we throw appropriate errors. We also pass an parameter that is similar to the variable names to values environment we use in the AST interpreter, but this one tracks variable names to stack indices at which they reside. We update this information before compiling the body of a expression to capture the stack index of its assignment value. When compiling a expression, we use the map to lookup the variable’s stack index, and encode it in the bytecode. At the end of compilation, we check that the entire bytestring is filled with bytes till the very end, failing which we throw an error. This check is required because otherwise the bytestring may have garbage bytes, and may fail inexplicably during execution. All the errors are thrown in the monad using the function, and are caught after compilation using the function. The final result or error is returned wrapped into . Let’s see it in action: You can verify that the resultant bytes are indeed correct. I assume that it is difficult for you to read raw bytes. We’ll fix this in a minute. Meanwhile, let’s ponder upon some performance characteristics of our compiler. You may be wondering why I chose to write the compiler in this somewhat convoluted way of pre-allocating a bytestring and using pointers. The answer is: performance. I didn’t actually start with pointers. I iterated through many different data and control structures to find the fastest one. The table below shows the compilation times for a benchmark expression file when using different data structures to implement the function: I started with the bread-and-butter data structure of Haskellers, the humble and known to be slow , which was indeed quite slow. Next, I moved on to and thereafter , which are known to be faster at concatenation/consing. Then I abandoned the use of intermediate data structures completely, choosing to use a bytestring to create the bytestring. Finally, I had the epiphany that the bytestring size was known at compile time, and rewrote the function to pre-allocate the bytestring, thereby reaching the fastest solution. I also tried using , which has more-or-less same performance of bytestring, but it is inconvenient to use because there are no functions to do IO with bytearrays. So I’d anyway need to use bytestrings for reading from STDIN or writing to STDOUT, and converting to-and-fro between bytearray and bytestring is a performance killer. Thus, I decided to stick to bytestrings. The pre-allocated bytestring approach is 80 times faster than using lists, and almost 10 times faster than using . For such gain, I’m okay with the complications it brings to the code. Here are the numbers in a chart (smaller is better): The other important data structure used here is the map (or dictionary) in which we add the mappings from identifiers to their stack indices. This data structure needs to be performant because we do a lookup for each variable we encounter while compiling. I benchmarked compilation for some data structures: Strict hashmap turns out to be the fasted one, but interestingly, linked list is a close second. Mutable hashtable is the slowest even though I expected it to be the fastest. Here are the times in a chart (smaller is better): Another choice I had to make was how to write the function. I ended up passing and returning pointers and environment map, and throwing errors in , but a number of solutions are possible. I tried out some of them, and noted the compilation times for the benchmark expression file: I tried putting the pointer in s and state instead of passing them back-and-forth. I tried putting the environment in a config. I tried using for throwing errors instead of using IO errors. Then I tried various combinations of these monad transformers. Finally, I also tried converting the function to be tail-recursive by using Continuation-passing style (CPS), and then defunctionalizing the continuations, as well as, using the monad transformer. All of these approaches resulted in slower code. The times are interesting to compare (smaller is better): There is no reason to use s here because they result in slower and uglier code. Using one monad transformer at a time results in slight slowdowns, which may be worth the improvement in the code. But using more than one of them degrades performance by a lot. Also, there is no improvement caused by CPS conversion, because GHC is smart enough to optimize the non tail-recursive code to be faster then handwritten tail-recursive one that allocates a lot of closures (or objects in case of defunctionalization). Moving on … It is a hassle to read raw bytes in the compiler output. Let’s write a decompiler to aid us in debugging and testing the compiler. First, a disassembler that converts bytes to opcodes: A disassembled program is a sequence of opcodes. We simply go over each byte of the bytecode, and append the right opcode for it to the program, along with any parameters it may have. Note that we do not verify that the disassembled program is correct. Here are the helpers that read instruction bytes and their arguments from a bytestring: Next, we decompile the opcodes to an expression: Decompilation is the opposite of compilation. While compiling there is an implicit stack of expressions that are yet to be compiled. We make that stack explicit here, capturing expressions as they are decompiled from opcodes. For compound expressions, we inspect the stack and use the already decompiled expressions as the operands of the expression being decompiled. This way we build up larger expressions from smaller ones, culminating in the single top-level expression at the end 7 . Finally, we check the stack to make sure that there is only one expression left in it. Note that like the disassembler, we do not verify that the decompiled expression is correct. There is one tricky thing in decompilation: we lose the names of the variables when compiling, and are left with only stack indices. So while decompiling, we generate variable names from their stack indices by indexing a list of unique names. Let’s see it in action: That’s all for compilation and decompilation. Now, we use them together to make sure that everything works. We write some unit tests for the compiler, targeting both success and failure cases: In each test, we parse and compile an expression, and then disassemble the compiled bytes, which we match with expected list of opcodes, or an error message. Let’s put these tests with the parser tests, and run them: Awesome, it works! That’s it for this post. Let’s update our checklist: In the next part, we write a virtual machine that runs our compiled bytecode, and do some benchmarking. If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! There are VM s that execute hardware IS s instead of bytecode. Such VM s are also called Emulators because they emulate actual CPU hardware. Some examples are QEMU and video game console emulators . ↩︎ VM s use virtual registers instead of actual CPU registers, which are often represented as a fixed size array of 1, 2, 4 or 8 byte elements. ↩︎ I call them variables here but they do not actually vary. A better name is let bindings. ↩︎ We could have used two separate opcodes here: and . That would result in same final result when evaluating an expression, but we’d have to execute two instructions instead of one for expressions. Using a single instruction speeds up execution, not only because we reduce the number of instructions, but also because we don’t need to do a full swap, only a half swap is enough because we pop the stack anyway after the swap. This also shows how we can improve the performance of our VM s by inventing specific opcodes for particular operations. ↩︎ Notice the use of strict s here, for performance reasons. ↩︎ Used as Association List . ↩︎ The decompiler is a bottom-up shift-reduce parser from the opcodes to the expression tree. ↩︎ If you liked this post, please leave a comment .

0 views
Abhinav Sarkar 2 months ago

A Fast Bytecode VM for Arithmetic: The Parser

In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics: In this post, we write the parser for our expression language to an AST , and an AST interpreter. This post was originally published on abhinavsarkar.net . This is the first post in a series of posts: The language that we are going to work with is that of basic arithmetic expressions, with integer values, and addition, subtraction, multiplication and integer division operations. However, our expression language has a small twist: it is possible to introduce a variable using a binding and use the variable in the expressions in the body of 1 . Furthermore, we use the same syntax for as Haskell does. Here are some examples of valid expressions in our language: The only gotcha here is that the body of a expression extends as far as possible while accounting for nested s. It becomes clear when we look at parsed expressions later. The eventual product is a command-line tool that can run different commands. Let’s start with a demo of the tool: We can parse an expression, or compile it to bytecode. We can also disassemble bytecode to opcodes, or decompile it back to an expression. We can interpret an expression either as an AST or as bytecode. We can also run a bytecode file directly. Finally, we have a handy command to generate random expressions for testing/benchmarking purposes 2 . Let’s start. Since this is Haskell, we start with listing many language extensions and imports: We use the extension here that enables a lot of useful GHC extensions by default. We are using the bytestring and attoparsec libraries for parsing, strict , containers and unordered-containers for compilation, deepseq , mtl and primitive for interpreting, and QuickCheck for testing. The first step is to parse an expression into an Abstract Syntax Tree (AST). We represent the AST as Haskell Algebraic Data Type s (ADTs): We add instances for ADT s so that we can pretty-print the parsed AST 3 . Now, we can start parsing. The EBNF grammar for expressions is as follows: The , , , and productions take care of having the right precedence of arithmetic operations. The and productions are trivial. Our language is fairly oblivious of whitespaces; we allow zero-or-more spaces at most places. The expressions grammar is pretty standard, except we require one-or-more spaces after the and keywords to make them unambiguous. We use the parser combinator library attoparsec for creating the parser. attoparsec works directly with bytestrings so we don’t incur the cost of decoding unicode characters 4 5 . We write the parser in a top-down recursive-descent fashion, same as the grammar, starting with the parser: One small complication: our parsers not only return the parsed expressions, but also the number of bytes they occupy when compiled to bytecode. We gather this information while building the AST in parts, and propagate it upward in the tree. We use the bytecode size later in the compilation pass 6 . Both and chain the right higher precedence parsers with the right operators between them 7 using the combinator. uses lookahead to dispatch between one of the primary parsers, which is faster than using backtracking . simply skips the parenthesis, and recursively calls . uses the and parsers from the attoparsec library to parse an optionally signed integer. We restrict the numbers to 2-byte integers (-32768–32767 inclusive) 8 . The helper from attoparsec names parsers so that the error message shown in case of failures point to the right parser. and are straightforward. We restrict identifiers to upper-and-lowercase ASCII alphabetic letters. We also check that our reserved keywords ( and ) are not used as identifiers. Finally, we write the parser for expressions: In we use to parse the variable name, and recursively call to parse the assignment and body expressions, while making sure to correctly parse the spaces. The helper parser is used to parse known string tokens ( , and ), and provide good error messages in case of failures. Talking about error messages … Let’s figure out an error handling strategy. We use an type wrapped in to propagate the errors in our program: The type also captures the in which the error is thrown. is a type alias that represents either an error or a result. Finally, we put all the parsers together to write the function. Our function uses the function from attoparsec to run the over an input. The function deals with intricacies of how attoparsec returns the parsing result. Basically, we inspect the returned result and throw appropriate errors with useful error messages. We use from the typeclass that works with all its instances, which is one of. Finally, we throw away the bytecode size from the result of in the function. The parser is done. But as good programmers, we must make sure that it works correctly. Let’s write some unit tests. We use the hspec library to write unit tests for our program. Each test is written as a spec 9 . We have a bunch of tests for the parser, testing both success and failure cases. Notice how spaces are treated in the expressions. Also notice how the expressions are parsed. We’ll add property-based tests for the parser in the next post. There is not much we can do with the parsed AST s at this point. Let’s write an interpreter to evaluate our AST s. The AST interpreter is a standard and short recursive interpreter with an environment mapping variables to their values: This interpreter serves both as a performance baseline for the bytecode VM we write later, and as a definitional interpreter for testing the VM 10 . We are careful in detecting division-by-zero and arithmetic overflow errors, but we ignore possible integer overflow/underflow errors that may be caused by the arithmetic operations. We write some unit tests for the interpreter following the same pattern as the parser: Now, we can run the parser and interpreter tests to make sure that everything works correctly. Awesome, it works! That’s it for this post. Let’s update our checklist: In the next part , we write a bytecode compiler for our expression AST . If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! Variables are scoped to the body of the expressions they are introduced in, that is, our language has lexical scoping . Also, variables with same name in inner s shadow the variables in outer s. ↩︎ If you are wondering why do this at all, when we can directly run the expressions while parsing, I think this is a great little project to learn how to write performant bytecode compilers and VM s in Haskell. ↩︎ Bangs ( ) that enforce strictness are placed in the ADT (and also in the later code) at the right positions that provide performance benefits. The right positions were found by profiling the program. A bang placed at a wrong position (for example in front of inside ) may ruin the compiler provided optimizations and make the overall program slower. ↩︎ attoparsec is very fast, but there are faster parsing libraries in Haskell. On the other hand, attoparsec does not provided great error messages. If the user experience were a higher priority, I’d use the megaparsec library. I find attoparsec to have the right balance of performance, developer experience and user experience. Handwritten parsers from scratch could be faster, but they’d be harder to maintain and use. ↩︎ I wrote the first version of the parser using the library that comes with Haskell standard library. I rewrote it to use attoparsec and found that the rewritten parser was more than 10x faster. ↩︎ You don’t need to think about the bytecode size of expressions right now. It’ll become clear when we go over compilation in the next post. ↩︎ Certain functions such as are inlined using the pragma to improve the program performance. The functions to inline were chosen by profiling. ↩︎ Since the numbers need to be encoded into bytes when we compile to bytecode, we need to choose some encoding for them. For simpler code, we choose 2-byte integers. ↩︎ Testing your parsers is crucial because that’s your programming languages’ interface to the users. Also because writing (fast) parsers is difficult and error-prone. Most of the bugs I found in this program were in the parser. ↩︎ Again, notice the carefully placed bangs to enforce strictness. Try to figure out why they are placed at some places and not at others. ↩︎ If you liked this post, please leave a comment .

1 views
(think) 2 months ago

Learning OCaml: Having Fun with the Fun Module

When I started to play with OCaml I was kind of surprised that there was no (identity) function that was available out-of-box (in module, that’s auto-opened). A quick search lead me to the Fun module, which is part of the standard library and is nested under . It was introduced in OCaml 4.08, alongside other modules such as , and . 1 The module provides a few basic combinators for working with functions. Let’s go over them briefly: The identity function: returns its single argument unchanged. Returns a function that always returns the first argument, ignoring its second argument. Composes two functions, applying the second function to the result of the first. Haskell and F# have special syntax for function composition, but that’s not the case in OCaml. (although you can easily map this to some operator if you wish to do so) Also, introduced a bit later than the other functions in the module - namely in OCaml 5.2. Reverses the order of arguments to a two-argument function. Negates a boolean-returning function, returning the opposite boolean value. Useful when you want to provide a pair of inverse predicates (e.g. and ) I believe that those functions are pretty self-explanatory, but still below we’ll go over a few examples of using them: Admittedly the examples are not great, but I hope they managed to convey how to use the various combinators. Those are definitely not the type of functions that you would use every day, but they can be useful in certain situations. Obviously I needed at some point to discover the module in the first place, and all of the functions there can be considered “classic” combinators in functional programming. In practice most often I need and , and infrequently and . Right now I’m struggling to come up with good use-cases for , but I’m sure those exist. Perhaps you’ll share some examples in the comments? How often do you use the various combinators? Which ones do you find most useful? I find myself wondering if such fundamental functions shouldn’t have been part of module directly, but overall I really like the modular standard library approach that OCaml’s team has been working towards in the past several years. 2 The important thing in the end of the day is to know that these functions exist and you can make use of them. Writing this short article will definitely help me to remember this. That’s all I have for you today. Keep hacking! It was part of some broader efforts to slim down and move in the direction of a more modular standard library.  ↩ And obviously you can open the module if you wish to at whatever level you desire.  ↩ The identity function: returns its single argument unchanged. Returns a function that always returns the first argument, ignoring its second argument. Composes two functions, applying the second function to the result of the first. Haskell and F# have special syntax for function composition, but that’s not the case in OCaml. (although you can easily map this to some operator if you wish to do so) Also, introduced a bit later than the other functions in the module - namely in OCaml 5.2. Reverses the order of arguments to a two-argument function. Negates a boolean-returning function, returning the opposite boolean value. Useful when you want to provide a pair of inverse predicates (e.g. and ) It was part of some broader efforts to slim down and move in the direction of a more modular standard library.  ↩ And obviously you can open the module if you wish to at whatever level you desire.  ↩

0 views
Max Bernstein 4 months ago

What I talk about when I talk about IRs

I have a lot of thoughts about the design of compiler intermediate representations (IRs). In this post I’m going to try and communicate some of those ideas and why I think they are important. The overarching idea is being able to make decisions with only local information. That comes in a couple of different flavors. We’ll assume that we’re compiling a method at a time, instead of a something more trace-like (tracing, tracelets, basic block versioning, etc). A function will normally have some control-flow: , , , any amount of jumping around within a function. Let’s look at an example function in a language with advanced control-flow constructs: Most compilers will deconstruct this , with its many nested expressions, into simple comparisons and jumps. In order to resolve jump targets in your compiler, you may have some notion of labels (in this case, words ending with a colon): This looks kind of like a pseudo-assembly language. It has its high-level language features decomposed into many smaller instructions. It also has implicit fallthrough between labeled sections (for example, into ). I mention these things because they, like the rest of the ideas in this post, are points in an IR design space. Representing code this way is an explicit choice, not a given. For example, one could make the jumps explicit by adding a at the end of . As soon as we add that instruction, the code becomes position-independent: as long as we start with , the chunks of code between labels could be ordered any which way: they are addressed by name and have no implicit ordering requirements. This may seem arbitrary, but it gives the optimizer more flexibility. If some optimization rule decides, for example, that a branch to may rarely be taken, it can freely re-order it toward the end of the function (or even on a different page!) so that more hot code can be on a single cache line. Explicit jumps and labels turn the code from a strictly linear assembly into a control-flow graph (CFG). Each sequence of code without internal control-flow is a called basic block and is a vertex in this graph. The directed edges represent jumps between blocks. See for example this crude GraphViz representation: We’re actually kind of looking at extended basic blocks (EBBs), which allow for multiple control exits per block but only one control entry. A strict basic block representation of the above code would look, in text form, something like this: Notice how each block has exactly one terminator (control-flow instruction), with (in this case) 0 or 2 targets. Opinions differ about the merits and issues of extended vs normal basic blocks. Most compilers I see use normal basic blocks. In either case, bringing the IR into a graph form gives us an advantage: thanks to Cousot and Cousot, our favorite power couple, we know how to do abstract interpretation on graphs and we can use this to build an advanced optimizer. See, for example, my intro to abstract interpretation post . Some IRs are stack based. For concatenative languages or some newer JIT compilers, IRs are formatted in such a way that each opcode reads its operands from a stack and writes its outputs to a stack. This is reminiscent of a point-free coding style in languages such as Haskell or OCaml. In this style, there is an implicit shared state: the stack. Dataflow is explicit (pushes and pops) and instructions can only be rearranged if the stack structure is preserved. This requires some non-local reasoning: to move an instruction, one must also rearrange the stack. By contrast, in a register-based IR, things are more explicit. Instructions take named inputs ( , , etc) and produce named outputs. Instructions can be slightly more easily moved around (modulo effects) as long as inputs remain defined. Local variables do not exist. The stack does not exist. Everything is IR “variables”. The constraints (names being defined) are part of the IR . This gets a little bit tricky if it’s possible to define a name multiple times. What does mean in the instruction for ? Which definition does it refer to? In order to reason about the instruction , we have to keep around some context. This is non-trivial: requiring compiler writers to constantly truck around side-tables and update them as they do analysis is slow and error-prone. Fortunately, if we enforce some interesting rules, we can push that analysis work into one pass up-front… Static single assignment (SSA) was introduced by a bunch of folks at IBM (see my blog post about the different implementations). In SSA-based IRs, each variable can only be defined once. Put another way, a variable is its defining instruction; alternately, a variable and its defining instruction are addressed by the same name. The previous example is not valid SSA; has two definitions. If we turn the previous example into SSA, we can now use a different name for each instruction. This is related to the unique name assumption or the global names property: names do not depend on context. Now we can identify each different instruction by the variable it defines. This is useful in analysis… I’d be remiss if I did not mention continuation-passing style (CPS) based IRs (and in fact, I had forgotten in the original draft of the post). As an IR, CPS is normally used in the analysis and optimization of functional programs, for example in the OCaml and Racket compilers. It is not required, however; MLton, for example, uses SSA in its compiler for Standard ML. SSA and CPS can more or less represent the same programs, but they can each feel a natural fit for different languages (and different compiler authors). I don’t feel qualified to say much more here. For a more informed opinion, check out Andy Wingo’s approaching cps soup , especially the benefits and drawbacks near the end. Speaking of CPS, I took a class with Olin Shivers and he described abstract interpretation as “automated theorem finding”. Unlike theorem provers such as Lean and Rocq, where you have to manually prove the properties you want, static analysis finds interesting facts that already exist in your program (and optimizers use them to make your program faster). Your static analysis pass(es) can annotate your IR nodes with little bits of information such as: If your static analysis is over SSA, then generally the static analysis is easier and (potentially) storing facts is easier. This is due to this property called sparseness . Where a static analysis over non-SSA programs has to store facts about all variables at all program points , an analysis over SSA need only store facts about all variables, independent of context. I sometimes describe this as “pushing time through the IR” but I don’t know that that makes a ton of sense. Potentially more subtle here is that we could represent the above IR snippet as a list of tuples, where instructions are related via some other table (say, a “variable environment”): Instead, though, we could allocate an object for each instruction and let them refer to one another by pointer (or index, if using Rust or something). Then they directly refer to one another (no need for a side-table), which might be faster and more compact. We can re-create nice names as needed for printing. Then, when optimizing, we look up the type information of an operand by directly reading a field ( or similar). Another thing to note: when you start adding type information to your IR, you’re going to start asking type information questions in your analysis. Questions such as “what type is this instruction?”, where “type” could span a semilattice, and even refer to a specific run-time object by its pointer. In that case, it’s important to ask the right questions . For example: instructions are likely not the only opcodes that could produce specific objects; if you have an instruction like , for example, that burns a specific expected pointer into the generated code, the type (and therefore the pointer) will come from the instruction. The big idea is that types represent a different slice of your IR than the opcodes and should be treated as such. Anyway, SSA only stores type information about instructions and does not encode information that we might later learn in the IR. With basic SSA, there’s not a good way to encode refinements… Static single information (SSI) form gives us new ways to encode metadata about instructions (variables). It was introduced by C. Scott Ananian in 1999 in his MS thesis (PDF). (I also discussed it briefly in the Scrapscript IR post .) Consider the following SSA program (represented as pseudo-Python): is undefined at . is defined and an integer at . But then we do something interesting: we split control flow based on the run-time value of . We can take this split to add new and interesting information to . For non-sparse analysis, we can record some fact on the side. That’s fine. When doing a dataflow analysis, we can keep track of the fact that at , is nonnegative, and at , is negative. This is neat: we can then determine that all paths to this function return a positive integer. Importantly, does not override the existing known type of . Instead, it is a refinement: a set intersection. A lattice meet. The middle bit of a Venn diagram containing two overlapping circles, and . If we want to keep our information sparse, though, we have to add a new definition to the IR. This is complicated (choose which variables to split, replace all uses, to maintain SSA, etc) but gives us new places to store information inside the IR . It means that every time we refer to , we know that it is nonnegative and every time we refer to , we know that it is negative. This information is independent of context! I should note that you can get a lot of the benefits of SSI without going “full SSI”. There is no need to split every variable at every branch, nor add a special new merge instruction. Okay, so we can encode a lot of information very sparsely in the IR. That’s neat. It’s powerful. But we should also be mindful that even in this very sparse representation, we are encoding information implicitly that we may not want to: execution order. In a traditional CFG representation, the instructions are already scheduled , or ordered. Normally this comes from the programmer in the original source form and is faithfully maintained. We get data use edges in an IR like SSA, but the control information is left implicit. Some forms of IR, however, seek to reify both data and control dependencies into the IR itself. One such IR design is sea of nodes (SoN), which was originally designed by Cliff Click during his PhD. In sea of nodes, every instruction gets its own vertex in the graph. Instructions have use edges to their operands, which can be either data or some other ordering property (control, effects, etc). The main idea is that IR nodes are by default unordered and are only ordered later, after effect analysis has removed a bunch of use edges. Per Vognsen also notes that there is another motivating example of sea of nodes: in the previous SSI example, the cannot be validly hoisted above the check. In a “normal” IR, this is implicit in the ordering. In a sea of nodes world, this is explicitly marked with an edge from the to the . I think Graal, for example, calls these nodes “Pi nodes”. I think I need to re-read the original paper, read a modern implementation (I get the feeling it’s not done exactly the same way anymore), and then go write more about it later. For now, see Simple , by Cliff Click and friends. It is an implementation in Java and a little book to go with it. Design neighbors include value dependence graphs (VDG), value state dependence graphs (VSDG), region value state dependence graphs (RVSDG), and program dependence graphs (PDG). Speaking of Cliff Click, I once heard/read something he said that sounded really interesting. Roughly, it was “elaborate the full semantics of the operation into the IR and let the optimizer sort it out”. That is, “open code” or “inline” your semantics. For example, don’t emit code for a generic add operation that you later specialize: Instead, emit code that replicates the written semantics of the operation, whatever that is for your local language. This can include optimistic fast paths: This has the advantage that you may end up with fewer specialized rewrite rules because constant propagation and branch folding take care of these specializations “for free”. You can even attach probabilities to more or less likely branches to offer outlining hints in case all of this is never specialized. Sure, the downside of this is that the generated IR might be bigger, so your optimizer might be slower—or worse, that your resulting generated code at the end might be bigger. But outlining, deduplication (functionalization?), and probably some other clever methods can help here. Similarly, Maxime Chevalier-Boisvert and Marc Feeley write about this (PDF) in the context of basic block versioning. If the runtime’s generic add functions is written in IR, then callers to that function can specialize “through it” by calling it in different basic block contexts. That more or less gets you call-site specialization “for free”. See Figure 4 from their paper (lightly edited by me), where I think dollar-prefixed variable names indicate special functions known to the compiler: This is nice if you are starting a runtime from scratch or have resources to devote to re-writing chunks of the runtime in your IR. Then, even in a method JIT, you can get your inlined language semantics by function (partial) inlining. There’s probably more in this vein to be explored right now and probably more to invent in the future, too. Some other potentially interesting concepts to think about include: Thank you to Chris Fallin , Hunter Goldstein, and Per Vognsen for valuable feedback on drafts of this post.

0 views
Corrode 4 months ago

Rust For Foundational Software

Ten years of stable Rust; writing this feels surreal. It’s only been yesterday that we all celebrated the 1.0 release of this incredible language. I was at Rust Week where Niko Matsakis gave his talk “Our Vision for Rust” in which he made a profound and insightful statement: Rust is a language for building foundational software . That deeply struck me. I highly recommend you read his blog post titled “Rust in 2025: Targeting foundational software” , which is a great summary on the topic. I wanted to expand on the idea and share what this means to corrode (and perhaps to a wider extent to Rust in the industry). First off, do we really need another term? After all, many people still think of Rust as a systems programming language first and foremost, so why can’t we just stick to “systems programming”? I believe the framing is all wrong. From the outside, “systems programming” might establish that it is about “building systems,” but the term is loaded with historical baggage that feels limiting and prohibitive. It creates an artificial distinction between systems programming and “other types of programming.” The mindset “We are not a systems programming company so we don’t need Rust” is common, but limiting. If I may be candid for a moment, I believe well-known systems-programming domains have a tendency to be toxic. Even the best developers in the world have had that experience. The first contribution that I had to the Linux kernel was some fix for the ext3 file system. It was a very emotional moment for me. I sent a patch to the Linux Kernel and then I saw an email response from Al Viro - one of those developers I’d only heard about and dreamed of meeting someday. He responded, ‘I’ve never seen code this bad in my life. You managed to introduce three new bugs in two new lines of code. People like you should never be allowed to get close to a keyboard again.’ That was my introduction to Linux. – Glauber Costa, co-founder of Turso, on the Top Shelf Podcast Glauber went on to work at Red Hat, Parallels, ScyllaDB, and Datadog on schedulers, databases, and performance optimizations, but just imagine how many capable developers got discouraged by similar early feedback or never even tried to contribute to the Linux kernel in the first place. I find that ironic because people once dismissed Linux itself as just a toy project . The Linux kernel wasn’t built in a day. People need time to learn. The whole idea of Rust is to enable everyone to build reliable and efficient software. To me, it’s about breaking down the barriers to entry and making larger parts of the software stack accessible to more people. You can sit with us. We are committed to providing a friendly, safe and welcoming environment for all – The Rust Code of Conduct That is also where the idea for corrode comes from: To cut through red tape in the industry. The goal is to gradually chip away at the crust of our legacy software with all its flaws and limitations and establish a better foundation for the future of infrastructure. To try and defy the ‘common wisdom’ about what the tradeoffs have to be . The term corrode is Latin for “to gnaw to bits, wear away.” 1 “I think ‘infrastructure’ is a more useful way of thinking about Rust’s niche than arguing over the exact boundary that defines ‘systems programming’.” “This is the essence of the systems Rust is best for writing: not flashy, not attention-grabbing, often entirely unnoticed. Just the robust and reliable necessities that enable us to get our work done, to attend to other things, confident that the system will keep humming along unattended.” – Graydon Hoare, 10 Years of Stable Rust: An Infrastructure Story In conversations with potential customers, one key aspect that comes up with Rust a lot is this perception that Rust is merely a systems programming language. They see the benefit of reliable software, but they don’t see how Rust could help them. I used to reply along the lines of Rust’s mantra: “empowering everyone to build reliable and efficient software” – and while I love this mission, it didn’t always “click” with people. In practice, my clients use Rust for a much broader range of software , not just kernel drivers and embedded code! They use Rust for writing software that underpins other software . Then I used to tell my own story: I did some C++ in the past, but I wouldn’t call myself a hardcore systems person. And yet, I help a lot of clients with really interesting and complex pieces of software. I ship code that is used by many people and companies like Google, Microsoft, AWS, and NVIDIA. Rust is a great enabler, a superpower, a fireflower . I found that my clients often don’t use Rust as a C++ replacement. Many of them don’t even have any C++ in production to begin with. They also don’t need to work on the hardware-software interface or spend their time in low-level code. What all clients have in common, however, is that the services they build with Rust are foundational to their core business . Rust is used for building platforms: systems which enable building other systems on top. These services need to be robust and reliable and serve as platforms for other code that might or might not be written in Rust. This is, in my opinion, the core value proposition of Rust: to build things that form the bedrock of critical infrastructure and must operate reliably for years. Rust is a day-2 language, i.e. it only starts to shine after taking a leap of faith and using it for an extensive period of time. In Rust, all of the problems that you have during the lifecycle of your application surface early in development. Once a service hits production, maintaining it is a breeze. There is very little on-call work. The focus should be on what Rust enables: a way to express very complicated ideas on a type-system level, which will help build complex abstractions through simple core mechanics: ownership, borrowing, lifetimes, and its trait system. This mindset takes away the focus from Rust as a C++ replacement and also explains why so many teams which use languages like Python, TypeScript, and Kotlin are attracted to Rust. What is less often talked about is that Rust is a language that enables people to move across domain boundaries: from embedded to cloud, from data science to developer tooling. Few other languages are so versatile and none offer the same level of correctness guarantees. If you know Rust, you can program simple things in all of these domains. But don’t we just replace “Systems Programming” with “Foundational Software”? Does using the term “Foundational Software” simply create a new limiting category? Crucially, foundational software is different from low-level software and systems software. For my clients, it’s all foundational. For example, building a data plane is foundational. Writing a media-processing pipeline is foundational. Rust serves as a catalyst: companies start using it for critical software but then, as they get more comfortable with the language, expand into using it in other areas of their business: I’ve seen it play out as we built Aurora DSQL - we chose Rust for the new dataplane components, and started off developing other components with other tools. The control plane in Kotlin, operations tools in TypeScript, etc. Standard “right tool for the job” stuff. But, as the team has become more and more familiar and comfortable with Rust, it’s become the way everything is built. A lot of this is because we’ve seen the benefits of Rust, but at least some is because the team just enjoys writing Rust. – Marc Brooker, engineer at Amazon Web Services in Seattle on lobste.rs That fully aligns with my experience: I find that teams become ambitious after a while. They reach for loftier goals because they can . The fact they don’t have to deal with security issues anymore enables better affordances. From my conversations with other Rustaceans, we all made the same observation: suddenly we can build more ambitious projects that we never dared tackle before. It feels to me as if this direction is more promising: starting with the foundational tech and growing into application-level/business-level code if needed/helpful. That’s better than the other way around, which often feels unnecessarily clunky. Once the foundations are in Rust, other systems can be built on top of it. Just because we focus on foundational software doesn’t mean we can’t do other things. But the focus is to make sure that Rust stays true to its roots. So, what is foundational software? It’s software that organizations deem critical for their success. It might be: and many, many more. All of these things power organizations and must not fail or at least do so gracefully . My clients and the companies I interviewed on our podcast all have one thing in common: They work on Rust projects that are not on the sideline, but front and center, and they shape the future of their infrastructure. Rust is useful in situations where the “worse is better” philosophy falls apart ; it’s a language for building the “right thing”: With the right thing, designers are equally concerned with simplicity, correctness, consistency, and completeness. I think many companies will choose Rust to build their future platforms on. As such, it competes with C++ as much as it does with Kotlin or Python. I believe that we should shift the focus away from memory safety (which many other languages also offer) and instead focus on the explicitness, expressiveness, and ecosystem of Rust that is highly competitive with these languages. It is a language for teams which want to build things right and are at odds with the “move fast and break things” philosophy of the past. Rust is future-looking. Backwards-compatibility is enforced by the compiler and many people work on the robustness aspect of the language. Dropbox was one of the first production users of Rust. They built their storage layer on top of it. At no point did they think about using Rust as a C++ replacement. Instead, they saw the potential of Rust as a language for building scalable and reliable systems. Many more companies followed suit: Amazon, Google, Microsoft, Meta, Discord, Cloudflare, and many more. These organizations build platforms and Rust, a tool for professional programmers, developed by world experts over more than a decade of hard work, is best equipped to fit the bill. Is Rust used for real? “At this point, we now know the answer: yes, Rust is used a lot. It’s used for real, critical projects to do actual work by some of the largest companies in our industry. We did good.” “[Rust is] not a great hobby language but it is a fantastic professional language, precisely because of the ease of refactors and speed of development that comes with the type system and borrow checker.” – Graydon Hoare, 10 Years of Stable Rust: An Infrastructure Story To build a truly industrial-strength ecosystem, we need to remember the professional software lifecycle, which is often measured in decades. Stability plays a big role in that. The fact that Rust has stable editions and a language specification is crucial for industry adoption. But Rust is more than just a compiler and its standard library: the tooling and wider ecosystem are equally important. To build foundational software, you need guarantees that vulnerabilities get fixed and that the ecosystem evolves and adapts to the customer’s needs. The ecosystem is still mostly driven by volunteers who maintain key libraries and tools in their free time. There is more to be said about supply-chain security and sustainability in a future post. Building foundational systems is rooted in the profound belief that the efforts will pay off in the long run because organizations and society will benefit from them for decades. We are building systems that will be used by people who may not even know they are using them, but who will depend on them every day; that’s critical infrastructure. And Rust allows us to do so with great ergonomics. Rust inherited the pragmatism from C++ and the purism from Haskell. Rust enables us to build sustainable software that stays within its means and is concerned about low resource usage. Systems where precision and correctness matter. Solutions that work across language boundaries and up and down the stack. Rust is a language for decades and my mission is to be a part of this shift. On to the next 10 years! Conveniently, it also has a ‘C’ and an ‘R’ in the name, which bridges both languages. ↩

0 views
Abhinav Sarkar 5 months ago

Reading Time Estimates for Pandoc Based Blog Generators

If you, like me, are one of those who have written their own static-site generators based on Pandoc using Haskell, or use a framework made on top of it 1 , this post provides a little code snippet to compute the reading time estimate for your blog posts. You can stick this in your setup at the right place, and get pretty accurate estimates 2 . This note was originally published on abhinavsarkar.net . The function takes a tree data structure, and recursively traverses it to get the count of textual words. For raw HTML code embedded in the tree, it parses the HTML code using the tagsoup library, and extracts the inner text after removing the tags. The words are counted after removing words that are composed of only punctuation. The function then divides the word count by a words-per-minute ( ) parameter — that I’ve set here to 220, close to the average reading speed of literate adult humans — to return the reading time estimate in minutes. You can change the parameter as per your liking. That’s it. I hope this is helpful to you. If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! Like Hakyll , Slick or Ema . ↩︎ I cross-checked the estimates with those provided by the Miniflux feed reader, and they are quite close. ↩︎ If you liked this note, please leave a comment .

0 views

Analyzing API Design via Algebraic Laws

The other day, someone asked: Why doesn’t [the Data.Map function] allow for different value types the way does? This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation. To answer the original question, what would the type of a different value type of look like? It would be something in the flavor of: But this new parameter is somewhat lossy, in that it gives the impression that it could be called with as parameters, which doesn’t fit into the vibe of being a “union.” So instead we could restrict that possibility by using : which seems reasonable enough. But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map? A is a particularly efficient implementation of functions with type . But why is this here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be . In fact, it helps us make sense of the right bias present in , where we see: This equality is hard to justify under the normal understanding of being an encoding of a function . But under the general monoid interpretation, we get a nice semigroup homomorphism: where the monoid in question has been specialized to be . Of course, we also have a monoid homomorphism: Let’s re-evaluate the original question in terms of this newly-generalized . Now that we’ve removed all of the unnecessary baggage of , we can again think about the desired type of : which looks awfully familiar . This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with as a parameter! We can give some semantics as to what ought to do again by relating it to the observation . The relevant law here seems like it ought to be: By choosing a degenerate function , say, , where is some value that is not , we can see the beginnings of a problem: Regardless of the key we lookup in our ed , we need to get back . How can we implement such a thing? I see only two ways: #1 is clearly a non-starter, given that we want our s to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this: Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism: Without loss of generalization, take (where is just with a constant function.) This gives us: Making this thing efficient is a further complication! We again have two options: #1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire . For example, consider a constructed a la: Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of . I find this quite an interesting result. I always assumed that (or at least, ) didn’t have an instance because it would require a constraint on its output—but that’s not the sort of thing we can express in Haskell. But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of , even if we could constrain the result. What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.

0 views
baby steps 6 months ago

Dyn you have idea for `dyn`?

Knock, knock. Who’s there? Dyn. Dyn who? Dyn you have ideas for ? I am generally dissatisfied with how in Rust works and, based on conversations I’ve had, I am pretty sure I’m not alone. And yet I’m also not entirely sure the best fix. Building on my last post, I wanted to spend a bit of time exploring my understanding of the problem. I’m curious to see if others agree with the observations here or have others to add. It’s worth stepping back and asking why we have in the first place. To my mind, there are two good reasons. The most important one is that it is sometimes strictly necessary. If you are, say, building a multithreaded runtime like or , you are going to need a list of active tasks somewhere, each of which is associated with some closure from user code. You can’t build it with an enum because you can’t enumerate the set of closures in any one place. You need something like a . The second reason is to help with compilation time. Rust land tends to lean really heavily on generic types and . There are good reasons for that: they allow the compiler to generate very efficient code. But the flip side is that they force the compiler to generate a lot of (very efficient) code. Judicious use of can collapse a whole set of “almost identical” structs and functions into one. Right now, both of these goals are expressed in Rust via , but actually they are quite distinct. For the first, you really want to be able to talk about having a . For the second, you might prefer to write the code with generics but compile in a different mode where the specifics of the type involved are erased, much like how the Haskell and Swift compilers work. Now that we have the two goals, let’s talk about some of the specific issues I see around and what it might mean for to be “better”. We’ll start with the cases where you really want a value. One interesting thing about this scenario is that, by definition, you are storing a explicitly. That is, you are not working with a where just happens to be . This is important because it opens up the design space. We talked about this some in the previous blog post: it means that You don’t need working with this to be exactly the same as working with any other that implements (in the previous post, we took advantage of this by saying that calling an async function on a trait had to be done in a context). For this pattern today you are almost certainly representing your task a or (less often) an . Both of these are “wide pointers”, consisting of a data pointer and a vtable pointer. The data pointer goes into the heap somewhere. In practice people often want a “flattened” representation, one that combines a vtable with a fixed amount of space that might, or might not, be a pointer. This is particularly useful to allow the equivalent of . Today implementing this requires unsafe code (the type is an example). Another way to reduce the size of a is to store the vtable ‘inline’ at the front of the value so that a is a single pointer. This is what C++ and Java compilers typically do, at least for single inheritance. We didn’t take this approach in Rust because Rust allows implementing local traits for foreign types, so it’s not possible to enumerate all the methods that belong to a type up-front and put them into a single vtable. Instead, we create custom vtables for each (type, trait) pair. Right now traits cannot have methods. This means for example you cannot have a closure. You can workaround this by using a method, but it’s annoying: One specific thing that hits me fairly often is that I want the ability to clone a value: This is a hard one to fix because the trait can only be implemented for types. But dang it would be nice. Building on the above, I would like to have traits that have methods with generic parameters. I’m not sure how flexible this can be, but anything I can get would be nice. The simplest starting point I can see is allowing the use of in argument position: Today this method is not dyn compatible because we have to know the type of the parameter to generate a monomorphized copy, so we cannot know what to put in the vtable. Conceivably, if the trait were dyn compatible, we could generate a copy that takes (effectively) a – except that this wouldn’t quite work, because is short for , and is not . But maybe we could finesse it. If we support in argument position, it would be nice to support it in return position. This of course is approximately the problem we are looking to solve to support dyn async trait: Beyond this, well, I’m not sure how far we can stretch, but it’d be nice to be able to support other patterns too. One last point is that sometimes in this scenario I don’t need to be able to access all the methods in the trait. Sometimes I only have a few specific operations that I am performing via . Right now though all methods have to be dyn compatible for me to use them with . Moreover, I have to specify the values of all associated types, lest they appear in some method signature. You can workaround this by factoring out methods into a supertrait, but that assumes that the trait is under your control, and anyway it’s annoying. It’d be nice if you could have a partial view onto the trait. So what about the case where generics are fine, good even, but you just want to avoid generating quite so much code? You might also want that to be under the control of your user. I’m going to walk through a code example for this section, showing what you can do today, and what kind of problems you run into. Suppose I am writing a custom iterator method, , which returns an iterator that alternates between items from the original iterator and the result of calling a function. I might have a struct like this: The impl itself might look like this: Now an iterator will be if the base iterator and the closure are but not otherwise. The iterator and closure will be able to use of references found on the stack, too, so long as the itself does not escape the stack frame. Great! But suppose I am trying to keep my life simple and so I would like to write this using traits: You’ll notice that this definition is somewhat simpler. It looks more like what you might expect from . The function and the are also simpler: There a problem, though: this code won’t compile! If you try, you’ll find you get an error in this function: The reason is that traits have a default lifetime bound. In the case of a , the default is . So e.g. the field has type . This means the closure and iterators can’t capture references to things. To fix that we have to add a somewhat odd lifetime bound: OK, this looks weird, but it will work fine, and we’ll only have one copy of the iterator code per output type instead of one for every (base iterator, closure) pair. Except there is another problem: the iterator is never considered . To make it , you would have to write and , but then you couldn’t support non -Send things anymore. That stinks and there isn’t really a good workaround. Ordinary generics work really well with Rust’s auto trait mechanism. The type parameters and capture the full details of the base iterator plus the closure that will be used. The compiler can thus analyze a to decide whether it is or not. Unfortunately really throws a wrench into the works – because we are no longer tracking the precise type, we also have to choose which parts to keep (e.g., its lifetime bound) and which to forget (e.g., whether the type is ). This gets at another point. Even ignoring the issue, the type is not ideal. It will make fewer copies, but we still get one copy per item type, even though the code for many item types will be the same. For example, the compiler will generate effectively the same code for as or even . It’d be cool if we could have the compiler go further and coallesce code that is identical. 1 Even better if it can coallesce code that is “almost” identical but pass in a parameter: for example, maybe the compiler can coallesce multiple copies of by passing the size of the type in as an integer variable. I really like using in argument position. I find code like this pretty easy to read: But if I were going to change this to use I can’t just change from to , I have to add some kind of pointer type: This then disturbs callers, who can no longer write: but now must write this You can work around this by writing some code like this… but to me that just begs the question, why can’t the compiler do this for me dang it? In the iterator example I was looking at a struct definition, but with (and in the future with ) these same issues arise quickly from functions. Consider this async function: If you rewrite this function to use , though, you’ll find the resulting future is never send nor sync anymore: This has been a useful mental dump, I found it helpful to structure my thoughts. One thing I noticed is that there is kind of a “third reason” to use – to make your life a bit simpler. The versions of that used and felt simpler to me than the fully parameteric versions. That might be best addressed though by simplifying generic notation or adopting things like implied bounds. Some other questions I have: If the code is byte-for-byte identical, In fact LLVM and the linker will sometimes do this today, but it doesn’t work reliably across compilation units as far as I know. And anyway there are often small differences.  ↩︎

0 views
Andre Garzia 7 months ago

Why I choose Lua for this blog

This blog used to run using with a stack based on Racket using Pollen and lots of hacks on top of it . At some point I realised that my setup was working against me. The moving parts and workflow I created added too much friction to keep my blog active. That happened mostly because it was a static generator trying to behave as if it was dynamic website with an editing interface. That can be done really well — cue Grav CMS — but that was not the case for me. Once I decided to rewrite this blog as a simpler system, I faced the dilema of what stack to choose. The obvious choice for me would be Javascript, it is the language I use more often and one that I am quite confortable with. Still, I don't think it is a wise choice for the kind of blog I want to maintain. Talking to some friends recently, I noticed that many people I know that have implemented their own blogging systems face many challenges keeping them running over many years. Not because it is hard to keep software running, but because their stack of choice is moving faster than their codebase. This problem is specially prevalent in the Javascript world. It is almost a crime that JS as understood by the browser is this beautiful language with extreme retrocompatibility, while JS as understood and used by the current tooling and workflows is this mess moving at lightspeed. Let me unpack that for a bit. You can open a web page from 1995 on your browser of choice and it will just work because browser vendors try really hard to make sure they don't break the web. Developers who built the whole ecosystem of NodeJS, NPM, and all those libraries and frameworks don't share the same ethos. They all make a big case of semantic versioning and thus being able to handle breaking changes, but they have breaking changes all the time. You'd be hardpressed to actually run some JS code from ten years ago based on NodeJS and NPM. There is a big chance that dependencies might be gone, broken, or it might be incompatible with the current NodeJS. I know this sounds like FUD, and that for many many projects, maybe even most projects, that will not be the case. But I heard from many people that keeping their blogging systems up to date requires a lot more work than they would like to do and if they don't, then they're screwed. That is also true about other languages even though many of them move at a slower speed. A friend recently complained about a blogging system he implemented that requires Ruby 2.0 and that keeping that running sucks. I want a simpler blogging system; one that requires minimal changes over time. Lua is a wonderful and nimble language that is often misunderstood . One characteristic that I love about it, is that is evolves very slowly. Lua 5.1 was introduced in 2006, Lua 5.4 which is the current version initial release was in 2020. Yes, there are point released in between, but you can see how much slower it moves when compared to JS. The differences between Lua 5.1 and Lua 5.4 are minimal when compared with how much other languages changed in the same time period. Lua only requires a C89 compiler to bootstrap itself. It is very easy to make Lua work and even easier to make it interface with something. JS is a lot larger than Lua, there is more to understand and more to remember. My blog needs are very simple and Lua can handle them with ease. This is an old-school blog. I uses cgi-bin — aka Comon Gateway Interface — scripts to run it. It is a dynamic website with a SQLite database holding its data. When you open a page, it fetches the data from a database and assembles a HTML to send to the browser using Mustache templates. One process per request. Like the old days. You might argue that if I went with NodeJS, I'd be able to serve more requests using fewer resources. That is true. I don't need to serve that many requests though. My peak access was a couple years ago with 50k visitors on a week, even my old Racket blog could handle that fine. The Lua one should handle it too; and if it breaks it breaks. I'm a flawed human being, my code can be flawed too, we're in this together, holding hands. Your blog is your place to experiment and program how you want it. You can drop the JS fatigue, you can drop your fancy Haskell types, you can just do whatever you find fun and keep going (and that includes JS and Haskell if that's your thing. You do you). Cause I'm using Lua, I don't have as many libraries and frameworks available to me as JS people have, but I still have quite a large collection via Luarocks . I try not to add many dependencies to my blog. At the moment there are about ten and that is mostly because Lua is a batteries-not-included language so you start from a minimal core and build things up to suit your needs. For a lot of things I went with the questionable choice of implementing things myself. I got my own little CGI library. It is 200 lines long and does the bare minimum to make this blog work. I got my own little libraries for many things. Micropub and IndieAuth were all implemented by hand. At the moment I'm despairing frustrated having a lot of fun implementing WebMentions . Doing the Microformats2 exorcism extraction on my own is teaching me a lot of things. What I want to say is that by choosing a small language that moves very slowly and very few dependencies, I can keep all of my blogging system in my head. I can make sure it will run without too much change for the next ten or twenty years. Lua is a lego set, a toolkit, it adapts to you and your needs. I don't need to keep chasing the new shiny or the latest framework du jour. I can focus on making the features I want and actually understanding how they work. Instead of installing a single dependency in another language and it pulling a hundred of other small dependencies all of which were transpiled into something the engine understands to the point that understanding how all the pieces work and fit together takes more time than to learn a new language, I decided to keep things simple. I got 29 Luarocks installed here and that is for all my Lua projects in this machine. That is my blog, my game development, my own work scripts for my day job. Not even half of those are for my blog. I often see wisdom in websites such as Hacker News and Lobsters around the idea of "choosing boring" because it is proven, safe, easier to maintain. I think that boring is not necessarily applicable to my case. I don't find Lua boring at all, but all that those blog posts talk about that kind of mindset are all applicable to my own choices here. Next time you're building your own blogging software, consider for a bit for how long do you want to maintain it. I first started blogging on macOS 8 in 2001. I choose badly many times and in the end couldn't keep my content moving forward in time with me as softwares I used or created became impossible to run. The last two changes: from JS to Racket and from Racket to Lua have been a lot safer and I managed to carry all my content forward into increasingly simpler setups and workflows. My blogging system is not becoming more complex over the years, it is becoming smaller, because with each change I select a stack that is more nimble and smaller than the one I had before. I don't think I can go smaller than Lua though. By small I mean: I chose Lua because of all that, and I'm happy with it and hope this engine will see me through the next ten or so years. A language I can fully understand and keep on my head. A language that I know how to build the engine and can do it if needed. An engine that requires very few resources and is easy to interface with third-party libraries in native code.

0 views

Bidirectional Instance Contexts

Just a quick one today, but I wanted to point out a little trick you can do with Haskell’s typeclass inference. Imagine we have some little class, the details of which matter not in the least: We can give some instances of this type: Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context: Then, of course, if we know both and , we can infer . To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the type, and thus the following program will typecheck: Perhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck: But why should it not typecheck? 1 Recall from the relevant instance definition that these instances must, in fact, exist: As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances. The trick is to add an associated type family to , and to use it as a superclass constraint: Because we’ve given a default implementation of the type family, our existing simple instances work as before: with the only change required coming from the type constructor instances: or, if we you want to be cute about it: By sticking into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a dictionary around. And our earlier program now typechecks as expected. This is all available in a play session if you’d like to fool around with it. Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever. ↩︎

0 views
baby steps 8 months ago

How I learned to stop worrying and love the LLM

I believe that AI-powered development tools can be a game changer for Rust—and vice versa. At its core, my argument is simple: AI’s ability to explain and diagnose problems with rich context can help people get over the initial bump of learning Rust in a way that canned diagnostics never could, no matter how hard we try. At the same time, rich type systems like Rust’s give AIs a lot to work with, which could be used to help them avoid hallucinations and validate their output. This post elaborates on this premise and sketches out some of the places where I think AI could be a powerful boost. Is Rust good for every project? No, of course not. But it’s absolutely great for some things—specifically, building reliable, robust software that performs well at scale. This is no accident. Rust’s design is intended to surface important design questions (often in the form of type errors) and to give users the control to fix them in whatever way is best. But this same strength is also Rust’s biggest challenge. Talking to people within Amazon about adopting Rust, perceived complexity and fear of its learning curve is the biggest hurdle. Most people will say, “Rust seems interesting, but I don’t need it for this problem” . And you know, they’re right! They don’t need it. But that doesn’t mean they wouldn’t benefit from it. One of Rust’s big surprises is that, once you get used to it, it’s “surprisingly decent” at very large number of things beyond what it was designed for. Simple business logic and scripts can be very pleasant in Rust. But the phase “once you get used to it” in that sentence is key, since most people’s initial experience with Rust is confusion and frustration . Some languages are geared to say yes —that is, given any program, they aim to run it and do something . JavaScript is of course the most extreme example (no semicolons? no problem!) but every language does this to some degree. It’s often quite elegant. Consider how, in Python, you write to get the last element in the list: super handy! Rust is not (usually) like this. Rust is geared to say no . The compiler is just itching for a reason to reject your program. It’s not that Rust is mean: Rust just wants your program to be as good as it can be. So we try to make sure that your program will do what you want (and not just what you asked for). This is why , in Rust, will panic: sure, giving you the last element might be convenient, but how do we know you didn’t have an off-by-one bug that resulted in that negative index? 1 But that tendency to say no means that early learning can be pretty frustrating. For most people, the reward from programming comes from seeing their program run—and with Rust, there’s a lot of niggling details to get right before your program will run. What’s worse, while those details are often motivated by deep properties of your program (like data races), the way they are presented is as the violation of obscure rules, and the solution (“add a ”) can feel random. Once you get the hang of it, Rust feels great, but getting there can be a pain. I heard a great phrase from someone at Amazon to describe this: “Rust: the language where you get the hangover first”. 3 My favorite thing about working at Amazon is getting the chance to talk to developers early in their Rust journey. Lately I’ve noticed an increasing trend—most are using Q Developer. Over the last year, Amazon has been doing a lot of internal promotion of Q Developer, so that in and of itself is no surprise, but what did surprise me a bit is hearing from developers the way that they use it. For most of them, the most valuable part of Q Dev is authoring code but rather explaining it. They ask it questions like “why does this function take an and not an ?” or “what happens when I move a value from one place to another?”. Effectively, the LLM becomes an ever-present, ever-patient teacher. 4 Some time back I sat down with an engineer learning Rust at Amazon. They asked me about an error they were getting that they didn’t understand. “The compiler is telling me something about , what does that mean?” Their code looked something like this: And the compiler was telling them : This is a pretty good error message! And yet it requires significant context to understand it (not to mention scrolling horizontally, sheesh). For example, what is “borrowed data”? What does it mean for said data to “escape”? What is a “lifetime” and what does it mean that “ must outlive ”? Even assuming you get the basic point of the message, what should you do about it? Ultimately, the answer to the engineer’s problem was just to insert a call to 5 . But deciding on that fix requires a surprisingly large amount of context. In order to figure out the right next step, I first explained to the engineer that this confusing error is, in fact, what it feels like when Rust saves your bacon , and talked them through how the ownership model works and what it means to free memory. We then discussed why they were spawning a task in the first place (the answer: to avoid the latency of logging)—after all, the right fix might be to just not spawn at all, or to use something like rayon to block the function until the work is done. Once we established that the task needed to run asynchronously from its parent, and hence had to own the data, we looked into changing the function to take an so that it could avoid a deep clone. This would be more efficient, but only if the caller themselves could cache the somewhere. It turned out that the origin of this string was in another team’s code and that this code only returned an . Refactoring that code would probably be the best long term fix, but given that the strings were expected to be quite short, we opted to just clone the string. An error message is often your first and best chance to teach somebody something.—Esteban Küber (paraphrased) Working through this error was valuable. It gave me a chance to teach this engineer a number of concepts. I think it demonstrates a bit of Rust’s promise—the idea that learning Rust will make you a better programmer overall, regardless of whether you are using Rust or not. Despite all the work we have put into our compiler error messages, this kind of detailed discussion is clearly something that we could never achieve. It’s not because we don’t want to! The original concept for , for example, was to present a customized explanation of each error was tailored to the user’s code. But we could never figure out how to implement that. And yet tailored, in-depth explanation is absolutely something an LLM could do. In fact, it’s something they already do, at least some of the time—though in my experience the existing code assistants don’t do nearly as good a job with Rust as they could. Emery Berger is a professor at UMass Amherst who has been exploring how LLMs can improve the software development experience. Emery emphasizes how AI can help close the gap from “tool to goal”. In short, today’s tools (error messages, debuggers, profilers) tell us things about our program, but they stop there. Except in simple cases, they can’t help us figure out what to do about it—and this is where AI comes in. When I say AI, I am not talking (just) about chatbots. I am talking about programs that weave LLMs into the process, using them to make heuristic choices or proffer explanations and guidance to the user. Modern LLMs can also do more than just rely on their training and the prompt: they can be given access to APIs that let them query and get up-to-date data. I think AI will be most useful in cases where solving the problem requires external context not available within the program itself. Think back to my explanation of the error, where knowing the right answer depended on how easy/hard it would be to change other APIs. I’ve thought about a lot of places I think AI could help make working in Rust more pleasant. Here is a selection. Consider this code: This function will give a type error, because the signature (thanks to lifetime elision) promises to return a string borrowed from but actually returns a string borrowed from . Now…what is the right fix? It’s very hard to tell in isolation! It may be that in fact the code was meant to be (in which case the current signature is correct). Or perhaps it was meant to be something that sometimes returns and sometimes returns , in which case the signature of the function was wrong. Today, we take our best guess. But AI could help us offer more nuanced guidance. People often ask me questions like “how do I make a visitor in Rust?” The answer, of course, is “it depends on what you are trying to do”. Much of the time, a Java visitor is better implemented as a Rust enum and match statements, but there is a time and a place for something more like a visitor. Guiding folks through the decision tree for how to do non-trivial mappings is a great place for LLMs. When I start writing a Rust program, I start by authoring type declarations. As I do this, I tend to think ahead to how I expect the data to be accessed. Am I going to need to iterate over one data structure while writing to another? Will I want to move this data to another thread? The setup of my structures will depend on the answer to these questions. I think a lot of the frustration beginners feel comes from not having a “feel” yet for the right way to structure their programs. The structure they would use in Java or some other language often won’t work in Rust. I think an LLM-based assistant could help here by asking them some questions about the kinds of data they need and how it will be accessed. Based on this it could generate type definitions, or alter the definitions that exist. A follow-on to the previous point is that, in Rust, when your data access patterns change as a result of refactorings, it often means you need to do more wholesale updates to your code. 6 A common example for me is that I want to split out some of the fields of a struct into a substruct, so that they can be borrowed separately. 7 This can be quite non-local and sometimes involves some heuristic choices, like “should I move this method to be defined on the new substruct or keep it where it is?”. When you run the command today it will automatically apply various code suggestions to cleanup your code. With the upcoming Rust 2024 edition , will do the same but for edition-related changes. All of the logic for these changes is hardcoded in the compiler and it can get a bit tricky. For editions, we intentionally limit ourselves to local changes, so the coding for these migrations is usually not too bad, but there are some edge cases where it’d be really useful to have heuristics. For example, one of the changes we are making in Rust 2024 affects “temporary lifetimes”. It can affect when destructors run. This almost never matters (your vector will get freed a bit earlier or whatever) but it can matter quite a bit, if the destructor happens to be a lock guard or something with side effects. In practice when I as a human work with changes like this, I can usually tell at a glance whether something is likely to be a problem—but the heuristics I use to make that judgment are a combination of knowing the name of the types involved, knowing something about the way the program works, and perhaps skimming the destructor code itself. We could hand-code these heuristics, but an LLM could do it and better, and if could ask questions if it was feeling unsure. Now imagine you are releasing the 2.x version of your library. Maybe your API has changed in significant ways. Maybe one API call has been broken into two, and the right one to use depends a bit on what you are trying to do. Well, an LLM can help here, just like it can help in translating idioms from Java to Rust. I imagine the idea of having an LLM help you migrate makes some folks uncomfortable. I get that. There’s no reason it has to be mandatory—I expect we could always have a more limited, precise migration available. 8 Premature optimization is the root of all evil, or so Donald Knuth is said to have said. I’m not sure about all evil, but I have definitely seen people rathole on microoptimizing a piece of code before they know if it’s even expensive (or, for that matter, correct). This is doubly true in Rust, where cloning a small data structure (or reference counting it) can often make your life a lot simpler. Llogiq’s great talks on Easy Mode Rust make exactly this point. But here’s a question, suppose you’ve been taking this advice to heart, inserting clones and the like, and you find that your program is running kind of slow? How do you make it faster? Or, even worse, suppose that you are trying to turn our network service. You are looking at the blizzard of available metrics and trying to figure out what changes to make. What do you do? To get some idea of what is possible, check out Scalene , a Python profiler that is also able to offer suggestions as well (from Emery Berger’s group at UMass, the professor I talked about earlier). Let’s look a bit to the future. I want us to get to a place where the “minimum bar” for writing unsafe code is that you test that unsafe code with some kind of sanitizer that checks for both C and Rust UB—something like miri today, except one that works “at scale” for code that invokes FFI or does other arbitrary things. I expect a smaller set of people will go further, leveraging automated reasoning tools like Kani or Verus to prove statically that their unsafe code is correct 9 . From my experience using miri today, I can tell you two things. (1) Every bit of unsafe code I write has some trivial bug or other. (2) If you enjoy puzzling out the occasionally inscrutable error messages you get from Rust, you’re gonna love miri! To be fair, miri has a much harder job—the (still experimental) rules that govern Rust aliasing are intended to be flexible enough to allow all the things people want to do that the borrow checker doesn’t permit. This means they are much more complex. It also means that explaining why you violated them (or may violate them) is that much more complicated. Just as an AI can help novices understand the borrow checker, it can help advanced Rustaceans understand tree borrows (or whatever aliasing model we wind up adopting). And just as it can make smarter suggestions for whether to modify the function body or its signature, it can likely help you puzzle out a good fix. Anyone who has used an LLM-based tool has encountered hallucinations, where the AI just makes up APIs that “seem like they ought to exist”. 10 And yet anyone who has used Rust knows that “if it compiles, it works” is true may more often than it has a right to be. 11 This suggests to me that any attempt to use the Rust compiler to validate AI-generated code or solutions is going to also help ensure that the code is correct. AI-based code assistants right now don’t really have this property. I’ve noticed that I kind of have to pick between “shallow but correct” or “deep but hallucinating”. A good example is statements. I can use rust-analyzer to fill in the match arms and it will do a perfect job, but the body of each arm is . Or I can let the LLM fill them in and it tends to cover most-but-not-all of the arms but it generates bodies. I would love to see us doing deeper integration, so that the tool is talking to the compiler to get perfect answers to questions like “what variants does this enum have” while leveraging the LLM for open-ended questions like “what is the body of this arm”. 12 Overall AI reminds me a lot of the web around the year 2000. It’s clearly overhyped. It’s clearly being used for all kinds of things where it is not needed. And it’s clearly going to change everything. If you want to see examples of what is possible, take a look at the ChatDBG videos published by Emery Berger’s group. You can see how the AI sends commands to the debugger to explore the program state before explaining the root cause. I love the video debugging bootstrap.py , as it shows the AI applying domain knowledge about statistics to debug and explain the problem. My expectation is that compilers of the future will not contain nearly so much code geared around authoring diagnostics. They’ll present the basic error, sure, but for more detailed explanations they’ll turn to AI. It won’t be just a plain old foundation model, they’ll use RAG techniques and APIs to let the AI query the compiler state, digest what it finds, and explain it to users. Like a good human tutor, the AI will tailor its explanations to the user, leveraging the user’s past experience and intuitions (oh, and in the user’s chosen language). I am aware that AI has some serious downsides. The most serious to me is its prodigous energy use, but there are also good questions to be asked about the way that training works and the possibility of not respecting licenses. The issues are real but avoiding AI is not the way to solve them. Just in the course of writing this post, DeepSeek was announced, demonstrating that there is a lot of potential to lower the costs of training. As far as the ethics and legality, that is a very complex space. Agents are already doing a lot to get better there, but note also that most of the applications I am excited about do not involve writing code so much as helping people understand and alter the code they’ve written. We don’t always get this right. For example, I find the combinator of iterators annoying because it takes the shortest of the two iterators, which is occasionally nice but far more often hides bugs.  ↩︎ The irony, of course, is that AI can help you to improve your woeful lack of tests by auto-generating them based on code coverage and current behavior.  ↩︎ I think they told me they heard it somewhere on the internet? Not sure the original source.  ↩︎ Personally, the thing I find most annoying about LLMs is the way they are trained to respond like groveling serveants. “Oh, that’s a good idea! Let me help you with that” or “I’m sorry, you’re right I did make a mistake, here is a version that is better”. Come on, I don’t need flattery. The idea is fine but I’m aware it’s not earth-shattering. Just help me already.  ↩︎ Inserting a call to is actually a bit more subtle than you might think, given the interaction of the future here.  ↩︎ Garbage Collection allows you to make all kinds of refactorings in ownership structure without changing your interface at all. This is convenient, but—as we discussed early on—it can hide bugs. Overall I prefer having that information be explicit in the interface, but that comes with the downside that changes have to be refactored.  ↩︎ I also think we should add a feature like View Types to make this less necessary. In this case instead of refactoring the type structure, AI could help by generating the correct type annotations, which might be non-obvious.  ↩︎ My hot take here is that if the idea of an LLM doing migrations in your code makes you uncomfortable, you are likely (a) overestimating the quality of your code and (b) underinvesting in tests and QA infrastructure 2 . I tend to view an LLM like a “inconsistently talented contributor”, and I am perfectly happy having contributors hack away on projects I own.  ↩︎ The student asks, “When unsafe code is proven free of UB, does that make it safe?” The master says, “Yes.” The student asks, “And is it then still unsafe?” The master says, “Yes.” Then, a minute later, “Well, sort of.” (We may need new vocabulary.)  ↩︎ My personal favorite story of this is when I asked ChatGPT to generate me a list of “real words and their true definition along with 2 or 3 humorous fake definitions” for use in a birthday party game. I told it that “I know you like to hallucinate so please include links where I can verify the real definition”. It generated a great list of words along with plausible looking URLs for merriamwebster.com and so forth—but when I clicked the URLs, they turned out to all be 404s (the words, it turned out, were real—just not the URLs).  ↩︎ This is not a unique property of Rust, it is shared by other languages with rich type systems, like Haskell or ML. Rust happens to be the most widespread such language.  ↩︎ I’d also like it if the LLM could be a bit less interrupt-y sometimes. Especially when I’m writing type-system code or similar things, it can be distracting when it keeps trying to author stuff it clearly doesn’t understand. I expect this too will improve over time—and I’ve noticed that while, in the beginning, it tends to guess very wrong, over time it tends to guess better. I’m not sure what inputs and context are being fed by the LLM in the background but it’s evident that it can come to see patterns even for relatively subtle things.  ↩︎

0 views

Use Monoids for Construction

There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well. This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods: Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this: To be more concrete, maybe instead of doodads and widgets you have s and s. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code. Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like: Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark. The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and s all have in common is that they are algorithms for strategies for building something. But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at. I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one? which I found by grepping through for , and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests! The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces. The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.” As an amazingly fast recap, a monoid is a collection of three things: some type , some value of that type , and binary operation over that type , subject to a bunch of laws: which is to say, does nothing and doesn’t care where you stick the parentheses. If you’re going to memorize any two particular examples of monoids, it had better be these two: The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids. The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation . Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap. The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example: and under this isomorphism, we should thereby expect the instance to agree with . If you generalize this out, you get the following instance: which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation: which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type in the domain of the function, but showing it generically is challenging. Anyway, I digresss. The reason to memorize this instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be. Don’t believe me? A is an encoding of the function optimized to efficiently query which values map to something. That is to say, it’s a sparse representation of a function. What does all of this look like in practice? Stuff like worrying about is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application. The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things. At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well. It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these nodes, and propagating their information downstream. Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement. The insight here is that everything we needed to collect was monoidal: where the stanza gives us the semigroup and monoid instances that we’d expect from being the product of a bunch of other monoids. And now for the coup de grace : we hook everything up with the monad. is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details: is a monad whenever is a monoid , which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions: each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type: which traverses the , recursing with a different whenever it comes across a constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces. This new implementation is roughly 20x smaller, coming in at @O (500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation. Chalk it down to another win for induction!

0 views