Evolving Languages Faster with Type Tailoring
Programming languages are too slow! I’m not talking about execution speed—I’m talking about evolution speed. Programmers are always building new libraries and embedded DSLs, but the host programming language—particularly its type system—doesn’t understand the domain-specific aspects of these things. Consider regular expressions—most programmers would understand that a regular expression like , if it matches, will capture two digits in the first capture group. If I try to write this code in Rust or Typed Racket, the type checker complains: The syntax is a type union of types and , whatever types those are. The equivalent of in Typed Racket is . The problem is that getting the first capture group ( in Rust, in Typed Racket) returns an optional type ( in Rust, . The thing is, we know that since the regex match succeeded on line 5, (or ) should definitely succeed because there was one capture group in the regex we matched. Instead, we’re forced to unwrap in Rust: Likewise, in Typed Racket, we have to insert some casts and checks manually to convince the type checker that this code should run. This is a small example; the key point is this: type systems usually aren’t smart enough to know about the structure of regexes —all the compiler sees are opaque strings. This goes for things beyond regular expressions. Consider SQL queries, which are so often embedded as strings in languages: when a query goes wrong, you usually only find out at runtime. What would a solution look like? Some people have tried making a new way of writing regexes with composable objects that the type system has a better chance of understanding; that is a neat approach, but that both requires me to rewrite my program and doesn’t solve the issue of indexing into lists of known size (the results of a capture) more efficiently. Another option would be to make the type system smarter. But that too has its drawbacks: language implementers are often leery of tinkering with the type checker—and rightly so! So much relies on the type checker being correct. Moreover, even if you did make the type checker able to understand regexes, someone’s going to build a new library tomorrow that will stump your type checker just as before. Here’s a more attractive option: we can use the metaprogramming tools the language provides to teach the type system some new tricks. This is something that end-users of languages can do without waiting for the language designer. We call this type tailoring . Type Tailoring is the title and subject of a paper I wrote with my advisor Ben Greenman and our coauthors Stephen Chang and Matthias Felleisen . It has been accepted at European Conference on Object-Oriented Programming (ECOOP). Like the ACM conference OOPSLA, ECOOP has in recent years focused on more than object-oriented programming. The name stuck around. ¯\_(ツ)_/¯ You can get a preprint here . Here’s a high-level sketch of how we would solve the problem: Some years ago, my advisor made the library for Typed Racket. It can tailor the following code so that it typechecks and runs efficiently: The library is available as a Racket package. If you have Racket installed on your system, run to install it. For specific details on how the library works, see § Appendix: How does the trivial library work? at the end of this post. The problem is that, like Rust, Typed Racket must assign an overly conservative type to the result of matching a regular expression. Consequently, the programmer has to insert casts. The library can analyze Typed Racket and insert these casts and checks automatically . The end-result for the user is that this code Just Works™ as you would expect. Notice that the library is a library —it doesn’t require modifications to the compiler or type checker or anything. This means that normal users of programming languages can create their own tailorings without breaking the compiler or messing with the build pipeline. What do you need to make type tailoring work? Let’s step back a second and look at what we need to do in the first place. Our problem is that the type checker doesn’t know as much about our program as we do. What we can do to teach the type checker is program the elaboration step : surface syntax typically doesn’t have type annotations at every point; elaboration takes the syntax that a programmer writes, and adds types and type holes wherever needed. This elaborated syntax gets sent off to the constraint solver for type checking and inference. How do we program the elaboration step? Almost all languages that have macros do type checking after macroexpansion. This is crucial for type tailoring. We can write macros that add checks, casts, type annotations, or whatever else we need to make the type checker happy. Here are the key features that you must have to make type tailoring work: These are the essential elements, without which tailoring can’t happen. Besides these three things, you also will want some of the following tailoring features: No language supports all of these features—that’s exciting because it means there’s room to explore! In our paper we go into detail about each of these features and the kinds of tailoring they enable. We also have a chart showing how a handful of languages stack up against each other. We invented the term “type tailoring”, but we didn’t invent the idea—programmers have wanted to teach their type systems how to understand domain-specific concerns for a long time. Here are just a few existing projects we found that were doing type tailoring: Rust’s SQLx library reaches out to the database at compile-time to check if the schema in the code matches how the database is set up. This will warn you at compile-time if your query is malformed. Julia’s StaticArrays package rewrites lists of a static, known size into tuples. This lets the compiler track how long the lists are and automatically eliminates lots of bounds checks—handy when you’re doing lots of numerical work. Elixir’s Phoenix web framework will check routes in your template files against your route handler; if you make a typo or forget to implement a handler for a route, Phoenix will warn you at compile-time. This feature is called verified routes . Again, that’s just a small sample. Please see our paper for details on how these projects are using type tailoring, as well as for more examples that we found. The big contributions of our paper are: We introduce the term type tailoring . The ideas have appeared in many forms across many languages, but there hasn’t been any underlying rationale unifying their efforts. Now that we’ve identified the phenomenon, we can talk about and support it directly and consciously. We identified the main things you need to make tailoring work. Language designers can use this to build in better support for type tailoring in their languages. We show users how tailorings can balance ease-of-use with features typically only found in dependent type systems. Furthermore, we built two libraries: for Racket—which tailors things like vectors, regular expressions, etc., and for Rhombus —which turns Rhombus into a gradually-typed language through a tailoring. We expect more will be built in the future. Again, please see our paper for all the details. Our paper comes with an artifact that contains all the code in the paper. You can simply download a Docker container to run the code and verify all our claims. Yay for reproducible research! If you have any questions, feel free to email me. (Email in the paper, as well as here on my blog .) If you’re going to ECOOP in Vienna this year, let me know and we can talk in person there! Here is the example with the library that we saw in § Sketch of a solution : Normally, Typed Racket rejects the code because the type of is too general to be applied to . Here is how the library tailors the example to make it work: First, it overrides the implicit form that wraps literal values like the string ; this lets it read the string and collect any interesting information about it at compile time. The library sees that the string has one set of matched parentheses; moreover, the pattern inside the parentheses consists entirely of digits. It attaches this information as a syntax property to the syntax object for that string. This information gets propagated to all occurrences of the identifier . The library also overrides , so that it looks at the syntax properties on its first argument. In this case, it sees that is a string with one capture group. The library updates the return type of from to . In the true branch of the statement, Typed Racket is automatically able to refine the type of to . The library overrides to check the type of its argument; it sees that is long enough for this call to succeed, so it tailors this to a faster, unsafe lookup function. also gets overridden to look at the information about the match. Since step 2 was able to see that the match consists only of digits, it updates its type from returning to . That’s a lot going on! The neat thing is that is able to do all this in a fairly generalized way: one component works with strings, another works with regular expressions, and another works with lists of known size. They’re able to share all this information through syntax properties which respect the scoping rules of Typed Racket. It also plays nicely with other metaprogrammings; we could have written a macro that e.g., turns into and flips the branches, but the information we needed about the variable still would have gotten to the right place. Unfortunately, there’s not a way right now that we could make this example work for Rust—at least, not in its current form. That’s because different languages have different support for different kinds of tailoring. In our paper, we explore all the different dimensions for how languages can support tailorings. Here is how you might build a little tailoring. In this tailoring, we’d like to turn array lookups that are known to be in-bounds into versions that use a fast, unsafe lookup. In a dependently typed language, we would be able to do this by looking at index and the type of the vector, since the type of the vector would include its length. In Racket, we have no such power. However, we can get a little bit of that power through a tailoring. This section is taken almost verbatim from §3.8 from our paper. This code actually runs! Here’s an example of what we would like to tailor: if we know the length of a vector and the index, then we can either tailor to , which skips the bounds check, or tailor to an error if the index is out-of-bounds. Otherwise, if we don’t know either the length of the vector or the index, we stick with the safe function which does run the bounds check: Here’s how we do it: The bit says that, when this module gets imported, export the function under the name . This means, whenever this module gets imported, all calls to automatically use . Now we import some helpers: gives us the function, which is what we tailor to. This function can segfault if misused, so we have to be careful. We pull in to get the excellent macro facility. We also pull in four symbols from , which deserve a mention: The file is small and really just provides these convenience functions; you can find it in our artifact . Now comes the meat of the tailoring: the tailoring is a macro so that it can statically rewrite source code. First it parses its input syntax object to extract and expand two subexpressions and : Now the tailoring checks whether these expanded expressions have the static information needed; specifically, it looks for a vector length (key: ) and an integer value (key: ): If the information is present, we expand to if safe to do so; otherwise we expand to code that raises an error. If the information is not present, fall back to plain-old : That’s it! A Racket program can import this tailoring to make this code run faster: Save that program to a file and run to see the expanded version; you should see in the expanded code. Something would notice that the regex has a single capture group. The function would get this information and update the its type. This information would further by leveraged by the type of , to indicate that or will always succeed. Rust’s SQLx library reaches out to the database at compile-time to check if the schema in the code matches how the database is set up. This will warn you at compile-time if your query is malformed. Julia’s StaticArrays package rewrites lists of a static, known size into tuples. This lets the compiler track how long the lists are and automatically eliminates lots of bounds checks—handy when you’re doing lots of numerical work. Elixir’s Phoenix web framework will check routes in your template files against your route handler; if you make a typo or forget to implement a handler for a route, Phoenix will warn you at compile-time. This feature is called verified routes . We introduce the term type tailoring . The ideas have appeared in many forms across many languages, but there hasn’t been any underlying rationale unifying their efforts. Now that we’ve identified the phenomenon, we can talk about and support it directly and consciously. We identified the main things you need to make tailoring work. Language designers can use this to build in better support for type tailoring in their languages. We show users how tailorings can balance ease-of-use with features typically only found in dependent type systems. First, it overrides the implicit form that wraps literal values like the string ; this lets it read the string and collect any interesting information about it at compile time. The library sees that the string has one set of matched parentheses; moreover, the pattern inside the parentheses consists entirely of digits. It attaches this information as a syntax property to the syntax object for that string. This information gets propagated to all occurrences of the identifier . The library also overrides , so that it looks at the syntax properties on its first argument. In this case, it sees that is a string with one capture group. The library updates the return type of from to . In the true branch of the statement, Typed Racket is automatically able to refine the type of to . The library overrides to check the type of its argument; it sees that is long enough for this call to succeed, so it tailors this to a faster, unsafe lookup function. also gets overridden to look at the information about the match. Since step 2 was able to see that the match consists only of digits, it updates its type from returning to .