Open and closed universes
Type systems are tools for modeling some aspect of reality. Some types need to represent one of several different choices. How should these different kinds of choices be modeled? If you’re writing an end-user application, none of this matters. You have full control over all aspects of your code, so use whatever is most convenient. However, if you’re writing a library that other developers will use, you may care about API stability and forward compatibility. A simple example is the type in Rust, or its analogs like in Haskell. is defined as : Part of the definition of is that it can only be either or . This is never going to change, so values form a closed universe. A simple enum is appropriate for such cases. The main advantage of this approach is that it minimizes the burden on consumers: they know that an can only be or , and only have to care about such cases. The main disadvantage is that any sort of new addition would constitute API breakage. You’re expanding a universe you formerly assumed to be closed, and everyone who depends on you needs to be made aware of this. As a library author, you may run into two kinds of open universes: If you’re trying to model semi-open universes, consider using a non-exhaustive enum. For example, you may have an error type in a library that bubbles up other errors. One way to write this is: The problem with this approach is that errors typically do not form a closed universe. A future version may depend on some other library, for example , and you’d have to add a new option to the type: Anyone matching on the type would have to handle this new, additional case… unless they’ve specified a wildcard pattern . Some programming languages allow library developers to force consumers to specify a wildcard pattern. In Rust, you can annotate a type with : Anyone matching against the error type would have to add an additional wildcard pattern: This ensures that the library developer can continue to add more options over time without causing breakages. The main downside is that all consumers need to handle the additional wildcard pattern and do something sensible with it 1 . The non-exhaustive enum approach is best suited for semi-open universes. What about situations where consumers should have the ability to add new choices? The best way is often to use a string, or a newtype wrapper around it. For example, one may wish to represent a choice between computer architectures such as , , and . More architectures may be added in the future, too, but often the library just needs to pass the architecture down to some other tool without doing much processing on it. One way to represent this is 2 : Well-known strings can be represented as constants: This allows known choices to not be stringly-typed , and currently-unknown ones to be used without having to change the library. The format doesn’t have to be restricted to strings. For example, an may wish to also carry its bitness . In general, this works best for choices that are: One pattern I’ve seen is to use another variant that represents an unknown value. For example: The problem with this approach is that it becomes very hard to define equality and comparisons on these types. Are and the same, or are they different? In practice, this becomes tricky very quickly. In general, this approach is best avoided. The final, most general approach is to use traits, typeclasses or other interface mechanisms. For example, architectures can be represented as a trait in Rust: Then, other library methods could use: And custom architectures can be defined as: This also works for output types such as returned errors. You can try and downcast the error into a more specific type: but this turns a formerly compile-time check into a runtime one. A common bug that happens with this approach is version skew: for example, let’s say the library had originally been using . In a minor release, it upgrades to and return errors from the new version, while the consumer continues to downcast to the old version. Such cases will fail silently 3 . Traits are similar to the many-strings approach because they both model fully open universes. However, there are some tradeoffs between the two. One benefit of traits is that they offer greater implementation flexibility: trait methods allow for custom behavior on user-defined types. A downside is that traits make each choice in an open universe a type , not a value , so operations like equality become harder to define. It isn’t impossible, though: the trait could require a method which returns a many-strings type. The contents of can then be used for equality checks. Another downside is ergonomics: traits and interfaces are more awkward than values. A sufficiently complex trait may even require some sort of code generation to use effectively. Rust has declarative and procedural macros to help, which work okay but add even more complexity on top. Whether strings are sufficient or full-blown traits are required is a case-by-case decision. In general, the many-strings approach is much simpler and is powerful enough for many kinds of choices; traits provide greater power and are always worth keeping in mind, though. A trait can also model a semi-open universe if you seal it : The library itself may add new implementations of the trait, but downstream consumers cannot because they do not have access to 4 . Type systems are often used to represent one of several choices. The choices can either be fixed (closed universe), or might continue to change over time (open universe). These are different situations that usually need to be handled in different ways. Hope this helps! The idea of writing this post arose from several conversations with members of the Rust community. Thanks to Munin , Jane Lusby , Manish Goregaokar , Michael Gattozzi and Izzy Muerte for reviewing drafts of this post. Updated 2021-08-04: edited some text, and corrected the footnote about to indicate that it is required. Updated 2021-08-13: added a note about more complex versions of the many-strings pattern, expanded on the difference between it and traits, and expanded on what it would mean to unify sealed traits and non-exhaustive enums. Thanks to Manish for the inspiration. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match. ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory. ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade. ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Sometimes, all the choices may be known in advance and will likely never change—this is often called a closed universe of values. Other times, the set of options will change over time, and the type needs to represent an open universe . Open with respect to the library itself: new choices can be defined by library authors in the future, but not by downstream consumers. I’m going to call these semi-open universes . Open with respect to users of the library: new choices can be defined by both library authors and consumers. These might be described as fully open universes . used as input types to a library that doesn’t do much processing on them, and the data associated with each choice doesn’t vary by choice. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match. ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory. ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade. ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Allow statements to match sealed traits if a wildcard match is specified. Make enum variants their own types rather than just data constructors . Automatically convert the return values of functions like into enums if necessary. (The trait doesn’t need to be sealed in this case, because the enum would be anonymous and matching on its variants wouldn’t be possible.) Inherent methods on an enum would need to be “lifted up” to become trait methods. As of Rust 1.54, there are several kinds of methods that can be specified on enums but not on traits. Features like existential types help narrow that gap.