Posts in Scala (7 found)
Neil Madden 2 weeks ago

Monotonic Collections: a middle ground between immutable and fully mutable

This post covers several topics around collections (sets, lists, maps/dictionaries, queues, etc) that I’d like to see someone explore more fully. To my knowledge, there are many alternative collection libraries for Java and for many other languages, but I’m not aware of any that provide support for monotonic collections . What is a monotonic collection, I hear you ask? Well, I’m about to answer that. Jesus, give me a moment. It’s become popular, in the JVM ecosystem at least, for collections libraries to provide parallel class hierarchies for mutable and immutable collections: Set vs MutableSet, List vs MutableList, etc. I think this probably originated with Scala , and has been copied by Kotlin , and various alternative collection libraries, e.g. Eclipse Collections , Guava , etc. There are plenty of articles out there on the benefits and drawbacks of each type. But the gulf between fully immutable and fully mutable objects is enormous: they are polar opposites, with wildly different properties, performance profiles, and gotchas. I’m interested in exploring the space between these two extremes. (Actually, I’m interested in someone else exploring it, hence this post). One such point is the idea of monotonic collections, and I’ll now explain what that means. By monotonic I mean here logical monotonicity : the idea that any information that is entailed by some set of logical formulas is also entailed by any superset of those formulas. For a collection data structure, I would formulate that as follows: If any (non-negated) predicate is true of the collection at time t , then it is also true of the collection at any time t’ > t . For example, if c is a collection and c.contains(x) returns true at some point in time, then it must always return true from then onwards. To make this concrete, a MonotonicList (say) would have an append operation, but not insert , delete , or replace operations. More subtly, monotonic collections cannot have any aggregate operations: i.e., operations that report statistics/summary information on the collection as a whole. For example, you cannot have a size method, as the size will change as new items are added (and thus the predicate can become false). You can have (as I understand it) map and filter operations, but not a reduce / fold . So why are monotonic collections an important category to look at? Firstly, monotonic collections can have some of the same benefits as immutable data structures, such as simplified concurrency. Secondly, monotonic collections are interesting because they can be (relatively) easily made distributed, per the CALM principle: Consistency as Logical Monotonicity (insecure link, sorry). This says that monotonic collections are strongly eventually consistent without any need for coordination protocols. Providing such collections would thus somewhat simplify making distributed systems. Interestingly, Kotlin decided to make their mutable collection classes sub-types of the immutable ones: MutableList is a sub-type of List, etc. (They also decided to make the arrows go the other way from normal in their inheritance diagram, crazy kids). This makes sense in one way: mutable structures offer more operations than immutable ones. But it seems backwards from my point of view: it says that all mutable collections are immutable, which is logically false. (But then they don’t include the word Immutable in the super types). It also means that consumers of a List can’t actually assume it is immutable: it may change underneath them. Guava seems to make the opposite decision: ImmutableList extends the built-in (mutable) List type, probably for convenience. Both options seem to have drawbacks. I think the way to resolve this is to entirely separate the read-only view of a collection from the means to update it. On the view-side, we would have a class hierarchy consisting of ImmutableList, which inherits from MonotonicList, which inherits from the general List. On the mutation side, we’d have a ListAppender and ListUpdater classes, where the latter extends the former. Creating a mutable or monotonic list would return a pair of the read-only list view, and the mutator object, something like the following (pseudocode): The type hierarchies would look something like the following: This seems to satisfy allowing the natural sub-type relationships between types on both sides of the divide. It’s a sort of CQRS at the level of data structures, but it seems to solve the issue that the inheritance direction for read-only consumers is the inverse of the natural hierarchy for mutating producers. (This has a relationship to covariant/contravariant subtypes, but I’m buggered if I’m looking that stuff up again on my free time). Anyway, these thoughts are obviously pretty rough, but maybe some inklings of ideas if anyone is looking for an interesting project to work on.

0 views

Improving my Distributed System with Scala 3: Consistency Guarantees & Background Tasks (Part 2)

Improving Bridge Four, a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads by providing consistency guarantees and improving overall code quality (or something like that).

0 views
emiruz 1 years ago

Advent of Code in Prolog, Haskell, Python and Scala

Here are some Advent of Code solutions: 2023 (Prolog) 2022 (Haskell) 2021 (Python & Scala) (in progress at the time of writing). Here are some comparative notes: My Haskell solutions were mostly < 27 LoC. The Prolog solutions where considerably longer. The Prolog solutions were, on average, much harder to code for me. My Prolog solutions ended up looking rather functional for the most part.

0 views

Tiny Telematics: Tracking my truck's location offline with a Raspberry Pi, redis, Kafka, and Flink (Part 2)

Tracking vehicle location offline with a Raspberry Pi, Part 2: Apache Flink, scala, Kafka, and road-testing.

0 views

Functional Programming concepts I actually like: A bit of praise for Scala (for once)

Types, type classes, implicits, tagless-final, effects, and other things: Not everything in the world of functional programming is bleak and overly academic. A view on FP & scala concepts someone who loves to complain actually likes.

0 views

Scala, Spark, Books, and Functional Programming: An Essay

Reviewing 'Essential Scala' and 'Functional Programming Simplified', while explaining why Spark has nothing to do with Scala, and asking why learning Functional Programming is such a pain. A (maybe) productive rant (or an opinionated essay).

0 views
sunshowers 4 years ago

Open and closed universes

Type systems are tools for modeling some aspect of reality. Some types need to represent one of several different choices. How should these different kinds of choices be modeled? If you’re writing an end-user application, none of this matters. You have full control over all aspects of your code, so use whatever is most convenient. However, if you’re writing a library that other developers will use, you may care about API stability and forward compatibility. A simple example is the type in Rust, or its analogs like in Haskell. is defined as : Part of the definition of is that it can only be either or . This is never going to change, so values form a closed universe. A simple enum is appropriate for such cases. The main advantage of this approach is that it minimizes the burden on consumers: they know that an can only be or , and only have to care about such cases. The main disadvantage is that any sort of new addition would constitute API breakage. You’re expanding a universe you formerly assumed to be closed, and everyone who depends on you needs to be made aware of this. As a library author, you may run into two kinds of open universes: If you’re trying to model semi-open universes, consider using a non-exhaustive enum. For example, you may have an error type in a library that bubbles up other errors. One way to write this is: The problem with this approach is that errors typically do not form a closed universe. A future version may depend on some other library, for example , and you’d have to add a new option to the type: Anyone matching on the type would have to handle this new, additional case… unless they’ve specified a wildcard pattern . Some programming languages allow library developers to force consumers to specify a wildcard pattern. In Rust, you can annotate a type with : Anyone matching against the error type would have to add an additional wildcard pattern: This ensures that the library developer can continue to add more options over time without causing breakages. The main downside is that all consumers need to handle the additional wildcard pattern and do something sensible with it 1 . The non-exhaustive enum approach is best suited for semi-open universes. What about situations where consumers should have the ability to add new choices? The best way is often to use a string, or a newtype wrapper around it. For example, one may wish to represent a choice between computer architectures such as , , and . More architectures may be added in the future, too, but often the library just needs to pass the architecture down to some other tool without doing much processing on it. One way to represent this is 2 : Well-known strings can be represented as constants: This allows known choices to not be stringly-typed , and currently-unknown ones to be used without having to change the library. The format doesn’t have to be restricted to strings. For example, an may wish to also carry its bitness . In general, this works best for choices that are: One pattern I’ve seen is to use another variant that represents an unknown value. For example: The problem with this approach is that it becomes very hard to define equality and comparisons on these types. Are and the same, or are they different? In practice, this becomes tricky very quickly. In general, this approach is best avoided. The final, most general approach is to use traits, typeclasses or other interface mechanisms. For example, architectures can be represented as a trait in Rust: Then, other library methods could use: And custom architectures can be defined as: This also works for output types such as returned errors. You can try and downcast the error into a more specific type: but this turns a formerly compile-time check into a runtime one. A common bug that happens with this approach is version skew: for example, let’s say the library had originally been using . In a minor release, it upgrades to and return errors from the new version, while the consumer continues to downcast to the old version. Such cases will fail silently 3 . Traits are similar to the many-strings approach because they both model fully open universes. However, there are some tradeoffs between the two. One benefit of traits is that they offer greater implementation flexibility: trait methods allow for custom behavior on user-defined types. A downside is that traits make each choice in an open universe a type , not a value , so operations like equality become harder to define. It isn’t impossible, though: the trait could require a method which returns a many-strings type. The contents of can then be used for equality checks. Another downside is ergonomics: traits and interfaces are more awkward than values. A sufficiently complex trait may even require some sort of code generation to use effectively. Rust has declarative and procedural macros to help, which work okay but add even more complexity on top. Whether strings are sufficient or full-blown traits are required is a case-by-case decision. In general, the many-strings approach is much simpler and is powerful enough for many kinds of choices; traits provide greater power and are always worth keeping in mind, though. A trait can also model a semi-open universe if you seal it : The library itself may add new implementations of the trait, but downstream consumers cannot because they do not have access to 4 . Type systems are often used to represent one of several choices. The choices can either be fixed (closed universe), or might continue to change over time (open universe). These are different situations that usually need to be handled in different ways. Hope this helps! The idea of writing this post arose from several conversations with members of the Rust community. Thanks to Munin , Jane Lusby , Manish Goregaokar , Michael Gattozzi and Izzy Muerte for reviewing drafts of this post. Updated 2021-08-04: edited some text, and corrected the footnote about to indicate that it is required. Updated 2021-08-13: added a note about more complex versions of the many-strings pattern, expanded on the difference between it and traits, and expanded on what it would mean to unify sealed traits and non-exhaustive enums. Thanks to Manish for the inspiration. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match.  ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory.  ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade.  ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Sometimes, all the choices may be known in advance and will likely never change—this is often called a closed universe of values. Other times, the set of options will change over time, and the type needs to represent an open universe . Open with respect to the library itself: new choices can be defined by library authors in the future, but not by downstream consumers. I’m going to call these semi-open universes . Open with respect to users of the library: new choices can be defined by both library authors and consumers. These might be described as fully open universes . used as input types to a library that doesn’t do much processing on them, and the data associated with each choice doesn’t vary by choice. I wish that Rust had a way to lint, but not fail to compile, on missing patterns in a non-exhaustive match.  ↩︎ This implementation uses the type, which with the lifetime stands for either a borrowed string that lasts the duration of the entire program—typically one embedded in the binary—or an owned created at runtime. If this were just a , then the implementation below wouldn’t be possible because s can’t allocate memory.  ↩︎ With static types, this sort of version skew error is usually caught at compile time. With dynamic casting it can only be caught at runtime. You might argue that if the error type comes from a public dependency, an incompatible upgrade to the dependency constitutes a semantic version breakage. However, this is a bit of a boundary case; the library authors might disagree with you and say that their stability guarantees don’t extend to downcasting. Semantic versioning is primarily a low-bandwidth way for library developers and consumers to communicate about relative burdens in the event of a library upgrade.  ↩︎ Given that both sealed traits and non-exhaustive enums model the same kinds of semi-open universes in fairly similar fashions, a direction languages can evolve in is to unify the two. Scala has already done something similar . Rust could gain several features that bring non-exhaustive enums and sealed traits incrementally closer to each other: Allow statements to match sealed traits if a wildcard match is specified. Make enum variants their own types rather than just data constructors . Automatically convert the return values of functions like into enums if necessary. (The trait doesn’t need to be sealed in this case, because the enum would be anonymous and matching on its variants wouldn’t be possible.) Inherent methods on an enum would need to be “lifted up” to become trait methods. As of Rust 1.54, there are several kinds of methods that can be specified on enums but not on traits. Features like existential types help narrow that gap.

0 views