Latest Posts (19 found)
Hillel Wayne 2 weeks ago

A Very Early History of Algebraic Data Types

Been quiet around here! I’ve been putting almost all of my writing time into Logic for Programmers and my whole brain is book-shaped. Trust me, you do not want to read my 2000-word rant on Sphinx post-build LaTeX customization. But I spent the past week in a historical rabbit hole and had to share what I found. It started with Algebraic [Data] Types are not Scary, Actually . The post covers AlgDTs 1 in more detail, but a quick overview is: Product types are things like or ). They have a value for each field. Almost all modern languages have product types, usually calling them something like “structs” or “records” or “value object” or whatever. Sum types are “tagged unions”, things like , , or . They are a choice between several options. The “tag” means that we do not eliminate overlapping choices. has four possible values, colloquially . Sum types are relatively rare in modern programming languages, outside functional programming and some places like Rust. Where do these get their names? The usual explanation is it comes from their cardinalities (number of possible values): if is the cardinality of type T, then and . Try this with and if you feel unsure. But it could instead be that the names come from the fact that sums and products of types act like sums and products of numbers! I think this is pretty known but it doesn’t appear very often in introduction to AlgDTs. For example: Addition over numbers is associative, meaning . This tells us that is isomorphic (≅) to : we can convert values between the two types without losing any useful information. A more interesting example is that AlgDTs follow the distributive property . If we have a value of type , we can transform it into a value of type . That’s pretty neat! So this got me curious: which property of AlgDTs gave the name to “sum” and “product” type, and which properties were discovered later? As I continued to researched this the blog sort of blew up into an early history of algebraic data types. Normally finding the origins of terms is a nightmare, but this is general academia nonsense so everything relevant is in a paper somewhere. There’s two research techniques I like to use: Usually I find a combination of the two works best. This eventually got me to John McCarthy’s (the inventor of LISP) “A Basis for a Mathematical Theory of Computation” , published in 1961. As far as I can tell, this is the first treatment of AlgDTs in any form. So first off, the paper doesn’t use “type” (in the CS sense) or “algebra” at all, or even really cover computer science. But it does cover “A way of defining new data spaces in terms of given base spaces”, and one of these new data spaces is “Π of truth values whose only elements are T (for truth) and F (for falsity)”, aka ‘booleans’. But he was certainly familiar with types, being on the ALGOL-60 committee , so I’m guessing he deliberately avoids the word “type” here. The whole paper is about a simple mathematical model of program functions and their properties. In section 2.6 he defines two ways of generating new data spaces from existing ones: Unsurprisingly, the product type is named after the Cartesian product . The record is a tagged version of the Cartesian product . Here we also see that, in our lingo, . As far as I can tell, “direct union” was coined by McCarthy, but it corresponds exactly with the disjoint union . McCarthy later refers to these two as the “direct sum and Cartesian product”. He does not name them “sum and product types” but I think it’s believable that they are named after the corresponding disjoint union and Cartesian product set operations. I’m not going to dig into the history of those names but I’d guess they were named in analogy to arithmetic sum and product. Shortly after defining the direct sum and Cartesian product 3 McCarthy defines isomorphisms between data spaces: We will not regard the sets as the same, but there is a canonical 1-1 mapping between them … We shall write to express the fact that these sets are canonically isomorphic. He goes on to show a number of isomorphisms, including the distributive that I found so neat. All of these properties of “algebraic data types” were known from the start, even if they weren’t called that yet. While researching this I found a curious article called “Some Observations Concerning Large Programming Efforts” (1964). The author presented the Agile manifesto almost 50 years before the Agile manifesto! Nobody ever cited it and the author published nothing else. What I’m saying is that it doesn’t matter if you’re the first to have an idea if nobody else hears you have it. So, let’s continue tracing development of algebraic types. In 1964 McCarthy published Definition of new data types in ALGOL x . 4 In this he proposes adding two new types, “cartesian” and “union”, to the next version of ALGOL. Notably this is the first place we see “tagged” algebraic types, where both sums and products have string tags. ALGOL-68 would later implement both of them but also be a curious dead end in language history; it would have little impact on modern programming languages. Here’s where the history gets messy. In looking for next steps, I found two influential papers that cite McCarthy: Tony Hoare’s Notes on Data Structuring (1970) and Rod Burstall’s Proving properties of programs by structural induction (1968). These correspond to a split between imperative and functional languages, so I’ll treat them in separate sections. Hoare seems to have independently come up with the idea of sum and product types. In “A contribution to the development of ALGOL” (1966) he proposes records tagged with “classes” and a keyword to join disjoint classes into a single supertype. He does not give any algebraic properties of records or unions, though. By the time of “Notes on Data Structuring” his terminology shifts to match McCarthy’s: The types in which we are interested are those already familiar to mathematicians; namely, Cartesian Products, Discriminated Unions, Sets, Functions, Sequences, and Recursive Structures. Both his “cartesian products” and “discriminated unions” are tagged, matching how we use algebraic types today. As he is mostly concerned with data storage, he does not list any algebraic properties but does note their cardinalities as part of a discussion on how to most-efficiently store these types. This paper may also be the first time sum types are discussed as a means of detecting errors at compile-time: If the programmer attempts to convert a discriminated union value back to a type from which it did not originate, this is a serious programming error, which could lead to meaningless results. This error can be detected only by a runtime check, which tests the tag field whenever such a conversion is explicitly or implicitly invoked. Such a check is time consuming and when it fails, highly inconvenient. We therefore seek a notational technique which will guarantee that this error can never occur in a running program; and the guarantee is given by merely inspecting the text, without any knowledge of the runtime values being processed. Such a guarantee could be given by an automatic compiler, if available. This is also the first time I can find anybody calling sum types “discriminated union”. 5 Barbara Liskov then uses “discriminated union” to describe CLU’s sum type and Niklaus Wirth uses “discriminated union” why Pascal doesn’t have sum types . For this reason, I suspect the imperative world first learned about sum types from Hoare’s lecture, and Hoare was influenced (but not totally inspired) by McCarthy. 6 After this point sum types disappear from imperative programming. Late 20th century imperative languages were dominated by the influence of Pascal and C. Pascal did not have sum types because Wirth thought they were less flexible than untagged unions. C possibly does not have sum types because they adopted the type system of IBM’s PL/I. PL/I, first released in 1964, had product types (which they called ) but only untagged unions. Recent imperative languages instead got sum types from the functional programming world, which brings us back to “Proving properties of programs by structural induction” . Burstall is incredibly frustrating to research. His obituary lists an enormous number of papers and influential students, so he clearly had a huge role in the early development of functional programming. But very few of those papers are online, so there’s huge gaps in what I’m able to easily research. This is my best attempt to figure out his part of the history. The paper is about proving properties of recursive types, like lists and trees. This is how he defines a list: I suggest that each construction of operation introduce a new type and that disjunctions of these types should also be types, (roughly following Landin 1964) a cons has an atom and a list, a nil has no components, a list is a cons or a nil . Burstall says he gets his notation from P. J. Landin’s “The Mechanical Evaluation of Expressions” (1964). Landin’s use doesn’t seem to be quite sum types— it’s not clear he has a mechanism to discriminate between types if they had overlapping values.Rather, it seems to me that Burstall is “following” Landin’s means of writing a recursive definition. It’s hard to tell from a few readings, though, and I could easily be wrong. The ISWIM near miss Burstall also cites Landin’s “The Next 700 Programming Languages” (1966), which proposes the influential ISWIM language. According to “Some History of Functional Programming Languages” (2012), This was the first language to have algebraic data types: The ISWIM paper also has the first appearance of algebraic type definitions used to define structures. This is done in words, but the sum-of-products idea is clearly there. However, I don’t think this is correct. For one, ISWIM was a proposed language, not an implemented one. If we count languages proposals then McCarthy beat it by two years and Hoare/Wirth by a couple of months. But also, I don’t think that ISWIM defined sum types at all! This is a weaker belief but I have it for three reasons: None of these are super strong reasons and I’m open to being wrong. Burstall does have sum types, though, which we see in his definition of a tree. A tree is “a node or a tip or a niltree”, but a tip “ has an item”, not is an item. And later, he has this pseudocode: Burstall also says his ideas “are based on the work of John McCarthy”, citing his mathematical theory paper. It’s not impossible that Burstall was inspired by McCarthy’s notion of the direct union. As of 2025 this paper has been cited over 500 times . Burstall would later go on to make an even bigger contribution but that’s in 1980 so we have to cover ML first. So far we’ve seen the definitions of algebraic data types and common use of “Cartesian product”, but not the terms “sum type” and “product type”. As far as I can tell, that comes from Robin Milner’s ML , as publicized in “A Theory of Type Polymorphism in Programming” (1977): The fully determined types (i.e., the monotypes) of ML are built from a set of basic types (int, bool etc.) by the binary infixed operators x (Cartesian product), + (disjoint sum) and -> (function type), and the unary postfixed operator . A few things to note about this paper. First, of course, is it describes constructing types with the “disjoint sum” and uses the symbol. Second, it does not justify nor define “disjoint sum”, in contrast with Hoare and McCarthy. By this point it may have been common knowledge for Milner’s audience. Finally, the paper claims that sums and products had already been implemented in ML for at least two years. We don’t have a complete spec for the oldest versions of ML but we do have the released source code of the LCF prover that ML was designed for. This is in files.ml : There we have it: “sum” and “product” types! These aren’t tagged, though. Li-yao Xia’s post Where does the name “algebraic data type” come from? notes that “early versions did not feature data types (see the first version of Edinburgh LCF )“, which I take to mean tagged algebraic types; ML had the keyword for naming constructed types. You could define to be but couldn’t call the left “email”. “A Theory of Type Polymorphism” doesn’t cite the McCarthy paper, nor the Hoare or Burstall papers, so it’s not clear if Milner was influenced by them or developed AlgDTs independently. Some tentative evidence is the followup paper “A Metalanguage for Interactive Proof in LCF” (1978), where he thanks all three for their contributions: We are indebted to Dana Scott for providing a large part of the theoretical basis for our work: to John McCarthy for encouraging the forerunner of this project at Stanford: to Richard Weyhrauch who contributed greatly to that project: to Tony Hoare and Jerry Schwarz for help with the notion of abstract types in ML: to Avra Cohn for help in the final design stages through her experiments: to Rod Burstall and many colleagues in Edinburgh for illuminating discussions. Okay, just three more languages to go. Around this time Burstall was applying category theory to specify software , eventually leading him to develop HOPE. “HOPE: An experimental applicative language” (1980) marks the first introduction of tagged unions to functional programming, but not tagged products: To define a type ‘tree-of-numbers’ we could say (the sign gives the Cartesian product of types.) On top of this, the paper introduces a couple of important ideas. The first is that the compiler can exhaustively check that all disjoint choices in a sum type are handled: Each data type will have a set of disjoint subtypes, each with a different constructor function; e.g., lists are made with cons or nil It should be easy to do case analysis with respect to these constructors and easy for a compiler to check whether the analysis is exhaustive. This avoids mistakes like forgetting to include a test for the empty list. On top of this, the paper proposes that the use of sum types can be dramatically simplified via pattern matching. This is not the first appearance of structural pattern matching in a language (Prolog had it first), but it’s the first appearance of pattern matching in functional programming. Around the same time as HOPE, Burstall’s grandstudent Luca Cardelli 7 was working on porting ML to VAX and Unix, in the process adding record and variant types . As far as I can tell, this is the first time tagged sums and tagged products were present in a functional programming language. All of these ideas would later be incorporated into Standard ML , codifying what people consider core features of algebraic types today: tags, compiler-based exhaustiveness checking, and pattern matching. Finally, to reference Xia again , “Miranda: A non-strict functional language with polymorphic types” (1985) is the first paper to actually use the term “Algebraic Data Type”. I highly recommend reading Xia’s post for more info. This is the best I could do with the time and resources I had available. Let me know if you spot any mistakes! If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . Product types are things like or ). They have a value for each field. Almost all modern languages have product types, usually calling them something like “structs” or “records” or “value object” or whatever. Sum types are “tagged unions”, things like , , or . They are a choice between several options. The “tag” means that we do not eliminate overlapping choices. has four possible values, colloquially . Sum types are relatively rare in modern programming languages, outside functional programming and some places like Rust. Read a paper that definitely uses the terms, see what it cites, read those papers, repeat Go to the ACM Digital Library , search a term, and read every paper starting from the oldest. 2 The Cartesian product A × B of two sets A and B is the set of all ordered pairs (a · b) with a ∈ A and b ∈ B. If A and B are finite sets and n(A) and n(B) denote the numbers of members of A and B respectively then n(A × B) = n(A) · n(B). The direct union A ⊕ B of the sets A and B is the union of two non-intersecting sets one of which is in 1-1 correspondence with A and the other with B. If A and B are finite, then n(A ⊕ B) = n(A) + n(B) even if A and B intersect [emphasis mine]. The elements of A ⊕ B may be written as elements of A or B subscripted with the set from which they come, i.e. a_A or b_B I read the paper six times and cannot figure out where he’s using “sum-of-products”. It could be that SHFP is referring to Landin’s definition of an , but I think that section is about the structure of an ISWIM program, not the data structures inside of it. I read the specs of two ISWIM implementations, PAL and POP-2, and neither has sums-of-products. SHFP later says that Burstall’s paper “extended ISWIM with algebraic type definitions — still defined in words”, which implies ISWIM didn’t have them at the start. McCarthy first proposes the concepts and properties of algebraic data types in his paper, though he doesn’t call it that. This has at least some influence on Tony Hoare and Rod Burstall, who write influential papers of their own. AlgDTs are implemented in ALGOL-68, CLU, and ML. It’s unclear if Milner knew of McCarthy’s theories, but I’d venture it’s more likely than not. Pascal and C prevent sum types from getting a foothold in imperative programming. Burstall and Luca Cardelli fill out the main AlgDT features, establishing it as a cornerstone in typed FP. Algebraic Data Types are normally abbreviated ADT, but so are Abstract data types, so for clarity I abbreviate them as “AlgDT” and “AbDT”. [return] Kagi makes this a lot easier: I have a custom bang for searching the DL by “earliest first”. [return] For fans of early programming language theory, there’s a bunch of other fun type related stuff here. For example, he presents the famous “power series” definition of the sequence type by first writing and then “solving formally for S, getting S = 1/(1 − A) which we expand in geometric series to get S = 1 ⊕ A ⊕ A² ⊕ …” [return] Tony Finch sent this my way, thank you! [return] This might also be the first appearance of the “pack of cards” example; Hoare writes it as . Yes, is sum and is product . [return] On the other hand, Wirth coauthored “In Contribution to the Development of ALGOL”, so he probably had a sense of their value, just maybe not a formal model. [return] Cardelli’s doctoral advisor was Gordon Plotkin, and his doctoral advisor was Burstall. Small world! [return]

0 views
Hillel Wayne 6 months ago

Gamer Games for Non-Gamers

It’s April Cools ! It’s like April Fools, except instead of cringe comedy you make genuine content that’s different from what you usually do. For example, last year I talked about the 3400-year history of the name “Daniel” . This year I wrote about one of my hobbies in hopes it would take less time. It didn’t. The video game industry is the biggest entertainment industry in the world. In 2024, it produced almost half a trillion dollars in revenue, compared to the film industry’s “mere” 90 billion . For all the money in it, it feels like gaming is a very niche pastime. It would surprise me if a friend has never watched a movie or listened to music, but it’s pretty normal for a friend to have never played a video game. The problem is that games are highly inaccessible. To most people, games are “big budget games”, which are targeted at people who already have played lots of games. They assume the player has a basic “vocabulary” of how games work: conventional keyboard layouts, how to parse information on a screen, etc. If you don’t already know these, it’s a huge barrier and highly demotivating. This sucks! I love games. My earliest memory is playing Crystal Caves on my dad’s computer. I’d call myself a gamer, in the same way I’d call some of my friends cinephiles. I want to make gaming more accessible to my non-gamer friends. That means finding good entry points. Here are my criteria for a good introductory game: So here are four games that I know that fit these criteria. I also included a tiny bit about each game’s historical context because I’m a history nerd. Caveats: I’m a PC gamer, and my favorite genre is puzzle games. This ends up working out here because lots of popular genres need motor skills or investment before they get fun. This is also based on just my own experience. Other enthusiasts might know other good introductory games. All of the games listed below are available on mobile and PC. Mobile is more convenient, PC is generally a better experience. Bigger screen, you can sync saves to the cloud, you get to use a keyboard, etc. By far the most common way to get PC games is via Steam , an app for purchasing, managing, and automatically updating computer games. You need to make an account first, though. For three of the four games I also included a link to purchase it directly from the developer. Genre : Puzzle (Sokoban) Description : Move the baba to the flag. That’s it, that’s the whole game. So what do you do when the flag is completely blocked by walls? You change the rules, of course! Walls only stop you because there is a line of three tiles that spell out . If you break the sentence then you can move through walls. Every other rule, down to the win condition and what you control, works the same way. If you make , then you control the flag. As the game progresses, it introduces new rules, objects, and interactions. You use conveyor belts as keys to open doors, melt lava with flowers, and turn bees into teleporters. Why it’s fun : It’s mentally challenging and makes you feel smart. A good puzzle game is designed around the “insights”, where you realize that something you never considered is possible within the game rules. Baba is You is a very good puzzle game. One clever thing that the game does is Extra Levels. After you beat certain levels, it opens up an optional level with a very slight difference to the puzzle layout… a slight difference that leads to a wildly different solution. Why it’s good : Well first of all, like all the other games here, it’s good because I’ve shown it to a non-gamer and watched them fall in love. Any rambling about the nature of games is secondary to that reason. On top of that, it also shows just how much creativity is possible in a game. The rules of Baba is You are about how you change the rules. It’s “thinking outside the box”, the game. I just find that people are very surprised to realize that this is a real thing a game can make happen, and that games can get even stranger than this. More on the genre : Baba is You is a “Sokoban”-style game. This category is named after… Sokoban , a puzzle game from 1982 where you push blocks around on a grid. Early Sokoban-style games were all about long chains of exacting moves. Modern Sokobans instead prioritize short solutions based on an insight: levels should feel impossible until they suddenly become trivial. One of the best examples of this philosophy is Stephen’s Sausage Roll , which profoundly influenced modern puzzle games. It is also ludicrously difficult and impossible to recommend to puzzling beginners. 1 But if you enjoy other Sokobans, this one’s sublime. I bought this game because I thought it had to be a joke. Turns out, it was a joke. The punchline is that I am stupid. — SSR Review If you liked Baba is You , the next game I’d recommend is Patrick’s Parabox . It starts with regular block-pushing and rapidly descends into recursive infinities. Genre: Farm sim (cozy game) Description: You just inherited your grandfather’s farm in a small rural town. Raise crops, keep animals, build relationships with the townsfolk, explore the world. Gameplay is broken up into days. You might start a day by watering and harvesting crops, move to chopping wood and setting crab pots, then close out by exploring a cave for gems. You might spend the next day just fishing at the beach. Seasons change, holidays come and go, and townsfolk keep their own schedules. A large part of the game is tied up with the in-game characters: most of them have personal stories that can only be gradually learned by befriending them. There are a few “milestones” in the game but you are not required to complete them and can play the game in the way that is most fun for you. Why it’s fun: You get to reap the rewards of investing time into something. The game does a great job of giving you a list of short-term and long-term goals, so it always feels like you are making progress without falling behind on anything. It’s also fun to explore the open world and discover new things. Why it’s good: It shows that games don’t have to be violent or intense to be good. Exceptional games can be slow-paced, relaxing, and ground themselves in everyday concerns. It’s also the most intricate game on this list, but introduces the complexity in an easy-to-understand way. Lots of games I play that others say are “too complicated” are significantly less complex than Stardew, but they do a bad job of making the complexity accessible. More on the genre: Stardew Valley is the reason that farming simulations are a genre. There were a handful of farming sims around before but Stardew made them a thing . 2 It’s like comparing a telegraph machine to an iphone. Stardew is also a cozy game . Cozy games have been around for a while, but the term itself is fairly new and there’s no consensus on what exactly it means yet. My very rough criteria is 1) there’s no consequences of failure, and 2) intensity and frustration aren’t part of the game, even in a positive way. Action games are intense when you’re in the thick of a fight, puzzle games are frustrating when you hit an unsolvable puzzle, Stardew is neither. 3 I think the two most famous cozy games are Animal Crossing and The Sims . (I couldn’t tell you what other farming sims to play. At least one game reviewer I respect seems to think none yet measure up to Stardew.) Genre: Puzzle (deduction) The Case of the Golden Idol. (click to enlarge) Description: You are presented with the aftermath of a death and have to figure out exactly what happened. To do this, you can zoom in on details in the scene, like inspect people’s pockets, read letters, stuff like that. This gives you keywords, which you put into a giant fill-in-the-blanks prompt to complete the level. There are no penalties for guessing wrong. Early levels are simple, but things get convoluted very quickly. Sound confusing? Well guess what, you can try a demo right now in your own browser . See for yourself what it’s like! Why it’s fun: The game makes you feel smart. Like Baba, every case here relies in triggering that moment of insight. Except instead of the insight being “making both this rule and this rule might make this move possible”, you have insights like “the knife is *copper*… this is insurance fraud made to look like a murder-suicide.” 4 It’s even better when you’re in a group and you race to blurt out the insight first. Levels can have multiple locations or even the same location at different times of day. (click to enlarge) Why this game: For one, it showcases that games can tell good stories. And not just good stories, but good stories that are good because they are told through the medium of games . Mr Invincible works because it plays with the medium of comic books. Golden Idol works because of its interactivity. You learn the story by opening drawers and rifling through purses. It also shows how games can be an rewarding social experience. 5 I played through the whole game with two other friends across several sessions. First dinner, then a few hours of play, rotating who took notes each level. I can’t imagine playing it any other way, and recommend you play it collaboratively, too. More on the genre: “ Deduction games ” are a relatively new style of puzzle. In my experience, they’re the closest a game gets to making you feel like a detective, as opposed to merely detective-themed set dressing. Return of the Obra Dinn was the first game to really nail the concept. It uses a similar mechanic of “viewing the scene of death” but gives you different goals and emphasizes different kinds of clues. Some passing knowledge of 1800’s maritime cliches can be helpful. The newest critical darling in the genre is The Roottrees are Dead , where you use the power of the 90’s internet to trace the convoluted family free of a candy dynasty. Originally a free online game , a deluxe version is available on Steam . If it’s half as good as the free online version I can heartily recommend it. (Also Golden Idol has a sequel . But play the original game first!) Genre: Roguelike ( spirelike balatrolike) A game of Balatro is played in 24 rounds. In each round, you have to assemble a poker hand from a deck of playing cards. Playing a hand gives you points, with rarer hands (like straight flush) giving more points. In between rounds, you get a chance to buy “Jokers”, which change the gameplay in your favor. A joker might give you extra points for playing aces, or let you make flushes with only four cards, or let you remove one card from your deck per round. The jokers you can buy are random, and you will only see a subset in a given game. The strategy, then, is to figure out which combination of jokers to buy in order to beat the growing score requirements of the rounds. A single game (or “run”) takes about 40-60 minutes total. Why it’s fun: It’s deeply satisfying to pull off a combo that gives you a trillion trillion trillion points. It’s even more satisfying to win a run by the skin of your teeth, and then brag about it to all your Balatro friends. Why it’s good: Same reason. Also it swept every 2024 “game of the year” award and is videogame history in the making, so there’s that. More on the genre: Baba is You and Golden Idol are organized into handcrafted levels while Stardew Valley is a lovingly-designed open world. Balatro , by contrast, is a roguelike . Each run is a little bit random: you will get slightly different challenges, draw different cards, and get different jokers. It’s kind like Solitaire . Instead of learning how to beat specific levels, you learn how to better adapt to different circumstances. This randomness makes roguelikes popular for their replayability , where you can win a game several times and still find it fun to start over. There are many different subcategories of roguelikes, some which look more like traditional “gamer games” and some that are more exotic. Many modern roguelikes were profoundly influenced by the 2017 game Slay the Spire , so much so that “Spirelike” is often used as a term for the subcategory. Play a few and you quickly get a feel for what they all share in common. One of these Spirelikes is Luck be a Landlord , which asks “what if slots took actual strategy?” Balatro draws from both these games , but in an innovative way that makes it feel almost like a piece of outsider art. The legion of imitators it [Balatro] already spawned are testament to its rightful place in the PC gaming canon. — PC Gamer We know Balatro is going to inspire a new generation of roguelikes but we don’t entirely know what they’ll look like. And that’s really exciting to me! We are watching a tiny piece of gaming history happen right now. Thanks to Predrag Gruevski for feedback. If you enjoyed this, check out the April Cools website for all the other pieces this year! Does not require any special hardware to play. This automatically rules out any consoles or high-end PC games. It should either run on a low-end laptop or a mobile phone. Does not expect the player to already know the vocabulary or gaming culture, and does not require a lot of investment to get started. Is Inscryption a good game? Yeah. Does it make any goddamn sense if you’re not familiar with “haunted cartridge” creepypasta? Hell no. Is culturally meaningful as a game. At the very least, it’s critically acclaimed, and also shows something about what’s possible in the medium. Most of all, I have personally seen a non-gamer enjoy the game. My one bit of video game cred is that I beat SSR without any hints. Yes, including “The Spine”. It took me four days [return] “FarmVille had way more players than Stardew!!!” Can you ride a horse in FarmVille? No? Then it’s not a farming sim. [return] Except for Junimo cart. To hell with Junimo cart. [return] Not a real case. [return] You can also play Stardew Valley with multiple people (I think everybody shares one farm?) but I’ve never tried it and can’t comment about how good it is. [return]

0 views
Hillel Wayne 7 months ago

A Perplexing Javascript Parsing Puzzle

What does this print? Think it through, then try it in a browser console! Answer and explanation in the dropdown. Show answer It prints . At the beginning of a line (and only at the beginning of a line), starts a comment. The JavaScript is parsed as The browser then displays the value of the last expression, which of course is 1. It’s a legacy hack. Netscape Navigator 2 introduced both JavaScript and the tag. Older browsers in common use (like Navigator 1) had no idea that content was anything special and wrote it as regular text on the page. To ensure graceful degradation, webdevs would wrap their scripts in html comment blocks : Old browsers would parse the content as an HTML comment and ignore it, new browsers would parse the content as JavaScript and execute it. I’m not quite sure why and weren’t syntax errors; presumably there was special code in the js engines to handle them, but I can’t figure it out where . All modern browsers at least recognize . 1 But since some old websites still have the hack and the standardization committee will never, ever break the web, they added and as legal comment tokens to the 2015 standard . and both act like , i.e. starting line comments. is only valid at the start of a line (to avoid ambiguity with a postfix decrement followed by a greater than operator), while can occur anywhere in the line. — MDN Web Docs Web browsers are required to support this syntax , while other engines are not. Node and Electron both support it, though, as they share Chromium’s v8 engine. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . Text-only browsers like Lynx recognize the tag, they just choose to ignore the contents. [return]

0 views
Hillel Wayne 7 months ago

The Hierarchy of Controls (or how to stop devs from dropping prod)

The other day a mechanical engineer introduced me to the Hierarchy of Controls (HoC), an important concept in workplace safety . 1 To protect people from hazards, system designers should seek to use the most effective controls available. This means elimination over substitution, substitution over engineering controls, etc. Can we use the Hierarchy of Controls in software engineering? Software environments are different than physical environments, but maybe there are some ideas worth extracting. Let’s go through the exercise of applying HoC to an example problem: a production outage I caused as junior developer. About ten years ago I was trying to debug an issue in production. I had an SSHed production shell and a local developer shell side by side, tabbed into the wrong one, and ran the wrong query. That’s when I got a new lesson: how to restore a production database from backup. In process safety, the hazard is whatever makes an injury possible: a ladder makes falls possible, a hydraulic press makes crushed fingers possible. In my case, the hazard might have been a production shell with unrestricted privileges, which made losing production data possible. The specific injuries include both dropping the database directly (like I did) or running a delete query on the database. I’ll use both injuries to discuss different kinds of hazard controls. The HoC was designed to protect people from machines, not machines from people! I found most of the concepts translate pretty well, but I ran into some issues with PPE. There’s also lots of room to quibble over whether something is one category or the other. As part of writing this, I’m also trying to figure out the qualitative differences between the categories. This is my own interpretation as a total beginner. Finally, the HoC is concerned with prevention of injury, not recovery from injury. Software best practices like “make regular database backups” and “have a postmortem after incidents” don’t fit into the hierarchy. All quotes come from the OSHA HoC worksheet . Elimination makes sure the hazard no longer exists . Elimination is the most direct way to prevent an accident: don’t have the hazard. All the various OSHA materials use the same example: if you want to reduce falling injuries, stop making workers do work at heights. In our case, we could eliminate the production environment or we could eliminate the database. Neither is exactly practical, and most HoC resources are quick to point out that proper elimination often isn’t possible. We work with hazardous materials because they are essential to our process. “Elimination” seems most useful as a check of “is this dangerous thing really necessary ?” Essential hazards cannot be eliminated, but inessential hazards can. Before it was decommissioned in 2020, Adobe Flash was one of the biggest sources of security exploits. The easiest way to protect yourself? Uninstall Flash. Substitution means changing out a material or process to reduce the hazard. Unlike elimination, substitution keeps the hazard but reduces how dangerous it is. If the hazard is toxic pesticides, we can use a lower-toxicity pesticide. If it’s a noisy machine, we can replace it with a quieter machine. If the hazard is a memory unsafe language, we use Rust. For our problem I can see a couple of possible substitutions. We can substitute the production shell for a weaker shell. Consider if one “production” server could only see a read replica of the database. Delete queries would do nothing and even dropping the database wouldn’t lose data. Alternatively, we could use an immutable record system, like an event source model. Then “deleting data” takes the form of “adding deletion records to the database”. Accidental deletions are trivially reversible by adding more “undelete” records on top of them. Engineering controls reduce exposure by preventing hazards from coming into contact with workers. They still allow workers to do their jobs, though. Engineering controls maintain the same danger of the hazard but use additional physical designs to mitigate the risk and severity of accidents. We can do this in a lot of ways: we can reduce the need for workers to expose themselves to a hazard, make it less likely for a hazard to trigger an accident, or make accidents less likely to cause injury. There is a lot more room for creativity in engineering controls than in elimination or substitution. Some ideas for engineering controls: Some engineering controls are more effective than others. A famously “weak” control is a confirmation box: The problem with this is that if I run this a lot in my local environment, I’ll build up the muscle memory to press , which will ruin my day when I do the same thing in prod. A famously “strong” control is the “full name” confirmation box: 2 Even if I have muscle memory, it’d be muscle memory to type , which would block the action. You can see a real example of this if you try to delete a repo on Github: Some of OSHA’s examples of engineering controls look like substitutions to me, and vice versa. My heuristic for the distinction is that engineering controls can fail, substitutions cannot. What if permissions are misconfigured and they don’t actually prevent me from deleting the database? Whereas if the environment is only exposed to a read replica, I can’t magically destroy the write replica. I hope. This is an imperfect heuristic, though! Swapping C for Rust would be a substitution, even though some of Rust’s guarantees are bypassable with . Administrative controls change the way work is done or give workers more information by providing workers with relevant procedures, training, or warnings. Engineering controls change the technology, administrative controls change the people. Controls that change how people interact with the technology could be in either category; I got into a long discussion over whether “full name” confirmation boxes are an administrative or engineering control. You could reasonably argue it either way. Some ideas for administrative controls: OSHA further classifies “automated warnings” as a kind of administrative control. That would be something like “logging into production posts a message to Slack”. On the hierarchy, administrative controls are lower than engineering controls for a couple of reasons. One, engineering controls are embedded in the system, while administrative controls are social. Everybody needs to be trained to follow them. Second, in all of the HoC examples I’ve seen, it takes effort to break the engineering control, while it takes effort to follow the administrative control. You might not follow them if you are rushed, forgetful, inattentive, etc. Some administrative controls can be lifted into engineering controls. A company policy that junior engineers should never SSH into prod is an administrative control, and relies on everybody following the rules. A setup where juniors don’t have the appropriate SSH keys is an engineering control. Personal protective equipment (PPE) includes clothing and devices to protect workers. This is the lowest level of control: provide equipment to people to protect them from the hazard. PPE can reduce the risk of injury (I am less likely to be run over by a forklift if I am wearing a reflective vest) or the severity (a hard hat doesn’t prevent objects from falling on me, but it cushions the impact). I think PPE is the least applicable control in software. First of all, HoC is meant to protect humans, while in software we want to protect systems. So is software PPE worn by people to prevent damage to systems, or worn by systems to prevent damage from people? An example of “human PPE” could be to use red background for production terminals and a blue background for development terminals. All the examples of “system PPE” I can come up with arguably count as engineering controls. Second, PPE isn’t an engineering control because engineering controls modify hazards, while PPE is a third entity between people and the hazard. But anything between a person and the software is also software! Even something like “use Postman instead of ” is more a mix of engineering/admin than “true” PPE. I can think of two places where PPE makes more sense. The first is security, where PPE includes things like secure browsers, 2FA, and password managers are all kinds of PPE. The second is PPE for reducing common software developer injuries, like carpal tunnel, back pain, and eyestrain. These places “work” because they involve people being injured, which was what the HoC was designed for in the first place. PPE comes lower in the hierarchy than administrative controls because employees need discipline and training to use PPE effectively. In the real world, PPE is often bulky and uncomfortable, and 90% of the time isn’t actually protecting you (because you’re not in danger). One paper on construction accidents found that injured workers were not wearing PPE in 65% of the studied accidents. To maximize the benefit of PPE you need to train people and enforce use, and that means already having administrative controls in place. Higher tiers of controls do more to eliminate danger, but are harder to implement. Lower tiers are less effective, but more versatile and cheaper to implement. I’m still working out exactly what this thought is, but: in the real world, people interact with hazards through “natural interfaces”, basically as an object situated in space. If I’m working with a hydraulic press, by default I can put my hand in there. We need to add additional physical entities to the system to prevent the injury, or train people on how to use the physical entity properly. In software, all interfaces are constructed and artificial: hazards come from us adding capabilities to do harm. It’s easier for us to construct the interface in a different way that diminishes the capacity for injury, or in a way that enforces the administrative control as a constraint. This came up once at a previous job. Our usage patterns meant that it was safest to deploy new versions off-peak hours. “Only deploy off-peak” is an administrative control. So we added a line to the deploy script that checked when it was being run and threw an error if it during peak times. That’s turning an administrative control into an engineering one. 3 Also, real world engineering controls are expensive , which is a big reason to choose administrative controls and PPE. But software can be modified much more quickly and cheaply than physical systems can , which makes engineering controls more effective. (I think there’s a similar argument for substitutions, too.) If you looked at the OSHA worksheet I linked above, you’d see an interesting section here: Any possible control method can potentially introduce new risks into the workplace. Lots of administrative alarms cause alarm fatigue, so people miss the critical alerts. Forbidding personnel from entering a warehouse might force them to detour through a different hazard. Reflective vests can be caught on machinery. The safety of the whole system must be considered holistically: local improvements in safety can cause problems elsewhere. We see this in software too. Lorin Hochstein has a talk on how lots of Netflix outages were caused by software meant to protect the system. How could my controls add new hazards? I find that it’s harder to identify new hazards that could be caused by controls than it is to identify existing hazards. That’s HoC in a nutshell. I think it’s a good idea! If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . With better monitoring and observability, I might not need to log into production in the first place. With better permissions policies, I could forbid the environment from dropping the database or require a special developer key to take this action. Or maybe junior engineers shouldn’t have access to production environments at all. If I want to debug something, I must ask someone more experienced to do it. Autologouts could keep me from having an idle production terminal just lying around, waiting to be accidentally alt-tabbed into. A company policy that juniors should not connect to prod Showing us training videos about how easy it is to drop a database Having only one terminal open at a time Requiring that you can only connect to prod if you’re pairing with another developer Reducing hours and crunch time so engineers are less sleep-deprived Regularly wargaming operational problems. Substituting an append-only database could require significant retraining and software changes, introducing more space for mistakes and new bugs Strict access policies could slow me down while trying to fix an ongoing problem, making that problem more severe Too many administrative controls could make people “go through the motions” on autopilot instead of being alert to possible danger. A few places refer to it as a hierarchy of hazard controls (HoHC), but places like NIOSH and OSHA and calls it HoC. I think it’s like how we say “interpreted language” and not “interpreted programming language”: the topic is implicit when we’re writing for professionals, but you need to clarify it when writing for the general public. [return] I have not found any common name for this control. I asked on a Discord and everybody called it by a different name. [return] Yes, there was a flag. [return]

0 views
Hillel Wayne 1 years ago

Hiatus

All of my budgeted blogwriting time is going to Logic for Programmers . Should be back early 2025. (I’m still writing the weekly newsletter .)

0 views
Hillel Wayne 1 years ago

Toolbox languages

A toolbox language is a programming language that’s good at solving problems without requiring third party packages. My default toolbox languages are Python and shell scripts, which you probably already know about. Here are some of my more obscure ones. Had to show up! Autohotkey is basically “shell scripting for GUIs”. Just a fantastic tool to smooth over using unprogrammable applications. It’s Windows-only but similar things exist for Mac and Linux. Audacity doesn’t let you configure mouse shortcuts, so I used AutoHotKey to map the middle-mouse to a keyboard shortcut anyway. I made typing ` ` fill in the current date. This is a tool I use to take timestamped notes. Other uses: launch REPLs for toolbox languages. Input the 100-keypress sequence required to beat one videogame (if you know, you know). Further reading: An array language, like APL . Really good at doing arithmetic on arrays, hair-pullingly frustrating at doing anything with strings or structured data. I used to use it a lot but I’ve mostly switched to other tools, like Excel and Raku. But it’s still amazing for its niches. Get all of the prime factors of a number: Given two processes, each running a four step algorithm, how many possible interleavings are there ? What if I wanted a table of interleavings for each value of 1 to 3 processors and 1 to 3 steps? Further reading: Possibly the most obscure language on this list. Frink is designed for dimensional analysis (math with units), but it’s also got a bunch of features for covering whatever the developer thinks is interesting. Which is quite a lot of things! It’s probably the closest to “a better calculator” of any programming language I’ve seen: easy to get started with, powerful, and doesn’t have the unfamiliar syntax of J or Raku. If someone was born at midnight on Jan 1st 1970, when do they become a billion seconds old? If I run a certain distance in a certain time, what’s my average pace? What’s (6 ± 2) * (8 ± 1)? Raku (née Perl 6) is a really weird language filled to the brim with dark magic. It’s very powerful and also very easy to screw up. I’m not yet comfortable running it for a production program. But for personal scripting and toolkits, it’s incredible. Generate three random 10-character lowercase strings. Parse unusual structured data formats with grammars (see link). Copy a bunch of SVG ids over into inkscape labels. Write a CLI with a few fiddly combinations of options ( example ). Further reading: My newest toolbox language, and the language that got me thinking about toolboxes in general. A heady mix of logic programming, constraint solving, and imperative escape hatches. I first picked it up as a Better Constraint Solver and kept finding new uses for it. If I run at 4.5 miles/hour for X minutes and 5.1 for Y minutes, what should X and Y be to run 3.1 miles in 38 minutes? Given a bunch of activities, time constraints, and incompatibilities, figure out a vacation plan . Checking if a logic puzzle has multiple solutions. Checking if the clues of a logic puzzle are redundant, or if one could be removed and preserve the unique solution. Mocking up a quick Petri net reachability solver . Most of the good toolbox languages I’ve seen are for computation and calculation. I think toolbox languages for effects and automation are possible (like AutoHotKey) but that space is less explored. A toolbox language should be really, REALLY fast to write. At the very least, faster than Python. Compare “ten pairs of random numbers”: A few things lead to this: a terse syntax means typing less. Lots of builtins means less writing basic stuff myself. Importing from a standard library is less than ideal, but acceptable. Having to install a third-party package bothers me. Raku does something cool here; the Rakudo Star Bundle comes with a bunch of useful community packages preinstalled. If you can do something in a single line, you can throw it in a REPL. So you want a good REPL. Most of the languages I use have good repls, though I imagine my lisp and Smalltalk readers will have words about what “good REPL” means. Ideally the language has a smooth on-ramp. Raku has a lot of complexity but you can learn just a little bit and still be useful, while J’s learning curve is too steep to recommend to most people. This tends to conflict with being “fast to write”, though. Thanks to Saul Pwanson for feedback. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . My new book, Logic for Programmers , is now in early access! Find it here . You can configure shortcuts that are only active for certain programs, if a global flag is set, if certain text appears on screen, etc. Simple access to lots of basic win32 functionality. Opening the file selection dialog is just . The GUI framework is really, really good. Honestly the best of any language I’ve used, at least for small things. Learn AutoHotKey by stealing my scripts Somehow AutoHotKey is Kinda Good Now In Praise of AutoHotKey It is insanely terse. Things that would take a several lines in most languages take a few characters in J, so I like it for quickly doing a bunch of math. First-class multidimensional arrays. can add two numbers together, two arrays elementwise, a single number to every element of an array, or an array to every row (or column) of an higher-dimension array. There are lots of top-level primitives that do special case mathematical things, like decompose a number into its prime factors. J notation as a tool of thought J as a desktop calculator How to solve sudoku (The appendix, mostly) Lots of builtin units and unit modifiers. is exactly 365 days, is 365.24, and is about 1.6 seconds. Date literal notation: is 2199.01 years. There’s a builtin interval type for working with uncertainties. It’s a little clunky but it works. The Frink is Good, the Unit is Evil A brief introduction to interval arithmetic You can define your own infix operators ! And postfix operators. And circumfix operators . Lots and lots of syntactic sugar, to a level that worries me. Like instead of you can write . And instead of you can write . Raku Just Knows™ what to do. If you define a function then its parameters are turned into CLI arguments. Multimethods with multiple dispatch, based on runtime values. Combining this with makes small CLI tooling really easy. Many of the mathematical operators have unicode equivalents (like ∈ for ` `), which synergizes well with all of my AutoHotKey hotstrings. Raku, a Language for Gremlins An RNG that runs in your brain Raku is Surprisingly good for CLIs Assignment to variables. Shockingly useful in a logic language. Lots of problems felt almost right for logic programming, but there’d always be one small part of the algorithm I couldn’t figure out how to represent 100% logically. Imperative provided the escape hatch I needed. The module. I love the module. It is my best friend. Give it a goal and a list of possible actions, Picat will find a sequence of actions that reaches the goal. It is extremely cool. Planner programming blows my mind Picat is my favorite new toolbox language Solving a math problem with planner programming jq for json processing Javascript, so I can modify other people’s websites via the dev console Some kind of APL that offers the benefits of J but without the same frustrations I keep having A concatenative PL if I ever find out what small problems a CPL is really good for Something that makes webscraping and parsing as easy as calculation. Requests and bs4 ain’t it.

0 views
Hillel Wayne 1 years ago

Composing TLA+ Specifications with State Machines

Last year a client asked me to solve a problem: they wanted to be able to compose two large TLA+ specs as part of a larger system. Normally you’re not supposed to do this and instead write one large spec with both systems hardcoded in, but these specs were enormous and had many internal invariants of their own. They needed a way to develop the two specs independently and then integrate them with minimal overhead. This is what I came up with. Warning: this is a complex solution is aimed at advanced TLA+ users. For a much (much) gentler introduction, check out my website learntla . Let’s start by giving a motivating system: a Worker sends an authentication request to a Server . If the password matches the server’s internal password, the server responds “valid”, otherwise it responds “invalid”. If the worker receives “invalid” as a response, it goes into an error state. The worker can retry from that state and submit a new authentication request. The worker and server have shared state via the request/response. As an additional complication, we’ll add internal state to the server, in the form of a request log that is hidden from the worker. We can use this example to show the problems of composition and my solution (though I’ll say the example is a little too simple to make it worthwhile). What we want is for the composition to be as simple and painless as possible. If our specs are and , the easiest composition would just be I talk about the problems we have in-depth here , but the gist is that if and are “normal” specs, they’ll place contradictory constraints on the shared variables. For example, will likely read the server response, but not modify it. So to run independently of the composition, we have to say the response never changes, which is equivalent to saying we can’t change it, which makes it impossible for to send a response! The normal way around this is to break apart both and into collections of actions, and then carefully stitch them together in non-contradictory ways. Which is about as complex as it sounds: composing two specs can be as much work as writing them in the first place. This is why I’m trying to find a better way. What we need to do is write specs intended to represent part of the world and then incorporate them into a “whole world” main spec. To do this, we’ll use one of TLA+’s most powerful features: we can use to both assign the next value to x and constrain what the next value can be. Say we have When TLC evaluates , it reads and in as assignments . There are four possible assignments, so the model checker evaluates them all. Then, since and are already chosen, TLC reads the statement in as a constraint . Three of the possible assignments break that constraint, so TLC eliminates those possibilities, leaving us with a unique next state. This means that a Spec Q can take two specs, X and Y, and constrain them against each other. X can have an action that increments , and then Q says that can only happen if is true. Similarly, Q can make one assignment trigger another: Now, changing to true forces an increment in . Q is using one spec to drive side-effects in the system. This is just a state machine! Constraints on transitions are just guard clauses and assignments on transitions are just effects. These can be enforced on other specifications that the state machine doesn’t know about. Here’s how to make this idea work in practice: We’ll model this with three separate specs: , , and . Since the whole system is based around the worker’s state machine, let’s start with . This won’t represent what happens when the worker transitions states, just what the transitions are. represents the set of valid transitions. The three operators are just helpers. Adding them is good practice; most TLA+ specs don’t have enough helpers. The constraint is a little complex, but all it’s saying is that we can’t always transition from to . If we get there often enough, we’ll eventually transition to instead. Otherwise this spec doesn’t put any conditions on the transfers: we don’t need to “do anything” to go to . That’s what is for. show cfg I used this to directly test the worker. It passes, meaning this spec guarantees . Next, the outside component we’ll integrate with the worker. We’re doing three unusual things in this spec. First is that we split the variables into and vars (a) . External vars represent state that’s shared with other parts of the composed specification, while internal only pertains to the ’s properties. Here we represent requests to the server with and responses with . Second, the spec will trivially deadlock (b) . only contains the action, is only enabled if isn’t null, but nothing in the spec makes not-null. This spec isn’t “self-contained”, and it needs to be used by another specification to be meaningful. Third, Spec is only stuttering invariant with respect to variables. This is done by writing instead of (c) . This is the easiest part to miss but will make both spec composition and spec refinement easier down the line. Now let’s put the server and the worker together. This is by far the most complex part. Let’s start with the whole spec, and then cover a breakdown of the most important sections. system spec show all I like to group my by purpose and spec. Since both the server and the worker use and , I put them in a separate grouping. I hard-coded the server’s constant for pedagogical convenience. We don’t have to instantiate the constant because it’s the same name in both and , so it propagates automatically. just shells out to both and to set up their respective variables. It’s easy here because they don’t share any variables. If there was something that they both used or if needed any extra booking variables, I’d handle them specially here. just says “the server can handle its own state updates”, but accounts for stuttering (because every variable needs to be assigned a value on every step). The server is independently active in the system and its behavior doesn’t synchronously depend on the worker. Keeping the components separate isn’t always possible when composing, but it makes things more convenient when applicable. is the same: it behaves independently of . But here’s where the “system” starts to play a role, in . is where things get interesting. To recap the earlier explanation, primed variables can be used in other expressions after being assigned. This means we can attempt a transition in one action and then check if the transition is valid in a later action. This is the key to making this all work. Without this, we’d have to put the guards and side effects in the same action as the transitions. For example, in this line: We are constraining the transition to only be possible if the server response was “invalid”. That’s only possible if rejected the password. But our worker doesn’t need to know this . We can develop them independently and rely on the to correctly compose their behaviors. We can also use this to drive effects on the system: This makes a transition also trigger a request (and clear any outstanding response). This then enables , which enables , so the server can react to the system. Remember, we added all of this machinery in first place in order to compose two complex specifications together. We need to make sure we’re composing them in a way that doesn’t violate either of their own properties. That’s handled by these lines: These are refinement properties . At a high level, these check that doesn’t make the or do anything they’re not normally able to do. For example, this property would fail if we removed something from , since that’s not possible in . 1 In TLA+, refinements are transitive . had the liveness property . Since I already verified it in , I don’t need to test it in . If passes, then is guaranteed to pass, too! Why refinement is transitive When we test , what we’re verifying is We already know from model checking that Implication is transitive: if and , then . And currently it fails : show cfg show all The problem is that the worker can keep picking the wrong password, so the server keeps rejecting it and the worker never completes. One fix is to say the system never retries the same password: This makes the spec pass. If you’re concerned about leaking the server’s internal log to the system, you can add a log to the worker, too, either in or in a separate component that you include in the . 2 An alternative fix uses a strong fairness constraint : This says that if it’s always-eventually possible to execute in a way that sends the right password, the spec eventually will. This makes the spec pass without changing the essential logic of the system. One last thing we need to do: make model-checkable. We can’t test it directly because it’s not a “closed” specification: it relies on some other spec to send requests. For more complex specs we want to be able to test that spec’s properties independently of the composition. Fortunately, this is an “already solved problem” in the TLA+ community: refine the open spec into a closed one. We do this with a new file : This augments the server with an external that can send requests. Now we can test the behavior of by model checking . show cfg show all If you’re on the VSCode TLA+ extension nightly, running the “Check model with TLC” command on will automatically run the model checker with . It also comes with a TLA+ debugger. The VSCode extension is great. This is a lot of work! So why do this instead writing one large spec? The point of composing specs at all is so we can work on them independently. It’s not a big deal when each spec is ~30 lines, but when they’re each 200+ lines, you can’t just rewrite them as one spec. So let’s compare it to a more conventional approach to composition. I’ll use the MongoDB Raft composition , which Murat Demirbas analyzes here . The independent specs are MongoStaticRaft and MongoLoglessDynamicRaft , which composed together in MongoRaftReconfig . The for looks like this: While the compositional looks like this: The actions under (a) must be manually repeated in , while the actions under (b) must be be carefully interlaced with the other spec under . This is the standard way of composing two specs, which gets progressively more difficult the more specs you need to integrate. It’s also hard to check refinements in this paradigm. I’m trying to avoid both issues with my -based approach. Each component already has its own that we can use without changes. Anything that affects shared variables is handled by the additional operator. Would this spec benefit from my approach? I don’t know. I designed the approach for a primary “machine” interacting with external components, where here the two specs are on equal footing. I have a sketch for what the conversion would look like, but I don’t know if it’s necessarily better. Sketch of changes This is all without adding an extra state machine. Pick one “primary” spec, say . then becomes We need to add to so it doesn’t call on its own— that needs to happen through . will share some variables in common with with and . would look similar to : It annoys me that we’re repeating in both and , but it’s the most straightforward way to ensure that both and use the same values for and . I figured out some ways to deduplicate this but they all rely on creative misuse of TLA+ semantics. My prediction is that conventional composition works for a wider variety of cases, but the is more maintainable and scales better in the cases where it does work. Neither approach handles the “one-to-many” case well. That’s where you have a spec for a single worker and try to use it to add N workers, where N is a model parameter. I discuss why this is so difficult in my article Using Abstract Data Types in TLA+ . This is a powerful technique, but also one that takes experience and adds complexity. It’s good if you need to write a very large specification or need a library of reusable components. For smaller specifications, I’d recommend the standard technique of putting all of the components in one spec. This was developed for a consulting client and worked beautifully. I’m excited to share it with all of you! If you liked this, you can read other advanced TLA+ techniques here , like how to make model-checking faster. . Thanks to Murat Demirbas and Andrew Helwer for feedback. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . Some company styleguides forbid using primed variables as an expression in later actions. In that case, you can get the same effect like this: is effectively just emulating the behavior of , except it lets us “save” the transition we use and pass it into . This is also useful for preserving parameters passed into an action, which is sometimes necessary for composition. Write an abstract state machine for all of the high-level state transitions of the core component Write the other components as “open” specs that don’t fully describe their next states. Refine the core component into the main spec, with a action that adds guards and side-effects to the state transitions. just shells out to both and to set up their respective variables. It’s easy here because they don’t share any variables. If there was something that they both used or if needed any extra booking variables, I’d handle them specially here. just says “the server can handle its own state updates”, but accounts for stuttering (because every variable needs to be assigned a value on every step). The server is independently active in the system and its behavior doesn’t synchronously depend on the worker. Keeping the components separate isn’t always possible when composing, but it makes things more convenient when applicable. is the same: it behaves independently of . But here’s where the “system” starts to play a role, in . This is why we needed to write as and not . only needs to hold for the internal variables, not the shared ones! [return] I had a long section on composition via multilayer refinement: refines refines . But it ended up being too complex to thoroughly explain. Maybe that’ll be a part 2! [return]

0 views
Hillel Wayne 1 years ago

What We Know We Don't Know: Empirical Software Engineering

This version of the talk was given at DDD Europe, 2024 . Technology is a multitrillion dollar industry, but we know almost nothing about how it’s best practiced. Empirical Software Engineering, or ESE, is the study of what works in software and why. Instead of trusting our instincts we collect data, run studies, and peer-review our results. This talk is all about how we empirically find the facts in software and some of the challenges we face, concluding with a guide on how to find existing research and an overview on some things we’ve learned about DDD. Slides are here . I referenced a bunch of papers in my talk. These are links so you can read them yourself: Scalability! But at what COST? The Pragmatics of TDD TDD is dead. Long live testing. Comparing syntax highlightings and their effects on code comprehension Simple Testing Can Prevent Most Critical Failures A Large Scale Study of Programming Languages and Code Quality in Github (original faulty study) On the Impact of Programming Languages on Code Quality (replication study) My 6,000 word writeup of the whole fiasco Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names (preprint) Domain-Driven Design in Software Development: A Systematic Literature Review on Implementation, Challenges, and Effectiveness Design, Monitoring, and Testing of Microservices Systems: The Practitioners’ Perspective Practitioner Views on the Interrelation of Microservice APIs and Domain-Driven Design: A Grey Literature Study Based on Grounded Theory Refactoring with domain-driven design in an industrial context Tackling Consistency-related Design Challenges of Distributed Data-Intensive Systems - An Action Research Study Note that there are other interesting papers in the survey paper, these are just the ones I brought up in the talk. Teaching tech together Leprechauns of Software Engineering The Programmer’s Brain Making Software It Will Never Work In Theory ACM digital library What does science say about DDD as a whole? Is it good or bad? This isn’t a question science can answer. It’s kind of like asking “is competition good”: the scope is simply too vast and the criteria too ambiguous to have a meaningful answer. Instead, we have to look at the specific things people do as part of DDD, and the specific ways it affects their projects. Do microservice architects applying bounded contexts create more services than ones who don’t use any part of DDD? Do domains modeled with event storming “look different” than domains that don’t use it? What are the most common unique issues in systems that use CQRS? Are there any studies on how to teach “thinking in abstractions”? Off the top of my head, the first place I’d look is at Shriram Krishnamurthi’s corpus . His group focuses on how we can teach abstraction better and has developed a lot of interesting tools exploring this. Does it even make sense to study “which languages are more error-prone?” Maybe different languages attract different types of people, and that’s what matters. In his video debunking the original paper , Jan Vitek agrees that this is a fundamental issue with the original paper, but he focused the replication on the methodological errors because those are easier to conclusively prove. See my writeup for more details. How long is it usually between software developers adopting a new technique and scientists studying it? No idea, sorry.

0 views
Hillel Wayne 1 years ago

Comment Section: Software Friction

These are some of the responses to Software Friction . Laurie Tratt wrote What Factors Explain the Nature of Software? which touches on the topic of friction, too. I’m an Infantry Officer in the US Marine Corps, we talk quite a bit about friction. One book you may be interested in if you’re reading Clausewitz is Marine Corps Doctrinal Publication 1 (MCDP 1) Warfighting (link to PDF at bottom). It’s a pretty quick read. We approach it largely in the ways you just laid out, we train people early in their careers about friction and that helps prime people to identify ways to solve it as they gain experience. One tactic we use quite a bit that I don’t see you mention but that I think would work really well for software teams as well is using “hot washes”. Basically, immediately after an operation, whether in training or real-life, sitting down with all the key players to go over what went well, what went poorly, and whether or not there should be updates to Standard Operating Procedures for better future performance. Doing these regularly also helps take the “ego sting” out of it, for lack of a better phrase, where it can be hard to get called out about a mistake you made in a public setting with peers but like anything else it gets easier with repetition. Regular hot washes also build shared understanding between teammates which helps communication and reduces friction. I asked how “hot washes” compare to postmortems. The response: The main difference I would say (and maybe this is common in agile circles and I just don’t know about it) is that typically there will be a “scribe” that will record the points in a specific format (typically Topic-Discussion-Recommendation) and then these points are saved for posterity and stored in libraries at the unit, or frequently at the service level. This lets you reference mistakes you previously made, or even reference mistakes made years ago by units facing similar problems that you are about to face. The Marine Corps Center for Lessons Learned maintains the service-level library of these. One of the challenges that I think is a bit more unique to us relative to a large software firm is that we have higher turnover, with people rotating out of a unit typically after around two years. So a persistent store of knowledge is hard to maintain in the heads of staff, it needs to get written down to be effective.

0 views
Hillel Wayne 1 years ago

Software Friction

In his book On War , Clausewitz defines friction as the difference between military theory and reality: Thus, then, in strategy everything is very simple, but not on that account very easy. Everything is very simple in war, but the simplest thing is difficult. These difficulties accumulate and produce a friction, which no man can imagine exactly who has not seen war. As an instance of [friction], take the weather. Here, the fog prevents the enemy from being discovered in time, a battery from firing at the right moment, a report from reaching the general; there, the rain prevents a battalion from arriving, another from reaching in right time, because, instead of three, it had to march perhaps eight hours; the cavalry from charging effectively because it is stuck fast in heavy ground. Ever since reading this, I’ve been seeing “friction” everywhere in software development: This list is non-exhaustive and it’s not possible to catalogue all possible sources of friction. Friction matters more over large time horizons and large scopes, simply because more things can go wrong. Friction compounds with itself: two setbacks are more than twice as bad as one setback. This is because most systems are at least somewhat resilient and can adjust itself around some problem, but that makes the next issue harder to deal with. (This is a factor in the controversial idea of “don’t deploy on Fridays”. The friction caused by a mistake during deployment, or of needing to doing a rollback, would be made much worse by the friction of people going offline for the weekends. The controversy is between people saying “don’t do this” and people advocating for systemic changes to the process . Either way the goal is to make sure friction doesn’t cause problems, it’s a debate over how exactly to do this.) Addressing friction can also create other sources of friction, like if you upgrade a dependency to fix a security alert but the new version is subtly backwards incompatible. And then if you’re trying to fix this with a teammate who lives in a different timezone… Friction is inevitable and impossible to fully remove. I don’t think it’s possible to even fully anticipate. But there are things that can be done to reduce it, and plans can be made more resilient to it. I don’t have insight into how military planners reduce friction. This is stuff I’ve seen in software: One interesting book on this is the Naval War College Fundamentals of War Gaming . In it they argue that there’s two purpose to wargaming: gathering information on how situations can develop, and giving commanders (some) experience in a safe environment. If a trainee learns that “weather can disrupt your plan” in a wargame, they don’t have to learn it with real human lives. Similarly, I’d rather practice how to retrieve database backups when I’m not desperately trying to restore a dropped table. To my understanding, both security and operations teams use gaming for this reason. (At the same time, people have to devote time to running participate in games, which is inefficient in the same way adding redundancy is.) Ways of formalizing tacit knowledge of dealing with particular problems. Is it useful to subcategorize sources of friction? Does calling a tooling problem “technical” as opposed to “social” friction do anything useful to us? How do other fields handle friction? I asked some people in the construction industry about friction and they recognized the idea but didn’t have a word for it. What about event planners, nurses, military officers? How do we find the right balance between “doing X reduces the effect of friction” and “ not doing X is more efficient right now”? Is friction important to individuals ? Do I benefit from thinking about friction on a project, even if nobody else on my team does? Thanks for Jimmy Koppel for feedback. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . I’ve collected some of the comments I received on this post here . A vendor’s API doesn’t work quite as you thought it did, or it did and then they changed it. Bugs. Security alerts. A dependency upgrade breaks something. Someone gets sick. Someone’s kid gets sick. Someone leaves the company. Someone leaves for Burning Man. The requirements are unclear, or a client changes what they want during development. A client changes what they want after development. A laptop breaks or gets stolen. Slack goes down for the day. Tooling breaks. Word changes every font to wingdings. ( This is a real thing )

0 views
Hillel Wayne 1 years ago

Don't let Alloy facts make your specs a fiction

I’ve recently done a lot of work in Alloy and it’s got me thinking about a common specification pitfall. Everything in the main post applies to all formal specifications, everything in dropdowns is for experienced Alloy users. Consider a simple model of a dependency tree. We have a set of top-level dependencies for our program, which have their own dependencies, etc. We can model it this way in Alloy: Show Alloy tip I’m going to use a slightly different model for the next example: I do things this way because it gives visualizations with and instead of and . Alloy has built-in enums but they don’t play nice with the rest of the language (you can’t extend them or give them fields). If we look through some of the generated examples, we see something odd: a package can depend on itself! These kinds of nonsensical situations arise often when we’re specifying, because we have an intent of what the system should be but don’t explicitly encode it. When this happens, we need to add additional constraints to prevent it. For this reason, Alloy has a special “fact” keyword: 1 Show Alloy tip You can write the same fact purely relationally: In general, the model checker evaluates purely-relational expressions much faster than quantified expressions. This can make a big difference in large models! Alloy will not generate any models that violate a fact, nor will it look for invariant violations in them. It’s a “fact of reality” and doesn’t need to be explored at all. “No self-deps” is a great example of a fact. It’s also an example of a terrible fact. Beginners often make this mistake where they use facts to model the system, which quickly leads to problems. Consider the real system for a second, not the spec. Where do the package manager dependencies come from? Usually a plain text file like or . What if someone puts manually a self-dependency in that file? Presumably, you want the package manager to detect the self-dep and reject the input as an error. How do you know the error-handling works? By having the checker verify that it accepts valid manifests and rejects ones with self-loops. Except it can’t test the rejection because we told it not to generate any self-dependencies. Our fact made the self-deps unrepresentable . Normally in programming languages, “making illegal states unrepresentable” (MISU) is a good thing ( 1 2 3 4 ). But specification covers both the software you are writing and the environment the software is running in, the machine and the world . If you cannot represent the illegal state, you cannot represent the world creating an illegal state that your software needs to handle. Show Alloy tip There is a technique to make invalid states representable in the world but not in the machine : refinement . The link goes to a TLA+ explanation but the same principle works in Alloy too: write an abstract spec without MISU, write an implementation spec with it, then show that the implementation refines the abstract spec. But you’ll do this with signatures and predicates, not with facts. Instead of facts, you want predicates. Then you can test for the predicate being true or check that it’s necessary to get other properties. Make the constraint explicit instead of implicit. Predicates have the additional benefit of being “locally scoped”: if you have three facts and want to check a model with only two of them, you have to comment the third fact out. So where should we use facts? When does it make sense to universally enforce constraints, when doing so could potentially weaken our model? First, facts are useful for narrowing the scope of a problem. “No self-deps” is a perfectly reasonable fact if we’re only specifying the package installer and something else is responsible for validating the manifests. Writing the as a fact makes it clear to the reader that we’re not supposed to validate the manifest. This means we don’t make any guarantees if the assumption is false. Second, facts rule out fundamentally uninteresting cases. Say I’m modeling linked lists: This generates regular lists and lists with cycles, which are interesting to me. I don’t want to constrain away either case. But it also generates models with two disjoint lists. If I only care about single linked lists, I can eliminate extra lists with a fact: Show Alloy tip also rules out “two link lists that merge into one”. If that’s something you want to keep, use the graph module: Similarly, you can use facts to eliminate extraneous detail. If I’m modeling users and groups, I don’t want any empty groups. I’d add a fact like . Third, you can use constraints to optimize a slow model. This is usually through symmetry breaking . Finally, you can use facts to define necessary relationships that Alloy can’t can’t express natively. In the project I worked on, we had Red and Blue nodes in our graph. Red nodes had at least one edge to another node, Blue nodes had at most one . We wrote this as But then we couldn’t write generic predicates on nodes that use , because Alloy treated it as a type error. Instead we wrote it with a fact: Show Alloy tip Okay, one more (rather niche) use case. Say you have a temporal model and a canonical predicate for system behavior. Then a lot of your assertions look like You can clean this up a lot by exporting all of properties to and writing it like this: Constraints are dangerous because you need error states in order to check that your program avoids error states. If you’re interested in learning more about Alloy, there’s a good book here and I maintain some reference documentation . I’m also working on a new Alloy workshop. I ran an alpha test last month and plan to run a beta test later this summer. Sign up for my newsletter to stay updated! Thanks for Jay Parlar and Lorin Hochstein for feedback. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . In other specification languages this is usually either a runtime constraint (like in TLA+ ) or a direct modification to the system spec itself. [return]

0 views
Hillel Wayne 1 years ago

The Tale of Daniel

It’s April Cools ! It’s like April Fools, except instead of cringe comedy you make genuine content that’s different what you usually do. For example, last year I talked about the strangest markets on the internet . This year I went down a different rabbit hole. Daniel has been a top 5 baby name twice since 2000 . There are over 1.5 million Daniels in the United States (and almost 100,000 last-name Danielses). There are who knows how many Dans, Dannies, and Daniellas out there. “Daniel” is, by any way you slice it, a very modern name. It may be the oldest attested name in common use. Last year was Chicago’s last Newberry Library book sale . Part of my haul was several collections of ancient literature: I’m interested in ancient history, which shouldn’t be a surprise to anybody who reads this blog. And I love how their literature reflects the modern human experience. Take this lament carved in a pyramid from the Egyptian Middle Kingdom: I have heard the words of Imohotep and Hardedef \ Whose sayings are recited whole. \ What of their places? Their walls have crumbled, \ Their places are gone, \ As though they had never been! ( pg 196 ) This anonymous poet, writing 4500 years ago, shares our fears about losing our past and of our memories being forgotten. Beautiful. But I found something far more relevant in the Mesopotamian literature. Straightway Daniel the Raphaman … gives oblations to the gods to eat ( pg 118 ) Daniel the Raphaman The story comes from the Tale of Aqhat , pulled out of the ruins of a forgotten city, estimated to approximately 1350 BCE. That’s a hundred years older than the oldest known Chinese writing , one thousand years older than the book of Daniel, and thirty-four hundred years before today. I needed to know if this was the same name, transmitted across cultures, stretching 3400 years of unbroken “Daniels” to our modern day. Is this where “Daniel” comes from, or is just a strange coincidence? A historian of the modern world has to learn how to sip from a firehose of evidence, while the historian of the ancient world must learn how to find water in the desert. — Bret Devereaux I’ve done some historical work before, but all of it was in software, a “history of the modern world”. I decided to try anyway. This is what I’ve put together from reading books and trawling JSTOR. 1 This is the tale of Daniel. Obviously, finding an unbroken lineage of specific Daniels across three millennia is impossible. Instead, I want to learn if the name was in the “cultural consciousness”. If so, we can be modestly confident that real people are getting named Daniel, like how real people are being named Khaleesi today. So let’s start with where the name was first found. Ugarit was a port city of the shores of Ras Shamra, unknown until 1928. Scholars believe that it was a city of roughly 8,000 people and the capitol of a kingdom of 20,000 ( ref ). While never a great power, Ugarit was a fairly “literate” society and had extensive trade relations with all of the regional powers. Along with a trove of archaeological material (pots and doodads), excavators also found large collections of clay tablets. Like with many Bronze Age civilizations, most of these recovered documents are either religious, diplomatic, or administrative. That’s because clay tablets are expensive and only reserved for important matters. Most of these were dated to very approximately 1500-1200 BCE. But also in that collection, we found three piece of literature. The first, the Baal Cycle, tells of how Baal Hadad became one of the great gods of the Ugaritic pantheon. The second, the Legend of Keret, probably matters a lot to another person chasing a completely different white whale, but doesn’t matter to me. The final one, the one that started this obsession of mine, is the Tale of Aqhat. I found this picture in the InscriptiFact collection . Sadly no digital transcription! It tells of the story of the great hero Aqhat, who was loved and murdered by the goddess Anath. But I don’t actually care about any of that, I’m only after the Daniel. I’m pretty sure this is Daniel’s name: I have a few friends who can read cuneiform, so I asked if any understood this text. I found out 1) that’s like asking a person who “can read ASCII” if they can understand Spanish, and 2) this is not actually cuneiform. It’s just an alphabet that looks like cuneiform. Aqhat’s father is 𐎄𐎐𐎛𐎍. Left to right: D, N, ‘I, L. “Judged by El”, El being the chief God in the Ugaritic pantheon. The name could be older than this text, since most early written myths were first oral traditions. This is just the first time the name “Daniel” is attested. The name could also be a lot younger, at least the modern name. The old name could just be an unusual coincidence, like how people in both England and China have the last name “Lee”. 2 For “Daniel” to be that old, it’s not enough for the Ugarits to have had the name, it has to come from them, in a chain of Daniels reaching into present. And here we run into a problem. We know roughly how old the tablets are because we know roughly when the city collapsed. By cross-referencing names of Egyptian rulers that appear in Ugarit tablets with the much better preserved Egyptian corpus, historians estimate that the city collapsed around 1190 BCE. This corresponds with the “Bronze Age Collapse”, a cataclysmic era where dozens of cities and kingdoms in the region are turned to ruins. Scholars still don’t have a clear picture why. Regardless of the why , the Ugarites disappeared from history. The name Daniel would have to be carried by a new generation. But this is also where the chain breaks. The Ras Shamra find was a one-in-a-million miracle, an enormous corpus of about 40,000 words, roughly as much as one Harlequin romance novel. After that? Well, here’s the Hadad Statue : If you look closely, you can see the bottom is covered in faint writing. The language is “ Samalian ”, one of the many languages in the region. The engraving is roughly 400 words long and represents over half of all Samalian. This essay is four times longer than the Samalian corpus. So what happened? It’s easy to imagine the Bronze Age Collapse turning the broader region into a Stone Age wasteland, but the archaeological evidence suggests the opposite. Rather, it points to a rapid economic recovery of the region , with new kingdoms soon establishing a large number of seafaring trade routes. Eventually they’d come in contact with the Greeks, who’d call all of the many distinct groups “Phoenicians”. But we know so little of what the “Phoenicians” (in all their languages) wrote because of how they wrote it. Instead of Ugarit’s clay tablets, the Phoenicians used papyrus. Papyrus is lighter and cheaper, but also rots after a few decades. The fragments of text we do have— with one big exception— are engravings. 3 Inscriptions on tombs, statues, weapons, etc. This is a major problem for historians, but an even bigger problem for me , because it makes the question of Daniel impossible to answer. Nonetheless, we can look for clues that the same stories persisted, even if we don’t have concrete evidence. Let’s start with the first two lines of the Hadad statue: 𐤀𐤍𐤊 𐤐𐤍𐤌𐤅 · 𐤁𐤓 · 𐤒𐤓𐤋 · 𐤌𐤋𐤊 · 𐤉𐤀𐤃𐤉 · 𐤆𐤉 · 𐤄𐤒𐤌𐤕 · 𐤍𐤑𐤁 · 𐤆𐤍 · 𐤋𐤄𐤃𐤃 · 𐤁𐤏𐤋𐤌𐤉‎· 𐤒𐤌𐤅 · 𐤏𐤌𐤉̇ ·· 𐤀̇𐤋𐤄𐤅 ·· 𐤄𐤃𐤃 · 𐤅𐤀𐤋 · 𐤅𐤓𐤔𐤐 · 𐤅𐤓𐤊𐤁𐤀𐤋 · 𐤅𐤔𐤌𐤔 · 𐤅𐤍𐤕𐤍 · 𐤁𐤉𐤃𐤉 This is the “Phoenician alphabet”—the same one that inspired the Greek alphabet— which many distinct cultures in the region used for their own languages. Unlike Ugaritic, it’s read right-to-left. Scholars don’t think this alphabet descended from Ugaritic, but the two writing systems are closely related: the same sounds appear in roughly the same places in the alphabet. 4 Here’s the translation (from wiki): I am Panamuwa, son of Qarli, king of Y’DY, who have erected this statue for Hadad in my eternal abode (burial chamber). The gods Hadad and El and Rašap and Rākib-El and Šamaš supported me. The same gods, El and Baal Hadad, are important gods in the Ugarit corpus! And this is coming four hundred years after the fall of Ugarit. It suggests the Baal Hadad and El weren’t local gods to the Ugarits, but regional gods, part of a shared culture. The tale of Aqhat could have been preserved as a literary tradition, just on sources we don’t have anymore. And with Aqhat comes his dad(niel). To get out of the Daniel dark ages, we have to look at a different inscription found 600 miles away. Like the Hadad statue, the Misha Stele is one of our only sources on a long-dead language. In this case, Moabite. It’s a distinct language from Samalian but this uses the same Phoenician alphabet. Here’s a sample (also from wiki): I have built its gates and I have built its towers. I have built the palace of the king, and I made the prisons for the criminals within the wall. And there were no wells in the interior of the wall in Karchah. And I said to all the people, ‘Make you every man a well in his house.’ And I dug the ditch for Karchah with the chosen men of Israel. Yes, Israel. The Greeks called this region Phoenicia, but the locals always called it Canaan . The same “Canaan” from the Bible. The Israelites were a Canaanite population with Canaanite religions: Israel literally means One who fought El . Slowly, over years and centuries, they developed their own ethnic identity. At some point, there were two Israelite kingdoms: the kingdom of Israel/Samaria in the north and the kingdom of Judah (where we get “Jew”) in the south. Israel was conquered in 720 BCE while the kingdom of Judah limps along, in one form or another, until 130 CE. 5 And it’s through the people of Judah that we have the best corpus of that region’s writing. The Tanakh 6 , or “Old Testament”, is a 300,000 word corpus of ancient Hebrew, chronicling the stories, religious practices, and history (mythic and attested) of the Jews. 7 It’s the reason we can decipher the scant inscriptions of other Canaanite languages like Moabite, Samalian, and even Ugaritic itself ! And it’s where the tale of Daniel continues. Quick recap on Daniel so far: we know the Ugarites wrote about a son of Daniel, the Canaanites (nee Phoenicians) shared a culture with the Ugarites, and the Israelites shared a culture with the Canaanites. In 600 BCE, the Neo-Babylonian empire conquered Judah, which leads to the sixty year “Babylonian exile” of Jews from Canaan. It’s allegedly during this time that we get the collection of prophecies known as the “Book of Ezekiel”. 8 Mostly it’s about how the Jews lost the favor of God and will one day earn it back, and with it redemption. Far, far more importantly, it marks the triumphant return of Daniel. “Son of man, if a land sins against Me by trespassing grievously, I shall stretch forth My hand upon it and break its staff of bread, and I shall send famine upon it and cut off from it both man and beast. Now should these three men be in its midst- Noah, Daniel , and Job-they would save themselves with their righteousness, says the Lord God. 9 — Ezekiel 14:13-14 Daniel is written (דנאל). As a distant descendent of Phoenician, Hebrew reads right-to-left: Daled, Nun, Aleph, Lamed. “God judged me.” Same meaning as the Ugaritic name, same letters as 𐎄𐎐𐎛𐎍. I’m amazed that the same meaning can be preserved across so many centuries. But I also want to be as thorough as possible. Two names, eight hundred years apart, with the same meaning, is still not enough for me. I still worry that it could just be an enormous coincidence that two cultures in the same region, speaking sister languages, came up with the same name-meaning . Ezekiel provides the missing link. In the Tanakh, both Noah and Job are treated as pious men, but critically, neither are Israelites . 10 Making Daniel part of the trio implies he was also a righteous non-Jew. Ezekiel would expect people would know who “Daniel” was if he used that name. This would make sense if Daniel was, say, a righteous figure in the broader Canaanite folklore. At least the Tale of Aqhat implies he was a righteous and just man who was worthy of El’s blessing. He prayed for a son and his wish was granted. He’d be a distinct character, a known righteous non-Israelite, which Ezekiel used as a point of reference. A tenuous chain, I know, but enough to persuade me at least. Enough to persuade other scholars too . It may be the best we’ll ever do. We can finally use the Bible to establish that Daniel wasn’t just a folkloric name, but a name people actually used. From the Book of Ezra: And these are the heads of their father’s houses and the lineage of those who ascended with me in the kingdom of King Artaxerxes from Babylon. Of the sons of Phinehas, Gershom: of the sons of Ithamar, Daniel ; of the sons of David, Hattush. — Ezra 8:1-2 And these were the sons of David who were born to him in Hebron: the firstborn, Amnon, to Ahinoam the Jezreelitess; the second, Daniel , to Abigail the Carmelitess. — Chronicles 3:1 If they’re using Daniel as the name of a minor son of a minor figure, it’s probably a real name. But now we have a problem. The Ezekiel Daniel has four letters. But all of the later Daniels have five : Daled, Nun, Yud , Aleph, Lamed (דניאל) . It’s still possible to read the four-letter version as “Daniel” but it’s more likely to be “Danel” or “Danil”. And, Ezekiel’s Daniel being closest to the Ugaritic DN’IL, suggests that DN’IL was also “Danel”. That’s how most modern scholars now transliterate the name. So yes, I may have been chasing a false Daniel this whole time. Theophanu is not Tiffany , after all. Maybe “Daniel” is at best attested from 600 BCE, putting it in contention with things like Debrah and Zhou . There is a tradition to pronounce the Ezekiel DN’IL as “Daniel”, so it could be the actual pronunciation and the Yud was added later. I don’t know when the tradition started, though. At the very least, this strengthens the connection between Ezekiel’s דנאל and the Ugarite’s 𐎄𐎐𐎛𐎍: Ezekiel was working off a different name than the rest of the Tanakh authors. Then again, maybe this is all just quibbling over specifics. The Latin alphabet wasn’t meant for Canaanite names, and the Hebrew alphabet wasn’t meant for Ugaritic ones. The Biblical שרה could be Romanized as “Sarah” or “Sara”, חנכה could be Hannukah, Channuka, or Chanukah. Maybe it doesn’t matter whether 𐎄𐎐𐎛𐎍 is דניאל or Danel or דנאל or Daniel. It’s still the same name. Regardless, we’re following a chain, and there’s still one more link. There’s one last question here: lots of people use Tanakh names, like Sarah and Benjamin and David. But many Tanakh names, like Tamar and Yehuda and Hillel, are only used by Jews. Why is Daniel in the former group and not the latter? I’d speculate that the difference is how likely it is for a Christian reader to encounter the name. “Adam” appears in the first book of Genesis, while Shamgar is a minor figure in the book of Judges. Christians would most likely encounter names that appear either really early, have entertaining stories, or are directly relevant to the New Testament. And here’s where we finally get to the last link between then and now: the Book of Daniel. Like the Book of Ezekiel, this recounts the story of someone living during the Babylonian exile. Unlike Ezekiel, it was pretty clearly written four hundred years after the exile. Daniel contains an awful lot of prophecies that seem very relevant to the Greek occupation (the one that leads to the story of Channukah). The most famous of them is the prophecy of 70 weeks : And you shall know and understand that from the emergence of the word to restore and to rebuild Jerusalem until the anointed king [shall be] seven weeks, and [for] sixty-two weeks it will return and be built street and moat, but in troubled times. — Daniel 9:25 Later writers took this to be a prophecy about Jesus . I don’t admit to understand how it’s supposed to be about Jesus, but it is. So the book of Daniel is really important in Christian thought. This kept the name “Daniel” in the Christian mainstream for the next 2000 years, carrying it across the world to Europe and India and East Asia and South America , all the way to the present daytoday. The end of a 3400 year chain. So that’s the story of the name of Daniel, from an ancient clay tablet to the millions of Daniels alive now. I want to reiterate that I’m not a trained historian and almost certainly got a lot of details wrong. Also, this isn’t and wouldn’t pass peer review. Don’t cite this for a class paper. Happy April Cools! Thanks to Predrag Gruevski for feedback. If you enjoyed this, check out the April Cools website for all the other pieces this year! I accessed JSTOR as a UChicago Alumnus benefit, but I also found out that anyone with a Chicago public library card has access too! Hundreds of public libraries do this! [return] Note though that the Chinese “Lee” is just one possible Romanization of 李. “Li” is another. [return] We have a much larger Egyptian corpus because papyrus lasts longer in Egypt’s hot and arid climate. We have large Greek and Roman corpuses because monks tirelessly copied old texts into new books for centuries. Volcanoes helped . [return] We know the order because we’ve recovered Ugarit abedecaries (inscriptions of all an alphabet’s letters in order). [return] At least, that’s when Jews were permanently expelled from the region. Growing up I was taught that the important event was the burning of the Second Temple, which happened in 70 CE. [return] I originally wrote “the Torah” here, but as one commenter pointed out, the Torah is just the first five books of Moses . The Tanakh also includes the Nevi’im and Ketuvim, both of which I rely on later in the essay. I know all my old Rebbes are very disappointed in me. [return] I grew up an Orthodox Jew, and it’s extremely weird to study the actual history of your religion. Like vivisecting holy books. [return] From my research it seems that it’s generally accepted that most of Ezekiel is written contemporaneously with the Babylonian exile, though it may have been smoothed out later. [return] For Old Testament verses I’m using the Chabad translations. My Hebrew never got past a first grade level. [return] Noah is obvious: his story takes place before Abraham’s. Job is tougher, as there’s nothing in the story that makes it clear he’s not Jewish. I’m basing this off the claims of many different Rabbis and modern scholars . [return]

0 views
Hillel Wayne 1 years ago

Comment Section: The Hunt For The Missing Data Type

I got a lot of responses to The Hunt For the Missing Data Type . I’ve included some of the most interesting ones below. Everything in quotes is verbatim. My name is Michel Pelletier, I’m one of the contributors to the GraphBLAS API standard and python bindings. Congrats on your blog going up to the front page on HN. I replied with some information that I think you might find useful to your question, but it’s pretty buried under a lot of the discussion there. I noticed that you invite people to email you, so I decided to send you the information directly. The graph data type you’re missing is the one you already mentioned, a matrix. You mention adjacency matrices, but in the context of your blog post, they’re considered only a storage format, not the graph itself. But graphs and matrices are conceptually and algebraically isomorphic. All graphs, and thus all composite data structures, are mathematically matrices, and all matrices are graphs. Hypergraphs and Multigraphs are represented by “Incidence Matrices” which are node-to-edge to edge-to-node adjacencies using two rectangular matrices. A really excellent introductory paper by a large group of researchers, spearheaded by Dr. Jeremy Kepner, Director of the Supercomputing Center and MITs Lincoln Laboratory is: https://arxiv.org/pdf/1606.05790.pdf The issue with computers and thinking of graphs as matrices is that most graphs are sparse, and most matrix libraries (like numpy) are dense. This makes using adjacency matrices very expensive, since most of the “space” in a dense matrix ends up being zero. This is extremely inefficient and defeats cache hierarchies typical of von Neumann architectures. These two worlds have not quite converged yet. There is however a lot of research and development around efficient sparse matrix computation, and thus, sparse graph analysis. While these may seem different, they are actually one in the same: matrix multiplication is a breadth first step across a graph. This is part of the isomorphic nature between them. A lot of ML and AI research involves both sparse matrices and graphs, and this research is very much unified in improving the performance of both paradigms. A big question I get about this is “why”. Why write a linear algebraic formula instead of a function that traverses the nodes and edges? And one of the most important answers is parallelization over huge graphs. When graphs get really, really big, billions or trillions of edges, you need to divide the work of your algorithm efficiently. How are you going to do that? Fork every edge? Use a thread pool? How to schedule the work and partition the graph efficiently? Now do that on CUDA… the problem becomes almost impossible for even the smartest programmers to tackle. With the GraphBLAS, a graph operation is a linear algebraic formula, decomposed into a series of matrix multiplications, you just say something like Ax = b, and the underlying library figures out how to do the work most efficiently on a specific target architecture. Run it on a chromebook or a supercomputer, the code doesn’t change, just the capacity of the machine for bigger graphs. You can think of the GraphBLAS as a language that can be “JIT” compiled based not only on the underlying architecture but also the shape and type of problem you feed it. Since LA is the universal language of math, science and engineering, this technique has natural application to a lot of existing work. So I just wanted to send that your way, good luck on your further exploration of the subject. I’m happy to chat with you anytime if you’d like to find out more, as a member of the C API Committee, evangelizing is part of my job and I enjoy introducing the subject. In earlier drafts of the essay I talked about TinkerPop, Apache’s graph computing framework, and Gremlin, the query language. I removed it from the final version, and several people noticed it was missing. Here’s one response. You mentioned Cypher, but didn’t talk about Gremlin, which is an expressive graph query language for Neo4j/TinkerPop: https://tinkerpop.apache.org/ It was used to power the Joern SAST tool that was acquired by ShiftLeft, and I think is used a lot in the finance world. I last used it to make a graph of all of the software packages in Maven and their interdependencies. It’s got nice bindings into programming languages - I used Python Gremlin so I could drive it from a familiar scripting language, since the default REPL/script is Groovy that I’m not so handy with. You can interchange between querying the graph with Gremlin and doing “normal” imperative scripting in Python, then do further querying from the results. It felt quite natural. I don’t know about the state of whole graph algorithms for it - I was always interested in traversal-based queries vs whole graph statistics like centrality. I ran across your article about missing software libraries for graphs because it was linked from the CodeProject newsletter. It was a wonderful article. I’m not of the same caliber as the people you consulted when you were writing the article, but nevertheless I thought I would offer you one more example of the difficulties in creating generic software libraries for graphs. Namely, I’m a part of a very informal group of people who have been using computers for years to investigate the mathematics of Rubik’s Cube. The math underlying Rubik’s Cube is called group theory and one of the things you can do with groups is to map them out with graphs called Cayley graphs. That know-it-all called Google describes it as follows: “Cayley graphs are frequently used to render the abstract structure of a group easily visible by way of representing this structure in graph form. Properties of a group G, such as its size or number of generators, become much easier to examine when G is rendered as a Cayley graph.” In particular, each possible Rubik’s Cube position can be represented as a node in a Cayley graph and adjacent nodes are those which can be reached in exactly one move. You mentioned the 15 puzzle your article. It turns out that one of the Rubik’s Cube investigators has written a complete and very fast solver for the 15 puzzle. And it also turns out that the size of the 15 puzzle is a drop in the bucket as compared to the size of the Cayley graph for Rubik’s Cube. In any case, I have been writing code since 1985 to investigate Rubik’s Cube. This is not just to “solve” Rubik’s Cube, which is actually quite easy. Rather, it’s to determine for each of the nearly 10^20 positions the least number of moves that are required to solve position. The problem remains unsolved simply because it is so large. It has been determined that any position can be solved in either 20 moves or in 26 moves, depending on what you count as one move. But that’s not the same thing as finding the solution for each possible position that requires the least possible number of moves. In any case, I know from much experience that I have had to develop my own data structures that are quite specific to Rubik’s Cube in order to address the problem in any practical manner. The key issue (and here I’m quoting from your article) is that Performance Is Too Important. No libraries I could find did what I needed, so I rolled my own. Thanks for a wonderful article, I asked for more information on what “20 or 26” moves meant and whether he could share more about the data structures he uses. His reply: As you guessed, whether you count a single half rotation as one move or two is the primary example of what counts as one move. If you only count quarter rotations as one move, then that’s called the quarter turn metric. Any position can be solved in 26 moves in this metric. If you count either quarter rotations or half rotations as one move, then that’s called the face turn metric. Any position can be solved in 20 moves in this metric. But there are other ways to choose what counts as one move. A Rubik’s Cube has three layers in any direction. Typically you only count moves of the outer layers such as the top and bottom but not the layer in between the top and bottom or the outer layers such as the right and left but not the layer between the right and left. But it is sometimes interesting to count moves of those middle layers as one move. Another variation is the stuck axle problem where you pretend one of the axles is stuck. For example, you don’t move the top face and only move the layers on the other five faces of the cube. With this variation, you can still reach all the possible positions but the Cayley graph does not have the same symmetry as it would have if none of the axles were stuck. Also, many more moves can be required to solve a cube with a stuck axle than the standard 20 moves or 26 moves. I don’t think there is any standard data structure for Rubik’s Cube. Each person working on the cube has their own, except that there is an obvious sense that any data structure that faithfully represents the cube has to be somewhat isomorphic to any other data structure that faithfully represents the cube. A big distinction is whether the data structure consists only of positions or whether it consists both of positions and moves. For example, two consecutive clockwise quarter turns of the front face results in the same position as two consecutive counter-clockwise quarter turns of the front face. In a Cayley graph, these move sequences become a loop if continued. The loop is a four move cycle (a 4-cycle). So do you store moves and positions, or do you just store positions? I really don’t know how anybody else’s data structures handles these issues. For myself, I don’t explicitly store the entire Cayley graph. Instead, I store positions and with each position I store single bit for each possible move from that position that indicates whether the move takes you further from the solved position or closer to the solved position. There are 12 such bits for each position in the quarter turn metric and 18 such bits for each position in the face turn metric. Those bits implicitly define a Cayley graph, but I do not store the graph explicitly. Other people working on the problem talk about using canonical sequences of moves, for example you can make two successive clockwise moves of the front face but not two successive counter-clockwise moves of the front face. I do something similar but not exactly the same using my bits. The other issue is that I needed a tree structure and a tree can be thought of as a special case of a graph. Which is to say, a tree is just a graph where one node is declared to be a root node and where there are no loops in the graph. I had to roll my own tree structure. The tree structure I needed arises as follows. There are 54 color tabs on a standard Rubik’s Cube. In the standard mathematical model, the center color tab on each face does not move which leaves 48 color tabs which move. Of the 48 color tabs, 24 are on the corners of 3x3 squares and 24 are on the edges of 3x3 squares. The corner color tabs and edge color tabs are disjoint, so I represent a cube position by labeling each corner color tab with a letter from A to X and by labeling each edge color tab with a letter from A to X. Each position is then an ordered pair of words where each word consists of 24 letters and where each letter appears exactly one time in each word. So why do I need a tree? Well, I need to be able to find these words very quickly. It’s like finding words very quickly in a spell check dictionary. Nominally, each node in the tree needs 24 pointers to other nodes in the tree. Except that unlike real words in a real spell check dictionary, each letter can appear only one time in each word. So as I get towards the leaf nodes of the tree, each node is going to consist mostly of null pointers, a huge waste of very valuable memory. So I had to create a tree structure to accommodate the storage of the positions for fast retrieval. No standard library routines were fast enough and conservative enough of memory. So essentially I have two data structures layered together. One of them uses bit switches to define the Cayley graph for the cube, and the other of them uses a spell check dictionary style tree to locate specific positions very quickly. And trees are just special cases of graphs. I don’t know if that answers your questions, but I hope it helps. Hillel — Thank you for sharing your exploration of graph representations with clear explanations as to why there is no clear winner. I’m wondering if the problems you cite aren’t so large as to preclude viable alternatives. I come to this question first as a Smalltalk programmer and then as the creator of Wiki. Both have a graph like structure assembled in bits and persistent in use. The small community around my most recent wiki implementation is highly motivated to add graphs to pages along with paragraphs, images, outline, and tables. We have partially met this with Graphviz using Dot as a markup. But this does not yield to the computation and sharing as we get with files of yaml or csv. We’ve recently adopted the practice of representing graphs (in javascript, for example) as an object with arrays for nodes and relations. The empty graph would be: Nodes and relations are themselves objects with a string for type and an additional object for properties and some managed indices that hook them together. This then serializes conveniently into (loop free) json, a format widespread and adequate for graphs measured in kilobytes. These graphs converts easily into Neo4j objects for which we have considerable experience. More often we avoid maintaining a database beyond wiki itself. Although we have also built a modest Cypher interpreter we have not found that useful. We have made the surprising discovery that we are more likely to merge many small graphs into the one that solves the problem at hand rather than build one large Neo4j graph and query out the graph needed to solve the same problem. More recently we have come to separate problems into “aspects” based on cross-cutting concerns in the problem space. We might have tens of these graphs or even hundreds. We browse these as one might browse a wiki where the equivalent of a wiki-link comes from recognizing the same node appearing in yet-to-be-included graphs. This is a “recommending” process and query is replaced by choosing and un-choosing recommendations with work-in-progress rendered immediately with Graphviz running in the browser. I started this email with a more complete description of our graph abstraction but stepped back from that as I was unsure you would find our experience interesting. I could also describe applications where this has proven useful usually involving some collaboration problem within a community. I would love to correspond if you think there is any overlap in interests. Thank you and best regards — Ward His code for this is avaiable here , along with docs and examples. Followup email: Hillel — We are on the hunt for small graphs representing aspects of larger problems. Here is a case where when asked to read a paper headed to the European Patterns conference I chose to read it very carefully and map every “relational” sentence in their patterns. Here is a picture where the an unexpected overlap between aspects is shown in yellow. This particular graph viewer has had a circuitous life: started as scripts on wiki pages; extracted as a stand-alone web app on top of Croquet for online collaboration; then here as a single-user “solo” application heading back into wiki. It is not yet self-explanatory. But you can bring it up with the “open” link in the last paragraph of the wiki page where I coordinated with a co-reviewer who shares an interest in this sort of exploration. http://ward.dojo.fed.wiki/aspects-of-pattern-relations.html Two more similar projects that show off different “aspect” possibilities: Chopping up a year of recent changes, and, annotating a search engine source code with node-relation comments and extracting them into graph files with GitHub actions. The “missing” graph datatype already exists. It was invented in the ‘70s , about datalog.

0 views
Hillel Wayne 1 years ago

The Hunt for the Missing Data Type

A (directed) graph is a set of nodes, connected by arrows ( edges ). The nodes and edges may contain data. Here are some graphs: Graphs are ubiquitous in software engineering: Graphs are also widespread in business logic. Whitepapers with references form graphs of citations. Transportation networks are graphs of routes. Social networks are graphs of connections. If you work in software development long enough, you will end up encountering graphs somewhere . I see graphs everywhere and use them to analyze all sorts of systems. At the same time, I dread actually using graphs in my code. There is almost no graph support in any mainstream language. None have it as a built-in type, very few have them in the standard library, and many don’t have a robust third-party library in the ecosystem. Most of the time, I have to roll graphs from scratch. There’s a gap between how often software engineers could use graphs and how little our programming ecosystems support them. Where are all the graph types? As I ran into more and more graphs in my work, this question became more and more intriguing to me. So late last year I finally looked for an answer. I put a call out on my newsletter asking for people with relevant expertise— graph algorithm inventors, language committee members, graph library maintainers— to reach out. I expected to interview a dozen people, but in the end I only needed to talk to four: After these four people all gave similar answers, I stopped interviewing and start writing. So far I’ve been describing directed graphs. There are also undirected graphs, where edges don’t have a direction. Both directed and undirected graphs can either be simple graphs , where there is a maximum of one edge between two nodes, or multigraphs , where there can be many edges. And then for each of those types we have hypergraphs, where an edge can connect three or more nodes, and ubergraphs, where edges can point to other edges. For each possible variation you have more choices to make: do you assign ids to edges or just to nodes? What data can be stored in a node, and what can be stored in an edge? That’s a lot of decisions for a library to make! But wait, do these distinctions matter at all? A simple graph is just a degenerate multigraph, and and undirected edge can be losslessly transformed into two directed edges. A language could just provide directed hyperubermultigraphs and let users restrict it however they want. There are two problems with this. First of all, it changes the interface, like whether various operations return single values or lists. Second, as I’ll discuss later, graph algorithm performance is a serious consideration and the special cases really matter . Kelly raised the example of maximum weight matching . If you know that your graph is “bipartite”, you can use a particular fast algorithm to find a matching, while for other graphs you need to use a slow, more general algorithm. [It] ties back to the “algorithm dispatch problem.” Given a Problem P, a Graph G, and Algorithms A, B, C to solve P on G… which one do you run? If we don’t know that G is bipartite, and Algorithm C only works on bipartite graphs, how much time can we afford to determine whether or not G is bipartite? — Kelly The perfect graph library would support a lot of different kinds of graphs. But that takes time away from supporting what people want to do with graphs. Graph algorithms are notoriously hard to get right. In this essay , the inventor of Python implemented his own algorithm. It had to be updated with corrections five times! Every single implementation of pagerank that I compared to was wrong. — Nicole So which algorithms should come with the library? “The amount of things people want to do with graphs is absurd,” Kelly told me. That matches my experience, and the experiences of all my interviewees. It sometimes seems like graphs are too powerful , that all their possibilities are beyond my understanding. “The question is,” Kelly said, “where do you draw the line?” For NetworkX, “the line” is approximately 500 distinct graph algorithms, by themselves making up almost 60,000 lines of code. By comparison, the entire Python standard library, composed of 300 packages, is just under 600,000 lines. 2 With all that, it’s unsurprising that you don’t see graphs in standard libraries. The language maintainers would have to decide which types of graphs to support, what topologies to special-case, and what algorithms to include. It makes sense to push this maintenance work onto third parties. This is already the mainstream trend in language development; even Python, famous for being “batteries included”, is removing 20 batteries . Third parties can make opinionated decisions on how to design graphs and what algorithms to include. But then they’re faced with the next problem: once you have a graph interface, how do you represent it? Let’s imagine we’re supporting only barebones simple directed graphs: nodes have identities, edges do not, neither has any associated data. How do we encode this graph? Here are four possible ways a programming language could internally store it: Different graph operations have different performance characteristics on different representations. Take a directed graph with 100 nodes and 200 edges. If we use an adjacency matrix representation, we need a 100×100 matrix containing 200 ones and 9,800 zeros. If we instead use an edge list we need only 200 pairs of nodes. Depending on your PL and level of optimizations that could be a memory difference of 20x or more. Now instead take a graph with 100 nodes and 8,000 edges and try to find whether an edge exists between node 0 and node 93. In the matrix representation, that’s an O(1) lookup on . In the edge list representation, that’s an O(|edge|) iteration through all 8,000 edges. 3 Graphs with only a few edges are sparse and graphs with almost all edges are dense . The same program may need to do both operations on both kinds of graph topologies: if you’re constructing a graph from external data, you could start out with a sparse graph and later have a dense one. There’s no “good option” for the internal graph representation. And all this trouble is just for the most barebones directed graph! What about implementing node data? Edge data? Different types of nodes and edges? Most third party libraries roughly fall in one of two categories: Offer a single rich datatype that covers all use-cases at the cost of efficiency. NetworkX stores graph as a dict of dicts of dicts, so that both nodes and edges can have arbitrary data. 4 Offer separate graph types for each representation, and rely on the user to store node and edge data separately from the graph type. An example of the second case would be Petgraph , the most popular graph library for Rust. Petgraph has , , and for different use-cases. Bradford used Petgraph for Nosey Parker , a security tool that scans for secrets across an entire history of a git repo. His benchmarking graph is CPython, which has 250k commits and 1.3M objects but only a few edges per commit node. He went with an adjacency list. Supporting many representations has a serious downside: you have to do a lot more work to add algorithms. If you write a separate version of the algorithm for each graph representation, you’re tripling or quadrupling the maintenance burden. If you instead write a generic abstraction over polymorphic types, then your library is less performant. One programmer I talked to estimated that a hand-rolled graph algorithm can be 20x faster or more than a generic algorithm. And this gets into every interviewee’s major complaint. A “generic” graph implementation often doesn’t cut it. — Bradford This is the big one. Many, many graph algorithms are NP-complete or harder. 5 While NP-complete is often tractable for large problems , graphs can be enormous problems. The choice of representation plays a big role in how fast you can complete it, as do the specifics of your algorithm implementation. Everyone I talked to had stories about this. In Nosey Parker, Bradford needed to reconstruct a snapshot of the filesystem for each commit, which meant traversing the object graph. None of the four provided graph walkers scaled to his use case. Instead he had to design a “semi-novel” graph traversal algorithm on the fly, which reduced the memory footprint by a factor of a thousand. I was able to get working a proof of concept pretty quickly with [petgraph], but then… this is one of those cases where the performance constraints end up meeting reality. — Bradford Zayenz raised a different problem: what if the graph is simply too big to work with? He gave the example of finding a solution to the 15 puzzle . This is done by running a A* search on the state space. A state space with over 20 trillion states . If you generate all the nodes, you’ve lost already. — Zayenz Zayenz oversaw one research project to add graphs to the Gecode constraint solver. They eventually found that a generic graph type simply couldn’t compete with handpicking the representation for the problem. Even graph databases, designed entirely around running complex graph algorithms, struggle with this problem. Nicole, the graph database engineer, told me about some of the challenges with optimizing even basic graph operations. If you’re doing a traversal, you either have to limit your depth or accept you’re going to visit the entire graph. When you do a depth search, like “go out three steps from this and find the path if it exists”, then you’re just committing to visiting quite a bit of data. — Nicole After leaving that job, she worked as a graph query performance consultant. This usually meant migrating off the graph database. She told me about one such project: to speed the graph queries up, she left one computation as-is and rewrote the rest as MapReduce procedures. “Which was a lot harder to understand,” she said, “But would actually finish overnight.” All of this means that if you have graph problems you want to solve, you need a lot of control over the specifics of your data representation and algorithm. You simply cannot afford to leave performance on the table. So, the reasons we don’t have widespread graph support: This explains why languages don’t support graphs in their standard libraries: too many design decisions, too many tradeoffs, and too much maintenance burden. It explains why programmers might avoid third party graph libraries, because they’re either too limited or too slow. And it explains why programmers might not want to think about things in terms of graphs except in extreme circumstances: it’s just too hard to work with them. Since starting this research, I’ve run into several new graph problems in my job. I still appreciate analyzing systems as graphs and dread implementing them. But now I know why everybody else dreads them, too. Thank you for reading! Thanks to Predrag Gruevski for research help, Lars Hupel , Predrag Gruevski , Dan Luu , and Marianne Bellotti for feedback, and to all of the people who agreed to do interviews. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . Graph querying languages (GQLs) 6 are to graph databases what SQL is to relational databases. There is no widely-used standard, but two of the most popular are SPARQL for querying RDF triples and Neo4j’s cypher . Ironically, GraphQL is not a graph querying language, instead being named for its connection to the Facebook Graph Search . I considered graph databases themselves mostly distinct from graphs in programming languages, but their query languages show how graphs could work in a PL. The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities. Imagine a dataset of movies and people, where people act in, direct, or produce movies. In SQL you’d implement each relationship as a many-to-many tables, which makes it easy to query “who acted in movie X” but hard to query “who had any role in movie Y, and what was that role”. In SPARQL relationships are just edges, making the same query easy. Cypher has a similar construct. GQLs can also manipulate edges: reverse them, compose them together, take the transitive closure, etc. If we wanted to find all actors with some degree of separation from Kevin Bacon, we could write SPARQL cannot give the length of the path nor do computation along the path, like collecting the chain of movies linking two actors. GQLs that support this are significantly more complicated. My main takeaway from looking at GQLs is that there’s a set of useful traversal primitives that a PL with graph support would need to provide. Interestingly, the formal specification language Alloy has all of these primitives for its “relation” datatype. For this reason I find working with a graph representation in Alloy much easier than in a proper programming language. That said, these all work with labeled edges and may not work for other graph representations. Python added a graphlib in 2020. Based on the discussion here , it was because topological sorting is a “fundamental algorithm” and it would be useful for “pure Python implementations of MRO [Method Resolution Order] logic”. Graphlib has no other methods besides , which only takes graphs represented as node dicts. Unusually, the direction of the node dict is reversed : the graph is represented as . As of 2023, nothing in CPython uses graphlib and there are fewer than 900 files referencing it on Github . By comparison, another package added in 2020, zoneinfo, appears in over 6,000 files, and the term appears in 4,000. I’d guess a lot of these are from before 2020, though. Some skimming suggests that all of these custom topological sorts take different graph representations than graphlib, so they wouldn’t be convertable regardless. Graph representation matters. There are two other languages I found with graph types: Erlang and SWI-Prolog . I don’t know either language and cannot tell when they were added; with Erlang, at least, it was before 2008. I reached out to a person on the Erlang core language committee but did not hear back. Programming languages where “everything is a graph” in the same way that everything in bash a string and everything in lisp is a list. Some examples include GP2 and Grape . Based on some correspondence with people in the field, right now this is still highly academic. Mathematica, MATLAB, Maple, etc all have graph libraries of some form or another. I am not paying the thousands of dollars in licensing needed to learn more. I’ve collected some of the comments I received on this post here . Package dependencies form directed graphs, as do module imports. The internet is a graph of links between webpages. Model checkers analyze software by exploring the “state space” of all possible configurations. Nodes are states, edges are valid transitions between states. Relational databases are graphs where the nodes are records and the edges are foreign keys. Graphs are a generalization of linked lists, binary trees, and hash tables. 1 Zayenz : Former core developer of the Gecode constraint solver , and who has “implemented every graph algorithm there is” Bradford : Author of the Nosey Parker security library and inventor of several new graph algorithms Nicole : Former graph database engineer Kelly : Maintainer on the NetworkX python graph library and compiler developer . Adjacency list: Adjacency matrix: A set of three structs with references to each other Offer a single rich datatype that covers all use-cases at the cost of efficiency. NetworkX stores graph as a dict of dicts of dicts, so that both nodes and edges can have arbitrary data. 4 Offer separate graph types for each representation, and rely on the user to store node and edge data separately from the graph type. There are many different kinds of graphs There are many different representations of each kind of graph There are many different graph algorithms Graph algorithm performance is very sensitive to graph representation and implementation details People run very expensive algorithms on very big graphs. No, really. Hash tables are bipartite graphs . This was used to prove performance of cuckoo hashing operations . [return] I derived both computations with cloc 1.96. I ran cloc in (56989) and in (588167). The whole networkX library is ~90,000 lines of code. [return] You can make this more efficient by keeping the edge list sorted and doing an binary search, at the cost of making edge insertions more expensive. [return] NetworkX has functions to convert graphs into other representations but not for working with those representations directly. [return] 14 of the 21 canonical NP-complete problems are graph problems. [return] Not to be confused with the GQL language, a proposed GQL standard that’s still under development. [return]

0 views
Hillel Wayne 1 years ago

Planner programming blows my mind

Picat is a research language intended to combine logic programming, imperative programming, and constraint solving . I originally learned it to help with vacation scheduling but soon discovered its module, which is one of the most fascinating programming models I’ve ever seen. First, a brief explanation of logic programming (LP). In imperative and functional programming, we take inputs and write algorithms that produce outputs. In LP and constraint solving, we instead provide a set of equations and find assignments that satisfy those relationships. For example: Non-function identifiers that start with lowercase letters are “atoms”, or unique tokens. Identifiers that start with uppercase letters are variables. So is a list of four atoms, while and are variables. So returns true as you’d expect. The interesting thing is . Y wasn’t defined yet! So Picat finds a value for Y that makes the equation true. Y could be any of , , or . Then the line after that makes it impossible for to be , so this prints either or . Picat can even handle expressions like , instantiating Z as a list! Planning pushes this all one step further: instead of finding variable assignments that satisfy equations, we find variable mutations that reach a certain end state. And this opens up some really cool possibilities. To showcase this, we’ll use Picat to solve a pathing problem. We place a marker on the grid, starting at the origin (0, 0), and pick another coordinate as the goal . At each step we can move one step in any cardinal direction, but cannot go off the boundaries of the grid. The program is successful when the marker is at the goal coordinate. As a small example: One solution would be to move to (1, 0) and then to (1, 1). To solve this with planning, we need to provide three things: Once we define all of these, we can call the builtin which will assign to the shortest sequence of steps needed to reach a final state. 1 Explanation is the default entry point into a Picat program. Here we’re just setting up the initial state, calling , and printing . is the syntax for a Picat array, which is basically a tuple. Every expression in a Picat body must be followed by a comma except the last clause, which must be followed with a period. This makes moving lines around really annoying . Writing it in that “bullet point” style helps a little. Since takes just one argument, we’ll need to store both the current position and the goal into said argument. Picat has great pattern matching so we can just write it like this: Explanation Without the pattern matching, we’d have to write it like this: If we write a second predicate, the plan succeeds if either returns true. Finally, we need to define the actions which the planner can take. We only need one action here. Explanation is the initial state, is the next state, is the name of the action— in this case, . 1 You can store metadata in the action, which we use to store the new coordinates. Writing instead of makes backtrackable, which I’ll admit I don’t fully understand? I’m pretty sure it means that if this definition of pattern-matches but doesn’t lead to a viable plan, then Picat can try other definitions of action. This’ll matter more for later versions. As with the introductory example up top, we’re using to both find values (on line ) and test values (on lines ). Picat also has a non-assigning predicate, , which just does testing. If I wasn’t trying to showcase Picat I could instead have use for the testing part, which cannot assign. is the “cost” of the action. tries to minimize the total cost. Leaving it at 1 means the cost of a plan is the total number of steps. And that’s it, we’re done with the program. Here’s the output: That’s a little tough to read, so I had Picat output structured data that I could process into a picture. I used a Raku script to visualize it. 2 Here’s what we now get: To show that the planner can route around an “obstacle”, I’ll add a rule that the state cannot be a certain value: Let’s comment that out for now, leaving this as our current version of the code: Code show all Next I’ll add multiple goals. In order to succeed, the planner needs to reach every single goal in order. We start with one change to : Goal now represents a “queue” of goals to reach, in order. Then we add a new action which removes a goal from our queue once we’ve reached it. Explanation splits a list into the first element and the rest. Since was defined in the line before, is only true if the first goal on the list is equal to . Then we drop that goal from our new state by declaring the new goal state to just be . (This is where backtracking with becomes important: if we didn’t make the actions backtrackable, Picat would match on the first and never .) Since we’re now destructively removing goals from our list when we reach them, needs to be adjusted: And that’s it. We didn’t even have to update our first action! Code show all Going through the goals in order doesn’t always lead to the shortest total path. What if we didn’t care about the order of the goals and just wanted to find the shortest path? Then we only need to change two lines: Now the planner can delete any goal it’s passing over regardless of where it is in the list. So Picat can “choose” which goal it moves to next so as to minimize the overall path length. Final code: Code show all Picat supports a lot more variations on planning: The coolest thing to me is that the planning integrates with all the other Picat features. I whipped up a quick demo that combines planning and constraint solving. The partition problem is an NP-complete problem where you partition a list of numbers into two equal sums. This program takes a list of numbers and finds the sublist with the largest possible equal partitioning: 3 Code show all This is all so mindblowing to me. It’s almost like a metaconstraint solver, allowing me to express constraints on the valid constraints . I would not recommend using Picat in production. It’s a research language and doesn’t have a lot of affordances, like good documentation or clear error messages. Here’s what you get when there’s no plan that solves the problem: But hey it runs on Windows, which is better than 99% of research languages. Picat seems more useful as a “toolkit” language, one you learn to solve a specific class of computational problems, and where you’re not expecting to maintain or share the code afterwards. But it’s really good in that niche! There’s a handful of problems I struggled to do with regular programming languages and constraint solvers. Picat solves a lot of them quite elegantly. While originally pioneered for robotics and AI, “planning” is most-often used for video game AIs, where it’s called “Goal Oriented Action Planning” (GOAP). Usually it’s built as libraries on top of other languages, or implemented as a custom search strategy . You can read more about GOAP here . There is also PDDL , a planning description language that independent planners take as input, in the same way that DIMACS is a description format for SAT . Thanks to Predrag Gruevski for feedback. I first shared my thoughts on Picat on my newsletter . I write new newsletter posts weekly. A starting state , which contains both the origin and goal coordinates. A set of action functions that represent state transitions. In Picat these functions must all be named and take four parameters: a current state, a next state, an action name, and a cost. We’ll see that all below. A function named that determines if S is a final state. After writing this I realize I should have instead used a structure for the action, writing it instead as . I didn’t want to rewrite all of my code though so I’m leaving it like this. [return] caps the maximum cost at — good for failing early. For each , there’s a that finds every possible best plan. restricts the possible actions based on the current partial plan, so we can add restrictions like “you have to move twice before you turn”. If you want just any plan, regardless of how long it is, you can call instead. [return] The commented version of the formatter was originally up on my Patreon . You can now see it here . [return] “Find the most elements you can remove without getting you a valid partition” is a little more convoluted, but you can see it here . [return]

0 views
Hillel Wayne 1 years ago

An RNG that runs in your brain

Humans are notoriously bad at coming up with random numbers. I wanted to be able to quickly generate “random enough” numbers. I’m not looking for anything that great, I just want to be able to come up with the random digits in half a minute. Some looking around brought me to an old usenet post by George Marsaglia: Choose a 2-digit number, say 23, your “seed”. Form a new 2-digit number: the 10’s digit plus 6 times the units digit. The example sequence is 23 –> 20 –> 02 –> 12 –> 13 –> 19 –> 55 –> 35 –> … and its period is the order of the multiplier, 6, in the group of residues relatively prime to the modulus, 10. (59 in this case). The “random digits” are the units digits of the 2-digit numbers, ie, 3,0,2,2,3,9,5,… the sequence mod 10. Marsaglia is most famous for the diehard suite of RNG tests, so he knows his stuff. 1 I’m curious why this works and why he chose 6. We’re going to use Raku , the language for gremlins . 2 I’ll be explaining all the weird features I use in dropdowns in case you’re a bit of a gremlin, too. The sequence is periodic, meaning that if we iteratively apply it we’ll eventually get the same element. Let’s start with a function (“subroutine”) that produces the whole sequence: Explanation Raku is an extremely weird language but I’ll keep it as straightforward as I can. Once we have a sequence we can print it with the or commands, which have subtly different behaviors I’m not going to get into. Remember, the random numbers are the last digit. So the RNG goes 1 -> 6 -> 6 -> 9 -> 7 -> 7 -> … If the RNG is uniform then each digit should appear in the sequence the same number of times. We can check this by casting the last digits to a multiset, or “bag”. Explanation is the hyper metaoperator and “maps” the inside operator across both lists, recursively going into list of lists too . IE is ! Hyperoperators have a lot of other weird properties that make them both useful and confusing. Bags count the number of elements in something. . Confusingly though they can only contain scalars, not arrays or lists or the like. show all That seems to be a uniform-enough distribution, though I’m a bit less likely to get a 0 or a 9. My next idea comes from the diehard tests. From the wiki : Overlapping permutations : Analyze sequences of five consecutive random numbers. The 120 possible orderings should occur with statistically equal probability. There are only 54 5-number sequences in the dataset, so I’ll instead apply this to 2-number “transitions”. I’ll do this by outputting a 10-by-10 grid where the (i, j)th index (from 0) is the number of transitions from last-digit to . For example, the sequence includes the transition , and no other transitions of form , so cell (8, 0) should be a . Explanation concats directly onto . Without the it’d be a two-element list instead. The in is a whatever , a weird little operator that does a lot of things in a lot of different contexts, but usually in this case lifts the expression into a closure. Usually . It’s the same as writing . 1 rotor gets two elements, then goes one element back, then gets two more, etc. is . You could also do to get , or to get . Rotor is really cool. is just . For once something easy! X is the cross product metaoperator . So if , then would be . And yes, the operator can get much, much stranger. returns a list of all elements smart-matching , and is the number of elements in a list. So is the number of elements of the list matching . This took me way too long to figure out Actually but close enough [return] show all We can see from this table that some transitions are impossible. If I generate a 0, I can’t get a 6 right after. Obviously not a great RNG, but my expectations were pretty low anyway. What if instead of multiplying the last digit by 6, I multiply by 4? I dunno, I kinda like an RNG that never gives me 3. The distinct sequences are called orbits and their lengths are called periods . Let’s see all the possible orbits we can get by using 4 as the multiplier: Explanation is the sigil for “callable” or function subroutine. The method does a partial function application, and passing a makes it partially apply the second parameter. 1 The returns a sequence of lists, which we pass to . converts every sequence in the to a set and compares those for uniqueness, instead of the original lists. But the final result uses the elements prior to conversion. If that’s confusing, a simpler example is that returns , while is . Explanation Quoting Daniel Sockwell : The is just there to prettify the output. returns a of s; mapping over each with converts it into a string surrounded by square brackets and puts a newline between each of these strings. If you picked 13 as your starting value, your random digits would be 3, 3, 3, 3, 3, 3. Preempting 50,000 emails (source) For obvious reasons, 4 should never be our multiplier. In fact for a multiplier to give a “good” RNG, it needs to have exactly one orbit. As we’ll see later, this guarantees a(n almost) uniform distribution. Explanation applies a top-level routine as a method. is the equivalent of . returns a list. coerces both inputs to numbers, and coercing a list to a number returns the number of elements. So we’re testing if the returned list has one element, ie there’s exactly one orbit. (If you don’t want to compare without coercion, use either === or eqv .) This way of doing things is pretty slow and also only looks for orbits that start with a number up to 20. So it would miss the orbit for . We’ll fix both of these issues later. So some “good” choices for are 6, 11, and 18. Note that if you end up with a three digit number, you treat the first two digits as a single number. For , leads to , not (or ). Here’s a part of the explanation that really confused me: and its period is the order of the multiplier, 6, in the group of residues relatively prime to the modulus, 10. (59 in this case). After talking with some friends and a lot of reading Wiki articles, it started making more sense. I’m mentally computing a “ multiply with carry ” RNG with constants and . This choice has a cool property: if , then ! That’s pretty neat! It’s easier for me to mathematically reason about than “multiply the last digit by six and add the first digit”. For example, it’s clear why the RNG generates 0 and 9 slightly less often than the other digits: no matter which multiplier we pick, the generated sequence will go from 1 to 10n-2, “leaving out” 10n-1 (which ends with 9) and 10n (ends with 0). “Multiply and modulo” is also known as the Lehmer RNG . So what other numbers work? We already know that a good multiplier will produce only one orbit, and I showed some code above for calculating that. Unfortunately, it’s an O(n²) worst-case algorithm. 3 Thinking about the MWC algorithm as “Lehmer in reverse” gives us a better method: if is a good multiplier, then the period of the orbit starting from 1 should be . The Lehmer approach also gives us a faster way of computing the orbit: Explanation (source) Real Explanation Writing instead of as a param lets use use instead of in the body. is the sequence operator . It can do a lot of different things, but the important one for us is that if you write , it will start with 10 and then keep applying until it eventually generates . In , the first is a Whatever and the second is regular multiply. This then lifts into the function . This actually produces the orbit in reverse but we’re only interested in the period so nbd. Then we check the period using the same “ coerces lists to lengths” trick as before. I can see why Marsaglia chose 6: most programmers know their 6 times-table and it never returns a 3-digit number, so the addition step is real easy. The orbit has only 58 numbers and you won’t get some digit sequences, but if you need to pull out a few random digits quickly it’s totally fine. If you want more randomness, I see a couple of candidates. 50 has a period of 498 and is incredibly easy to compute. If the final digit is even then you don’t need to do any carries: 23 8 -> 4 23 ! That said, the 50-sequence doesn’t seem as random as other sequences. There’s a point where it generates 9 even numbers followed by 8 odd ones. Don’t use it to simulate coin flips. The last interesting number is 18. It has a respectable period of 178 and has every possible digit transition: The downside is that you have to learn the 18 times-table. This isn’t too bad: I internalized it with maybe 10 minutes of practice. I’m still not great at doing the whole MWC step but I can consistently produce another random digit every five seconds or so. That’s good enough for me. You can see the Raku code I used to research this here . It’s set up as a CLI so you can use it in a few different ways; see the file for more info. Thanks to Codesections for feedback and Quinn Wilton and Jeremy Kun for helping me understand the math. and are sigils for “positional” (listlike) and “scalar” respectively. Defining a positional without assigning anything defaults it to the empty array. checks for membership, and can be applied to negate any infix operator. is the ASCII version— Raku also accepts ∈. polymod splits a number into a remainder and dividend, ie . It takes multiple parameters, so you can do things like to get hours-minutes-seconds. Z is the “zip” metaoperator, applying a infix op elementwise between two lists. = . is the hyper metaoperator and “maps” the inside operator across both lists, recursively going into list of lists too . IE is ! Hyperoperators have a lot of other weird properties that make them both useful and confusing. Bags count the number of elements in something. . Confusingly though they can only contain scalars, not arrays or lists or the like. concats directly onto . Without the it’d be a two-element list instead. The in is a whatever , a weird little operator that does a lot of things in a lot of different contexts, but usually in this case lifts the expression into a closure. Usually . It’s the same as writing . 1 rotor gets two elements, then goes one element back, then gets two more, etc. is . You could also do to get , or to get . Rotor is really cool. is just . For once something easy! X is the cross product metaoperator . So if , then would be . And yes, the operator can get much, much stranger. returns a list of all elements smart-matching , and is the number of elements in a list. So is the number of elements of the list matching . This took me way too long to figure out Actually but close enough [return] is the sigil for “callable” or function subroutine. The method does a partial function application, and passing a makes it partially apply the second parameter. 1 The returns a sequence of lists, which we pass to . converts every sequence in the to a set and compares those for uniqueness, instead of the original lists. But the final result uses the elements prior to conversion. If that’s confusing, a simpler example is that returns , while is . Daniel Sockwell (aka codesections) kindly agreed to read a first draft of this post, and he told me about . Thanks Daniel! [return] applies a top-level routine as a method. is the equivalent of . returns a list. coerces both inputs to numbers, and coercing a list to a number returns the number of elements. So we’re testing if the returned list has one element, ie there’s exactly one orbit. (If you don’t want to compare without coercion, use either === or eqv .) Writing instead of as a param lets use use instead of in the body. is the sequence operator . It can do a lot of different things, but the important one for us is that if you write , it will start with 10 and then keep applying until it eventually generates . In , the first is a Whatever and the second is regular multiply. This then lifts into the function . Assume that every time I say “RNG” I mean “PRNG”. [return] Also full disclosure the code I’m showing is less gremliny than the code I originally wrote. So just know it can be more gremlins than this. [return] if multiplier has a single orbit, then we’ll run on ~10n-2 numbers, and the function will iterate 10n-2 times (since it has to go through every number in the orbit). If I bothered to skip numbers I’d already seen in an orbit then the runtime would collapse to O(n). [return]

0 views
Hillel Wayne 1 years ago

The World and the Machine

This is just a way of thinking about formal specification that I find really useful. The terms originally come from Michael Jackson’s Software Requirements and Specifications . In specification, the machine is the part of the system you have direct control over and the world is all the parts that you don’t. Take a simple transfer spec: The code that handles the transfer (represented by the process) is the machine . It currently has a race condition that can break the invariant , where someone with $5 can submit two checks for $4 each and get them both past the guard clause. One way you could solve this is by adding a lock so that the banking service only resolves one cheque at a time. One way you can’t solve it is by forcing people to deposit one cheque at a time. You don’t have any control over what people do with their chequebook! The people and their behavior is part of the world . Whether something belongs to the world or the machine depends on your scope in the system. If you maintain one service and the other teams aren’t returning your calls, then their components are part of the world. If they’re sending you bad data, you need to faithfully model receiving that bad data in your spec as part of designing your machine. While you need to model the whole system, you’re only designing the machine. So the implementation details of the machine matter, but you don’t need to implement the world. It can be abstracted away, except for how it affects the machine. In the above example, we don’t need to model a person deciding to write a cheque or the person depositing it, just the transfer entering the system. Like in OOP, some system state is restricted to the world or machine. We can divide properties into internal properties that concern just the machine and external (observable) properties that can be seen by the outside world. is observable: if it’s violated, someone will be able to see that they have a negative bank balance. By contrast, “at most one process can be in the withdraw step” ( ) is internal. The world doesn’t have access to your transaction logs, nobody can tell whether is satisfied or not. Internal properties are useful but they’re less important than observable properties. An violation might be the root cause of a violation, but is what actually matters to people. If a property isn’t observable, it doesn’t have any connection to the broader world, so nobody is actually affected by it breaking. If both the world and machine can write to a variable, generally the world should be able to do more with the variable than the machine can. IE the machine can only modify by processing transfers, while the world can also do deposits and withdrawals. It’s exceedingly hard to enforce MISU on state the world can modify. If the world can break an invariant, it’s not an invariant. Instead you want “ resilience ”, the ability to restore some system property after it’s been broken. See here for more on modeling resilience. It’s not uncommon for a spec to break not because the machine has a bug, but because the surrounding world has changed. Thanks to Andrew Helwer and Lars Hupel for feedback. If you liked this, come join my newsletter ! I write new essays there every week. The world can both read and write the and variables, but the machine can only read them. In most specifications these restrictions are implicit; you could write the spec so that the machine changes , but your boss wouldn’t let you build it. The “program counter”, or line each process is currently executing, isn’t readable or writable by the world. It’s an implementation detail of the machine. can be written by the machine and read by the world. I call these observable . If both the world and machine can write to a variable, generally the world should be able to do more with the variable than the machine can. IE the machine can only modify by processing transfers, while the world can also do deposits and withdrawals. It’s exceedingly hard to enforce MISU on state the world can modify. If the world can break an invariant, it’s not an invariant. Instead you want “ resilience ”, the ability to restore some system property after it’s been broken. See here for more on modeling resilience. It’s not uncommon for a spec to break not because the machine has a bug, but because the surrounding world has changed.

0 views
Hillel Wayne 1 years ago

Notes on Every Strangeloop 2023 Talk I Attended

This is my writeup of all the talks I saw at Strangeloop, written on the train ride back, while the talks were still fresh in my mind. Now that all the talks are online I can share it. This should have gone up like a month ago but I was busy and then sick. Enjoy! Topic: How to define what “success” means to you in your career and then be successful. Mostly focused on psychological maxims, like “put in the work” and “embrace the unknown”. I feel like I wasn’t the appropriate audience for this; it seemed intended for people early in their career. I like that they said it’s okay to be in it for the money. Between the “hurr dur you must be in it for the passion” people and the “hurr durr smash capitalism” people, it’s nice to hear some acknowledgement that money makes your life nice. Topic: the value of “play” (as in “play make believe”, or “play with blocks”) in engineering. Some examples of how play leads to innovation, collaboration, and cool new things. Most of the talk is about the unexpected directions her “play” went in, like how her work in education eventually lead to a series of collaborations with OK Go. I think it was more inspirational than informative, to try to get people to “play” rather than to provide deep insight into the nature of the world. Still pretty fun. (Disclosure, I didn’t actually see this talk live, I watched Zac Hatfield-Dodds rehearse and gave feedback. Also Zac is a friend and we’ve collaborated before on Hypothesis stuff.) Topic: Some of the unexpected things we observe in working LLMs, and some of the unexpected ways they’re able to self-reference themselves. Zac was asked to give the talk at the last minute due to a sickness cancellation by another speaker. Given the time crunch, I think he pulled it together pretty well. Even so it was a bit too technical for me; I don’t know if he was able to simplify it in time for the final presentation. Like most practical talks on AI, intentionally or not he slipped in a few tricks to eke more performance out of an LLM. Like if you ask them to answer a question, and then rate the confidence of the question they asked, they tend to be decently accurate at their confidence. Zac’s also a manager at Anthropic , which gave the whole talk some neat “forbidden knowledge” vibes. (Disclaimer: Douglas is a friend, too.) Topic: Why stack-based languages are an interesting computational model, how they can be Turing-complete, and some of the unusual features you get from stack programming. The first time I’ve seen a stack-based language talk that wasn’t about Forth. Instead, it used his own homegrown stack language so he could just focus on the computer science aspects. The two properties that stick out to me are: My only experience with stack machines is golfscript . Maybe I’ll try to pick up uiua or something. Topic: “Small” generative AI models, like “taking all one-star amazon reviews for the statue of liberty and throwing them into a Markov chain ”. This was my favorite session of the conference. The technical aspects are pretty basic, and it’s explained simply enough that even layfolk should be able to follow. His approach generates dreamy nonsense that should be familiar to anyone who’s played with Markov chains before. And then he pulls out a guitar. The high point was his “Weird Algorithm”, which was like a karaoke machine which replaced the lyrics of songs with corpus selections that matched the same meter and syllables. Like replacing “oh I believe in yesterday” with “this is a really nice Hyundai”. I don’t know how funny it’ll be in the video, it might be one of those “you had to be there” things. Topic: The modern pace of tech leaves a lot of software, hardware, and people behind. How can we make software more sustainable, drawn from the author’s experiences living on a boat. Lots of thoughts on this one. The talk was a crash course on all the different kinds of sustainability: making software run on old devices, getting software guaranteed to run on future devices, computing under significant power/space/internet constraints, and everything in between. I think it’s intended as a call to arms for us to think about doing better. I’m sympathetic to the goals of permacomputing; what do I do with the five old phones in my closet? That’s a huge amount of computing power just gone to waste. The tension I always have is how this scales. Devine Lu Levinga is an artistic savant (they made Orca !) and the kind of person who can live on a 200-watt boat for seven years. I’m not willing to give up my creature comforts of central heating and Geforce gaming. Obviously there’s a huge spectrum between “uses less electricity than a good cyclist ” and “buys the newest iPhone every year”, the question is what’s the right balance between sustainability and achievability. There’s also the whole aesthetic/cultural aspect to permacomputing. Devine used images in dithered black/white. AFAICT this is because Hypercard was black/white, lots of retrocomputing fans love hypercard, and there’s a huge overlap between retrocomputing and permacomputing. But Kid Pix is just as old as Hypercard and does full color. It’s just not part of the culture. Nit: at the end Devine discussed how they were making software preservation easier by writing a special VM. This was interesting but the discussion on how it worked ended up going way over time and I had to run to the next session. (Disclaimer: Jesus I’m friends with way too many people on the conference circuit now) Topic: Formal methods is useful to reason about existing legacy systems, but has too high a barrier to entry. Marianne made a new FM language called “Fault” with a higher levels of abstraction. Some discussion of how it’s implemented. This might just be the friendship talking, but Fault looks like one of the more interesting FM languages to come out recently. I’m painfully aware of just how hard grokking FM can be, and anything that makes it more accessible is good. I’ll have to check it out. When she said that the hardest part is output formatting I felt it in my soul. Topic: Lots of “simple” things take years to learn, like Bash or DNS. How can we make it easier for people to learn these things? Four difficult technologies, and different approaches to making them tractable. I consider myself a very good teacher. This talk made me a better one. Best line was “behind every best practice is a gruesome story.” That’ll stick with me. Topic: Randal Munroe (the xkcd guy)’s closing keynote. No deep engineering lessons, just a lot of fun. Before Julia Evans’ talk, Alex did A Long Strange Loop , how it went from an idea to the monolith it is today. Strange Loop was his vision, an eclectic mix of academia, industry, art, and activism. And it drew a diverse crowd because of that. I’ve made many friends at Strangeloop, people like Marianne and Felienne . I don’t know if I’ll run into them at any other conferences, because I don’t think other conferences will capture that lightning in a bottle. I’ll miss them. I also owe my career to Strangeloop. Eight years ago they accepted Tackling concurrency bugs with TLA+ , which got me started both speaking and writing formal methods professionally. There’s been some talk about running a successor conference (someone came up with the name “estranged loop”) but I don’t know if it will ever be the same. There are lots of people can run a good conference, but there’s only one person who can run Alex’s conference . Whatever comes next will be fundamentally different. Still good, I’m sure, but different. Stack programs don’t need to start from an empty stack, which means entire programs will naturally compose. Like you can theoretically pipe the output of a stack program into another stack program, since they’re all effectively functions of type . Stack ops are associative: if you chop a stack program into subprograms and pipe them into each other, it doesn’t matter where you make the cuts, you still get the same final stack. That’s really, really cool.

0 views
Hillel Wayne 1 years ago

A better explanation of the Liskov Substitution Principle

Short version: If X inherits from Y, then X should pass all of Y’s black box tests. I first encountered this idea at SPLASH 2021 . In A Behavioral Notion of Subtyping Liskov originally defined subtyping in inherited objects as follows: Subtype Requirement: Let P(x) be a property provable about objects x of type T. Then P(y) should be true for objects y of type S where S is a subtype of T. Later, Robert Martin named the Liskov Substitution Principle : Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it. If you had , then you should be able to call on any subtype of . This means that and couldn’t be subtypes of , since you cannot call on a flightless bird! Liskov would later “approve” of the rule in her book Program Development in Java . The LSP looks simple; the problem is applying it. How do you check that a function works on all subclasses of a class? Functions can do a lot of things! So we have a set of enforceable rules, where if you follow all the rules, you (presumably) satisfy LSP. The most well known of these rules is that inherited methods of a class must not strengthen preconditions or weaken postconditions . 1 That means it must accept a superset of the values the parent method accepts and output a subset of the parent’s possible outputs. Given cannot be overloaded to take only even integers, or to also output odd numbers. Preconditions and postconditions together are called contracts . This is just the first rule and already we have three problems. First, it’s not super clear why this follows from the LSP. Second, it’s hard to remember whether it’s weaken preconditions and strengthen postconditions or the other way around. Third, most languages don’t have dedicated pre/postcondition syntax. For these languages the rule is taught as a restriction on the method’s type signature : the parameters must be contravariant and the return values covariant . This makes things even more arbitrary and confusing and isn’t as accurate as the original version. Long tangent on types and contracts The above is historically inaccurate: the original Liskov paper lists both the contract rule and the type rule as separate requirements for subtyping. But IMHO the type rule is equivalent to the contract rule. Saying “parameters must be contravariant” is the same as saying “preconditions must not be strengthened.” Consider this case of contravariant parameters: We can rewrite it purely as contracts: It’s less intuitive that we can rewrite contracts as types, but that’s also possible: It’s true that most languages can’t statically typecheck this, but that’s a limitation on our current typecheckers, not an essential problem. Some languages, like Liquid Haskell , handle it just fine. That said, even if the two rules are theoretically equivalent, in practice your programming language will favor checking some things as contracts and other things as types. We can’t ensure LSP with function contracts alone. Consider! 2 is obviously not a subtype of , but it has the same method contracts! The difference is that has an extra invariant , a property that holds true of an object at all times. doesn’t have that property, so it can’t be a child class. That leads to rule two, subtypes must not weaken the class invariants . The method rule and the invariant rule look somewhat related, but the next one comes out of left field. Let’s look at our two rules: But is clearly not a subtype of . It doesn’t wrap! We need a third rule, the history rule : the subtype cannot change in a way prohibited by the supertype. In this case, this means that must remain unchanged. follows this rule and violates it, so it’s not a subtyping relation. 3 The history rule always felt like a tack-on. It makes me worry that we’re just patching up holes as we find them, and that somebody will come up with a counterexample that follows all three rules but still isn’t a subtype. Like what happens when we add in exceptions or generics? We might have to add another arbitrary rule. It all just makes me lose trust in the applicability of LSP. In their college classes, Elisa Baniassad and Alexander J Summers found a different approach : For a class to be substitutable for its supertype, it must pass all the supertype’s black box tests. 4 To check substitutability, we come up with a test: This passes, so we’d expect the equivalent test to pass for : This also passes. If every test we write passes, then is a subtype of . So let’s write a second test: And this passes for but fails for . Therefore is not a subtype of . Show limitation Note that passing tests doesn’t guarantee you’ve got a subtype, since there might be a failing test you just haven’t considered writing. It’s similar to how, even if you have a rule violation, you might never actually run into a subtype violation in production. This is more a pedagogical technique to help people internalize LSP. I like how the testing approach to LSP explains all the rules. If you give me a case where an LSP rule violation causes a problem, I can give you back a test case which fails for the same reason. Let’s Pythonize my earlier example of a precondition rule violation: Here’s my test: We don’t need to do anything with the output of , just call it. The first test (with ) will do nothing while the second test (with ) will throw an error, failing the test. is not a subtype. So that’s one way the testing approach makes things easier. It also shows why the history rule is necessary and also why it originally seemed like a crude hack to me. We originally needed the history rule because contracts and invariants only span the lifetime of one method. But now we can just call a sequence of methods in one test! By “work”, I mean “does it help students internalize the LSP.” I found the paper a little confusing here: it says they “observed that students were able to better understand the responsibilities of the subtype in a general sense”, but also that “the same proportion of students” understood the rules, which kinda sounds like ‘no impact’? I emailed the researchers for a clarification and here’s their response: It was a big improvement. Students went from not demonstrating any intuition around the rule, to “getting it”. “Oh - subclass has to pass all superclass black box tests! I get it!”. And then you can delve into ways tests can break (narrowing preconditions, etc). But at the root of it all - it’s about the tests. They did way better on LSP questions, and I believe there was one question in particular that we asked before and after, and it was way better done (like from a failing grade to a good passing grade on average) with the testing approach. They also told me about an indirect benefit: it also helped students understand the difference between black- and white-box tests! Once you get a sense for LSP in the original It helped clearly delineate whether tests were about “the packaging” (like what’s written on the box) or the “how” you’ve done it (which is specific to the implementation, and can change if you make different choices). You have to test both, and make sure both the packaging is okay, and check that the underlying implementation is not having unwanted side effects, or introducing errors — and it then became clear to students why we needed both kinds of tests, and why differentiating was useful. Pretty cool! Thanks to Liz Denhup for feedback. If you liked this, come join my newsletter ! I write new essays there every week. Does it pass the method test? Yup, both have the exact same type signatures and preconditions. Does it pass the invariant test? Yup, they both guarantee that . How did we go from functions to methods? Think of the method as being syntactic sugar for the function , kind of like how Python methods have a required parameter. [return] If I wanted to actually guarantee that , I could add a post_init method to the dataclass. [return] In this specific case you can catch this violation with the postcondition , but this is just a simple illustrative example. [return] Where “black box” means “public methods and accesses only”. [return]

0 views