Latest Posts (20 found)
Ivan Sagalaev 3 weeks ago

Mental capacity

I'm a fan of a hypothesis about "mental capacity" being a finite quantity that you spend while actively using your brain and recover during rest. It may sound obvious, but history is full of many seemingly "obvious" truisms that were ultimately disproved by science (like geocentric planetary model or cold weather being the reason for common cold). I don't know of any actual research into this hypothesis, but my life experience keeps providing me with confirmations. Everdell is a beautiful board game where you juggle competing strategies in your brain, constantly trying to calculate ahead and see which one is the most probable to win. It's pretty engaging! Trouble is, when we started playing it at home I was losing to my wife all the time . Like, I lost our first 20 or so games, and not by a narrow margin. This was surprising because our general board game winning rate is normally pretty even: sometimes she wins, sometimes I do. The effect was there even when we played on weekends, when my head didn't physically hurt after an exhausting day at work. Still, I was losing… Now, imagine my (and her) surprise when I won three of our last games in a row! I haven't found some magical winning strategy, and she didn't suddenly got much worse, judging by the winning points at the end of the game. But I did notice that it's now easier for me to hold several strategies in my head and calculate further into potential future moves. The only explanation I could come up with is that now I haven't been working full time for a while, I have more mental capacity at my disposal. Keeping an eye on a child is not very complicated in itself. Especially when they don't speak yet. But it does require continuous focus and keeps you in a state of tension, because it is your child and not some computer stuff belonging to your employer! What I discovered during my kid's early months was, keeping this continuous focus was also exhausting my mental capacity. At that time I was also trying to help friends with a startup, and even though not much was required of me, I remember simply not being able to focus on anything creative even after having a few hours of child-free time. ("Hours", ha-ha!) Later I got another confirmation of that when schools were closed during the 2020 pandemic. Even though technically I didn't spend much time directly engaged in schooling activities, just the fact that our entire living turned into a never-ending problem solving exercise was enough to drop my productivity so low that I almost got fired (although working under a shitty manager at the time could have something to do with that too). But here's a somewhat contradicting example. I noticed myself, and heard anecdotally from others, that achieving something satisfying , like finding and fixing a tricky bug, or seeing a rough implementation of an idea becoming alive, seems to reinvigorate you and keep you running through the day. A day when you work hard and accomplish something often feels easier on the brain than the one when you don't seem to be able to work at all.

0 views
Ivan Sagalaev 3 months ago

The Dawn of Everything

The Dawn of Everything is a book by an archaeologist David Wengrow and an anthropologist David Graeber who methodically dismantle the simplified view of the progress of human civilization. That we went from small bands of hunters-gatherers to tribes, to agriculture, to cities, to empires and ended up with our current idea of nation states, which is supposed to be the pinnacle of societal development. Turns out, this is simply not what happened, and modern archaeological data suggests that many societies around the planet and throughout history tried many different ways of organizing their living: All of this is written based on real archaeological data and prior work by other researchers. It's a thick book full of references, which makes it hard to read: you have to constantly switch between the text and the references, as half of them are not just titles of other works, but long full paragraphs that should have really been part of text. Plus, the first third of the book is really like a preview to the rest of it, where they regurgitate and repeat a lot of the points stated earlier. Here's a few random facts and tidbits I found fascinating that stuck in my head: European Renaissance thinkers likely got the idea of egalitarian societies and personal freedoms from American Indians. The latter were really not impressed with the idea of people having lords telling them what to do :-) One of the very first big settlements in the world were found in Ukraine , contemporary with the earliest cities of Mesopotamia . Early European archaeology focus on Mesopotamia, and their bias towards establishing the one true way from "primitive" cultures to dynasties is probably due to their desire to find kingdoms described in the Bible. They simply were not interested in a world-wide picture, as they already had a narrative. Schizmogenesis is a conscious choice of some peoples to live differently than their neighbors, despite sharing essentially the same environment. For example, ancient Pacific Northwest was split between heroic societies practicing war, slavery and display of wealth and more egalitarian societies with emphasis on personal freedom and sharing. It's tempting to look with irony on ancient Egyptians with their cult of dead kings and priests speaking on their behalf until you look at the modern USA with monuments like Mount Rushmore and the Supreme Court interpreting sacred words of Founding Fathers. Throughout history, egalitarian societies respecting personal freedoms tended to also respect women, while oppressive hierarchies worshiping heroic warriors tended to subjugate women and denigrate their roles. There are counter-examples, of course, but there is a strong correlation. huge cities without a central government, where power concentrated in local neighborhoods empires with a king but without a meaningful hierarchy of enforcing their rule people who tried agriculture and then consciously abandoned it while their neighbors still practiced it and even societies cycling between different forms of governance depending on the season European Renaissance thinkers likely got the idea of egalitarian societies and personal freedoms from American Indians. The latter were really not impressed with the idea of people having lords telling them what to do :-) One of the very first big settlements in the world were found in Ukraine , contemporary with the earliest cities of Mesopotamia . Early European archaeology focus on Mesopotamia, and their bias towards establishing the one true way from "primitive" cultures to dynasties is probably due to their desire to find kingdoms described in the Bible. They simply were not interested in a world-wide picture, as they already had a narrative. Schizmogenesis is a conscious choice of some peoples to live differently than their neighbors, despite sharing essentially the same environment. For example, ancient Pacific Northwest was split between heroic societies practicing war, slavery and display of wealth and more egalitarian societies with emphasis on personal freedom and sharing. It's tempting to look with irony on ancient Egyptians with their cult of dead kings and priests speaking on their behalf until you look at the modern USA with monuments like Mount Rushmore and the Supreme Court interpreting sacred words of Founding Fathers. Throughout history, egalitarian societies respecting personal freedoms tended to also respect women, while oppressive hierarchies worshiping heroic warriors tended to subjugate women and denigrate their roles. There are counter-examples, of course, but there is a strong correlation.

0 views
Ivan Sagalaev 11 months ago

Left DataDog

Today was my last day working at DataDog . Despite me having built and launched a pretty useful service within our department, the leadership decided to set it aside and built its next iteration with different people and different technologies. I resisted for a while :-), but wasn't really trying to save it at any cost: after all, nothing you do at work really belongs to you, so it's up to the company to decide what they want. That left me with nothing (big) to do, and starting anything new would've meant spending another 3 years building something from scratch without any real prospects or promises. So I left.

0 views
Ivan Sagalaev 1 years ago

Debounce

When last time I lamented the unoriginality of my tool I also entertained the idea of salvaging some value from it by extracting the event debouncing part into a stand-alone tool. So that's what I did. Meet debounce , a Rust library and a prototype command-line tool. In all honesty, I didn't really need as a library. was already working fine. But it felt like the Right Thing™ to do, and gave me an excellent opportunity to play with Rust's synchronization primitives. For a clean solution (that is, one without busy polling I employed before) I needed two threads: the main one to block and wait forever for external events, and a worker to wait out timeouts and perform specified actions. The worker would also have a mode where there are no events and it should block and wait for the main thread to supply one. This sounds like the job for a conditional variable, and I had hoped Rust would have some idiomatic higher-level wrapper around them. Turned out, Rust had three :-) Somehow it's very on-brand for a language that gives you 5 kinds of pointers and 4 kinds of strings :-) But this is also what makes it fun! Anyway, I ended up using parking, as it didn't need any extra code and worked well for my use case where I don't mind the worker thread being occasionally randomly woken up out of turn. Another purely Rustian puzzle I stumbled upon had to do with polymorphism. I have two kinds of event buffer types whith identical interface for getting values out of them. In Rust you express this with a trait which concrete types then implement in their own way: is the type of data stored in the buffer. Here seasonal Rustians are probably already asking their screens something along the lines of "wait, if your return type strictly depends on what's in the buffer, it doesn't really make sense for it to be a parameter of the trait…" And they are totally correct, but I didn't know that at that point. So far so good. Then I thought that, having a , it should be pretty natural for the buffer to implement a standard that would call as long as there are items in the buffer in the ready state. So I wrote the obvious: Which says "this is an implementation of the standard trait for any type which implements ". This however produced a compiler error which proved too intricate for me to understand. So, long story short, I went on Rust user forum where nice people imparted me over a couple of days with deep knowledge about traits, blanket implementations and associated types (which I think I finally get). Now my buffers are also iterators and I don't need to repeatedly call in my tests :-) Here's a couple of things I had a chance to reflect on, following this story: Having such a go-to place as users.rust-lang.org is exactly what I'm missing while developing for Android. To my knowledge, there just isn't anything like this for that ecosystem, and everyone just shouts in the abyss of Stack Overflow and tries to sort out random pieces of code coming from there. This type system wrangling is one of the things that makes dynamically typed languages more productive. And yes, I'm aware of the downsides, so no need to repeat the mantra of "typed languages remove a whole class of bugs" in the comments. Better think of the whole new class of code structures you need to learn and maintain to do it :-) Rust's packaging tool, Cargo , has a built-in notion of "examples", where you can implement something working without affecting your library dependencies and have it automatically built alongside the main code. So I implemented a CLI tool which works exactly in the way I described in the previous post , by removing sequential duplicates from that happen within a specified grace period: It's very bare-bones, as much as you would expect from a working example. I encourage anyone who needs additional options and features to write their own solution. (Here's a free idea: let the user specify by which part of the string to test equality, either with a regex or a field index, or something.) So this technically makes me an open source maintainer. Again . But this time, having 10+ years of experience maintaining highlight.js I think I'm going to do things differently. I don't like the default assumptions about what FLOSS maintainers are supposed to do these days. You're supposed to write code, do regular releases (lest your project will be pronounced dead), react on issues, review PRs from random people and be extra energized when dealing with anything that has the word "security" attached to it. And as a bonus for particularly good work you'll be rewarded with a Community™, whose self-proclaimed leaders would harass you for being a dictator who should feel guilty about not having the Community's interests, as formulated by the "leaders", in mind every single second of your life. This is all bullshit, of course. But this is also reality. And I used to bitch about it before , there's nothing new here. So here's what I'm going to do: I won't develop the code past what I need from it myself. If someone needs more features, they should write their own solution and maintain it (or not maintain it!) in the way they want. The license explicitly allows it. I am interested in what other people would make of it, but I make no promise about accepting all derivative work into my code. As long as you don't forcefully insist on having your PR merged, I remain a nice person and encourage sharing of ideas! I am especially interested in suggestions (in any form) on improving my Rust. This is, after all, what I wrote the thing for! In this light, my choice of pijul as a version control system plays well into this, as I expect to be somewhat shielded from Github's crowd where that sense of needy entitlement is especially strong. A random recent example is this thread where people with Opinions™ have been harassing maintainers of black about a minor issue for three years , and not a single one of them thought of volunteering to maintain a fork with the stability guarantees they ostensibly require so hard. Such work is not much fun of course, but they assume the maintainers owe it to them. P.S. I think I should write more about pijul, it's an interesting project! Condvar , which is exactly the Rusty wrapper around the idea of a conditional var channel , a higher-level interface for consumers to wait on data supplied by producers (which I suspect is built on top of ) parking , a built-in lightweight ability for a thread to suspend ("park") until the other thread wakes it up. Having such a go-to place as users.rust-lang.org is exactly what I'm missing while developing for Android. To my knowledge, there just isn't anything like this for that ecosystem, and everyone just shouts in the abyss of Stack Overflow and tries to sort out random pieces of code coming from there. This type system wrangling is one of the things that makes dynamically typed languages more productive. And yes, I'm aware of the downsides, so no need to repeat the mantra of "typed languages remove a whole class of bugs" in the comments. Better think of the whole new class of code structures you need to learn and maintain to do it :-) I won't develop the code past what I need from it myself. If someone needs more features, they should write their own solution and maintain it (or not maintain it!) in the way they want. The license explicitly allows it. I am interested in what other people would make of it, but I make no promise about accepting all derivative work into my code. As long as you don't forcefully insist on having your PR merged, I remain a nice person and encourage sharing of ideas! I am especially interested in suggestions (in any form) on improving my Rust. This is, after all, what I wrote the thing for!

0 views
Ivan Sagalaev 1 years ago

Python stdlib gems: collections.Counter

Here's a toy problem. Given a corpus of phone numbers for different countries determine a most prevalent display format in each country and use it to re-format an arbitrary phone number for its country. For example, if most US numbers in our data corpus are written like then the string should be converted to . For simplicity, let's assume that all numbers are local so we don't have to deal with the complexity of international prefixes . The actual conversion is not what this post is about, but for completeness here's a pair of complementary functions, one determining a pattern, and the other formatting a phone number according to it: (Error handling omitted for clarity.) The usual approach is to use a dict-like structure that would hold patterns as keys and count how often they appear in the source database, so the result is going to look something like this: (Don't mind the numbers, it's a small database!) As far as I have seen, people usually turn to defaultdict for something like this. It automatically initializes values of absent keys on first access which lets you simply do every time. In our case we need a more complicated thing: a for countries which would initialize keys with nested s for counts. Overall it might look like this: Here's a rule of thumb: whenever you see you can replace it with a Counter . It can do everything a can, plus a few convenient tricks, like the method which we can use instead of the ungainly expression from the previous example. Let me offer you another solution, which uses magic even more. I am also going to replace for-loops with a functional approach, so it won't be so embarrassingly imperative :-) Here's the code, followed by an explanation: Look ma no loops! Let me explain all that's happening here. As it happens, can consume a flat sequence of things and count them in one go: For this to work in our case we'll have to replace the nested data structure with a flat one. This can be done by simply gluing keys together in a tuple, replacing with equivalent (you can do it for any number of keys). The expression looks like a list comprehension without square brackets. It's a generator expression which you can iterate over without constructing an actual list in memory. It is usually enclosed in parentheses but when it is a sole argument in a function call you can omit them and avoid ugly doubling like . with no arguments returns the entire contents of the structure as a sorted list of pairs : It's sorted by count, with no regard for countries, but we don't care as you'll see later, as well as why we need to reverse it. The last expression is a dict comprehension which walks over the sorted list, destructuring its nested pairs into individual variables using which mimics the structure of one pair. is a conventional name for things we don't use, in our case. So the sequence we actually fold into a final dict looks like this, with countries becoming keys and patterns becoming values: Here's the important thing: subsequent values with the same key replace the ones already in the dict. This is why we need to reverse the list, so the patterns with bigger counts replace least common ones as we go along. This is also why we don't care about countries being intermixed, as they won't affect each other. Admittedly, the last bit is the most unobvious here. I don't really care, you can judge for yourself :-) I only wanted to demonstrate in action! As it happens, can consume a flat sequence of things and count them in one go: For this to work in our case we'll have to replace the nested data structure with a flat one. This can be done by simply gluing keys together in a tuple, replacing with equivalent (you can do it for any number of keys). The expression looks like a list comprehension without square brackets. It's a generator expression which you can iterate over without constructing an actual list in memory. It is usually enclosed in parentheses but when it is a sole argument in a function call you can omit them and avoid ugly doubling like . with no arguments returns the entire contents of the structure as a sorted list of pairs : It's sorted by count, with no regard for countries, but we don't care as you'll see later, as well as why we need to reverse it. The last expression is a dict comprehension which walks over the sorted list, destructuring its nested pairs into individual variables using which mimics the structure of one pair. is a conventional name for things we don't use, in our case. So the sequence we actually fold into a final dict looks like this, with countries becoming keys and patterns becoming values: Here's the important thing: subsequent values with the same key replace the ones already in the dict. This is why we need to reverse the list, so the patterns with bigger counts replace least common ones as we go along. This is also why we don't care about countries being intermixed, as they won't affect each other. Admittedly, the last bit is the most unobvious here.

0 views
Ivan Sagalaev 1 years ago

The config

This story is about an accomplishment at work. Due to it being work-specific I won't go into too much detail, so it won't be necessarily useful. It's just a personal piece. For various reasons (such as changing places, raising a child, workplace politics, poor luck) I haven't got any clear wins under my belt for quite some time, probably since building a Python team at Yandex 15 years ago. So this one feels a little special. A couple of years ago our department — we're doing billing at Datadog — faced a point at which individual teams accumulated a lot of hard-coded, duplicate, undocumented, constantly diverging configuration about products. My team had it especially bad, as we had to maintain essentially identical pieces of config in two different languages across two different code repositories, and synchronize bits of it to yet another one. So on the next department-wide gathering I volunteered to create a prototype of a solution in the form of a shared config service. The idea wasn't controversial, and there was no details yet to object to, so it became the moment where I put my stake in the ground. It went way slower than I expected (and I thought about my initial estimates as conservative ones!) Sure, the prototype within our team took shape pretty quickly, and soon significantly simplified making ongoing changes to products. It was convincing other teams to bite the bullet and switch their code to the config that took most of the effort. Because by its nature, such a change makes things harder at first: you suddenly need to make network requests instead of simply relying on hard-coded data, and you now need to think about backwards compatibility of your data consumed by others. It's only after a while you (or not even you) will start profiting from reduced duplication and simplified processes due to data being available from a single source of truth. But we got there. We got Product people on board, we acquired a manager, we started producing OKR s, we involved more engineers… And while there wasn't a clear moment of a "launch", on our latest department summit I realized people from various teams were mentioning "the config" not just as a nice idea, but as something definitely existing, and I was in the middle of all those conversations. It felt like a pivotal moment, and it was a warm, fuzzy feeling :-) I offered my idea in a very rough form, almost on a whim, as I wasn't even sure if I should may be first create a prototype before talking about it. I didn't have any slides or a prepared talk, I just shared my text editor on a screen during the "unconference" part and drafted a few ideas right there. Not many people were present, but it gave me an owning reference point for the effort. I named the project with a generic name claiming it as being a solution for the whole department. Not a fancy name, not a team-specific name, not a "draft" name. It's hard to prove how much it helped, but I believe it did. Finding early stakeholders and talking about their problems in their language instead of focusing on technology: managers want to streamline communications, product wants visibility, engineers want less churn in their code, bosses want to see a strategic vision. Making personal connections. Not only through docs, chat and zoom, but actual face-to-face meetings. When you win people over and they have trust in your project, they start inadvertently helping by mentioning it in meetings and tying their plans to its success. I saw this happening on a few occasions: an almost dead conversation suddenly turned around by someone credible saying "actually, we're going to use it in such and such way." Dumb architecture. None of the above would matter if I couldn't deliver a working prototype and quickly iterate it into something dependable. Since at first I was the only engineer/architect on the project, my approach was to use boring, dumb technology. Use static files instead of a database, serve them over HTTP via nginx, drop some plain Python modules to do validation at build time, put a number in a file name as a "versioning strategy", run deploys on a schedule instead of requiring teams to implement hooks, etc. Almost all of it is going to change over time, of course. But: A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work — John Gall . I offered my idea in a very rough form, almost on a whim, as I wasn't even sure if I should may be first create a prototype before talking about it. I didn't have any slides or a prepared talk, I just shared my text editor on a screen during the "unconference" part and drafted a few ideas right there. Not many people were present, but it gave me an owning reference point for the effort. I named the project with a generic name claiming it as being a solution for the whole department. Not a fancy name, not a team-specific name, not a "draft" name. It's hard to prove how much it helped, but I believe it did. Finding early stakeholders and talking about their problems in their language instead of focusing on technology: managers want to streamline communications, product wants visibility, engineers want less churn in their code, bosses want to see a strategic vision. Making personal connections. Not only through docs, chat and zoom, but actual face-to-face meetings. When you win people over and they have trust in your project, they start inadvertently helping by mentioning it in meetings and tying their plans to its success. I saw this happening on a few occasions: an almost dead conversation suddenly turned around by someone credible saying "actually, we're going to use it in such and such way." Dumb architecture. None of the above would matter if I couldn't deliver a working prototype and quickly iterate it into something dependable. Since at first I was the only engineer/architect on the project, my approach was to use boring, dumb technology. Use static files instead of a database, serve them over HTTP via nginx, drop some plain Python modules to do validation at build time, put a number in a file name as a "versioning strategy", run deploys on a schedule instead of requiring teams to implement hooks, etc. Almost all of it is going to change over time, of course. But: Having a live spec. A single document with a high-level overview of the system's current state. Way better than relying on updating people via random meetings or producing a new half-complete document for every separate occasion. It does take effort to maintain, but it's worth it. Thank you Mr. Spolsky for teaching me this a long time ago.

0 views
Ivan Sagalaev 2 years ago

On Kotlin

I've been writing code in Kotlin on and off over a few months, and I think I'm now at this unique stage of learning something new when I already have a sense of what's what, but not yet so far advanced so I don't remember beginner's pain points. Here's a dump of some of my impressions, good and bad. We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp. — Guy Steele Kotlin drags Java programmers another half of the rest of the way. That is to say, Kotlin doesn't feel like a real functional-first language. It's still mostly Java with all its imperativism, mutability and OO, but layered with some (quite welcome) syntactic sugar that makes it less verbose and actually encourages functional style. Where it still feels mostly Java-ish is when you need to work with Java libraries. Which is most of the time, since the absolutely transparent Java interop doesn't make writing Kotlin-flavored libraries a necessity. For starters, you don't have to put everything in classes with methods any more. Plain top-level functions are perfectly okay. You also don't need to write/generate a full-blown class if what you really need is a struct/record. Instead you just do: These have some handy features (like comparability) implemented out of the box, which is nice. And then you can pass them to functions as plain arguments, without necessarily having to make them methods on those argument's classes. Like other newer languages (Swift, Rust) Kotlin allows you to add your own methods to existing classes, even to built-in types. They are neatly scoped to whatever package they're defined in, and don't hijack the type for the entirety of the code in your program. The latter is what happens when you add a new method to a built-in class dynamically in Ruby, and as far as I know, it's a constant source of bad surprises. It doesn't require any special magic. Just keep in mind that is not really different from , only the name of the first parameter is going to be , and it's going to be available implicitly. This, I think, is actually a big deal, becasue looser coupling between types and functions operating on them pushes you away from building rigid heirarchies. And by now I believe most people have realized that inheritance doesn't scale. So these days the only real value in having over is the ability to compose functions in the natural direction: … as opposed to Yes, I know your Haskell/OCaml/Clojure have their own way of doing it. Good. Kotlin has chaining. Kotlin uses and for declaring local data as immutable and mutable, respectively. is encouraged to be used by default, and the compiler will yell at you if you use without actually needing to mutate the variable. This is very similar to Rust's and . Unfortunately however, Kotlin doesn't enforce immutability of a class instance inside its methods, so it's still totally possible to do: … and have internal state changed unpredictably. Kotlin is another new language adopting "everyhing is an expression" paradigm. You can assign the result of, say, an statement to a variable or it. This plays well with a shortened syntax for functions consisting of a single expression, which doesn't involve curly braces and the keyword: You still need in imperative functions and for early bail-outs. This is all good, I don't know of any downsides. I think Kotlin has easily the best syntax for nameless in-place functions out of all languages with curly braces: You put the body of the function within , no extra keywords or symbols required. If it has one argument (which is very common), it has an implicit short name, . This one is really cool: if the lambda is the last argument of the accepting function, you can take it outside the parentheses, and if there are no other arguments, you can omit the parentheses altogether. So filtering, mapping and reducing a collection looks like: Note the absence of after the first two functions. The line with is more complicated because it does have an extra argument, an initial value, which has to go into parentheses, and it also has a two-argument lambda, so it needs to name them. Many times you can get away with not inventing a name for another temporary variable: takes the object on which it was called ( in this case), passes it as a single argument to its lambda, where you can use it as, well, , and then returns whatever was returned from the lambda. This makes for succinct, closed pieces of code which otherwise would either bleed their local variables outside the scope, or require a named function. This reminds me of Clojure's , and Kotlin also has its own idiom similar to which is a variant that only works when the value is not : If the result of is the operator would safely short-cirquit the whole thing and not call the block. Speaking of , it's actually one of no fewer than five slight variations of the same idea. They vary by which name the object is passed inside the lambda block, and by what it returns, the object itself or the result of the lambda. Here they are: Technically, you can get by with only ever using , because you can always return explicitly, and the difference between and is mostly cosmetic: sometimes you can save more characters by omitting typing , sometimes you still need it to avoid things like , so you switch to using . The real reason for all these variations is they're supposed to convey different semantics . In practice I would say it creates more fuss than it helps, but it may be just my lack of habit. And no, I didn't forget about the fifth one, , which is just a variant of , but you pass the object in parentheses instead of putting it in front of a dot: I can only probably justify its existence by a (misplaced) nostalgia for a similar from Pascal and early JavaScript. And there's a reason nobody uses it anymore: the implicit was a reliable source of hard to spot bugs. By the way, this sudden language complexity is something that Lisps manage to avoid by simply not having the distinction between "functions" and "methods", and always returning the last expression from a form. "An elegant weapon for a more civilized age", and all that :-) That one caught me off guard. Turns out there's a difference on what kind of value you call , and such. Calling them on a does not produce a lazy sequence, it actually produce a concrete list. If you want a lazy result you should cast a concrete collection to first: That's one more gotcha to be aware of if you want to avoid allocating memory for temporary results at every step of your data transformations. In Python, tuples are a workhorse as much as dicts and lists. One of their underappreciated properties is their natural orderability : as long as corresponding elements of two tuples are comparable with each other, tuples are also comparable, with leftmost elements being the most significant, so you have: This is tremendously convenient when sorting collections of custom elements, because you only need to provide a function mapping your custom value to a tuple: Kotlin doesn't have tuples. It has pairs , but they aren't orderable and, well, sometimes you need three elements. Or four! So when you want to compare custom elements you have two options: Define comparability for your custom class. Which you do at the class declaration, way too far away from the place where you're sorting them. Or it may not work for you at all if you need to sort these same elements in more than one way. Define a comparator function in place. Kotlin lambdas help here, but since it needs to return a -1/0/1, it's going to be sprawling and repetitive: for all elements, subtract one from another, check for zero, return if not, move to the next element otherwise. Bleh… It's probably to widespread type inference that we owe the resurgence in popularity of typed languages. It's what makes them palatable. But implementations are not equally capable across the board. I can't claim a lot of cross-language experience here, but one thing I noticed about Kotlin is that it often doesn't go as far as, say, Rust in figuring out what is it that you meant. For example, Kotlin can't figure out the type of an item of an initially empty list based on what data you're adding to it: Rust does this just fine: It's a contrived example, but in paractice I also had stumbled against Kotlin's inability to look into how the type is being used later. This is not a huge problem of course… I'm going to bury the lead here and first give you two examples that look messy (to me) before uncovering the True Source of Evil. The first thing are and modifiers for type parameters. There is a long detailed article about them in the docs about generics which I could only sort of understand after the third time I read it. It all has to do with trying to explain to the compiler the IS-A relationship between containers of sub- and supertypes. Like could be treated as if you only read items from it, but you obviously can't write a random into it. Or something… The second example is about extension methods (those that you define on some third-party class in your namespace) that can't be virtual . It may not be immediately apparent why, until you realize that slapping a method on a class is not the same as overriding it in a descendant, but is simply a syntactic sugar for . So when you call it doesn't actually look into the VMT of , it looks for a free-standing function in a local namespace. You put the body of the function within , no extra keywords or symbols required. If it has one argument (which is very common), it has an implicit short name, . This one is really cool: if the lambda is the last argument of the accepting function, you can take it outside the parentheses, and if there are no other arguments, you can omit the parentheses altogether. takes the object as , returns the object takes the object as , returns the result of the block takes the object as , returns the object takes the object as , returns the result of the block Define comparability for your custom class. Which you do at the class declaration, way too far away from the place where you're sorting them. Or it may not work for you at all if you need to sort these same elements in more than one way. Define a comparator function in place. Kotlin lambdas help here, but since it needs to return a -1/0/1, it's going to be sprawling and repetitive: for all elements, subtract one from another, check for zero, return if not, move to the next element otherwise. Bleh…

0 views
Ivan Sagalaev 2 years ago

Trie in Python

A post about Haskell vs. Python readability came onto my radar the other day. It compares two implementations of a trie structure , and after looking upon the Python version I wanted to make my own attempt. I didn't make it to necessarily compare or "battle" against the other solutions, it's more of an exercise in the vein of "how would I do it". Here's the original code (for easier lookup, as I refer to a few things in it in the notes): A few notes: Not storing does seem to reduce complexity, perhaps counter-intuitively. The oft neglected allowed me to inline the entire . Another Pythonism, and , is a nice pattern for recursive tree walking that would otherwise require temporary containers. It also usually results in tighter code. I attempted a couple more experiments with making the code more functional, like using and recursions instead of for-loops, but it didn't improve things really. Python is not a functional language in its soul :-) The idea of re-binding while tree-walking may scare some people, but I thought doing just to avoid this was a bit silly :-)

0 views
Ivan Sagalaev 3 years ago

nfp

So what happened was, I fed up manually uploading pictures I export from Darktable to my Flickr photo stream using the browser's file picker. So I decided to do something about it. The initial idea was to craft a FUSE file system which would automatically upload new files, but this turned out to be hard, so I switched to a much simpler solution: a little inotify watcher handing over new files to an upload script. I managed to code up a working solution over a weekend! More interestingly, I made the watcher part — "nfp", for "New File Processor" — as a generic configurable tool which I published . It was only when I started writing this very blog post that I stumbled upon a standard Linux tool that does it, inotifywait :-) Still, I hope there's something to be salvaged from this project. Read on! Darktable is my tool of choice for working with camera RAWs, and I just want to take a moment to share my appreciation for the folks making it. It's a ridiculously advanced, polished photo processor. A real testament to open-source software. It actually used to have a Flickr export plugin, but it hasn't been working for a while, and got dropped in recent versions. Which is totally fair because it's very much out of scope for a photo editing software. Having a generic solution like makes much more sense because it can connect arbitrary file producers and consumers. It doesn't even have to be about images. Since "inotify" sounds very "systems" and "core", I immediately took it as an opportunity to play with Rust once more. That was the main reason. A nice side effect of it is that it builds into a small self-contained binary which you can bring with you anywhere. As long as it's Linux, anyway :-) If I had to mention a single gripe with the language during this last foray, that would be implementing an ordered type with the / / / trait family. This just feels unnecessarily hard. I still don't get what's the point of having partial variants, and why things couldn't be inferred from each other. Like, even the official docs on recommend writing a boilerplate for that just calls out to . I'm sure there are Reasons™ for it, but somehow Python can infer total ordering from just and . After using the tool for a week I noticed that the uploaded photos didn't have any metadata on them. After some digging this turned out to be due to the way Darktable writes exported files: it does it twice for every file. The second write, I assume, is specifically to add metadata to the already fully written JPEG. The problem was, has been snatching the file away immediately after the first write. The only way I know how to deal with this problem is " debouncing ", a term familiar to programmers working with UI and hardware. Which means, adding a short grace period of waiting until a jittery signal stops appearing on the input or the user stops rapidly clicking a button. Or Darktable stops rapidly overwriting a file. Quick search for a generic debouncer for Rust turned up only specific solutions tied to mpsc channels, or async streams, or hardware sensors. So I wrote my own debounce , which is a passive data structure with a couple of methods that doesn't want to know anything about where you get the data and what's the waiting mechanism. It just tracks time and removes duplicates. I may yet turn it into a full-blown crate, and may be build a unixy-feeling debounce tool along the lines of: Update: this has been implemented . To do it properly though, I'll have to implement it as a two-threaded process, which will give me an opportunity to play with concurrency in Rust, something I haven't done yet. In I cheated: it waits for new notifications on the same thread that sleeps for debounce timeouts, so it uses an ugly hack of sleeping in short chunks and constantly checking for new events: The uploader script was a story in itself. Ironically I spent more time trying to make various existing solutions work for me than I did with , but didn't have any luck. So I ended up cobbling together a Python script using flickrapi . The ugly part of all these scripts is OAuth . More precisely, its insistence on having to register a client app to get a unique id and secret (apart from the user auth for whoever is going to be using it). It's totally fine for a web service, but in anything distributed to user-owned general-purpose computers it means that a determined user can fish out the client credentials and use them for something else (oh horrors!) I remember dealing with this problem when we worked on an OAuth service for Yandex around 2009, and we didn't come up with a good solution for it. These days I believe client credentials should be optional, akin to the header in HTTP, and shouldn't be used for anything outside of coarse statistical data. Anyway… Since I'm using this script only for myself, I registered it on Flickr, put the credentials in a config and forgot about it :-) Here's the whole script for posterity: P.S. I love dict destructuring with ! I'm not yet sure what to do with . The good sense tells me to extract a debouncer out of it and drop the rest in favor of , but it actually does add some extra value: it has a sensible config format and I can modify it further into being able to exec multiple processor scripts in parallel. Although I suspect the latter part can be handled by yet another unix voodoo :-)

0 views
Ivan Sagalaev 4 years ago

New pet project

So anyway, I'm making a shopping list app for Android. As I understand, "shopping list" is something of a hello-world exercise of Android development, which may explain why there are so many rudimentary ones in Google Play. Only in my case I actually need one, and I know exactly what I want from it. See, for the past 10 years or so I've been in charge of food supply in our family, which includes everything from grocery shopping logistics, to cooking, to arranging dishes in the dishwasher. And the app is an essential part of the first stage of that chain. Up until recently I used Out of Milk , which someone suggested me a long time ago, and at that time it was probably the best choice. I remember being quite happy to pay for a full version. Over time though it got a little bloated in ways I didn't need and a little neglected in places I cared about. The UI got very "traditional", requiring fiddly unnecessary motions for core functionality. Here's the short list of its wrongs I still remember: Start-up time of several seconds, sometimes overflowing into dozens. I believe my 4-year old phone should be perfectly able to load a shopping list in sub-second time. Adding an item when it's already on the list results in two identical items on the list. (Yes, really.) Auto suggest when adding an item has whatever ordering and limits the amount of displayed results. This meant I could never get "Tomatoes" in there, as they were buried under "Roma tomatoes", "Cherry tomatoes", and a few others with no way to scroll to it. Tiny click target to check an item off the list. I was constantly fat-fingering around those and getting into a different screen. Checking an item off the list puts it into another list below the main one, which you either have to empty all the time, or end up with a huge scroll height. As I understand, the idea was that you could uncheck the items from there to put them back on the list, but that's unrealistic with my catalog of ~ 150 items. "Smart" categorization kept inventing excessively detailed categories leading to several one-item categories clogging up the list. Sometimes unsuccessful synchronization would "forget" added items on the list. Which is funny because I didn't have anything to synchronize with! I probably could spend some time on searching for an app that'd suit me better, but… Look, I'm a programmer. Writing code is what I do! And I wanted to play with Android development since forever, and the recent exposure to Kotlin gave me all the reasons I didn't really need in the first place :-) Here's a laundry list of what I want from a shopping list: Automatic ordering based on the order in which I buy things. I've had this idea ever since I was using Out Of Milk , because ordering manually sucks, and it feels like something computers should be able to do well, right? However it's really not trivial to implement, if you think of it. So it was my main challenge and a trigger to actually start the project. Fuzzy search for suggested items. I'm used to typing 3-4 characters in my Sublime Text to go to every file or identifier in a project. I want the same service here. Smart sorting of suggested items. It could take into account closeness of matching, frequency and recency of buying. Multiple lists with separate histories. Different stores have different order of aisles, and I buy different things in them. A single list won't cut it. Renaming and annotating items . I get annoyed by typos and spelling errors, I want to correct them. And sometimes I want to add a short note to an item (like a particular brand of cheese, or a reminder that I need two cartons of milk this time). Color-coded categories , to give visual aid in scanning what otherwise would be a plain list of strings. They don't have to be terribly detailed. Less of buttons, check boxes and dialogs. I want to interact with the content itself as much as possible. Swiping items off the list instead of clicking a checkbox. Having lists themselves in a carousel, instead of choosing their names from a , etc. Oh, and no settings, if I can get away with it! Undo . It's really annoying to accidentally swipe off something covered by your thumb only to realize it's not what you intended, and now you have no clue what it was. GPS pinning . This is one aspirational feature I'll probably tackle last, if ever. I want to pin a list to a particular geo location, so the app would automatically select it when I'm at this store again. Also, no tracking, ads or other such bullshit. Should be self-explanatory :-) Not having some ugly API SDK making network calls at startup should really help with performance. I actually first started working on it at the end of 2019 and made good progress into 2020… but then something got in the way. Yeah… Anyway, after making an effort to restart the project I'm making good progress again and actually feel really happy about it all! About a month ago I started dogfooding the app and was able to deleted Out Of Milk from my phone (So long and thanks for all the fish!) I've got the first five features mostly done, but there's nothing like actually using it that keeps showing me various edge cases I could never think about. I love this process :-) Crucially, I can now add "Tomatoes" by just typing "t", "m" — and have them as the first suggestion. The app looks pretty rudimentary, as you'd expect at this stage. But really, this time I want to not just fool around and dump the code somewhere in the open, I actually want to make a finished, sellable product out of it. Going to be a fun adventure! (Technically, me and my wife already tried selling my shareware tools at some time in the previous century, but we managed to only sell about two copies, so it doesn't count.) Start-up time of several seconds, sometimes overflowing into dozens. I believe my 4-year old phone should be perfectly able to load a shopping list in sub-second time. Adding an item when it's already on the list results in two identical items on the list. (Yes, really.) Auto suggest when adding an item has whatever ordering and limits the amount of displayed results. This meant I could never get "Tomatoes" in there, as they were buried under "Roma tomatoes", "Cherry tomatoes", and a few others with no way to scroll to it. Tiny click target to check an item off the list. I was constantly fat-fingering around those and getting into a different screen. Checking an item off the list puts it into another list below the main one, which you either have to empty all the time, or end up with a huge scroll height. As I understand, the idea was that you could uncheck the items from there to put them back on the list, but that's unrealistic with my catalog of ~ 150 items. "Smart" categorization kept inventing excessively detailed categories leading to several one-item categories clogging up the list. Sometimes unsuccessful synchronization would "forget" added items on the list. Which is funny because I didn't have anything to synchronize with! Automatic ordering based on the order in which I buy things. I've had this idea ever since I was using Out Of Milk , because ordering manually sucks, and it feels like something computers should be able to do well, right? However it's really not trivial to implement, if you think of it. So it was my main challenge and a trigger to actually start the project. Fuzzy search for suggested items. I'm used to typing 3-4 characters in my Sublime Text to go to every file or identifier in a project. I want the same service here. Smart sorting of suggested items. It could take into account closeness of matching, frequency and recency of buying. Multiple lists with separate histories. Different stores have different order of aisles, and I buy different things in them. A single list won't cut it. Renaming and annotating items . I get annoyed by typos and spelling errors, I want to correct them. And sometimes I want to add a short note to an item (like a particular brand of cheese, or a reminder that I need two cartons of milk this time). Color-coded categories , to give visual aid in scanning what otherwise would be a plain list of strings. They don't have to be terribly detailed. Less of buttons, check boxes and dialogs. I want to interact with the content itself as much as possible. Swiping items off the list instead of clicking a checkbox. Having lists themselves in a carousel, instead of choosing their names from a , etc. Oh, and no settings, if I can get away with it! Undo . It's really annoying to accidentally swipe off something covered by your thumb only to realize it's not what you intended, and now you have no clue what it was. GPS pinning . This is one aspirational feature I'll probably tackle last, if ever. I want to pin a list to a particular geo location, so the app would automatically select it when I'm at this store again. Also, no tracking, ads or other such bullshit. Should be self-explanatory :-) Not having some ugly API SDK making network calls at startup should really help with performance.

0 views
Ivan Sagalaev 4 years ago

Status update: 2020

Pretty sure I'm in the minority on this one, but this year was actually a good one for me! One of the best in a while, to be honest… There's nothing good about this, and we all could do without it. But even then, being introverted probably helped me to fare better than most with the whole sitting at home thing. Also, nobody of my family and friends had died or had any severe health problems. So I'm filing it under the "could be worse" category. Unlike many, many people who lost their jobs this year I actually got a new one. I was lucky enough to wrap up my interviewing and even do an onboarding week in New York in the first week of March, right before the quarantine. I distinctly remember a conversation with the manager of my favorite breakfast place in NY about the virus: she was worried, and I was completely dismissive about the severity of the thing, saying that it's going to be gone soon like another flu, and the probability to catch it was really low. Yeah… Didn't take long to realize how wrong I was! So anyway this is now my third job in a row where I'm working remotely from West coast on a New York based company :-) This time it's Datadog . Dealing with Python, TypeScript, React, Kubernetes, etc. Nothing out of the ordinary. But going from a household with no income to having two people with income obviously sweetens the deal quite a lot! It was a long process, but now it is fair to say that maintenance of the library is finally out of my hands. I still own the domain name and do occasional merges and fixes to the site code which Josh asks me to, but I'm so very glad the project hasn't died with me losing the interest! This is something that makes me really, really happy. After quite a few years of hiatus I made enough time to go back to programming something for myself. Not nearly as regular as I'd like to, but it is happening, and it's immensely satisfying! This time it's an Android app ( in Kotlin , of course), and the one I wanted to make for myself for more than 5 years: a shopping list. It may sound totally lame but I don't care :-) And I'll definitely blog about it in some more details. One thing that made it possible is that our school has smoothed out the remote learning process enough so that me and my wife don't have to babysit our 3-rd grader through the day. She does everything mostly by herself, so we have time to actually work during the day and I, consequently, have time to do other stuff in the evenings. This was the best year ever, literally! After recovering from yet another of innumerable injuries in the beginning of the year I'm back to running fast, and setting personal bests on both 6 and 13 miles. This is probably in part due to the new running shoes , too. We used to play board games a lot with different friends and colleagues back when we lived in Moscow, but after everyone moved everywhere we pretty much only played once a year when we visited our oldest friends for the New Year celebration. But this year we realized our kid is old enough to be interested in board games, so we went big on that and dedicated a whole room to it! We dug out the old games we hauled with us from Russia and we bought some new ones. And now we have some family fun most nights. I have to say, our girl is scary good at Munchkin!

0 views
Ivan Sagalaev 5 years ago

Server upgrade 2020

Regular readers of this blog have probably noticed that it is now served from a different IP address… Okay, okay, I'm kidding! This blog doesn't have any regular readers, of course. Anyway, what I'm trying to say is that I recently spent quite some effort to move all my stuff — softwaremaniacs.org , highlightjs.org , their supporting databases and, most importantly, my personal email — to a new and shiny Linode instance under control of an up-to-date Ubuntu. Here's the (heavily compressed) war story. It all started with an email from letsencrypt politely telling me my client software (certbot) must be too old as it uses an obsolete version of their protocol, and it's going to break when they drop support for it. Little did I know at that moment just how hairy a yak I was staring in the eyes… Try upgrading the package via : no new versions available for this version of Ubuntu (I think it was 18.04). Try upgrading Ubuntu (long overdue) via : no more versions of Ubuntu available for the 32-bit architecture. Reach to Linode support asking why my 64-bit machine tells me it's 32 bit: get a few useful pointers revealing that my system is in fact a Frankenstein's monster: a 32-bit kernel with a 64-bit userspace. (By the way, Linode has the best support team!) Figure there's no way to upgrade the mess in place: need to provision a new machine and build all the services from scratch moving data from the old machine. Django apps, HTTP configs, Postgres databases and all the mail. Fire up a new instance, wait weeks for "when there's more time" to deal with it (you know how that ends). Meanwhile, software running highlightjs.org really needs an upgrade to support its new release process (it's always node.js for some reason): realize I can do it relatively easy on the new server without touching softwaremaniacs.org (thinking of migrating mail gives me headaches). Move entire highlightjs.org stuff in a semi-manual process, switch DNS, wait for the traffic on the old instance to die down in a day. Now I have two servers: really need to find more time to finish the transition! Move entire softwaremaniacs.org stuff in a semi-manual process, but don't switch DNS: I can't just move my mail server in a similar way because unlike mostly read-only Web apps, I will have to be accepting mail on both hosts while DNS is expiring. Anyway, need to set up a mail server on the new instance: turns out package is not supported anymore. If you didn't know, was a brilliant package maintained by Canonical that installed all packages needed for a personal mail server and then went ahead and wrote all the configs for you . But fret not, there's now the task , ostensibly doing the same: alas, it actually only installs and but doesn't configure anything . Work up courage to read the 3-part personal email howto on ArsTehnica which I had opened in my browser for the past 80 years or so: realize it's opinionated, complicated and in places outdated (but still a good resource for orientation). Educate myself on "postfix", "dovecot", "Maildir", "sasl", "starttls", "spamassassin", "milters", "sieve", "fail2ban", "spf", "dkim": I guess I now know why Canonical doesn't want to support an out-of-the-box solution anymore. Set up mail, test the hell out of it by using my local to point softwaremaniacs.org to the new server. Set up a temporary relay from the old server to the new one: realize postfix is not okay with relaying mail to another postfix that thinks it has the same name . Pretend the new server is called , switch DNS, wait for traffic on the old one to die down. Finally successfully install on the new server, obtain new certificates. They were about to expire in just two days! Shut down the old server. RIP. I now have a cleaner, simpler setup which I understand. My quarantined spam now ends up in a regular mail folder (before, I had to SSH onto the machine and dig through some obscure file names). My outgoing mail has a better chance of reaching its destinations (but it's still an ongoing fight). My mail server doesn't use weird custom ports, so it's easier to set up mail clients. I'm getting less spam. My certificates are not in the immediate danger of expiring. My Python code is now using all the 3.8-isms I want! I even refactored quite a bit of site code, just for fun. Try upgrading the package via : no new versions available for this version of Ubuntu (I think it was 18.04). Try upgrading Ubuntu (long overdue) via : no more versions of Ubuntu available for the 32-bit architecture. Reach to Linode support asking why my 64-bit machine tells me it's 32 bit: get a few useful pointers revealing that my system is in fact a Frankenstein's monster: a 32-bit kernel with a 64-bit userspace. (By the way, Linode has the best support team!) Figure there's no way to upgrade the mess in place: need to provision a new machine and build all the services from scratch moving data from the old machine. Django apps, HTTP configs, Postgres databases and all the mail. Fire up a new instance, wait weeks for "when there's more time" to deal with it (you know how that ends). Meanwhile, software running highlightjs.org really needs an upgrade to support its new release process (it's always node.js for some reason): realize I can do it relatively easy on the new server without touching softwaremaniacs.org (thinking of migrating mail gives me headaches). Move entire highlightjs.org stuff in a semi-manual process, switch DNS, wait for the traffic on the old instance to die down in a day. Now I have two servers: really need to find more time to finish the transition! Move entire softwaremaniacs.org stuff in a semi-manual process, but don't switch DNS: I can't just move my mail server in a similar way because unlike mostly read-only Web apps, I will have to be accepting mail on both hosts while DNS is expiring. Anyway, need to set up a mail server on the new instance: turns out package is not supported anymore. If you didn't know, was a brilliant package maintained by Canonical that installed all packages needed for a personal mail server and then went ahead and wrote all the configs for you . But fret not, there's now the task , ostensibly doing the same: alas, it actually only installs and but doesn't configure anything . Work up courage to read the 3-part personal email howto on ArsTehnica which I had opened in my browser for the past 80 years or so: realize it's opinionated, complicated and in places outdated (but still a good resource for orientation). Educate myself on "postfix", "dovecot", "Maildir", "sasl", "starttls", "spamassassin", "milters", "sieve", "fail2ban", "spf", "dkim": I guess I now know why Canonical doesn't want to support an out-of-the-box solution anymore. Set up mail, test the hell out of it by using my local to point softwaremaniacs.org to the new server. Set up a temporary relay from the old server to the new one: realize postfix is not okay with relaying mail to another postfix that thinks it has the same name . Pretend the new server is called , switch DNS, wait for traffic on the old one to die down. Finally successfully install on the new server, obtain new certificates. They were about to expire in just two days! Shut down the old server. RIP. I now have a cleaner, simpler setup which I understand. My quarantined spam now ends up in a regular mail folder (before, I had to SSH onto the machine and dig through some obscure file names). My outgoing mail has a better chance of reaching its destinations (but it's still an ongoing fight). My mail server doesn't use weird custom ports, so it's easier to set up mail clients. I'm getting less spam. My certificates are not in the immediate danger of expiring. My Python code is now using all the 3.8-isms I want! I even refactored quite a bit of site code, just for fun.

0 views
Ivan Sagalaev 5 years ago

Dicts are now ordered, get used to it

There were several moments over the last few weeks when I heard people discuss differences between Python lists and dicts and one of the first ones mentioned was that lists are ordered and dicts are not. Well, not anymore. Quoting the docs referenced above: Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. So if you want to discuss fundamental differences you can pretty much only point out that dict values are accessible by keys, which could be of any immutable type, while list values are indexed with integers. That's it :-) A plain hash table holds both keys and values in a pseudo random order determined by hashes calculated from keys. It is also sparse, with unoccupied holes in a pre-allocated array: Since version 3.6 CPython holds keys and values in a separate dense array, while the hash table itself only holds indexes into that: Since the entries array is populated sequentially, it naturally ensures the order. As far as I know the initial reason for this change was saving space by sharing hash tables of multiple dicts with the same set of keys (which in Python means instances of the same class, for example). I don't know the exact politics of how this useful property progressed from an implementation detail to a guaranteed behavior. Does it mean that a dict with int keys is the same as list? There's still a few practical differences. One obvious difference is the API. You'd have to always mention indexes explicitly when working with a dict[int], there's no such thing as or with no argument. Frankly, I can't see any point in trying to make it work :-) Also, dicts are bigger, by an order of magnitude: Surprisingly, I couldn't find any consistent difference in speed in neither insertion of new values in a list and a dict[int], nor in traversing them. My gut feeling tells me it's mostly due to the fact that a hash value of an int is the int itself, so there's no time wasted on hashing. Yep, to my surprise sets are still unordered:

0 views
Ivan Sagalaev 6 years ago

Misconception about OSS support

You wouldn't think a free syntax highlighting library would be a strong dependency for the development process of a business, and yet I'm waking up on a Monday to a flurry of comments and even one personal email from engineers eager to ask me to work for free for their employers. So of course I took time to scathingly turn it into a teachable moment. https://github.com/highlightjs/highlight.js/issues/1984#issuecomment-466941892 : I would like if you revert the change. It is currently blocking a lot of build from other people Let me take this as an opportunity to explain something about the current sorry state of relationship between businesses and open source projects. (Yeah, I know, but people still don't get it.) highlight.js is not a business, it's a hobby. It means that whatever gets pushed to this repository or npm should be assumed to be the result of someone having fooled around and gone away for a weekend with their family. Or for a busy working day at their job. If a business has made a decision to rely on this artifact for anything requiring any sort of stability (i.e. "blocking a lot of build from other people"), it made a stupid and uninformed decision. Or more realistically, it simply relies on maintainers feeling ashamed enough to quickly fix problems when they happen. Even more realistically, it just accepts the fact that their engineers are going to deal with maintainers by soliciting free support, because it has always worked this way . I, for one, don't feel any urge at all supporting someone's misplaced expectations :-) So, dear fellow engineers, please take this build hiccup as an opportunity to explain to your particular business people that their entire intellectual property is a thin layer on top of a shaky foundation of open-source code lazily maintained by hobbyists or paid for by other businesses having their own goals in mind. Mention the leftpad story for more effect. If they really want stability they have to invest in it.

0 views
Ivan Sagalaev 6 years ago

Status update: 2018

Hey, I almost managed an entire calendar year without a post! Not that I was ever adhering to any notion of schedule here, but I have to admit that lately I miss blogging more and more. It was nice to have a not-insignificant voice and be able to make some difference with things I used to care about, and still do. Let's see if I can revive some activity here. No promises, of course… First of all, I did a slight restyling of this blog . It now uses a non-standard font face, the front page lost both the "Recent comments" section and the "Essential topics". I plan to bring the latter back when I figure out how I want it to look and devise an algorithm that would keep it filled with something interesting without me maintaining it by hand. Other plans include reworking categories, comments and simplifying the whole bilingual machinery. But all of that is secondary. For now I'm happy that anyone coming to a post from an external link can see when the post is written and who wrote it. 2018 wasn't a fun year for me at all. I had to quit my job at Shutterstock due to it turning more or less to shit, and even though I was lucky enough to be picked up by the people who quit before me to work at another company, AlphaSights, I'm still painfully trying to decide what I want to do with my professional life in the future. I'm a bit tired of being stuck in senior engineering roles where I can't really influence any strategic decisions. Also, being picky about morality, not wanting to commute every day and wanting to keep my hobbies limits my options pretty significantly. Capitalism sucks :-) But hopefully I will resolve these problems one way or another before spiraling down into depression. My family is wonderful, can't wish for anything better, really!

0 views
Ivan Sagalaev 6 years ago

Treat HTTP status codes as strings

I usually see HTTP status codes being checked as integer numbers: This is rather verbose. But if you think about it, it's the first digit in the number that has a special value, and there are only five of them: When treated as strings, checking for the error class looks a bit better: Unfortunately, the ensuing party is pre-pooped by most client HTTP libraries helpfully casting the status to . 1 xx: server programmer is too smart 2 xx: success 3 xx: your client HTTP library is too dumb 4 xx: you screwed up 5 xx: server screwed up

0 views
Ivan Sagalaev 6 years ago

Status update: 2016

I thought I'd post something for no reason… We're still living in Washington state, I'm still into programming, with much the same hobbies as before. However these days I actually work for money. The company is Shutterstock , and I'm a resident "refactorer" on one of the teams in Search. So for the last few months I mostly write and re-write Python code. Which feels scarily comfortable. Scare is due to me understanding that from the point of view of self-development feeling comfortable is a road to nowhere. But I'm not scared too much, as it has only started and I'm constantly thinking of how to make my work life more difficult. I've been missing that for the past few years. My employment is somewhat unusual because the company, though gladly accepting remote workers, still adheres mostly to New York working hours which results in a funny schedule for me living 3 hours later from it. So I enjoy doing my groceries on weekdays, and running, and an occasional beach trip! Work is work though, and it inevitably took time out of other things, and since my other priorities (also inevitably) lay with my family, the things I chose to sacrifice are Rust and highlight.js . The latter isn't actually dead though, thanks to the established contributor community providing patches at a rate no one of the core team has any hope to keep up with, and to the well-greased release process . Speaking of which… The latest release of highlight.js was the first after a pretty long streak of completely automatic successful deployments which… didn't. The reason for it being me having updated the code of both highlightjs.org and softwaremaniacs.org to something resembling a current century setup. Projects are now running in their virtual environments, both use Django 1.10 and Python 3 (yes, I don't have Python 2 code on my server anymore). I took some cues from The 12-factor App , and my settings are less insane now. Also, the server can now survive reboots without assistance. And I moved it from London to US Easts coast, which is faster for me. What else… Tried working on a Mac, didn't like it. There are two Thinkpad X1s on my desk right now! Both running Ubuntu. Started and paused learning guitar (again). I own a Les Paul if you're interested… Drive a car with a stick-shift, a Ford Fiesta ST. If you're a car nut, I advise you to indulge yourself with the new gig of Clarkson's and Co — The Grand Tour . I watched the first episode with a happy grin on my face from the start to the very end! Also, it's good it's on Amazon, and I don't need to hunt torrents or buy some stupid subscription with a lot of TV I don't need. Disappointed with the Trump thing as everyone I know… in my echo chamber. But no despair, democracy should work out in the end!

0 views
Ivan Sagalaev 8 years ago

Why Rust's ownership/borrowing is hard

Working with pure functions is simple: you pass arguments, you get a result — no side effects happen. If, on the other hand, a function does have side effects, like mutating its arguments or global objects, it's harder to reason about. But we've got used to those too: if you see something like you can be reasonably certain that it's going to mutate the player object in a predictable way (and may be send some signals somewhere, too). Rust's ownership/borrowing system is hard because it creates a whole new class of side effects. Consider this code: Nothing in the experience of most programmers would prepare them to suddenly stopping working after being passed to ! The compiler won't let you use it in the next line. This is the side effect I'm talking about — something has happened to the argument — but not the kind you've seen in other languages. Here it happens because gets moved (instead of being copied) into the function so the function becomes responsible for destroying it and the compiler prevents you from using it after that point. The way to fix it is to either pass the argument by reference or to teach it how to copy itself. It makes total sense once you've learned about "move by default". But these things tend to jump out on you in a seemingly random fashion while you're doing some innocent refactorings or, say, adding logging. Consider a parser that takes some bits of data from an underlying lexer and maintains some state: The seemingly unnecessary is just a convenience wrapper around a somewhat longer string of calls that I have in the actual code. The returns a self-sufficient lexeme by copying data from the lexer's internal buffer. Now, we want to optimize it so lexemes would only hold references into that data and avoid copying. We change the method declaration to: The thingy effectively says that the lifetime of a lexeme is now tied to the lifetime of the lexer reference on which we call . It can't live all by itself but depends on data in the lexer's buffer. The just spells it out explicitly here. And now stops working: In plain English, Rust tells us that as long as we have available in this block of code it won't let us change — a different part of the parser. And this does not make any sense whatsoever! The culprit here is the helper. Although it only actually needs , to the compiler we say that it takes a reference to the entire parser ( ). And because it's a mutable reference, the compiler won't let anyone else touch any part of the parser lest they might change the data that currently depends on. So here we have this nasty side effect again: though we didn't change actual types in the function signature and the code is still sound and should work correctly, a different ownership dynamic suddenly doesn't let it compile anymore. Even though I understood the problem in general it took me no less than two days until it all finally clicked and the fix became obvious. Changing to accept a reference to just the lexer instead of the whole parser has fixed the problem but the code looked a bit non-idiomatic, having changed from a dot-method notation into a plain function call: Luckily Rust actually makes it possible to have it the right way, too. Since in Rust the definition of data fields ( ) is separate from the definition of methods ( ) I can define my own local methods for any struct, even if it's imported from a different namespace:

0 views
Ivan Sagalaev 9 years ago

Liberal JSON

Tim Bray beat me to writing about this with some very similar thoughts to mine: Fixing JSON . I especially like his idea about native times, along with prefixing them with as a parser hint. I'd like to propose some tweaks however, based on my experience of writing JSON parsers (twice). Not only you don't need either of them, they actually make parsing more complicated. When you're inside an array or an object, you already know when to expect a next value or a key, but you have to diligently check for commas and colons with the sole reason of signaling errors if you don't find them where expected. Add to that edge cases with trailing commas and empty containers, and you get a really complicated state machine with no real purpose. My proposal is simpler than Tim's, though: no need to actually remove them, just equate them to whitespace. As in: . That's it. It removes all the complications from parsing, and humans can write those for aesthetics. And by the way, this approach works fine in Clojure for vectors and maps . JSON is defined as a UTF-8 encoded stream of bytes. This is already enough for encoding the entire Unicode. Yet, on top of that there's another encoding scheme using . One could probably speculate it was added to enable authoring tools that can only operate in the ASCII subset of UTF-8, but thankfully we've moved away from those dark ages already. Handling those is a pain in the ass for a parser, especially a streaming one. Dealing with single-letter escapes like is easy, but with you need an extra buffer, you need to check for edge cases with not-yet-enough characters, and you're probably going to need a whole separate class of errors for those. Gah…

0 views
Ivan Sagalaev 9 years ago

highlight.js turns 10

Almost exactly ten years ago on August 14 I wrote on this very blog (albeit in a different language): So on yesterday's night I got worked up and decided to try and write [it]. But on a condition of not dragging it on for many days if it didn't work out on the first take, I've got enough on my mind as it is. It did work out. Which makes August 14 the official birthday of highlight.js ! Although it wasn't until 5 days later when the first meaningful commit was recorded. Using any form of source control was only an afterthought for me back then :-) With the obligatory self-congratulatory stuff out of the way, let me now get to the main purpose of this anniversary post: explaining what makes highlight.js different among other highlighters. I'm not going to talk about obvious features listed on the front page of highlightjs.org. I'll try to document the philosophy that up until this point I was only referring to in various places, but never was able to put together. I'll try to keep it short (otherwise I'll never finish this post!) It is my deep conviction that highlighting should make code more readable instead of simply making it… fun, for the lack of better word. Let me explain by example. Here's some things that serve towards better readability when highlighted: Keywords , because they define the overall structure of the code and because they need prominent highlighting simply because they otherwise look too much like user variables. Function and class titles at the place of declaration, because they effectively define a domain-specific language, an API. They have a very distinct semantics. Built-ins and special literals , because it helps to know what in the code belongs to the language and what is defined by the user. And these are the things highlighting which makes no sense, in my humblest opinion: CamelCase identifiers, because it's not consistent: you get identifiers of the same nature either highlighted or not simply because they happen to be named differently. calls, because I, frankly, can't even invent a plausible reason of why they should be highlighted in any way. Punctuation, because it significantly increases the amount of color clutter in any given snippet which makes it hard on the eyes. I have a hypothesis that the only reason why these things get highlighted traditionally is simply due to the fact that they could easily be picked up by a regexp :-) In highlight.js we sometimes go to great lengths to highlight what makes sense instead of what's easy ("semantics highlighting?"). In lisps we highlight the first thing in parentheses, regardless of it being or not being built-in, and we have special rules to not highlight them in quoted lists and even in argument lists in lambdas in Scheme . In VimScript we try our best to distinguish between strings and line comments even though they seem to be deliberately designed to trip up parsers. And we recognize quite a few ways of spelling out attributes in HTML. The downside of this is that highlight.js is heavier and probably slower than it could've been. These were the reasons why we recently lost a bid on replacing the incumbent highlighting library on Stack Overflow. I still think they made a mistake :-) Because quality beats lightness! Of course no code base is ideal, especially a 10 year old one, there's always so much to do! However, since our way of dealing with the stress of Open Source maintenance is to not have it happening to us, the development of highlight.js goes at a rather leisurely pace. Which means we've accumulated quite a few plans without any reasonable expectation of when they might happen. There's a new exciting parser in the making. We'd like to do an overhaul of our build system and packaging. There are plans to have pluggable renderers in addition to HTML. Switched through 3 version control systems (Subversion, Bazaar, Git). Made 71 (seventy-one!) public releases, with a regular 6 week cadence for the past year. 166 languages and 77 styles created by 216 contributors and 3 core developers . Accumulated 8062 stars on Github . Went from being a single .js file to be provided as a custom-built package, a node.js library and served from two independent CDNs. Acquired a mighty 490-strong unit test suite. Keywords , because they define the overall structure of the code and because they need prominent highlighting simply because they otherwise look too much like user variables. Function and class titles at the place of declaration, because they effectively define a domain-specific language, an API. They have a very distinct semantics. Built-ins and special literals , because it helps to know what in the code belongs to the language and what is defined by the user. CamelCase identifiers, because it's not consistent: you get identifiers of the same nature either highlighted or not simply because they happen to be named differently. calls, because I, frankly, can't even invent a plausible reason of why they should be highlighted in any way. Punctuation, because it significantly increases the amount of color clutter in any given snippet which makes it hard on the eyes.

0 views