Implementing a 2d-tree in Clojure
Recently I followed the very good Coursera course “ Algorithms, Part I ” from Coursera. The exercises were in Java, and the most fun one was implementing a two-dimensional version of a k-d tree . Since I sometimes do generative art in Clojure , I thought this would be a fun algorithm to implement myself. There already exists other implementations, for example this one , but this time I wanted to learn, not use. A 2-d tree is a spatial data structure that is efficient for nearest neighbour and range searches in a two-dimensional coordinate system. It is a generalization of a binary search tree to two dimensions. Recall that in a binary search tree, one builds a tree structure by inserting elements such that the left nodes always are lower, and the right nodes always are higher. In that way, one only needs $O(\log(n))$ lookups to find a given element. See Wikipedia for more details. In a 2-d tree one manages to do the same with points $(x,y)$ by alternating the comparison on the $x$ or $y$ coordinate. For each insertion, one splits the coordinate system in two. Look at the following illustration: This is the resulting tree structure after having inserted the points $(0.5, 0.5)$, $(0.6, 0.3)$, $(0.7, 0.8)$, $(0.4, 0.8)$, and $(0.4, 0.6)$. For each level of the tree, the coordinate to compare with is alternating. The following illustration shows how the tree divides the coordinate system into sub-regions: To illustrate searching, let’s look up $(0.4, 0.6)$. Then we first compare it with $(0.5, 0.5)$. The $x$ coordinate is lower, so we look at the left subtree. Now the $y$ coordinate is lower, so we look at the left subtree again, and we found our point. This is 2 compares instead of the maximum 5. There’s a lot more explanation on Wikipedia . Let’s jump straight to the implementation in Clojure. We first define a node to contain tree values: the point to insert, a boolean indicating if we are comparing vertically or horizontally (vertical means comparing the $x$-coordinate), and a rectangle, indicating which subregion the node corresponds to. (note: we don’t really need to carry around the rectangle information - it can be computed from the boolean and the previous point. I might optimize this later.) Insertion is almost identical to insertion into a binary search tree. Where the structure shines, is when looking for nearest neighbour. The strategy is as follows: keep track of the “best so far” point, and only explore subtrees that are worth exploring. When is a subtree wort exploring? Only when its region is closer to the search point than the current best point: In addition, we do one optimization when there are two subtrees. We explore the closest subtree first. Here’s the full code: In the recursion, we keep a stack of paths (it looks like ). When exploring a new node, we add it to the top of the stack, and when recurring, we pop the current stack. Here’s how the data structure looks after inserting the same points as in the illustration above: I wanted the tree structure to behave like a normal Clojure collection. The way to do this, is to implement the required interfaces. For example, to be able to use , , , , etc, we have to implement the interface. To find out which methods we need to implement, I found this Gist very helpful. I create a new type that I call using : When implementing , I took a lot of inspiration (and implementation) from this blog post by Nathan Wallace. Also, thanks to the Reddit user for pointing out a bad implementation. Here is the diff after his comments. We can create a helper method to create new trees: Now we can create a new tree like this: Also, the following code works: (get all points whose seconds coordinate is greater than 0.5) The full code can be seen here on Github . I did this partly to learn a simple geometric data structure, but also to make another tool in my generative art toolbox . Implementing an algorithm helps immensely when trying to understand it: I got better and better visualizing how these trees looked like. There are two main things about Clojure I want to mention in this section: the library, and the Clojure interfaces. The library is a property based testing library. In a few words, given constraints on inputs, in can generate test data for your function. In one particular case, it helped me verify that my code had a bug and produce a minimal example of this (the bug was that I forgot to recur in the clause in the function). By writing some “simple-ish” code, I got an example of an input that made the return a wrong answer. Here is the code: It is probably more verbose than needed, but the summary is this: the function return a generator , which, given some restraints, can return sample inputs (in this case: vectors of points). Then I compare the result from the tree-search with the brute force result given by first sorting the points, then picking the first point. The way I’ve set it up, whenever I run my tests, generates 10000 test cases and fails the test if my implementation doesn’t return the correct result. This was very handy, and quite easy to set up. It was rewarding to implement the Clojure core interfaces for my type ( , , etc.). What was a bit frustrating though, was the lack of documentation. I ended up reading a lot of Clojure source code to understand the control flows. Basically, the only thing I now about , is that it is a Java interface like this: Then I had to search the Clojure source code to understand how it was supposed to be used. It would be nice with a docstring or two. I found many blog posts that implemented custom types ( this one , this one , or this one ), but very little in Clojure own documentation. On the flip side, I got to read some of the Clojure source code, which was very educational. I also got to understand a bit more the usefulness of protocols (using and to provide several implementations). Here it was very useful to read the source code of thi-ng/geom . I learned a lot, and I got one more tool to make generative art. Perhaps later I could publish the code as a library, but I should really battle test it a bit more first (anyone can copy the code, it is open source on my Github) I used the data structure to create the following pictures (maybe soon I’ll link to my own page instead of Instagram). The function was very useful in making the code fast enough. Se dette innlegget på Instagram Et innlegg delt av Fredrik Meyer (@generert) Until then, thanks for reading this far!