Latest Posts (4 found)
Florian Zenker 2 weeks ago

Diff Algorithms

However, heuristics like this trade diff minimality for performance, this is not always desirable. Sometimes, a minimal diff is exactly what's required. disables these heuristics to always find a minimal diff irrespective of the costs. We established before that a diff algorithm finds one of many possible solutions. Given such a solution we can discover more solutions by it locally and then selecting the best solution according to some metric. This is exactly how Michael Haggerty's indentation heuristic works for text. For any given diff, we can often slide the edits up or down in a way that doesn't change the meaning of a diff. For example, has the same meaning as We call edits that can be slid up or down sliders . The question is, how do we select the best slide? Michael collected human ratings for different sliders of the same diff and used them to develop a heuristic to match these ratings: diff-slider-tools . However, this heuristic only works for text and is tuned towards code instead of prose. I decided to make it optional. It can be enabled with the option. The representation used during the execution of the diff algorithm has a surprising impact on the algorithm performance and result readability. This is not at all obvious, and so it took me a while to figure out that the best approach is akin to a side-by-side view of a diff: You use two slices to represent the left side and the right side respectively: in the left side slice represents a deletion and on the right side an insertion. is a matching element. This representation has four big advantages: It can be preallocated, the order in which edits are discovered doesn't matter, it's easy to mutate during post-processing, and it's easy to generate other representations from it. Diff algorithms are relatively complicated by themselves, but they pale in comparison to what's necessary to provide a high-quality diff library. This article tries to explain what went into my new diff library, but there's still more that I haven't implemented yet. Here is one real-world example of why worst-case scenarios are important: Imagine you're breaking an existing feature in a way that triggers a worst-case scenario in a test. If the test is running for a very long time or runs out of memory, you're going to have to debug two problems instead of one.  ↩︎   ↩︎ See benchmark_comparison.txt for the source of these ratings.  ↩︎   ↩︎   ↩︎   ↩︎ No support for arbitrary sequences  ↩︎   ↩︎   ↩︎   ↩︎ No support for unified diffs  ↩︎   ↩︎ The diffmatchpatch API is very hard to use  ↩︎ No support for structured results  ↩︎ Quadratic memory use; for my test cases, this resulted in >30 GB of memory used.  ↩︎ The mb0 API is from before generics and is a bit cumbersome to use  ↩︎ udiff has a very low threshold for when it starts to stop searching for an optimal solution. This improves the speed, but it also results in relatively large diffs.  ↩︎   ↩︎ There's no single patience diff heuristic, instead there are different implementations with different performance characteristics.  ↩︎ Stolen from https://blog.jcoglan.com/2017/03/22/myers-diff-in-linear-space-theory/   ↩︎ I can recommend https://blog.robertelder.org/diff-algorithm/ and this 5 part series https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/   ↩︎ Myers, E.W. An O(ND) difference algorithm and its variations. Algorithmica 1, 251-266 (1986). https://doi.org/10.1007/BF01840446   ↩︎

0 views
Florian Zenker 3 months ago

Using Automated Refactoring Tools in Anger

Go supports functions as a first class concept and it's pretty common to pass a function to inject custom behavior. This can have a performance impact. In the ideal case, like in the example below, the compiler can inline everything, resulting in optimal performance: The loop in is inlined into , as is the function we're passing in as a parameter. The resulting code in the loop is free of function calls (no instruction): However, in many cases, where the function that's called can't be inlined because it's too complex (or if we disallow the compiler to inline) Go needs to make an indirect function call from within the loop. Let's use to forbid the compiler from inlining and compare the generated assembly code: In this case, the loop is not inlined into and we clearly see an indirect function call. The call instruction is the function call, and it's indirect because it's using a register ( ) as an argument instead of a label. In many cases, the difference between the two is unnoticeable. It would definitely not be the right decision to always inline this function, and if the function is truly hot, Go may still inline it when using profile guided optimization . I ran into this while investigating optimization opportunities for znkr.io/diff . I had started out with a generic implementation of the diffing algorithm that works for any type, using a signature like: However, the implementation of diffing algorithms is rather complex and so the function was never inlined. After a number of rounds of optimizations, I ended up with a changed API and an implementation that collapsed all diffs to using the algorithm on an type 1 . The problem was that the function still used an underlying algorithm for and supplied an function that was never inlined. My hypothesis was that I could improve the performance by making sure that the function was inlined. However, I didn't know how to do that. I tried to validate the hypothesis with a simple hack that duplicated the implementation and specialized it by hand. To my surprise, that hack resulted in a runtime reduction by up to -40% . So clearly, that's a worthwhile optimization! Unfortunately, the diff implementation is a bit too complicated to be a good example for this blog post. Instead, let's use a very simple (but inefficient) Quick Sort implementation (which incidentally has a similar recursive structure as the diff implementation I am using): Manually specializing this function for is straightforward: Comparing the runtime of these two algorithms using a simple benchmark (sorting slices with 100 random numbers) on my M1 MacBook Pro also shows a runtime improvement of -40%: I tried a number of ideas but found no way to ensure the passed-in function was inlined. The alternative of maintaining two copies of the same algorithm didn't sound very appealing. It felt like I had to either reduce the API surface by removing the option, take a performance hit to have both, or maintain two versions of the same algorithm. I really wished for specialization that would allow me to write the algorithm once and change aspects of it for types. I liked none of these options, and I was despairing about which one to pick when it hit me: I could implement specialization myself! Go has excellent support for refactoring Go code thanks to the go/ast package. We can use that to build a code generator that performs the manual specialization described above automatically: This is overkill for a simple function like , but keep in mind that this is a simple example by construction. The same principle can be applied to much more complicated functions or even, as in the case that triggered me to do this, whole types with multiple functions. It's also fairly trivial to hook the specialization into a unit test to validate that the specialized version matches the output of the generator and use to regenerate the specialized version. The full version of what I ended up using for the diffing algorithm is here . I found a way to specialize functionality in a way that doesn't require manual maintenance. Is it a good idea, though? I don't really know, but I can drop the specialized version for a performance hit with only a few lines of code. If this turns out to be a bad idea, or if I or someone else finds a better solution, it's quite easily removed or replaced. The details don't matter here, but the idea is to assign a unique integer to every line in both inputs. This happens almost naturally when reducing the problem size by removing every line unique to either input. Both together significantly speed up a number of diffs.  ↩︎

0 views
Florian Zenker 1 years ago

Point-to-Point Routing

When I upgraded my NAS to have a 10G network interface I didn't have a 10G switch in my network. To still have a fast 10G connection I installed a direct link between my desk and my TrueNAS Scale box. That link existed in parallel to my regular 1G LAN link: Let's say we use the subnet for the LAN and for the point-to-point link 1 . An easy way to use the 10G link would be to always use my NAS's IP address on the point-to-point subnet when accessing my NAS. That's a little inconvenient though, I would prefer to use the hostname which resolves to . If my computer was stationary and always connected to the two links, I could simply override the hostname (e.g. by editing ). However, my computer is MacBook and I use it from the couch as well as from my desk. When I am at my desk, I use a Thunderbolt dock that's connected to all my periphery and the two network links. In that case, I want NAS connections to use the 10G link. But when I am using it from anywhere else, I want to use my Wifi link or VPN. This is not an impossible setup, IP routing is very powerful and my intuition was that this should be possible to use the 10G link transparently using IP routing. However, I did not find a guide to describe how to set this up, and needed to piece it together. All IP routing works by sending packets intended for one IP address to a different IP address without actually changing the destination IP address in the IP packet. On ethernet links that usually means sending these packets to the MAC address. The receiver of such a packet then decides what to do with that packet. A router will usually forward the packet to a different router until it reaches its destination. The most common case is to just send everything that's not on any local subnet to a default gateway . In home networks that's often the router that connects the LAN to the internet and forwards the packets to the ISPs router. For point-to-point routing, we add a static entry into the routing table on both ends of the point-to-point link that routes all packets intended for the other end to the point-to-point subnet address instead. That is, my MacBook, when connected to the 10G link, should send all IP packets intended for to and my NAS should send all packets intended for to . This is enough because both Linux and MacOS implement the weak host model : They don't care at what interface packets arrive. If a packet for arrives at the interface configured for , both Linux and MacOS will accept it if a different interface is configured for . On TrueNAS adding a static route is straightforward, specifying the destination as and the gateway as is all that's necessary: The destination is CIDR for "only ", so only packets intended for that IP address will be routed to . The setup on MacOS requires some command-line shenanigans though. First, we need to figure out the name of the 10G device. The simplest way I found was to run and compare with system settings: In my case "Thunderbolt Ethernet Slot 0" is the 10G point-to-point link and "USB 10/100/1000 LAN" is the 1G LAN link. Setting a static route on the "Thunderbolt Ethernet Slot 0" device works like this: Here, instead of the CIDR notation used by TrueNAS, the subnet mask makes sure to select only the IP address to route to . Neither of the two systems allows setting the source IP address. That's a bummer because without that connections using the 10G link use the point-to-point IP address instead of the LAN address. However, I haven't run into any issues because of that yet. I know I could be using a here, but they are a bit special and I didn't want to run into any issues. It's unlikely that I'll be using more than 1 point-to-point link ever, so the is already more for documentation than to save IP address resources.  ↩︎

0 views
Florian Zenker 1 years ago

TrueNAS ACME using deSEC.io

TrueNAS can secure connections to its WebUI using ACME-provisioned TLS certificates. Because I don't want to expose TrueNAS to the internet, the best possible way to get a certificate is to use ACME DNS-01. Unfortunately, TrueNAS only supports Cloudflare and AWS Route 53 out of the box. For other DNS providers, a shell script can be used. I am using deSEC.io and so I needed to write a shell script. I struggled quite a bit with this because there's not really any documentation out there about how to do it. Assuming TrueNAS tries to get a certificate for it will call the shell script with the following parameters. To set the challenge, it calls Once the challenge is either solved or failed it removes the challenge from DNS by calling It doesn't attempt to find the zone for you instead it gives you the full domain name (e.g. let's say the zone in this example is , then TrueNAS will provide instead of ). That's annoying because for deSEC I need the zone. I decided to hard code the zone into the script to avoid anything more complicated than calling . It's not the most robust piece of software, but it does its job.

0 views