GreatReads - Blog Aggregator · Phoenix Framework

Posts in Julia (18 found)

Media Diet

📺 Wondla — 10/10 kids show. I was way into it. Post-apoc situation with underground bunkers (apparently Apple loves that theme) where when the protagonist girl busts out of it, the world is quite different. The premise and payoff in Season 1 was better than the commentary vibe of Season 2, but I liked it all. Apparently there is one more season coming . 🎥 Downton Abbey: The Grand Finale — The darkest of the three movies? Weird. I love spending time in this world though so I was happy to be there. But honestly I was coming off a couple of day beers when I saw it in the theater and it put me in a weird mood and I should probably watch it again normally. How to proper movie critics review movies without their random current moods affecting the review?! 📕 Annie Bot — Sierra Greer is like, what if we turned AI into sex bots? Which honestly feels about 7 minutes away at this point. I’m only like half through it and it’s kinda sexy in that 50-shades kinda way where there is obviously some dark shit coming. 📔 Impossible People — Binge-able graphic novel by Julia Wertz about a redemption arc out of addiction. I’m an absolute sucker for addiction stories. This is very vulnerable and endearing. Like I could imagine having a very complicated friendship with Julia. It doesn’t go down to the absolute bottom of the well like in books like A Million Little Pieces or The Book of Drugs , so I’d say it’s a bit safer for you if you find stuff like that too gut wrenching.

Books Julia

Entertainment

0 views

DYNOMIGHT 6 months ago

DumPy: NumPy except it’s OK if you’re dum

What I want from an array language is: I say NumPy misses on three of these. So I’d like to propose a “fix” that—I claim—eliminates 90% of unnecessary thinking, with no loss of power. It would also fix all the things based on NumPy, for example every machine learning library. I know that sounds grandiose. Quite possibly you’re thinking that good-old dynomight has finally lost it. So I warn you now: My solution is utterly non-clever. If anything is clever here, it’s my single-minded rejection of cleverness. To motivate the fix, let me give my story for how NumPy went wrong. It started as a nice little library for array operations and linear algebra. When everything has two or fewer dimensions, it’s great. But at some point, someone showed up with some higher-dimensional arrays. If loops were fast in Python, NumPy would have said, “Hello person with ≥3 dimensions, please call my ≤2 dimensional functions in a loop so I can stay nice and simple, xox, NumPy.” But since loops are slow, NumPy instead took all the complexity that would usually be addressed with loops and pushed it down into individual functions. I think this was a disaster, because every time you see some function call like , you have to think: Different functions have different rules. Sometimes they’re bewildering. This means constantly thinking and constantly moving dimensions around to appease the whims of particular functions. It’s the functions that should be appeasing your whims! Even simple-looking things like or do quite different things depending on the starting shapes. And those starting shapes are often themselves the output of previous functions, so the complexity spirals. Worst of all, if you write a new ≤2 dimensional function, then high-dimensional arrays are your problem. You need to decide what rules to obey, and then you need to re-write your function in a much more complex way to— Voice from the back : Python sucks! If you used a real language, loops would be fast! This problem is stupid! That was a strong argument, ten years ago. But now everything is GPU, and GPUs hate loops. Today, array packages are cheerful interfaces that look like Python (or whatever) but are actually embedded languages that secretly compile everything into special GPU instructions that run on whole arrays in parallel. With big arrays, you need GPUs. So I think the speed of the host language doesn’t matter so much anymore. Python’s slowness may have paradoxically turned out to be an advantage , since it forced everything to be designed to work without loops even before GPUs took over. Still, thinking is bad, and NumPy makes me think, so I don’t like NumPy . Here’s my extremely non-clever idea: Let’s just admit that loops were better. In high dimensions, no one has yet come up with a notation that beats loops and indices. So, let’s do this: That’s basically the whole idea. If you take those three bullet-points, you could probably re-derive everything I do below. I told you this wasn’t clever. Suppose that and are 2D arrays, and is a 4D array. And suppose you want to find a 2D array such that . If you could write loops, this would be easy: That’s not pretty. It’s not short or fast. But it is easy! Meanwhile, how do you do this efficiently in NumPy? Like this: If you’re not a NumPy otaku, that may look like outsider art. Rest assured, it looks like that to me too, and I just wrote it. Why is it so confusing? At a high level, it’s because and and multiplication ( ) have complicated rules and weren’t designed to work together to solve this particular problem nicely. That would be impossible, because there are an infinite number of problems. So you need to mash the arrays around a lot to make those functions happy. Without further ado, here’s how you solve this problem with DumPy (ostensibly D ynomight N umPy ): Yes! If you prefer, you can also use this equivalent syntax: Those are both fully vectorized. No loops are executed behind the scenes. They’ll run on a GPU if you have one. While it looks magical, the way this actually works is fairly simple: If you index a DumPy array with a string (or a object), it creates a special “mapped” array that pretends to have fewer dimensions. When a DumPy function is called (e.g. or (called with )), it checks if any of the arguments have mapped dimensions. If so, it automatically vectorizes the computation, matching up mapped dimensions that share labels. When you assign an array with mapped dimensions to a , it “unmaps” them into the positions you specify. No evil meta-programming abstract syntax tree macro bytecode interception is needed. When you run this code: This is what happens behind the scenes: It might seem like I’ve skipped the hard part. How does know how to vectorize over any combination of input dimensions? Don’t I need to do that for every single function that DumPy includes? Isn’t that hard? It is hard, but did it already. This takes a function defined using ( JAX ’s version of) NumPy and vectorizes it over any set of input dimensions. DumPy relies on this to do all the actual vectorization. (If you prefer your janky and broken, I heartily recommend PyTorch’s .) But hold on. If already exists, then why do we need DumPy? Here’s why: That’s how you solve the same problem with . (And basically what DumPy does behind the scenes.) I think is one of the best parts of the NumPy ecosystem. The above code seems genuinely better than the base NumPy version. But it still involves a lot of thinking! Why put in the inner and in the outer one? Why are all the axes even though you need to vectorize over the second dimension of ? There are answers, but they require thinking. Loops and indices are better. OK, I did do one thing that’s a little clever. Say you want to create a Hilbert matrix with . In base NumPy you’d have to do this: In DumPy, you can just write: Yes! That works! It works because a acts both like a string and like an array mapped along that string. So the above code is roughly equivalent to: In reality, the choose random strings. (The class maintains a stack of active ranges to prevent collisions.) So in more detail, the above code becomes something like this: To test if DumPy is actually better in practice, I took six problems of increasing complexity and implemented each of them using loops, NumPy, JAX (with ), and DumPy. Note that in these examples, I always assume the input arrays are in the class of the system being used. If you try running them, you’ll need to add some conversions with / / . (Pretending doesn’t exist.) The goal is to create with The goal of this problem is, given a list of vectors and a list of Gaussians parameters, and arrays mapping each vector to a list of parameters, evaluate each corresponding vector/parameter combination. Formally, given 2D , , , and and 3D , the goal is to create with See also the discussion in the previous post . I gave each implementation a subjective “goodness” score on a 1-10 scale. I always gave the best implementation for each problem 10 points, and then took off points from the others based on how much thinking they required. According to this dubious methodology and these made-up numbers, DumPy is 96.93877% as good as loops! Knowledge is power! But seriously, while subjective, I don’t think my scores should be too controversial. The most debatable one is probably JAX’s attention score. The only thing DumPy adds to NumPy is some nice notation for indices. That’s it. What I think makes DumPy good is it also removes a lot of stuff. Roughly speaking, I’ve tried to remove anything that is confusing and exists because NumPy doesn’t have loops. I’m not sure that I’ve drawn the line in exactly the right place, but I do feel confident that I’m on the right track with removing stuff. In NumPy, works if and are both scalar. Or if is and is . But not if is and is . Huh? In truth, the broadcasting rules aren’t that complicated for scalar operations like multiplication. But still, I don’t like it, because every time you see , you have to worry about what shapes those have and what the computation might be doing. So, I removed it. In DumPy you can only do if one of or is scalar or and have exactly the same shape. That’s it, anything else raises an error. Instead, use indices, so it’s clear what you’re doing. Instead of this: write this: Indexing in NumPy is absurdly complicated . When you write that could do many different things depending on what all the shapes are. I considered going cold-turkey and only allowing scalar indices in DumPy. That wouldn’t have been so bad, since you can still do advanced stuff using loops. But it’s quite annoying to not be able to write when and are just simple 1D arrays. So I’ve tentatively decided to be more pragmatic. In DumPy, you can index with integers, or slices, or (possibly mapped) s. But only one index can be non-scalar . I settled on this because it’s the most general syntax that doesn’t require thinking. Let me show you what I mean. If you see this: It’s “obvious” what the output shape will be. (First the shape of , then the shape of , then the shape of ). Simple enough. But as soon as you have two multidimensional array inputs like this: Suddenly all hell breaks loose. You need to think about broadcasting between and , orthogonal vs. pointwise indices, slices behaving differently than arrays, and quirks for where the output dimensions go. So DumPy forbids this. Instead, you need to write one of these: They all do exactly what they look like they do. Oh, and one more thing! In DumPy, you must index all dimensions . In NumPy, if has three dimensions, then is equivalent to . This is sometimes nice, but it means that every time you see , you have to worry about how many dimensions has. In DumPy, every time you index an array or assign to a , it checks that all indices have been included. So when you see option (4) above, you know that: Always, always, always . No cases, no thinking. Again, many NumPy functions have complex conventions for vectorization. sort of says, “If the inputs have ≤2 dimensions, do the obvious thing. Otherwise, do some extremely confusing broadcasting stuff.” DumPy removes the confusing broadcasting stuff. When you see , you know that and have no more than two dimensions, so nothing tricky is happening. Similarly, in NumPy, is equivalent to . When both inputs have ≤2 or fewer dimensions, this does the “obvious thing”. (Either an inner-product or some kind of matrix/vector multiplication.) Otherwise, it broadcasts or vectorizes or something? I can never remember. In DumPy you don’t have that problem, because it restricts to arrays with one or two dimensions only. If you need more dimensions, no problem: Use indices. It might seem annoying to remove features, but I’m telling you: Just try it . If you program this way, a wonderful feeling of calmness comes over you, as class after class of possible errors disappear. Put another way, why remove all the fancy stuff, instead of leaving it optional? Because optional implies thinking! I want to program in a simple way. I don’t want to worry that I’m accidentally triggering some confusing broadcasting insanity, because that would be a mistake. I want the computer to help me catch mistakes, not silently do something weird that I didn’t intend. In principle, it would be OK if there was a method that preserves all the confusing batching stuff. If you really want that, you can make it yourself: You can use that same wrapper to convert any JAX NumPy function to work with DumPy. Think about math: In two or fewer dimensions, coordinate-free linear algebra notation is wonderful. But for higher dimensional tensors , there are just too many cases, so most physicists just use coordinates. So this solution seems pretty obvious to me. Honestly, I’m a little confused why it isn’t already standard. Am I missing something? When I complain about NumPy, many people often suggest looking into APL -type languages, like A, J, K, or Q. (All single-letter languages are APL-like, except C, D, F, R, T, X, and many others. Convenient, right?) The obvious disadvantages of these are that: None of those bother me. If the languages are better, we should learn to use them and make them do autodiff on GPUs. But I’m not convinced they are better. When you actually learn these languages, what you figure out is that the symbol gibberish basically amounts to doing the same kind of dimension mashing that we saw earlier in NumPy: The reason is that, just like NumPy and , these languages choose align dimensions by position , rather than by name. If I have to mash dimensions, I want to use the best tool. But I’d prefer not to mash dimensions at all. People also often suggest “NumPy with named dimensions” as in xarray . (PyTorch also has a half-hearted implementation .) Of course, DumPy also uses named dimensions, but there’s a critical difference. In xarray, they’re part of the arrays themselves, while in DumPy, they live outside the arrays. In some cases, permanent named dimensions are very nice. But for linear algebra, they’re confusing. For example, suppose is 2-D with named dimensions and . Now, what dimensions should have? ( twice?) Or say you take a singular value decomposition like . What name should the inner dimensions have? Does the user have to specify it? I haven’t seen a nice solution. xarray doesn’t focus on linear algebra, so it’s not much of an issue there. A theoretical “DumPy with permanent names” might be very nice, but I’m not sure how it should work. This is worth thinking about more. I like Julia ! Loops are fast in Julia! But again, I don’t think fast loops matter that much, because I want to move all the loops to the GPU. So even if I was using Julia, I think I’d want to use a DumPy-type solution. I think Julia might well be a better host language than Python, but it wouldn’t be because of fast loops, but because it offers much more powerful meta-programming capabilities. I built DumPy on top of JAX just because JAX is very mature and good at calling the GPU, but I’d love to see the same idea used in Julia (“Dulia”?) or other languages. OK, I promised a link to my prototype, so here it is: It’s just a single file with around 700 lines. I’m leaving it as a single file because I want to stress that this is just something I hacked together in the service of this rant . I wanted to show that I’m not totally out of my mind, and that doing all this is actually pretty easy. I stress that I don’t really intend to update or improve this. (Unless someone gives me a lot of money?) So please do not attempt to use it for “real work”, and do not make fun of my code. PS. DumPy works out of the box with both and . For gradients, you need to either cast the output to a JAX scalar or use the wrapper. PPS. If you like this, you may also like einx or torchdim . Update : Due to many requests, I have turned this into a “real” package, available on PyPi as . You can install it by typing: Or, if you use uv (you should) you can play around with DumPy by just typing this one-liner in your terminal: For example: Don’t make me think. Run fast on GPUs. Really, do not make me think. OK, what shapes do all those arrays have? And what does do when it sees those shapes? Bring back the syntax of loops and indices. But don’t actually execute the loops. Just take the syntax and secretly compile it into vectorized operations. Also, let’s get rid of all the insanity that’s been added to NumPy because loops were slow. If you index a DumPy array with a string (or a object), it creates a special “mapped” array that pretends to have fewer dimensions. When a DumPy function is called (e.g. or (called with )), it checks if any of the arguments have mapped dimensions. If so, it automatically vectorizes the computation, matching up mapped dimensions that share labels. When you assign an array with mapped dimensions to a , it “unmaps” them into the positions you specify. has 4 dimensions has 2 dimensions has 1 dimension has 4 dimensions They’re unfamiliar. The code looks like gibberish. They don’t usually provide autodiff or GPU execution.

Programming

Python

Data Analysis Julia

0 views

fnands 7 months ago

A quick first look at GPU programming in Mojo

The day has finally arrived. Well actually, the day arrived in February, but who’s counting. The Mojo language has finally publicly released the ability to do GPU programming - if you have a reasonably modern NVIDIA GPU. Luckily for me, I have an RTX 3090, and although it isn’t officially supported , it is basically an A10, which is. Looking at some of the comments on the nightly releases, it does seem that AMD support is on the way as well. The Modular team publicly released the ability to do GPU programming in Mojo in release 25.1, with further support and documentation in release 25.2. Fun fact: release 25.2 also saw my first (tiny) contribution to the Mojo standard library. This is a really important step for Mojo, a language that bills itself as a language designed to solve a variety of AI development challenges, which in this day and age basically means programming an increasingly heterogeneous stack of hardware. Today this mostly means GPUs, but there is an explosion of new accelerators like the ones from Cerebras, Groq and SambaNova, not to mention the not-so-new TPU from Google. As DeepSeek showed the world recently: if you’re willing to put the work in, there is a lot more to be squeezed out of current-gen hardware than most people thought. Now, I don’t think every ML engineer or researcher should be looking for every possible way to get more out of their compute, but there are definitely some wins to be had. As an example, I’m really fascinated by the work of Tri Dao and his collaborators, who work on deeply hardware aware improvements in machine learning, e.g. FlashAttention , which is mathematically equivalent to the attention mechanism that powers all transformer models, but with hardware aware optimizations that take into account the cost of memory access in GPUs. This does make me wonder what other optimizations are out there to be discovered. This however is not easy, as the authors note in the “Limitations and Future Directions” section of the FlashAttention paper: Our current approach to building IO-aware implementations of attention requires writing a new CUDA kernel for each new attention implementation. This requires writing the attention algorithm in a considerably lower-level language than PyTorch, and requires significant engineering effort. Implementations may also not be transferrable across GPU architectures. These limitations suggest the need for a method that supports writing attention algorithms in a high-level language (e.g., PyTorch), and compiling to IO-aware implementations in CUDA What makes GPU programming in Mojo interesting is that you don’t need the CUDA toolkit to do so, and compiles down to PTX which you can think of as NVIDIA’s version of assembly. If Mojo (and Max in general) can make it easier to write GPU kernels in a more user-friendly language, it could be a game changer. If you want to get started, there is a guide for getting started with GPU programming in Mojo from Modular (the company behind Mojo), which I strongly recommend. I learn by doing, so I wanted to try to implement something with relatively simple using the GPU. The example idea I chose is to transform an RGB image to grayscale, which is an embarrassingly parallel problem without a lot of complexity. I was halfway through writing this post before I realized that there was already an example of how to do grayscale conversion in the Mojo repo, but oh well. I basically just start with what’s in the documentation, but I added another example that I did do myself. To start, let’s read in an image using mimage , an image processing library I am working on. The image is represented here as a rank three tensor with the dimensions being width, height and channels, and the data type is an unsigned 8-bit integer. In this case we have four channels: red, green, blue and alpha (transparency), the latter being 255 for all pixels. So what we want to do here is to sum together the RGB values for each pixel, using the weights , and for red, green and blue respectively. If you want to know why we are using these weights, read this article . Now that we have that, let’s define a simple version of the transform we want on CPU. So hopefully that worked! Let’s see if it’s correct. I haven’t implemented image saving in mimage yet, so let’s use the good old Python PIL library to save the image. Now that we have a working CPU implementation, let’s try to implement the same function on the GPU. But first, let’s check if Mojo can actually find my GPU: Now that we know that Mojo can find our GPU, let’s define the function that will do the actual conversion. This kernel reads a pixel from the input tensor, converts it to grayscale and writes the result to the output tensor. It is parallelized across the output tensor, which means that each thread is responsible for one pixel in the output tensor. As you can see, it takes in as parameters the layout specifications of the input and output tensors, the width and height of the image, and the input and output tensors themselves. Now, the first slightly awkward thing I had to do was convert the image from a , which is what is returned by , to a , which is the new tensor type that is compatible with GPU programming. I am assuming that will be deprecated in the future. With this new tensor type you can explicitly set which device the tensor should be allocated on. In this case I will allocate it to the CPU, i.e. the host device, and then copy over the data from the old tensor to the new one. Next, we have to move the tensor to the GPU. Now that was easy enough. The next step is to allocate the output grayscale tensor. As we don’t need to copy over the data from the old tensor, we can just allocate it on the GPU immediately. Next, we get the layout tensors for the input and output tensors. The documentation on LayoutTensor is a bit sparse, but it seems to be there to make it easy to reason about memory layouts. There seems to be two ways to use GPU functions in Mojo. The first is to use the function, which is what I do here. This compiles the gpu kernel into a function which can be called as normal. While this function is being executed on the GPU, the host device will wait until it is finished before moving on. Later in this post I will show the other option which allows the host device to do other things while waiting for the GPU. And that’s it! Let’s call the GPU function. Here I will device the image up into blocks of 32x32 pixels, and then call the function. I have to admit, I have no clue what the best practices are for choosing the block size, so if you know a good rule of thumb, please let me know. I wonder if there is a way to tune these parameters at compile time? Once that is run, we move the grayscale tensor back to the CPU and compare the results. and there we have it! We have successfully converted an image to grayscale using the GPU. Another example I wanted to try is downsampling an image. This is a bit more complex than the grayscale conversion, because we need to handle the different dimensions of the input and output tensors. First let’s define some test images to make sure the function is doing what we expect. If this works we should have a downsampled 8x8 image with the same values as the original image. Let’s start with a CPU implementation: So it works! This does make some assumptions about the input image, like that it is a multiple of the factor. But good enough for a blog post. Now let’s try to do the same on the GPU. We again define our output tensor on the GPU, get the layout tensor and move the data from the host device to the GPU. This time we will try the other way of using GPU functions: enqueing the function(s) to be executed on the GPU. This means the host device will not wait for the GPU to finish the function, but can do other things while the GPU is running. When we call the host device will wait for the GPU to finish all enqueued functions. This allows for some interesting things, like running the GPU function in parallel with some other code on the host device. This is can also be a little bit dangerous if you try to access the GPU memory from the host device while the GPU is still running. Let’s try it out: Again, it works! Let’s try it on our original image, and downsample it by a factor of 2 and 4. Let’s also do a CPU version for comparison, and define the output tensors on the GPU. Now we can call the GPU function. Notice how we can enqueue a second function while the first one is still running. As it does not depend on the first function to finish, it can potentially start running before the first function has finished. Now let’s verify the results: Great! We can save these and see what they look like: And as we can see, the images get progressively more blurry the more we downsample. This was my first quick look at GPU programming in Mojo. I feel the hardest thing is conceptually understanding how to properly divide the work between threads, and how to assign the correct numbers of threads, blocks and warps (which I didn’t even get into here). I guess the next move is to look up some guide on how to efficiently program GPUs, and to maybe try some more substantial examples. The documentation on GPU programming in Mojo is still a bit sparse, and there aren’t many examples out there in the wild to learn from, but I am sure that will change soon. The Moduar team did say they are releasing it unpolished so that they can gather some community feedback early. For someone who uses GPUs a lot in my day job, I never really interact with the GPUs at a low level; it’s always through PyTorch or JAX or some other layer of abstraction from Python. It’s quite fun to have such low level access to the hardware in a language that doesn’t feel that dissimilar from Python. I think this is really where I am starting to see the vision behind Mojo more clearly. I think the shallow take is that Mojo is a faster Python, or basically some ungodly hybrid between Python and Rust, but the more I play with it the more I feel it’s a language designed to make programming heterogenous hardware easier. I don’t think it will be the only language like this we’ll see, and I am curious to see if other languages based on MLIR will pop up soon, or if some existing languages will adapt. Maybe basing Julia 2.0 off MLIR instead of LLVM is a good next move for the language. You only need to look at the schematic off Apple silicon chips these days to see which way the wind is blowing: a significant fraction of the chip is dedicated to GPU cores. I think the days where having a GPU attached to your computer was only for specialists is going out the window, and we might pretty soon be able to safely assume that every modern computer will have at least a decent amount of GPU cores available for general purpose tasks, and not just graphics. Still, I doubt most programmers will ever have to worry about actually directly programming GPUs, but I am interested to see how libraries take advantage of this fact.

Programming Julia

Hardware

Python

0 views

A Room of My Own 11 months ago

Life Without Envy

Jealousy is always a mask for fear: fear that we aren’t able to get what we want; frustration that somebody else seems to be getting what is rightfully ours even if we are too frightened to reach for it. At its root, jealousy is a stingy emotion. It doesn’t allow for the abundance and multiplicity of the universe. Jealousy tells us there is room for only one—one poet, one painter, one whatever you dream of being. Julia Cameron, The Artist's Way I read Life Without Envy: Ego Management for Creative People by Camille DeAngelis a long time ago, but I never really processed its highlights. Usually, I read on my Kindle and manage my highlights through Readwise. I always have these ambitious plans to revisit my highlights and reflect on them, and while I do that occasionally, most of the time, they just sit there, waiting for the "right" moment. This particular book comes back to me often—not because it was especially brilliant or life-changing. I wouldn’t call it a great book, but it resonated in many ways. It put into words something I’d been feeling for a long time: the belief that my creative work isn’t worth anything until it’s publicly validated by someone else. Or as the author puts it: I just need to prove myself as soon as possible, and then I’ll be someone important. I’ve struggled with this, especially when it comes to blogging. I wrote about my hesitation before —how much time I spent questioning whether what I wrote had any value or whether people I know would read my posts and judge me. Whether I sound ridiculous. That fear held me back for a long time. What helped me, at least partially, was stepping outside of my comfort zone in other ways. I joined a critique group for fiction writing, where we read our work out loud and received feedback from others. It was terrifying at first—sharing something so personal and opening myself up to critique—but it taught me to detach a little from the fear of judgment. I learned to listen, take what was useful, and leave the rest. Eventually, I moved further outside my comfort zone by sending my manuscript for assessment. When the feedback came back, it was also incredibly helpful. That experience reinforced the importance of putting my work out there. Below, I’ve shared the highlights I took from this book (in italics), along with my own commentary. Life Without Envy: Ego Management for Creative People by Camille DeAngelis How many times did the hunger for approval win out over curiosity and imagination? We are made to feel that we must always be striving for more. A bigger house, more money, more success, because if you feel complete just as you are, then you’re no longer a cog in the system. So you see, you’re not special. And that is the one great and profound benediction underwriting your entire existence. The idea here is pretty simple: you’re not special—and that’s actually a good thing. Not being inherently "special" frees you from the pressure of living up to extraordinary expectations or comparisons. Instead, it emphasizes that the lack of special status allows you to simply be —to exist, learn, grow, and find meaning without the weight of proving uniqueness. When we let go of trying to be "special," you realize something beautiful: we’re all in this together. Life isn’t about being the best or standing out—it’s about living, growing, and connecting with others. The real gift of life is knowing your worth doesn’t depend on being unique or exceptional. You’re enough just as you are, and that’s more than okay—it’s freeing. When we talk about wanting to be “great,” we implicitly set ourselves above others. We see ourselves as chosen where others are not. We oftent think: I just need to prove myself as soon as possible, and then I’ll be someone important. The frantic desire to produce an amazing work of art as soon as possible so that everyone will hail your genius before any of your contemporaries can edge you out. If I can’t be the best, then I don’t deserve to be here. I’m not a “real” artist until I make my work public. Even commercially successful artists sometimes work under a scarcity mentality: there are only so many artists who can be taken seriously, and I am not one of them. Here’s the saddest part: if all along we’ve been creating from a place of lack, what might we be capable of if we drew from a full well? As the entrepreneur and business coach Marie Forleo says: “Envy is often a clue that there’s something latent in you that needs to be expressed.” If you keep wanting what someone else has, you can’t grow into everything you could be. When you hinge your perception of success or failure on how your work is received, you create your own misery. You may think, “But I don’t think I’m entitled. I know I have to earn it.” Yet if you look back over the past ten or twenty or thirty years, at the various ways in which you may have waited for your life to happen to you, you begin to see that this passivity has been an expectancy: entitlement in a softer guise. So, what can you do? DeAngelis concludes that we don’t a castle or a faraway destination. Step outside and soak in the natural world—whether it’s a quiet park, a beach, or even your backyard. Shake up your daily routine with something new, no matter how small. And most importantly, set the intention that something extraordinary can happen. Bliss is right here, waiting for us to notice it. Related Post: Recognizing the Scarcity Mentality I just need to prove myself as soon as possible, and then I’ll be someone important. The frantic desire to produce an amazing work of art as soon as possible so that everyone will hail your genius before any of your contemporaries can edge you out. If I can’t be the best, then I don’t deserve to be here. I’m not a “real” artist until I make my work public.

Writing Julia

Career

0 views

Julia Evans 1 years ago

Importing a frontend Javascript library without a build system

I like writing Javascript without a build system and for the millionth time yesterday I ran into a problem where I needed to figure out how to import a Javascript library in my code without using a build system, and it took FOREVER to figure out how to import it because the library’s setup instructions assume that you’re using a build system. Luckily at this point I’ve mostly learned how to navigate this situation and either successfully use the library or decide it’s too difficult and switch to a different library, so here’s the guide I wish I had to importing Javascript libraries years ago. I’m only going to talk about using Javacript libraries on the frontend, and only about how to use them in a no-build-system setup. In this post I’m going to talk about: There are 3 basic types of Javascript files a library can provide: I’m not sure if there’s a better name for the “classic” type but I’m just going to call it “classic”. Also there’s a type called “AMD” but I’m not sure how relevant it is in 2024. Now that we know the 3 types of files, let’s talk about how to figure out which of these the library actually provides! Every Javascript library has a build which it uploads to NPM. You might be thinking (like I did originally) – Julia! The whole POINT is that we’re not using Node to build our library! Why are we talking about NPM? But if you’re using a link from a CDN like https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js , you’re still using the NPM build! All the files on the CDNs originally come from NPM. Because of this, I sometimes like to the library even if I’m not planning to use Node to build my library at all – I’ll just create a new temp folder, there, and then delete it when I’m done. I like being able to poke around in the files in the NPM build on my filesystem, because then I can be 100% sure that I’m seeing everything that the library is making available in its build and that the CDN isn’t hiding something from me. So let’s a few libraries and try to figure out what types of Javascript files they provide in their builds! First let’s look inside Chart.js , a plotting library. This library seems to have 3 basic options: option 1: . The suffix tells me that this is a CommonJS file , for using in Node. This means it’s impossible to use it directly in the browser without some kind of build step. option 2: . The suffix by itself doesn’t tell us what kind of file it is, but if I open it up, I see which is an immediate sign that this is an ES module – the syntax is ES module syntax. option 3: . “UMD” stands for “Universal Module Definition”, which I think means that you can use this file either with a basic , CommonJS, or some third thing called AMD that I don’t understand. When I was using Chart.js I picked Option 3. I just needed to add this to my code: and then I could use the library with the global environment variable. Couldn’t be easier. I just copied into my Git repository so that I didn’t have to worry about using NPM or the CDNs going down or anything. A lot of libraries will put their build in the directory, but not always! The build files’ location is specified in the library’s . For example here’s an excerpt from Chart.js’s . I think this is saying that if you want to use an ES Module ( ) you should use , but the jsDelivr and unpkg CDNs should use . I guess is for Node. ’s also says , which according to this documentation tells Node to treat files as ES modules by default. I think it doesn’t tell us specifically which files are ES modules and which ones aren’t but it does tell us that something in there is an ES module. is a library for logging into Bluesky with OAuth in the browser. Let’s see what kinds of Javascript files it provides in its build! It seems like the only plausible root file in here is , which looks something like this: This syntax means it’s an ES module . That means we can use it in the browser without a build step! Let’s see how to do that. Using an ES module isn’t an easy as just adding a . Instead, if the ES module has dependencies (like does) the steps are: The reason we need an import map instead of just doing something like is that internally the module has more import statements like , and we need to tell the browser where to get the code for and all of its other dependencies. Here’s what the importmap I used looks like for : Getting these import maps to work is pretty fiddly, I feel like there must be a tool to generate them automatically but I haven’t found one yet. It’s definitely possible to write a script that automatically generates the importmaps using esbuild’s metafile but I haven’t done that and maybe there’s a better way. I decided to set up importmaps yesterday to get github.com/jvns/bsky-oauth-example to work, so there’s some example code in that repo. Also someone pointed me to Simon Willison’s download-esm , which will download an ES module and rewrite the imports to point to the JS files directly so that you don’t need importmaps. I haven’t tried it yet but it seems like a great idea. I did run into some problems with using importmaps in the browser though – it needed to download dozens of Javascript files to load my site, and my webserver in development couldn’t keep up for some reason. I kept seeing files fail to load randomly and then had to reload the page and hope that they would succeed this time. It wasn’t an issue anymore when I deployed my site to production, so I guess it was a problem with my local dev environment. Also one slightly annoying thing about ES modules in general is that you need to be running a webserver to use them, I’m sure this is for a good reason but it’s easier when you can just open your file without starting a webserver. Because of the “too many files” thing I think actually using ES modules with importmaps in this way isn’t actually that appealing to me, but it’s good to know it’s possible. If the ES module doesn’t have dependencies then it’s even easier – you don’t need the importmaps! You can just: If you don’t want to use importmaps, you can also use a build system like esbuild . I talked about how to do that in Some notes on using esbuild , but this blog post is about ways to avoid build systems completely so I’m not going to talk about that option here. I do still like esbuild though and I think it’s a good option in this case. CanIUse says that importmaps are in “Baseline 2023: newly available across major browsers” so my sense is that in 2024 that’s still maybe a little bit too new? I think I would use importmaps for some fun experimental code that I only wanted like myself and 12 people to use, but if I wanted my code to be more widely usable I’d use instead. Let’s look at one final example library! This is a different Bluesky auth library than . Again, it seems like only real candidate file here is . But this is a different situation from the previous example library! Let’s take a look at : There’s a bunch of stuff like this in : This syntax is CommonJS syntax, which means that we can’t use this file in the browser at all, we need to use some kind of build step, and ESBuild won’t work either. Also in this library’s it says which is another way to tell it’s CommonJS. Originally I thought it was impossible to use CommonJS modules without learning a build system, but then someone Bluesky told me about esm.sh ! It’s a CDN that will translate anything into an ES Module. skypack.dev does something similar, I’m not sure what the difference is but one person mentioned that if one doesn’t work sometimes they’ll try the other one. For using it seems pretty simple, I just need to put this in my HTML: and then put this in . It seems to Just Work, which is cool! Of course this is still sort of using a build system – it’s just that esm.sh is running the build instead of me. My main concerns with this approach are: I also learned that you can also use to convert a CommonJS module into an ES module, though there are some limitations – the syntax doesn’t work. Here’s a github issue about that . I think the approach is probably more appealing to me than the approach because it’s a tool that I already have on my computer so I trust it more. I haven’t experimented with this much yet though. Here’s a summary of the three types of JS files you might encounter, options for how to use them, and how to identify them. Unhelpfully a or file extension could be any of these 3 options, so if the file is you need to do more detective work to figure out what you’re dealing with. The main difference between CommonJS modules and ES modules from my perspective is that ES modules are actually a standard. This makes me feel a lot more confident using them, because browsers commit to backwards compatibility for web standards forever – if I write some code using ES modules today, I can feel sure that it’ll still work the same way in 15 years. It also makes me feel better about using tooling like because even if the esbuild project dies, because it’s implementing a standard it feels likely that there will be another similar tool in the future that I can replace it with. A lot of the time when I talk about this stuff I get responses like “I hate javascript!!! it’s the worst!!!”. But my experience is that there are a lot of great tools for Javascript (I just learned about https://esm.sh yesterday which seems great! I love esbuild!), and that if I take the time to learn how things works I can take advantage of some of those tools and make my life a lot easier. So the goal of this post is definitely not to complain about Javascript, it’s to understand the landscape so I can use the tooling in a way that feels good to me. Here are some questions I still have, I’ll add the answers into the post if I learn the answer. Here’s a list of every tool we talked about in this post: Writing this post has made me think that even though I usually don’t want to have a build that I run every time I update the project, I might be willing to have a build step (using or something) that I run only once when setting up the project and never run again except maybe if I’m updating my dependency versions. Thanks to Marco Rogers who taught me a lot of the things in this post. I’ve probably made some mistakes in this post and I’d love to know what they are – let me know on Bluesky or Mastodon!

Julia

Frontend

JavaScript

Tutorial

HTML

0 views

Julia Evans 1 years ago

New microblog with TILs

I added a new section to this site a couple weeks ago called TIL (“today I learned”). One kind of thing I like to post on Mastodon/Bluesky is “hey, here’s a cool thing”, like the great SQLite repl litecli , or the fact that cross compiling in Go Just Works and it’s amazing, or cryptographic right answers , or this great diff tool . Usually I don’t want to write a whole blog post about those things because I really don’t have much more to say than “hey this is useful!” It started to bother me that I didn’t have anywhere to put those things: for example recently I wanted to use diffdiff and I just could not remember what it was called. So I quickly made a new folder called /til/ , added some custom styling (I wanted to style the posts to look a little bit like a tweet), made a little Rake task to help me create new posts quickly ( ), and set up a separate RSS Feed for it. I think this new section of the blog might be more for myself than anything, now when I forget the link to Cryptographic Right Answers I can hopefully look it up on the TIL page. (you might think “julia, why not use bookmarks??” but I have been failing to use bookmarks for my whole life and I don’t see that changing ever, putting things in public is for whatever reason much easier for me) So far it’s been working, often I can actually just make a quick post in 2 minutes which was the goal. My page is inspired by Simon Willison’s great TIL blog , though my TIL posts are a lot shorter. This came about because I spent a lot of time on Twitter, so I’ve been thinking about what I want to do about all of my tweets. I keep reading the advice to “POSSE” (“post on your own site, syndicate elsewhere”), and while I find the idea appealing in principle, for me part of the appeal of social media is that it’s a little bit ephemeral. I can post polls or questions or observations or jokes and then they can just kind of fade away as they become less relevant. I find it a lot easier to identify specific categories of things that I actually want to have on a Real Website That I Own: and then let everything else be kind of ephemeral. I really believe in the advice to make email lists though – the first two (blog posts & comics) both have email lists and RSS feeds that people can subscribe to if they want. I might add a quick summary of any TIL posts from that week to the “blog posts from this week” mailing list.

Web Development Julia

Programming

0 views

blog.philz.dev 1 years ago

Safari Top, Part 2

I posted recently about getting the top memory-using tabs from Safari. This is the sort of pickle you get into if you're using a laptop with only 8GB of RAM. There are two problems: (1) how to map tabs to process ids and (2) how to get the memory usage of the underlying processes. Once you enable AppleScript works well enough to get the mapping of tabs to process ids, but, crucially for the second problem, was underreporting memory usage. For example, Claude is reportedly using 1GB of memory, but is reporting just 1MB. This led me down a rabbit hole of finding the command, and seeing memory usage more in the 1GB ballpark. I learned about from Julia Evans' blog post and went on a little bit of a detour to try to replicate it. It turns out that to get a "mach port" you need several "Entitlements" like and . So, you make the Rust work, figure out how to and, voila, it still doesn't work . Safari is protected by System Integrity Protection and doesn't allow you to open a mach port to it. So, back at square two, we find out about , find the header files in , and use Python's package. The field seems to match with Activity Monitor says. (The documentation is sparse, and I haven't delved deeper.) The reason to use Python rather than compiling a binary is to avoid a compile or installation step. So, here's the result: Here's the Python code : This time, I converted the AppleScript into "Javascript for Automation" (JXA), and learned that the Script Editor app has an "Open Dictionary" feature which lets you browse what's possible. If you find out how Activity Monitor actually gets the pids of the tabs, let me know!

JavaScript Julia

Python

Programming

0 views

Lambda Land 1 years ago

My Top Emacs Packages

If you ask anyone what the best Emacs packages are, you’ll almost definitely hear Magit (the only Git porcelain worth using) and Org Mode (a way to organize anything and everything in plain text) listed as #1 and #2. And they’re right! I use those packages extensively every day. Besides those two powerhouses, there are a handful of packages that make using Emacs a delight. If I had to ever use something else, I would miss these packages most: Jump around your screen crazy fast. Teleport to any character with ~5 key strokes. See https://karthinks.com/software/avy-can-do-anything/ for more reasons why it’s awesome. I almost exclusively rely on and have it bound to . Kind of like a super-charged right-click for Emacs. Works beautifully in dired, when selecting files in the minibuffer. There’s an easy way to make it play well with Avy which is just the best. Eat is a terminal emulator that’s faster almost all the other terminal emulators for Emacs. The only emulator it’s not faster than is Vterm, which is pretty dang speedy. Eat has been more than fast enough for all my needs however. Additionally, it can make a terminal emulator in a particular region , so if you use Eshell, you can get a little terminal emulator for every command you run. Normally, if you run, say, , you see the ugly terminal escape characters printed as text. With Eat, however, those terminal escape characters get interpreted correctly. Interactive programs (e.g. the Julia and Elixir REPLs) work flawlessly with it. Best spellchecking ever. It can spellcheck based off of the fontlock face; I keep this on when I program to get on-the-fly spellchecking of code comments and strings. I keep bound to à la flyspell because it is so darn helpful. Supports checking documents with mixed languages. This is one of the packages I miss most when I’m editing text outside of Emacs. The best way to add citations in Emacs, hands-down. Reads bibtex, inserts in org-mode, LaTeX, whatever. These next packages are all by Daniel Mendler . These packages improve selecting commands, buffers, files, etc. from the and interfaces. These make Emacs insanely ergonomic and excellent. These replace packages like Helm , Ivy/Counsel/Swiper , and Company . In comparison to these packages, Vertico + Consult + Corfu are lighter-weight, faster, less buggy (in my experience; I’ve tried them all!), and work better with other Emacs packages because they follow the default built-in APIs. Lighter-weight, less buggy vertical completing-read interface. Replaces Ivy. Incredibly flexible. Works out-of-the-box with everything that has a interface, so you don’t need special packages to make it play nice. Recommend adding Marginalia as well by the same author to add extra infos. Better than counsel. The live preview is amazing; I use instead of , instead of Swiper. is :fire: for searching large projects with instant previewable results. Pairs well with Embark to save results to a buffer. Lightweight pop-up library. Pairs well with Cape by the same author. See also Orderless which enhances everything from to to the Corfu popup. Vertico + Consult + Orderless + Marginalia + Corfu + Cape + Embark is sometimes called the “minad stack”. Embark and Orderless are both developed by Omar Camarena (oantolin) who frequently collaborates with Daniel Mendler. When I asked Omar on Reddit about the name, Omar replied that “minad stack” is fine; another name they’ve tried for the stack is “iceberg”, which I think is a good name too. It’s the new hotness—that said, it’s gotten really really stable over the past two years. If you like these packages, consider sponsoring their maintainers! These are some of my favorite open-source projects and I try to support them when I can. If you like these packages, you might like my Emacs Bedrock starter kit which, unlike many other starter kits, is meant to be a no-nonsense no-fluff no-abstraction bare-bones start for you to fork and tinker with to your liking. The stock configuration only installs one package ( which-key , which is amazing) but includes some extra example configuration. The extras/base.el file includes sample starter configuration for most of the above packages. (I should add to it, come to think of it…) Avy Jump around your screen crazy fast. Teleport to any character with ~5 key strokes. See https://karthinks.com/software/avy-can-do-anything/ for more reasons why it’s awesome. I almost exclusively rely on and have it bound to . Embark Kind of like a super-charged right-click for Emacs. Works beautifully in dired, when selecting files in the minibuffer. There’s an easy way to make it play well with Avy which is just the best. Eat Eat is a terminal emulator that’s faster almost all the other terminal emulators for Emacs. The only emulator it’s not faster than is Vterm, which is pretty dang speedy. Eat has been more than fast enough for all my needs however. Additionally, it can make a terminal emulator in a particular region , so if you use Eshell, you can get a little terminal emulator for every command you run. Normally, if you run, say, , you see the ugly terminal escape characters printed as text. With Eat, however, those terminal escape characters get interpreted correctly. Interactive programs (e.g. the Julia and Elixir REPLs) work flawlessly with it. Jinx Best spellchecking ever. It can spellcheck based off of the fontlock face; I keep this on when I program to get on-the-fly spellchecking of code comments and strings. I keep bound to à la flyspell because it is so darn helpful. Supports checking documents with mixed languages. This is one of the packages I miss most when I’m editing text outside of Emacs. Citar The best way to add citations in Emacs, hands-down. Reads bibtex, inserts in org-mode, LaTeX, whatever. Vertico Lighter-weight, less buggy vertical completing-read interface. Replaces Ivy. Incredibly flexible. Works out-of-the-box with everything that has a interface, so you don’t need special packages to make it play nice. Recommend adding Marginalia as well by the same author to add extra infos. Consult Better than counsel. The live preview is amazing; I use instead of , instead of Swiper. is :fire: for searching large projects with instant previewable results. Pairs well with Embark to save results to a buffer. Corfu Lightweight pop-up library. Pairs well with Cape by the same author. Eat is not the fastest terminal emulator, Vterm is. Thanks to a Redditor who pointed this out.

Programming

Elixir Julia

0 views

blog.philz.dev 1 years ago

A Bibliography of Sorts and Some Quotes

These were impactful to me, one way or another. Did you just tell me to... is a classic from @jrecursive. Migrations are a fact of software engineering life, and this, by Manu Cornet , is on point. Julia Evans's comics and zines are a national treasure. I learned some options to ! I learned about CSS! I've shared the post on SQL queries don't start with SELECT many times! Google published The Standard of Code Review which includes the following: In general, reviewers should favor approving a CL once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect. I learned a lot from the ggplot2 book . I've followed Jeff Heer's work including Vega Lite, and you could do much worse than reading some of it. Matt Eccleston wrote a blog post on code-centric versus product-goal-centric teams . My friend Dan wrote Effective Typescript . Sometimes, coming up with the right approach to testing a problem is the key to solving the problem. Don't take just my word for it; here's FoundationDB Testing and debugging distributed systems is at least as hard as building them. Unexpected process and network failures, message reorderings, and other sources of non determinism can expose subtle bugs and implicit assumptions that break in reality, which are extremely difficult to reproduce or debug. The consequences of such subtle bugs are especially severe for database systems, which purport to offer perfect fidelity to an unambiguous contract. Moreover, the stateful nature of a database system means that any such bug can result in subtle data corruption that may not be discovered for months. FDB took a radical approach— before building the database itself, we built a deterministic database simulation framework that can simulate a network of interacting processes and a variety of disk, process, network, and request-level failures and recoveries, all within a single physical process. This rigorous testing in simulation makes FDB extremely stable, and allows its developers to introduce new features and releases in a rapid cadence. When I approach a new code base, I look first at what happens on disk (typically, the database schemas) and what happens across the wire (the RPC definitions like protobuf or thrift files, OpenAPI/swagger/typescript schemas, clicking around in the network tab in Chrome). This quote, from The Mythical Man-Month (Brooks) strikes a chord: Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. Tracing. I had the privilege of using Dapper while at Google. X-Trace is similar. The ad-hoc version of this is to add a random identifier to log statements (especially "canonical log lines") and pass those identifiers along across RPC boundaries (e.g., via an HTTP header). Add canonical log lines (thanks, Stripe and Brandur Leach, for the write up) to your system. If you can make it queryable with SQL, you have a lovely analytics system! Nelson Elhage's "What does a cache do?" is a great discussion of why read replicas and caches are different. E-mail me if you have pointers to these! A history of UI library APIs. How did we get from Apple II graphics to React? These libraries seem so dynamic; I'd love to read a history! A picture of the Cauldron visualization for a CDH build. (This is very specific!)

Programming

CSS

Database

Testing Julia

0 views

NorikiTech 1 years ago

Writing by hand for side projects

In the past few months I made good progress on my side projects ( DDPub , Typingvania ) compared to the previous period of drought. What changed? I started writing about them by hand. (Every journal is 140 A4 pages and I’m halfway through my third.) I’ve been following a practice called “ morning pages ” created by Julia Cameron. I learned about it by reading her book “ The Artist’s Way .” Previously I tried a similar practice called “freewriting” where you would ask a question and write for a predetermined short amount of time, but it didn’t stick. I’ve been doing the pages for about six months now. It is as simple as is explained on the page above—you sit down, preferably every day, and write down a certain amount straight from your thoughts. For me the right amount is two pages of A4 which takes me about an hour to write. It’s quite a lot and I skip some days, but it works so I keep at it. I use some space to write about my personal life, but over time I also started writing about the projects I’m working on. Usually it takes the form of a narrative around the next steps: Here I am in project right now: … The next two things I want to work on are this and this. How would I implement this approach? I would need to extend this object with such and such fields and add methods that would do this and this. When I transform this data, I’ll store it in this type of structure… I wonder if I could do this instead and rewrite that bit? The same internal monologue would happen if I were sitting in front of the code editor, but it’s hard to fully follow the thread of thought when you’re there trying to write some code at the same time. Writing by hand forces you to complete the thought to the end. There is some recent research showing higher brain activation when writing by hand. I don’t know if it’s that or not, but I often have really neat ideas when writing that I’m sure I would have missed otherwise—and I write them down immediately because I’m writing! Solutions to some architectural or UI problems simply presented themselves to me while I was writing. Clearly I thought about them previously, but I was also receptive during the writing sessions. Another benefit of the morning pages (that I usually do during lunch or even in the evening) is that divorcing the planning from the execution really helps when you are mentally fatigued. I work full time and do my side projects in the evening, and after a full day of work there is sometimes nothing but static in my head. It’s hard to stop working only to sit down and start working again. If I have written my pages, I already have a specific plan of the next steps, what exactly I want to do and how to implement it. It’s easier to sit down and start working on the side project when I’ve already done the hard part (came up with the implementation) and I only need to execute. The progress I make informs my next writing session, and the cycle repeats. The pages may take quite a lot of time, but as I said, they work (for me). I started to work and make progress on ideas that were sitting in my notes for five years or more. If I don’t write the pages, I’m likely to spend the same amount of time just starting at the editor trying to collect my thoughts and understand what to do next, that is if I sit down to work on the project at all. The pages help with that too. I reacknowledge to myself why I’m doing what I’m doing and what result I’m expecting. I’m prone to be distracted and anxious, and the pages give me more focus than I feel otherwise. The practice is simple so if you have similar problems as me, I encourage you to try it for a week and see if it makes any difference.

Programming Julia

Writing

0 views

blog.philz.dev 1 years ago

Creating a monorepo out of a multirepo

Inspired by Julia Evans' posts on git , I'm jotting down an obscure trick to combine repos. Sometimes you have multiple repos, and you want to create a monorepo out of them. Perhaps you have a distribution of many components, and it's convenient to across all of them together. The following annotated snippet creates two repos and joins them together. The key insight is that a commit in git is made up of (roughly) a message, pointers to parent commits, and a "tree," the latter of which can be made synthetically by using plumbing commands. This approach is probably overkill for a one time merging of two repos. In that case, create a commit in your second repo that moves everything to a subdirectory, add the second repo as a remote, and merge in that second repo using the . Rendered another way (with from and questionable abuse of ): If you're doing this for real, note that the above will fail spectacularly if you have spaces (and their ilk) in your names. Use or something. Create two repos, a-repo and b-repo, and initialize them with a file and some commits. Initialize the monorepo Add both subrepos as remotes and fetch them. Synthesize a tree listing and create it. Synthesize a commit with all the parents Update our working copy You can see how the tree preserves the subrepo histories Now let's let the A repo change Now we have to redo the merge. We use the same trick, sorta. Since we have that nice "README.md" in the mono repo tree, we want to preserve that. But, when we pull out , we have those and trees, and we want to filter those out. So, here we're abusing to filter them out. The and expression is producing . And, sure enough, voila!

Julia

Git

0 views

Lambda Land 1 years ago

Functional Languages Need Not Be Slow

Somewhere in my adolescence I got stuck with the notion that functional languages were slow while languages like C were fast. Now, a good C programmer can eke more performance out of their code than probably anyone else, but the cost you pay to keep your code correct goes exponential as you get closer and closer to the machine. Functional languages abstract a lot away from the machine. Higher languages in general abstract away the machine and make code easier to maintain. So I think I had it in my head that functional languages, being far away from the bare metal, must necessarily be slow. It didn’t help that I also thought of functional languages as being necessarily interpreted languages. Turns out, functional languages are just as amenable to compilation as imperative ones. Many popular/well-known functional are in fact compiled. (E.g. Haskell, Scala, Rust, Julia, etc.) Moreover, these languages can be just as fast—if not faster—than their more “mainstream” counterparts. I wanted to pit my favorite language (Racket) against a slightly more well-known language (Python) to see how they handled a simple single-threaded program. For good measure I threw Rust, Julia, and JavaScript into the mix for comparison. If you’re impatient, just jump to the results . I wrote the original program in Racket, then had ChatGPT help me rewrite it in Python. ChatGPT did astoundingly well. I eventually rewrote the program to be a little more idiomatic—I wanted it to use a loop instead of tail recursion, as Python is much better with loops than it is with lots and lots of function calls. I also had ChatGPT help me rewrite the program to Rust, Julia, and JavaScript. Impressive—and unsettling. I ran these programs on my M1 Pro MacBook Pro. Here’s what I got: In graphical form: Wow! I did not expect Python to get so pummeled by everything else! It makes sense that Julia with its sweet JIT engine is the fastest. Rust does well—no surprise there. (Note that I’m not counting compilation time here—all the more impressive for Julia!) Racket holds its own though, coming in third place by a wide margin. If you did take Rust’s compile time into account, I think that would make it switch places with Racket. Of course, you use Rust for compile-once-run-lots sorts of scenarios. Are these authoritative? No, of course not. This is a simple synthetic benchmark. I don’t consider myself expert programmer in any of these languages, so there are likely some performance tweaks that could make the respective language’s benchmark run faster. (Maybe I should in Racket… it’s kind of hard to consider yourself an expert in a language when its creator works down the hall from you though.) That said, I hope this dispels the myth that functional languages are necessarily slow. That is not the case. If Python is fast enough for your use-case, then there’s no reason you shouldn’t consider using Racket on performance grounds. Library support is a different matter: of all the programming that goes on in the world, the majority is probably just gluing libraries together to do the thing you want. This is a good thing: it means we’re not reinventing so many wheels and people are getting good work done. That said, there’s plenty of exciting work for which no library exists! If you find yourself in one of these exciting cases, consider using the tool of maximum power: in this regard, nothing beats Racket for its flexibility and extensibility . I love the Racket Plot library. So easy to use, so powerful. If you run it inside of DrRacket, the graphs are actually interactive: you can rotate 3D graphs and zoom in on sections of 2D graphs. So neat! Here’s the code I used to generate the above graph:

JavaScript Julia

Performance Haskell

Programming

0 views

Hillel Wayne 1 years ago

Notes on Every Strangeloop 2023 Talk I Attended

This is my writeup of all the talks I saw at Strangeloop, written on the train ride back, while the talks were still fresh in my mind. Now that all the talks are online I can share it. This should have gone up like a month ago but I was busy and then sick. Enjoy! Topic: How to define what “success” means to you in your career and then be successful. Mostly focused on psychological maxims, like “put in the work” and “embrace the unknown”. I feel like I wasn’t the appropriate audience for this; it seemed intended for people early in their career. I like that they said it’s okay to be in it for the money. Between the “hurr dur you must be in it for the passion” people and the “hurr durr smash capitalism” people, it’s nice to hear some acknowledgement that money makes your life nice. Topic: the value of “play” (as in “play make believe”, or “play with blocks”) in engineering. Some examples of how play leads to innovation, collaboration, and cool new things. Most of the talk is about the unexpected directions her “play” went in, like how her work in education eventually lead to a series of collaborations with OK Go. I think it was more inspirational than informative, to try to get people to “play” rather than to provide deep insight into the nature of the world. Still pretty fun. (Disclosure, I didn’t actually see this talk live, I watched Zac Hatfield-Dodds rehearse and gave feedback. Also Zac is a friend and we’ve collaborated before on Hypothesis stuff.) Topic: Some of the unexpected things we observe in working LLMs, and some of the unexpected ways they’re able to self-reference themselves. Zac was asked to give the talk at the last minute due to a sickness cancellation by another speaker. Given the time crunch, I think he pulled it together pretty well. Even so it was a bit too technical for me; I don’t know if he was able to simplify it in time for the final presentation. Like most practical talks on AI, intentionally or not he slipped in a few tricks to eke more performance out of an LLM. Like if you ask them to answer a question, and then rate the confidence of the question they asked, they tend to be decently accurate at their confidence. Zac’s also a manager at Anthropic , which gave the whole talk some neat “forbidden knowledge” vibes. (Disclaimer: Douglas is a friend, too.) Topic: Why stack-based languages are an interesting computational model, how they can be Turing-complete, and some of the unusual features you get from stack programming. The first time I’ve seen a stack-based language talk that wasn’t about Forth. Instead, it used his own homegrown stack language so he could just focus on the computer science aspects. The two properties that stick out to me are: My only experience with stack machines is golfscript . Maybe I’ll try to pick up uiua or something. Topic: “Small” generative AI models, like “taking all one-star amazon reviews for the statue of liberty and throwing them into a Markov chain ”. This was my favorite session of the conference. The technical aspects are pretty basic, and it’s explained simply enough that even layfolk should be able to follow. His approach generates dreamy nonsense that should be familiar to anyone who’s played with Markov chains before. And then he pulls out a guitar. The high point was his “Weird Algorithm”, which was like a karaoke machine which replaced the lyrics of songs with corpus selections that matched the same meter and syllables. Like replacing “oh I believe in yesterday” with “this is a really nice Hyundai”. I don’t know how funny it’ll be in the video, it might be one of those “you had to be there” things. Topic: The modern pace of tech leaves a lot of software, hardware, and people behind. How can we make software more sustainable, drawn from the author’s experiences living on a boat. Lots of thoughts on this one. The talk was a crash course on all the different kinds of sustainability: making software run on old devices, getting software guaranteed to run on future devices, computing under significant power/space/internet constraints, and everything in between. I think it’s intended as a call to arms for us to think about doing better. I’m sympathetic to the goals of permacomputing; what do I do with the five old phones in my closet? That’s a huge amount of computing power just gone to waste. The tension I always have is how this scales. Devine Lu Levinga is an artistic savant (they made Orca !) and the kind of person who can live on a 200-watt boat for seven years. I’m not willing to give up my creature comforts of central heating and Geforce gaming. Obviously there’s a huge spectrum between “uses less electricity than a good cyclist ” and “buys the newest iPhone every year”, the question is what’s the right balance between sustainability and achievability. There’s also the whole aesthetic/cultural aspect to permacomputing. Devine used images in dithered black/white. AFAICT this is because Hypercard was black/white, lots of retrocomputing fans love hypercard, and there’s a huge overlap between retrocomputing and permacomputing. But Kid Pix is just as old as Hypercard and does full color. It’s just not part of the culture. Nit: at the end Devine discussed how they were making software preservation easier by writing a special VM. This was interesting but the discussion on how it worked ended up going way over time and I had to run to the next session. (Disclaimer: Jesus I’m friends with way too many people on the conference circuit now) Topic: Formal methods is useful to reason about existing legacy systems, but has too high a barrier to entry. Marianne made a new FM language called “Fault” with a higher levels of abstraction. Some discussion of how it’s implemented. This might just be the friendship talking, but Fault looks like one of the more interesting FM languages to come out recently. I’m painfully aware of just how hard grokking FM can be, and anything that makes it more accessible is good. I’ll have to check it out. When she said that the hardest part is output formatting I felt it in my soul. Topic: Lots of “simple” things take years to learn, like Bash or DNS. How can we make it easier for people to learn these things? Four difficult technologies, and different approaches to making them tractable. I consider myself a very good teacher. This talk made me a better one. Best line was “behind every best practice is a gruesome story.” That’ll stick with me. Topic: Randal Munroe (the xkcd guy)’s closing keynote. No deep engineering lessons, just a lot of fun. Before Julia Evans’ talk, Alex did A Long Strange Loop , how it went from an idea to the monolith it is today. Strange Loop was his vision, an eclectic mix of academia, industry, art, and activism. And it drew a diverse crowd because of that. I’ve made many friends at Strangeloop, people like Marianne and Felienne . I don’t know if I’ll run into them at any other conferences, because I don’t think other conferences will capture that lightning in a bottle. I’ll miss them. I also owe my career to Strangeloop. Eight years ago they accepted Tackling concurrency bugs with TLA+ , which got me started both speaking and writing formal methods professionally. There’s been some talk about running a successor conference (someone came up with the name “estranged loop”) but I don’t know if it will ever be the same. There are lots of people can run a good conference, but there’s only one person who can run Alex’s conference . Whatever comes next will be fundamentally different. Still good, I’m sure, but different. Stack programs don’t need to start from an empty stack, which means entire programs will naturally compose. Like you can theoretically pipe the output of a stack program into another stack program, since they’re all effectively functions of type . Stack ops are associative: if you chop a stack program into subprograms and pipe them into each other, it doesn’t matter where you make the cuts, you still get the same final stack. That’s really, really cool.

Julia

Programming Bash

0 views

fnands 2 years ago

Mojo 0.5.0 and SIMD

Another month, another Mojo release! It feel like every time I run into a missing feature in Mojo it gets added in the next version: In my first post I complained about a lack of file handling, which was then added soon after in version . For version I ran into the issue that you can’t print Tensors, which has now been added in the release. So this means Mojo has now unlocked everyone’s favourite method of debugging: printing to stdout. In addition to that, Tensors can now also be written to and read from files with and . There have also been a couple of updates to the SIMD type, which lead me to ask: How does the SIMD type work in Mojo? For a bit of background, you might have noticed that CPUs clock speeds haven’t really increased by much in the last decade or so, but computers have definitely gotten faster. One of the factors that have increased processing speed has been a focus on vectorization through SIMD, which stands for , i.e. applying the same operation to multiple pieces of data. Modern CPUs come with SIMD registers that allow the CPU to apply the same operation over all the data in that register, resulting in large speedups, especially in cases where you are applying the same operation to multiple pieces of data, e.g. in image processing where you might apply the same operation to the millions of pixels in an image. One of the main goals of the Mojo language is to leverage the ability of modern hardware, both CPUs and GPUs, to execute SIMD operations. There is no native SIMD support in Python, however Numpy does make this possible. Note: SIMD is not the same as concurrency, where you have several different threads running different instructions. SIMD is doing the same operation on different data. Generally, SIMD objects are initialized as , so to create a SIMD object consisting of four 8-bit unsigned integers we would do: And actually, SIMD is so central to Mojo that the builtin type is actually just an alias for : Modern CPUs have SIMD registers, so lets use the package in Mojo to see what the register width on my computer is: This means we can pack 256 bits of data into this register and efficiently vectorize an operation over it. Some CPUs support AVX-512 , with as the name suggests 512 bit SIMD registers. Most modern CPUs will apply the same operation to all values in their register in one step, allowing for significant speedup for functions that can exploit SIMD vectorization. In my case, we’ll have to live with 256 bits. This means in this register we can either put 4 64-bit values, 8 32-bit values 16 16-bit values, or even 32 8-bit values. We can use the utility function to tell us how many 32-bit floating point numbers will fit in our register: One of the new features in Mojo is that SIMD types will default to the width of the architecture, meaning if we call: Mojo will automatically pack 8 32-bit values, or 32 8-bit values int the register. This is equivalent to calling: Operations over SIMD types are quite intuitive. Let’s try adding two SIMD objects together: Additionally, since the version 0.5.0, we can also concatenate SIMD objects with : Operations applied to a SIMD object will be applied element-wise to the data in it, if the function is set up to handle this: As far as I can tell, this doesn’t just work automatically. If I define a function as: Then applying it to a single floating point number works as expected: But trying this on a SIMD object does not: However, if I define a version of the function to take a SIMD object: Then (with the additional specification of the parameter ), it will apply the function to all the values: While still working on single floating point values, as they are just SIMD objects of width one under the hood: I do miss the flexibility of Julia a bit, where you can define one function and then vectorize it with a dot, i.e. if you have a function that operates on scalar values, then calling will apply it element-wise to all values of that vector, and return a vector of the same shape. But for the most part, defining functions just to apply to SIMD values in Mojo doesn’t lose you much generality anyway. To be honest, I was a little bit daunted when I first saw the SIMD datatype in Mojo. I vaguely remember playing around with SIMD in C++, where it can be quite complicate to implement SIMD operations. But in Mojo, it really is transparent and relatively straightforward to get going with SIMD. It is clear that exploiting vectorization is a top priority for the Modular team, and a lot of through has clearly gone into making it easy to exploit the SIMD capabilities of modern hardware. I might take a look at vectorization vs parallelization in Mojo in the future, and maybe even try my hand at a bit of benchmarking.

Julia

C++

Performance

Programming

Hardware

0 views

Lambda Land 2 years ago

Towards Fearless Macros

Macros are tricky beasts. Most languages—if they have macros at all—usually include a huge “here there be dragons” warning to warn curious would-be macro programmers of the dangers that lurk ahead. What is it about macros that makes them so dangerous and unwieldy? That’s difficult to answer in general: there are many different macro systems with varying degrees of ease-of-use. Moreover, making macros easy to use safely is an open area of research—most languages that have macros don’t have features necessary to implement macros safely. Hence, most people steer clear of macros. There are many ways to characterize macro systems; I won’t attempt to cover them all here, but here’s the spectrum I’ll be covering in this post: Figure 1: A spectrum of how easy macro systems are to use safely If you’ve done any C programming, you’ve likely run into things like: That bit is a macro—albeit a C macro. These operate just after the lexer: they work on token streams. It’s a bit like textual search-and-replace, though it knows a little bit about the structure of the language (not much: just what’s a token and what’s not) so that you won’t run into problems if you do something like this: because that in the string is not a token—it’s just part of a string. C macros can’t do very much: you scan the token stream for a macro, then fill in the variables to the macro, and then replace the macro and the arguments its consumed with the filled-out template that is the macro definition. This prevents you from doing silly things like replacing something sitting inside of a string literal, but it’s far, far from being safe, as we’ll see in the next section. In contrast to C’s macros, Lisp’s macros are much more powerful. Lisp macros operate after the lexer and the parser have had a go at the source code—Lisp macro operate on abstract syntax trees —or ASTs, which is what the compiler or interpreter works with. Why is this a big deal? The ASTs capture the language’s semantics around precedence, for instance. In C you can write a macro that does unexpended things, like this: The macro didn’t know anything about precedence and we computed the wrong thing. This means that, to use a macro in C, you have to have a good idea of how it’s doing what it’s intended to do. That means C macros are leaky abstractions that prevent local reasoning: you have to consider both the macro definition and where it’s used to understand what’s going on. In contrast, Lisp macros are an improvement because they will rewrite the AST and the precedence you’d expect will be preserved. You can do this, for example: Lisp macros are also procedural macros , meaning you can execute arbitrary code inside of a macro to generate new ASTs. Macros in Lisp and its descendants are essentially functions from AST → AST. This opens up a whole world of exciting possibilities! Procedural macros constitute a “lightweight compiler API”. [ 4 ] “Same except for variable names” is also called alpha-equivalence. This comes from the λ-calculus, which states that the particular choice of variable name should not matter. E.g. \(\lambda x.x\) and \(\lambda y.y\) are the same function in the lambda calculus, just as \(f(x) = x + 2\) and \(g(y) = y + 2\) are the same function in algebra. Lisp macros aren’t without danger—many a Lisp programmer has shot their foot off with a macro. One reason is that Lisp macros are not hygienic —variables in the macro’s implementation may leak into the context of the macro call. This means that two Lisp programs that are the same except for different variable names can behave differently: The fact that the macro implementation uses a variable named ( tmp-leaky ) has leaked through to the user of the macro. ( tmp-capture ) This phenomenon is called variable capture , and it exposes this macro as a leaky abstraction! There are ways to mitigate this using , but those are error-prone manual techniques. It makes macro writing feel like you’re writing in an unsafe lower-level language. Scheme’s macros introduce a concept known as hygiene , which prevents variable capture automatically: In this case, the variable that the macro introduces ( tmp-intro-macro ) is not the same thing that the variable from the calling context ( tmp-intro-let ) refers to. This separation of scopes happens automatically behind the scenes, so there’s now no chance of accidental variable capture. Breaking hygiene has some utility in some cases—for example, one might want to add a form inside the body of a loop. There are ways around hygiene, but these are not without some problems. For more details see [ 2 ]. If you’d like to know more about hygiene, [ 1 ] is an excellent resource. Since Scheme macros (and Lisp macros more generally) allow running arbitrary Scheme code—including code from other modules—the dependency graph between modules can get so tangled that clean builds of a Scheme codebase are impossible. Racket solves this problem with its phase separation , which puts clear delimiters between when functions and macros are available to different parts of the language. This detangles dependency graphs without sacrificing the expressive power of macros. I wrote a little bit about phase separation ; you can read more on the Racket docs as well as Matthew Flatt’s paper [ 3 ] on phase separation. Racket also has a robust system for reasoning about where a variable’s definition comes from called a scope set . This is a notion makes reasoning about where variables are bound sensible. See a blog post as well as [NO_ITEM_DATA:flattBindingSetsScopes2016b] by Matthew Flatt for more on scope sets. Phase separation and scope sets make Racket macros the safest to use: Racket macros compose sensibly and hide their implementation details so that it is easy to write macros that are easy to use as if they were built-in language constructs. Racket also goes beyond the form that it inherited from Scheme; Racket’s macro-building system makes generating good error messages easy. There’s a little bug in the macro we used earlier, and that is the form only takes an identifier (i.e. a variable) as its first argument. We don’t have any error checking inside the macro; if we were to call with something that wasn’t an identifier, we’d get an error in terms of the the macro expands to, not the macro call itself: This isn’t good because there’s no in our code at all! We could add some error handling in our macro to manually check that and are identifiers, but that’s a little tedious. Racket’s helps us out: Much better! Now our error is in terms that the macro user will recognize. There are lots of other things that can do that make it easy to write correct macros that generate good error messages—a must for macros that become a part of a library. Many modern languages use macros; I’ll only talk about a few more here. If something’s missing, that’s probably because I didn’t want to be exhaustive. Julia macros have a lot of nice things: they operate on ASTs and they’re hygienic, though the way hygiene is currently implemented is a little strange: all variables get ’d automatically Meaning, they all get replaced with some generated symbol that won’t clash with any possible variable or function name. whether or not they come from inside the macro or they originated from the calling code. Part of the problem is that all variables are represented as simple symbols, which [ 1 ] shows is insufficient to properly implement hygiene. Evidently there is some ongoing work to improve the situation. This is a good example of research ideas percolating into industry languages I think. Elixir has robust AST macros, and its standard library makes heavy use of macros; many “core” Elixir constructs like , , , , and others are actually macros that expand to smaller units of Elixir. Elixir actually gets hygiene right! Unlike Julia, variables in Elixir’s AST have metadata—including scope information—attached to them. This and other aspects of Elixir’s macro system open it up to lots of exciting possibilities. The Nx library brings support for numerical and GPU programming to Elixir, and it works essentially by implementing a custom Elixir compiler in Elixir itself , and macros play a big role in this. Me thinking that Elixir is a big mainstream language should tell you something about the languages I spend my time with in my job as a PhD student. I think Elixir macros are really neat—they’re the most powerful I’ve seen in a “big mainstream” language. Rust supports two kinds of macros: macros-by-example, and procedural macros. Macros-by-example are a simple pattern-to-pattern transformation. Here’s an example from The Rust Book : This macro takes a pattern like and expands it to a pattern like Notice how the marks a part of the template that can be repeated. pattern-repeat This is akin to Racket or Scheme’s repetition form. Macros-by-example work on AST, but you can’t perform arbitrary computation on the AST. For that, you need procedural macros. Rust’s procedural macros (called “proc macros”) work on a token stream, and you can perform arbitrary computation, which puts them in a bit of a funny middle ground between C and Lisp. There is a Rust crate that you can use to parse a Rust token stream into Rust AST, but you don’t get any nice source information from the AST nodes, which makes producing good error messages a challenge. I personally find Rust macros to be disappointing. There’s a wide variety of macro systems. The best macro systems: Different languages have different features in their macro systems; some languages make it easy to use macros sensibly, while for others macros are a formidable challenge to use properly—make sure you know what your language provides and the trade-offs involved. Turns out you can do a lot with functions. Powerful function programming languages let you do so much with first-class functions. If you can get access to first-class continuations , as you can in Racket and Scheme, then you can create powerful new programming constructs without having to resort to macros. I came across the JuliaCon 2019 keynote talk, where Steven Johnson explains how many of the things that you can do with macros can be solved just with Julia’s type dispatch. If you can do something with functions, you probably should: functions are first-class values in most languages these days, and you’ll enjoy increased composability, better error messages, and code that is easier to read and understand by your peers. Macros introduce little languages wherever you use them. For simple macros, you might not have any constraints on what you may write under the scope of a macro. As an example, consider a macro that adds a -loop construct to a language by rewriting to another kind of looping mechanism: you shouldn’t have any restriction on what you can write inside the body of the loop. However, more complex macros can impose more restrictions on what can and cannot be written under their lexical extent. These restrictions may or may not be obvious. Examples: accidental variable capture limits what can be safely written, and grammatical errors (e.g. using an expression where an identifier was expected) can lead to inscrutable errors. Better macro systems mitigate these problems. It’s not enough to just have a macro system that uses ASTs; you need a macro system that makes it easy to write correct macros with clear error messages so they truly feel like natural extensions of the language. Few languages do this right. Macro systems have improved since the 1960s. While Lisp excluded many of the pitfalls of C macros by construction , you still had to use kluges like to manually avoid variable capture. Scheme got rid of that with hygienic macros, and Racket improved matters further by improving macro hygiene through scope sets and introducing phase separation. It is so much easier to build robust macro-based abstractions. Macros are good—anyone can write macros and experiment with new syntactic constructs. This makes development and extension of the language no longer the sole domain of the language designer and maintainer—library authors can experiment with different approaches to various problems. We see this a lot with Elixir: Elixir’s core language is really rather small; most of the magic powering popular libraries like Ecto or Phoenix comes from a choice set of macro abstractions. These and other libraries are free to experiment with novel syntax without fear of cluttering and coupling the core language with bad abstractions that would then need to be maintained in perpetuity. Macros can be powerful when used correctly—something made much easier by modern macro systems. \(\lambda x.x\) and \(\lambda y.y\) are the same function in the lambda calculus, just as \(f(x) = x + 2\) and \(g(y) = y + 2\) are the same function in algebra. Lisp macros aren’t without danger—many a Lisp programmer has shot their foot off with a macro. One reason is that Lisp macros are not hygienic —variables in the macro’s implementation may leak into the context of the macro call. This means that two Lisp programs that are the same except for different variable names can behave differently: The fact that the macro implementation uses a variable named ( tmp-leaky ) has leaked through to the user of the macro. ( tmp-capture ) This phenomenon is called variable capture , and it exposes this macro as a leaky abstraction! There are ways to mitigate this using , but those are error-prone manual techniques. It makes macro writing feel like you’re writing in an unsafe lower-level language. Scheme’s macros introduce a concept known as hygiene , which prevents variable capture automatically: In this case, the variable that the macro introduces ( tmp-intro-macro ) is not the same thing that the variable from the calling context ( tmp-intro-let ) refers to. This separation of scopes happens automatically behind the scenes, so there’s now no chance of accidental variable capture. Breaking hygiene has some utility in some cases—for example, one might want to add a form inside the body of a loop. There are ways around hygiene, but these are not without some problems. For more details see [ 2 ]. If you’d like to know more about hygiene, [ 1 ] is an excellent resource. Racket macros: phase separation and scope sets # Since Scheme macros (and Lisp macros more generally) allow running arbitrary Scheme code—including code from other modules—the dependency graph between modules can get so tangled that clean builds of a Scheme codebase are impossible. Racket solves this problem with its phase separation , which puts clear delimiters between when functions and macros are available to different parts of the language. This detangles dependency graphs without sacrificing the expressive power of macros. I wrote a little bit about phase separation ; you can read more on the Racket docs as well as Matthew Flatt’s paper [ 3 ] on phase separation. Racket also has a robust system for reasoning about where a variable’s definition comes from called a scope set . This is a notion makes reasoning about where variables are bound sensible. See a blog post as well as [NO_ITEM_DATA:flattBindingSetsScopes2016b] by Matthew Flatt for more on scope sets. Phase separation and scope sets make Racket macros the safest to use: Racket macros compose sensibly and hide their implementation details so that it is easy to write macros that are easy to use as if they were built-in language constructs. Racket also goes beyond the form that it inherited from Scheme; Racket’s macro-building system makes generating good error messages easy. There’s a little bug in the macro we used earlier, and that is the form only takes an identifier (i.e. a variable) as its first argument. We don’t have any error checking inside the macro; if we were to call with something that wasn’t an identifier, we’d get an error in terms of the the macro expands to, not the macro call itself: This isn’t good because there’s no in our code at all! We could add some error handling in our macro to manually check that and are identifiers, but that’s a little tedious. Racket’s helps us out: Much better! Now our error is in terms that the macro user will recognize. There are lots of other things that can do that make it easy to write correct macros that generate good error messages—a must for macros that become a part of a library. Other languages # Many modern languages use macros; I’ll only talk about a few more here. If something’s missing, that’s probably because I didn’t want to be exhaustive. Julia # Julia macros have a lot of nice things: they operate on ASTs and they’re hygienic, though the way hygiene is currently implemented is a little strange: all variables get ’d automatically Meaning, they all get replaced with some generated symbol that won’t clash with any possible variable or function name. whether or not they come from inside the macro or they originated from the calling code. Part of the problem is that all variables are represented as simple symbols, which [ 1 ] shows is insufficient to properly implement hygiene. Evidently there is some ongoing work to improve the situation. This is a good example of research ideas percolating into industry languages I think. Elixir # Elixir has robust AST macros, and its standard library makes heavy use of macros; many “core” Elixir constructs like , , , , and others are actually macros that expand to smaller units of Elixir. Elixir actually gets hygiene right! Unlike Julia, variables in Elixir’s AST have metadata—including scope information—attached to them. This and other aspects of Elixir’s macro system open it up to lots of exciting possibilities. The Nx library brings support for numerical and GPU programming to Elixir, and it works essentially by implementing a custom Elixir compiler in Elixir itself , and macros play a big role in this. Me thinking that Elixir is a big mainstream language should tell you something about the languages I spend my time with in my job as a PhD student. I think Elixir macros are really neat—they’re the most powerful I’ve seen in a “big mainstream” language. Rust # Rust supports two kinds of macros: macros-by-example, and procedural macros. Macros-by-example are a simple pattern-to-pattern transformation. Here’s an example from The Rust Book : This macro takes a pattern like and expands it to a pattern like Notice how the marks a part of the template that can be repeated. pattern-repeat This is akin to Racket or Scheme’s repetition form. Macros-by-example work on AST, but you can’t perform arbitrary computation on the AST. For that, you need procedural macros. Rust’s procedural macros (called “proc macros”) work on a token stream, and you can perform arbitrary computation, which puts them in a bit of a funny middle ground between C and Lisp. There is a Rust crate that you can use to parse a Rust token stream into Rust AST, but you don’t get any nice source information from the AST nodes, which makes producing good error messages a challenge. I personally find Rust macros to be disappointing. Conclusion # There’s a wide variety of macro systems. The best macro systems: Operate on the AST rather than on a stream of tokens Avoid leaking implementation details through inadvertent variable capture by being hygienic Produce good error messages that are in terms of the caller’s context (Bonus) have good phase separation to enforce clear separation between complex macro systems

Elixir Julia

Programming

0 views

fnands 2 years ago

Stereo vision and disparity maps (in Julia)

I’ve been working a lot recently with stereo vision and wanted to go through the basics of how disparity is calculated. I’m partially doing this as an excuse to get better at Julia (v1.9.3 used here). You can view the notebook for this blog post on Github: In much the same way that we as humans can have depth perception by sensing the difference in the images we see between our left and right eyes, we can calculate depth from a pair of images taken from different locations, called a stereo pair. If we know the positions of out cameras, then we can use matching points in our two images to estimate how far away from the camera those points are. Taking a look at the image below (from OpenCV ): If we have two identical cameras, at points and at a distance from each other, with focal length , we can calculate the distance ( ) to object by using the disparity between where the object appears in the left image ( ) and where it appears in the right image ( ). In this simple case, the relation between disparity and distance is simply: If we know an , then we can rearrange this to give us distance as a function of disparity: You might notice that in case the disparity is zero, you will have an undefined result. This is just due to the fact that in this case the cameras are pointing in parallel, so in principle a disparity of zero should not be possible. The general case is more complicated, but we will focus on this simple setup for now. We can define the function as: Where and are measured in pixels, and is measured in centimeters. There is an inverse relation between distance and disparity: So once we have a disparity, it’s relatively straightforward to get a distance. But how do we find disparities? We usually represent the disparities for a given pair of images as a disparity map , which is an array with the same dimensions as (one of) your images, but with disparity values for each pixel. In principle, this is a two-dimensional problem, as an object might be matched to a point that has both a horizontal and vertical shift, but luckily, you can always find a transformation to turn this into a one dimensional problem. The cartoon below illustrates what a disparity map might look like: Above, we calculate the disparity with respect to the right image (you can do it with respect to the left image as well), and as you can see the disparity map tells us how many pixels to the right each object shifted in the left image vs the right image. For a set of images (taken from the Middlebury Stereo Datasets ): The corresponding disparity map can be visualized as follows: With darker pixels having lower disparity values, and brighter pixels having higher disparity values, meaning the dark objects are far away from the cameras, while the bright ones are close. The ground truth disparity as shown above is usually calculated from LiDAR or some other accurate method, and our goal is to get as close as possible to those values using only the images above. So let’s try and calculate disparity for the images above. There are many, many approaches to calculating disparity, but let us begin with the most simple approach we can think of. As a start, let us go through each pixel in the right image, and for that pixel, try and find the most similar pixel in the left image. So let us try and take the squared difference between pixels values as our similarity metric. As we are going to be doing the same thing for every row of pixels, we are just going to define a function that does the basic logic, and then apply the same function to every case. Let’s define a distance metric as the squared distance: And as a test case let’s create the cartoon image we had above: Now we can try and match pixels in the right image to pixels in the left image. So how did we do? So the toy example works! The top line, which moved more pixels, shows up brighter (i.e. larger disparity values), and the lower line is dimmer. So let’s move on to real images. We’ll start with the example case above, but for simplicity we’ll stick to grayscale at first: Redefining slightly… So let’s see how we did? Looking at the predicted disparity, we can see there is some vague resemblance to the input image, but we’re still pretty far from the target: A significant problem seems to be erroneous matches, especially in the background. As you can imagine, we are only comparing single channel pixels values, and it’s very likely that we might just find a better match by chance. In grayscale we are only matching pixel intensity, and we have no idea whether something is bright green, or bright red. So let’s try and improve the odds of a good match by adding colour. So, a slight improvement! There seem to be fewer random matches in the background, but still not that close to the desired outcome. Is there more we can do? The obvious downside of the naive approach above is that it only ever looks at one pixel (in each image) at a time. That’s not a lot of information, and also not how we intuitively match objects. Look at the image below. Can you guess the best match for the pixel in the row of pixels below it? Given only this information, it’s impossible for us to guess whether the green pixel matches with the pixels at location 3, 5 or 7. If however I was to give you more context, i.e. a block of say 3x3 pixels, would this make things simpler? In this case, there is an unambiguous answer, which is the principle behind block-matching. To confirm our idea that more context results in better matches, we can take a quick look at a row of pixels: Given the pixel above, where in the row below do you think this pixel matches? You would guess somewhere in the orange part on the left right? But which pixel exactly is almost impossible to say. If we now take a block with more context: And compare it to the row below, the location of the match becomes more obvious: Calculating the difference metric for each point with different block sizes, we can clearly see that for low block sizes, the lowest metric value is ambiguous, while for larger block sizes it becomes more clear where exactly the best match is: And now we are ready to define our block matching algorithm, much in the way we did our pixel matching algorithm: Let’s see how this does on the full image in comparison to the pixel matching: Now we are getting somewhere! Compared to the earlier results we can now start making out the depth of the separate objects like the lamp, bust and camera. There are still a few things we could do to improve our simple algorithm (like only accepting matches that have below a certain score for the metric), but I will leave those as an exercise to the reader. Above we went through a basic introduction to stereo vision and disparity, and built a bare-bones block matching algorithm from scratch. The above is pretty far away from the state of the art, and there are many more advanced methods for calculating disparity, ranging from relatively simple methods like block matching to Deep Learning methods. Below are some posts/guides I found informative: Introduction to Epipolar Geometry and Stereo Vision Stereo Vision: Depth Estimation between object and camera Depth Map from Stereo Images

Tutorial Julia Computer Vision

0 views

fnands 2 years ago

A first look at Mojo 🔥

The Mojo programming language was officially released in May, but could only be used through some notebooks in a sandbox. Last week, the SDK (version 0.2.1) got released, so I decided to give it a look. Mojo’s goal is to “combine the usability of Python with the performance of C” , and bills itself as “the programming language for all AI developers” . It’s clear that Python is the dominant language when it comes to ML/AI, with great libraries like Pytorch and a few others being the main drivers of that. The problem comes with depth: all the fast libraries in Python are written in a performant language, usually C or C++, which means that if you want to dig into the internals of the tools you are using you have to switch languages, which greatly raises the barrier of entry for doing so. There are other languages that try to go for the usability of Python while retaining performance, and the first language that comes to mind for me in this respect is Julia. Julia is a pretty neat language, and writing math-heavy, fast code in it feels very elegant, while retaining a very Python like syntax. Julia is about twenty years younger than Python, and to me seems like they took the best aspects of Python and Fortran and rolled them into one language, allowing you to have performant and elegant code that is Julia all the way down. Given all this, in vacuum, Julia would seem like the obvious language to choose when it comes to ML/AI programming. The one major downside of Julia is that it doesn’t have the robust ecosystem of libraries that Python has, and unless something major changes, it seems that Python will keep winning. Enter Mojo, a language that then (aspires to) keep interoperability with Python, while itself being very performant and allowing you to write code that is Mojo all the way down. Basically if Mojo achieves its goals then we get to have our cake and eat it: we can keep the great ecosystem of packages that Python brings with it, while getting to write new performant code in a single. My guess is if this works out that all the major packages will eventually get rewritten in Mojo, but we can have a transition period where we still get to keep the C/C++ version of them until this can be done. The people behind Mojo (mostly Chris Lattner ) seem to know what they are doing, so I wish them all the best. I wanted to start with something basic, so I thought I would have a look at the first puzzle from the 2022 advent of code . Basically you are given a text file with a several lists of numbers representing the amount of calories some elves are carrying (go read up on the advent of code if you are unfamiliar, it will make sense then), and have to find which elves are carrying the most calories. So effectively a little bit of file parsing, with some basic arithmetic, i.e. a little puzzle to ease into Mojo. I won’t share the input because the creator of the AoC has explicitly asked people not to , but you can download your own and try the code below. At first glance, a lot of Python code will “just work”: However, it’s clear a lot is still missing, e.g. lambda functions don’t work yet: This is likely coming, but for now we have to live without it. So for the first step, let’s parse some text files. The first thing I found was that Mojo doesn’t have a native way to parse text yet. But luckily, you can just get Python to do it for you! In this case, you have to import Python as a module and call the builtin Python open function. It’s standard practice in Python to open text files with the incantation, but this doesn’t work in Mojo, so have to open and close files manually. All in all, it’s relatively standard Python, with a couple of caveats. One of the big things is that there is a distinction between Python types and Mojo types, i.e. the Python is not the same as Mojo’s , so if you want to get the most out of Mojo, you need to cast from the one to the other. Right now, there seems to be no direct way to go from to , so I had to take a detour via . I tried to keep the Python imports in the function, so that the other functions can be in “pure” Mojo. The my first impulse was to create a Python-esque list, but the builtin list in Mojo is immutable , so I had to go for a DynamicVector, which had a strong C++ flavour to it. Once that was done I was done with Python for this program and could go forth in pure Mojo. Below you can see I declare functions with while above I used . Both work in Mojo, but functions forces you to be strongly typed and enfoces some memory safe behaviour . You can see here the values are all declared as mutable ( ). You can also declare immutables with . This is enforced in functions. Other than that, a relatively standard loop over a container. Again, relatively straightforward. I’m definitely missing Python niceties like being able to easily sum over a container (can’t call in Mojo 😢). To put it all together we create a main , and notice that we need to indicate that it might raise errors as we are calling the unsafe . Mojo feels relatively familiar, but I will also say that when writing “pure” Mojo it feels like writing C with Python syntax. This makes sense given the goals of the language, but caught me a little off guard; I was expecting something a little closer to Julia, which still feels a lot like Python in most cases. This was not the greatest example to show Mojo off, as Mojo really shines in high performance environments, so the language didn’t really get to stretch its legs here. You can find some more performance oriented examples on the official Mojo website . I will probably give Mojo another look and try out something a bit more suited for the language in the future, maybe when the version of the language drops. I think I’ve been spoiled by mostly writing in two well supported languages (Python and C++) for which there are countless reference examples or StackOverflow posts on how to do things. Due to the fact that Mojo is brand new, there are very few examples to look to about how to do even relatively basic things. For now if you want to get started, I recommend starting with the exercises on mojodojo.dev .

Performance

C++ Julia

Programming

0 views

Pat Shaughnessy 3 years ago

Visiting an Abstract Syntax Tree

In my last post , I explored how Crystal parsed a simple program and produced a data structure called an abstract syntax tree (AST). But what does Crystal do with the AST? Why bother going to such lengths to create it? After Crystal parses my code, it repeatedly steps through all the entries or nodes in the AST and builds up a description of the intended meaning and behavior of my code. This process is known as semantic analysis . Later, Crystal will use this description to convert my program into a machine language executable. But what does this description contain? What does it really mean for a compiler to understand anything? Let’s pack our bags and visit an abstract syntax tree with Crystal to find out. Imagine several tourists visiting a famous tree: Each of them sees the same tree in a different way. The tree doesn’t change, but the perspective of each person looking at it is different. They each take a different photo, or remember different details. In Computer Science this separation of the data structure (the tree) from the algorithms using it (the tourists) is known as the visitor pattern . This technique allows compilers and other programs to run multiple algorithms on the same data structure without making a mess. The visitor pattern calls for two functions: and . First, a node in the data structure “accepts” a visitor: After accepting a visitor, the node turns around and calls the method on : The method implements whatever algorithm that visitor is interested in. This seems kind of pointless… why use at all? We could just call directly. The key is that, after calling the visitor and passing itself, the node passes the visitor to each of its children, recursively: And then the visitor can visit each of the child nodes also. The class doesn’t necessarily need to know anything about how to navigate the node data structure. And more and more visitor classes can implement new algorithms without changing the underlying data structure and breaking each other. In order to understand what my code means, Crystal reads through my program’s AST over and over again using different visitors. Each algorithm looks for certain syntax, records information about the types and objects my code uses or possibly even transforms my code into a different form. Crystal implements the basics of the visitor pattern in visitor.cr , inside the superclass of all AST nodes: Each subclass of implements its own version of . To get a sense of how the visitor pattern works inside of Crystal, let’s look at one line of code from my last post: As I explained last month, the Crystal parser generates this AST tree fragment: Once the parser is finished and has created this small tree, the Crystal compiler steps through it a number of different times, looking for classes, variables, type declarations, etc. Each of these passes through the AST is performed by a different visitor class: , or among many others. The most important visitor class the Crystal compiler uses is called simply . You can find the code for in main_visitor.cr : Since Crystal supports typed parameters and method overloading, the visitor class implements a different method for each type of node that it visits, for example: Now let’s look at three examples of what the class does with my code: identifying variables, assigning types and expanding array literals. The Crystal compiler is much too complex to describe in a single blog post, but hopefully I can give you glimpse into the sort of work Crystal does during semantic analysis. Obviously, my example code creates and initializes a variable called : But how does Crystal identify this variable’s name and value? What does it do with ? The class starts to process my code by visiting the root node of this branch of my AST, the node: As you can see, earlier during the parsing phrase Crystal had saved the target variable and value of this assign statement in the child AST nodes. The target variable, , appears in the node, and the value to assign is an node. The now knows I declared a new variable, called , in the current lexical scope. Since my program has no classes, methods or any other lexical scopes, Crystal saves this variable in a table of variables for the top level program: Actually, to be more accurate, there will always be many other variables in this table along with . All Crystal programs automatically include the standard library, so Crystal also saves all of the top level variables from the standard library in this table. In a more normal program, there will be many lexical scopes for different method and class or module definitions, and will save each variable in the corresponding table. Probably the most important function of is to assign a type to each value in my program. The simplest example of this is when visits a node: Looking at the size of the numeric value, Crystal determines the type should be . Crystal then saves this type right inside of the node: Strictly speaking, this violates the visitor pattern because the visitors shouldn’t be modifying the data structure they visit. But the type of each node, the type of each programming construct in my program, is really an integral part of that node. In this case the is really just completing each node. It’s not changing the structure of the AST in this case… although as we’ll see in a minute the does this for other nodes! Sometimes type values can’t be determined from the intrinsic value of an AST node. Often the type of a node is determined by other nodes in the AST. Recall my example line of code is: Here Crystal automatically sets the type of the arr variable to the type of the array literal expression: . In Computer Science, this is known as type inference . Because Crystal can automatically determine the type of , I don’t need to declare it explicitly by writing something more complicated like this: Type inference allows me to write concise, clean code with fewer type annotations. Most modern, statically typed languages such as Swift, Rust, Julia, Kotlin, etc., use type inference in the same way as Crystal. Even newer versions of Java or C++ use type inference. The Crystal compiler implements type inference when the MainVisitor encounters an AST node, what we saw above. After encountering the node, Crystal recursively processes one of the two child nodes, the value, and its child nodes. When this process finishes, Crystal knows the type of the node is : I’ll take a closer look at how Crystal processes the node next. But for now, once Crystal has the type of the node it copies that type over to the node and sets its type also: But Crystal does something else interesting here: It sets up a dependency between the two AST nodes: it “binds” the variable to the value: This binding dependency allows Crystal to later update the type of the variable whenever necessary. In this case the value will always have the same type, but I suspect that sometimes the Crystal compiler can update types during semantic analysis. In this way if the Crystal compiler ever changed its mind about the type of some value, it can easy update the types of any dependent values. I also suspect Crystal uses these type dependency connections to produce error messages whenever you pass an incorrect type to some function, for example. These are just guesses, however; if anyone from the Crystal team knows exactly what these type bindings are used for let me know. Update: Ary Borenszweig explained that sometimes the Crystal compiler updates the type of variables based on how they are used. He posted an interesting example on The Crystal Programming Language Forum . So far we’ve seen Crystal set the type of the node to , and we’ve seen Crystal assign a type of . But how did Crystal determine the type of the array literal ? This is where things get even more complicated. Sometimes during semantic analysis the Crystal compiler completely rewrites parts of your code, replacing it with something else. This happens even with my simple example. When visiting the node, the expands this simple line of code into something more complex: Reading this, you can see how later my compiled program will create the new array. First Crystal creates an empty array with a capacity of 2, and an element type of . returns the type (or multiple types inside a union type) found in the given set of values, in this case just . Later Crystal sets the two elements in the array just by assigning them. Crystal achieves this by replacing part of my program’s AST with a new branch: For clarity, I’m not drawing the AST nodes for the inner assign operations, only the first line: With this new, updated AST we can see exactly how Crystal determines the type of my variable . Starting at the root of my AST, visits all of the AST nodes in this order in a series of recursive calls: And it determines the types of each of these nodes as it returns from the recursive calls: Some interesting details here that I don’t understand completely or have space to explain here: The node calculates a common union type using a type formula. In this example, it just returns because both elements of my array, and , are simple 32 bit integers. I believe the node refers to a Crystal generic class via the node shown above, in this example . When processes the node, it sets to the type , arriving at the type . The node looks up the method my code is calling ( ) and uses the type from that method’s return value. I didn’t have time to explore how method lookup works in Crystal, however, so I’m not sure about this. Today we looked at a tiny piece of what the Crystal compiler can do. There are many more types of AST nodes, each of which the class handles differently. And there are many different visitor classes also, beyond . When analyzing a more complex program Crystal has to understand class and module definitions, instance and class variables, type annotations, different lexical scopes, macros, and much, much more. Crystal will need all of this information later, during the code generation phase, the next step that follows semantic analysis. But I hope this article gave you a sense of what sort of work a compiler has to do in order to understand your code. As you can see, for a statically typed language like Crystal the compiler spends much of its time identifying all of the types in your code, and determining which programming constructs or AST nodes have which types. Next time I’ll look at code generation: Now that Crystal has identified the variables, function calls and types in my code it is ready to generate the machine language code needed to execute my program. To do that, it will leverage the LLVM framework. The node calculates a common union type using a type formula. In this example, it just returns because both elements of my array, and , are simple 32 bit integers. I believe the node refers to a Crystal generic class via the node shown above, in this example . When processes the node, it sets to the type , arriving at the type . The node looks up the method my code is calling ( ) and uses the type from that method’s return value. I didn’t have time to explore how method lookup works in Crystal, however, so I’m not sure about this.

C++

Java

Programming Julia

Tutorial

0 views