An Algebraic Theory of Music
In my last post, I was struggling towards an algebraic theory of music. This idea has been burning in my mind ever since, and I wanted to give some updates with where I’ve landed. We begin by modeling a musical voice , which is, roughly speaking, the abstract version of a human voice. The voice can be doing one thing at a time, or can choose to not be doing anything. Voices are modeled by step functions , which are divisions of the real line into discrete chunks. We interpret each discrete chunk as a note being played by the voice for the duration of the chunk. This gives rise to a nice applicative structure that I alluded to in my previous post: where we take the union of the note boundaries in order to form the applicative. If either voice is resting, so too is the applicative. There is also an instance here, which chooses the first non-rest. There is a similar monoidal structure here, where multiplication is given by “play these two things simultaneously,” relying on an underlying instance for the meaning of “play these two things:” If either voice is resting, we treat its value as , and can happily combine the two parts in parallel. All of this gives rise to the following rich structure: Voices, therefore, give us our primitive notion of monophony. But real music usually has many voices doing many things, independently. This was a point in which I got stuck in my previous post. The solution here, is surprisingly easy. Assign a to each voice name: We get an extremely rich structure here completely for free. Our monoid combines all voices in parallel; our applicative combines voices pointwise; etc. However, we also have a new instance, whose characteristic method allows us to trade lines between voices. In addition to the in-parallel monoid instance, we can also define a tile product operator over , which composes things sequentially 1 : The constraint on arises from the fact that the pieces of music might extend off to infinity in either direction (which must do), and we need to deal with that. There are a few other combinators we care about. First, we can lift anonymous voices (colloquially “tunes”) into multi-part : and we can assign the same line to everyone: The primitives for building little tunes are which you can then compose sequentially via , and assign to voices via . One of the better responses to my last blog post was a link to Dmitri Tymoczko ’s FARM 2024 talk . There’s much more in this video than I can possibly due justice to here, but my big takeaway was that this guy is thinking about the same sorts of things that I am. So I dove into his work, and that lead to his quadruple hierarchy : Voices move within chords, which move within scales, which move within macro-harmonies. Tymoczko presents a algebra which is a geometric space for reasoning about voice leadings. He’s got a lot of fun websites for exploring the ideas, but I couldn’t find an actual implementation of the idea anywhere, so I cooked one up myself. The idea here is that we have some which describes a hierarchy of abstract scales moving with respect to one another. For example, the Western traditional of having triads move within the diatonic scale, which moves within the chromatic scale, would be represented as . forms a monoid, and has some simple generators that give rise to smooth voice leadings (chord changes.) Having a model for smooth harmonic transformations means we can use it constructively. I am still working out the exact details here, but the rough shape of the idea is to build an underlying field of key changes (represented as smooth voice leadings in ): We can then make an underlying field of functional harmonic changes (chord changes), modeled as smooth voice leadings in : Our voices responsible for harmony can now be written as values of type and we can use the applicative musical structure to combine the elements together: which we can later out into concrete pitches. The result is that we can completely isolate the following pieces: and the result is guaranteed to compose in a way that the ear can interpret as music. Not necessarily good music, but undeniably as music. The type indices on are purely for my book-keeping, and nothing requires them to be there. Which means we could also use the applicative structure to modulate over different sorts of harmony (eg, move from triads to seventh chords.) I haven’t quite gotten a feel for melody yet; I think it’s probably in , but it would be nice to be able to target chord tones as well. Please let me know in the comments if you have any insight here. However, I have been thinking about contouring, which is the overall “shape” of a musical line. Does it go up, and peak in the middle, and then come down again? Or maybe it smoothly descends down. We can use the discrete intervals intrinsic inside of s to find “reasonable” times to sample them. In essence this assigns a to each segment: and we can then use these times to then sample a function . This then allows us to apply contours (given as regular functions) to arbitrary rhythms. I currently have this typed as where , and the outputted s get rounded to their nearest integer values. I’m not deeply in love with this type, but the rough idea is great—turn arbitrary real-valued functions into musical lines. This generalizes contouring, as well as scale runs. I’m writing all of this up because I go back to work on Monday and life is going to get very busy soon. I’m afraid I won’t be able to finish all of this! The types above I’m pretty certain are relatively close to perfect. They seem to capture everything I could possibly want, and nothing I don’t want. Assuming I’m right about that, they must make up the basis of musical composition. The next step therefore is to build musical combinators on top. One particular combinator I’ve got my eye on is some sort of general “get from here to there” operator: which I imagine would bridge a gap between the end of one piece of music with beginning of another. I think this would be roughly as easy as moving each voice linearly in space from where it was to where it needs to be. This might need to be a ternary operation in order to also associate a rhythmic pattern to use for the bridge. But I imagine would be great for lots of dumb little musical things. Like when applied over the chord dimension, it would generate arpeggios. Over the scale dimension, it would generate runs. And it would make chromatic moves in the chroma dimension. Choosing exactly what moves to make for s consisting of components in multiple axes might just be some bespoke order, or could do something more intelligent. I think the right approach would be to steal ’ idea of an , and attach some relevant metadata to each . We could then write as a function of those envelopes, but I must admit I don’t quite know what this would look like. As usual, I’d love any insight you have! Please leave it in the comments. Although I must admit I appreciate comments of the form “have you tried $X” much more than of the form “music is sublime and you’re an idiot for even trying this.” Happy new year! Strictly speaking, the tile product can also do parallel composition, as well as sychronizing composition, but that’s not super important right now. ↩︎ key changes chord changes how voices express the current harmony the rhythms of all of the above Strictly speaking, the tile product can also do parallel composition, as well as sychronizing composition, but that’s not super important right now. ↩︎