Latest Posts (9 found)

Notes on the WASM Basic C ABI

The WebAssembly/tool-conventions repository contains "Conventions supporting interoperability between tools working with WebAssembly". Of special interest, in contains the Basic C ABI - an ABI for representing C programs in WASM. This ABI is followed by compilers like Clang with the wasm32 target. Rust is also switching to this ABI for extern "C" code. This post contains some notes on this ABI, with annotated code samples and diagrams to help visualize what the emitted WASM code is doing. Hereafter, "the ABI" refers to this Basic C ABI. In these notes, annotated WASM snippets often contain descriptions of the state of the WASM value stack at a given point in time. Unless otherwise specified, "TOS" refers to "Top Of value Stack", and the notation [ x  y ] means the stack has y on top, with x right under it (and possibly some other stuff that's not relevant to the discussion under x ); in this notation, the stack grows "to the right". The WASM value stack has no linear memory representation and cannot be addressed, so it's meaningless to discuss whether the stack grows towards lower or higher addresses. The value stack is simply an abstract stack, where values can be pushed onto or popped off its "top". Whenever addressing is required, the ABI specifies explicitly managing a separate stack in linear memory. This stack is very similar to how stacks are managed in hardware assembly languages (except that in the ABI this stack pointer is held in a global variable, and is not a special register), and it's called the "linear stack". By "scalar" I mean basic C types like int , double or char . For these, using the WASM value stack is sufficient, since WASM functions can accept an arbitrary number of scalar parameters. This C function: Will be compiled into something like: And can be called by pushing three values onto the stack and invoking call $add_three . The ABI specifies that all integral types 32-bit and smaller will be passed as i32 , with the smaller types appropriately sign or zero extended. For example, consider this C function: It's compiled to the almost same code as add_three : Except the last i32.extend8_s , which takes the lowest 8 bits of the value on TOS and sign-extends them to the full i32 (effectively ignoring all the higher bits). Similarly, when $add_three_chars is called, each of its parameters goes through i32.extend8_s . There are additional oddities that we won't get deep into, like passing __int128 values via two i64 parameters. C pointers are just scalars, but it's still educational to review how they are handled in the ABI. Pointers to any type are passed in i32 values; the compiler knows they are pointers, though, and emits the appropriate instructions. For example: Is compiled to: Recall that in WASM, there's no difference between an i32 representing an address in linear memory and an i32 representing just a number. i32.store expects [ addr  value ] on TOS, and does *addr = value . Note that the x parameter isn't needed any longer after the sum is computed, so it's reused later on to hold the return value. WASM parameters are treated just like other locals (as in C). According to the ABI, while scalars and single-element structs or unions are passed to a callee via WASM function parameters (as shown above), for larger aggregates the compiler utilizes linear memory. Specifically, each function gets a "frame" in a region of linear memory allocated for the linear stack. This region grows downwards from high to low addresses [1] , and the global $__stack_pointer points at the bottom of the frame: Consider this code: When do_work is compiled to WASM, prior to calling pair_calculate it copies pp into a location in linear memory, and passes the address of this location to pair_calculate . This location is on the linear stack, which is maintained using the $__stack_pointer global. Here's the compiled WASM for do_work (I also gave its local variable a meaningful name, for readability): Some notes about this code: Before pair_calculate is called, the linear stack looks like this: Following the ABI, the code emitted for pair_calculate takes Pair* (by reference, instead of by value as the original C code): Each function that needs linear stack space is responsible for adjusting the stack pointer and restoring it to its original place at the end. This naturally enables nested function calls; suppose we have some function a calling function b which, in turn, calls function c , and let's assume all of these need to allocate space on the linear stack. This is how the linear stack looks after c 's prologue: Since each function knows how much stack space it has allocated, it's able to properly restore $__stack_pointer to the bottom of its caller's frame before returning. What about returning values of aggregate types? According to the ABI, these are also handled indirectly; a pointer parameter is prepended to the parameter list of the function. The function writes its return value into this address. The following function: Is compiled to: Here's a function that calls it: And the corresponding WASM: Note that this function only uses 8 bytes of its stack frame, but allocates 16; this is because the ABI dictates 16-byte alignment for the stack pointer. There are some advanced topics mentioned in the ABI that these notes don't cover (at least for now), but I'll mention them here for completeness: This is similar to x86 . For the WASM C ABI, a good reason is provided for the direction: WASM load and store instructions have an unsigned constant called offset that can be used to add a positive offset to the address parameter without extra instructions. Since $__stack_pointer points to the lowest address in the frame, these offsets can be used to efficiently access any value on the stack. There are two instance of the pair pp in linear memory prior to the call to pair_calculate : the original one from the initialization statement (at offset 8), and a copy created for passing into pair_calculate (at offset 0). Theoretically, as pp is unused used after the call, the compiler could do better here and keep only a single copy. The stack pointer is decremented by 16, and restored at the end of the function. The first few instructions - where the stack pointer is adjusted - are usually called the prologue of the function. In the same vein, the last few instructions where the stack pointer is reset back to where it was at the entry are called the epilogue . "Red zone" - leaf functions have access to 128 bytes of red zone below the stack pointer. I found this difficult to observe in practice [2] . Since we don't issue system calls directly in WASM, it's tricky to conjure a realistic leaf function that requires the linear stack (instead of just using WASM locals). A separate frame pointer (global value) to be used for functions that require dynamic stack allocation (such as using C's VLAs ). A separate base pointer to be used for functions that require alignment > 16 bytes on the stack.

0 views

LaTeX, LLMs and Boring Technology

Depending on your particular use case, choosing boring technology is often a good idea. Recently, I've been thinking more and more about how the rise and increase in power of LLMs affects this choice. By definition, boring technology has been around for a long time. Piles of content have been written and produced about it: tutorials, books, videos, reference manuals, examples, blog posts and so on. All of this is consumed during the LLM training process, making LLMs better and better at reasoning about such technology. Conversely, "shiny technology" is new, and has much less material available. As a result, LLMs won't be as familiar with it. This applies to many domains, but one specific example for me personally is in the context of LaTeX. LaTeX certainly fits the "boring technology" bill. It's decades old, and has been the mainstay of academic writing since the 1980s. When I used it for the first time in 2002 (for a project report in my university AI class), it was already very old. But people keep working on it and fixing issues; it's easy to install and its wealth of capabilities and community size are staggering. Moreover, people keep working with it, producing more and more content and examples the LLMs can ingest and learn from. I keep hearing about the advantages of new and shiny systems like Typst. However, with the help of LLMs, almost none of the advantages seem meaningful to me. LLMs are great at LaTeX and help a lot with learning or remembering the syntax, finding the right packages, deciphering errors and even generating tedious parts like tables and charts, significantly reducing the need for scripting [1] . You can use LLMs either as standalone or fully integrated into your LaTeX environment; Overleaf has a built-in AI helper, and for local editing you can use VSCode plugins or other tools. I'm personally content with TeXstudio and use LLMs as standalone help, but YMMV. There are many examples where boring technology and LLMs go well together. The main criticism of boring technology is typically that it's "too big, full of cruft, difficult to understand". LLMs really help cutting through the learning curve though, and all that "cruft" is very likely to become useful some time in the future when you graduate from the basic use cases. To be clear: Typst looks really cool, and kudos to the team behind it! All I'm saying in this post is that for me - personaly - the choice for now is to stick with LaTeX as a "boring technology". For finding the right math symbols, I rarely need to scan reference materials any longer. LLMs will easily answer questions like "what's that squiggly Greek letter used in math, and its latex symbol?" or "write the latex for Green's theorem, integral form". For the trickiest / largest equations, LLMs are very good at "here's a picture I took of my equation, give me its latex code" these days [2] . "Here's a piece of code and the LaTeX error I'm getting on it; what's wrong?" This is made more ergonomic by editor integrations, but I personally find that LaTeX's error message problem is hugely overblown. 95% of the errors are reasonably clear, and serious sleuthing is only rarely required in practice. In that minority of cases, pasting some code and the error into a standalone LLM isn't a serious time drain. Generating TikZ diagrams and plots. For this, the hardest part is getting started and finding the right element names, and so on. It's very useful to just ask an LLM to emit something initial and then tweak it manually later, as needed. You can also ask the LLM to explain each thing it emits in detail - this is a great learning tool for deeper understanding. Recently I had luck going "meta" with this: when the diagram has repetitive elements, I may ask the LLM to "write a Python program that generates a TikZ diagram ...", and it works well. Generating and populating tables, and converting them from other data formats or screenshots. Help with formatting and typesetting (how do I change margins to XXX and spacing to YYY). When it comes to scripting, I generally prefer sticking to real programming languages anyway. If there's anything non-trivial to auto-generate I wouldn't use a LaTeX macro, but would write a Python program to generate whatever I need and embed it into the document with something like \input{} . Typst's scripting system may be marketed as "clean and powerful", but why learn yet another scripting language? Ignoring LaTeX's equation notation and doing their own thing is one of the biggest mistakes Typst makes, in my opinion. LaTeX's notation may not be perfect, but it's near universal at this point with support in almost all math-aware tools. Typst's math mode is a clear sign of the second system effect, and isn't even stable . For finding the right math symbols, I rarely need to scan reference materials any longer. LLMs will easily answer questions like "what's that squiggly Greek letter used in math, and its latex symbol?" or "write the latex for Green's theorem, integral form". For the trickiest / largest equations, LLMs are very good at "here's a picture I took of my equation, give me its latex code" these days [2] . "Here's a piece of code and the LaTeX error I'm getting on it; what's wrong?" This is made more ergonomic by editor integrations, but I personally find that LaTeX's error message problem is hugely overblown. 95% of the errors are reasonably clear, and serious sleuthing is only rarely required in practice. In that minority of cases, pasting some code and the error into a standalone LLM isn't a serious time drain. Generating TikZ diagrams and plots. For this, the hardest part is getting started and finding the right element names, and so on. It's very useful to just ask an LLM to emit something initial and then tweak it manually later, as needed. You can also ask the LLM to explain each thing it emits in detail - this is a great learning tool for deeper understanding. Recently I had luck going "meta" with this: when the diagram has repetitive elements, I may ask the LLM to "write a Python program that generates a TikZ diagram ...", and it works well. Generating and populating tables, and converting them from other data formats or screenshots. Help with formatting and typesetting (how do I change margins to XXX and spacing to YYY).

0 views

Notes on using LaTeX to generate formulae

This post collects some notes on using LaTeX to render mathematical documents and formulae, mostly focused on a Linux machine. For background, I typically use LaTeX for one of two (related) purposes: I don't currently use LaTeX for either precise typesetting or for authoring very large, book-sized documents. For day-to-day authoring, I find TeXstudio to be excellent. It has everything I need for local editing with a convenient preview window. I really like that TeXstudio doesn't hide the fact that it's just a graphical veneer on top of command-line LaTeX tooling, and lets you examine what it's doing through logs. Note that web-based solutions like Overleaf exist; I can see myself using that, especially if collaborating with others or having to author LaTeX from a diverse set of computers and OSes, but for local editing of git-backed text files, TeXstudio is great. pandoc is very capable for converting documents from LaTeX to other formats. Recently I find that it's easier to write math-heavy blog posts in LaTeX, and then convert them to reStructuredText with pandoc . For example, the recent post on Hilbert spaces was written like this and then converted using this command: The resulting reStructuredText is very readable and requires very little tweaking before final publishing. pandoc supports many formats, so if you use Markdown or something else, it should work similarly well. A useful feature of LaTeX tooling is the ability to render a specific formula in standalone mode to an image. We can write the formula into its own file (call it standaloneformula.tex ): In case you were wondering, this is the Gaussian integral: Once we have that standalone .tex file, there's a number of things we can do. First, the texlive package should be installed [1] . Using apt : Now, we can run the tools from texlive , for example pdflatex : This creates a PDF file that's useful for previews. To convert the .tex file to an image in SVG format, we'll use a two-step process: If we want a PNG file instead of SVG: The latexmk tool can build a .tex file into a PDF whenever the input file changes, so running: And opening the PDF in a separate window, we can observe live refreshes of edits without having to recompile explicitly. While useful in some scenarios, I find that TeXstudio already does this well. The same tooling flow works for TikZ diagrams! A standalone LaTeX document containing a single tikzpicture element can also be rendered to a SVG or PNG using the same exact commands. If you'd rather not install all these tools directly but use Docker instead, the texlive image can be used to do the same things: And now we can use the same invocations, just through docker: When a formula like \frac{n+1}{n^2-1} is embedded in text, it should be aligned properly to look good with the surrounding text. The information required to do this is emitted by tools like dvisvgm and dvipng ; for example: Note the height=..., depth=... line in the output. The height is the total height of the formula, and depth is its height below the "baseline" (how much down it should stick out from the line). In my blog, these two are translated to attributes on the image element embedding the SVG. Height is translated to style="height: ... and depth to vertical-align: ... . Render math for my blog posts, which are usually written using reStructuredText . This sometimes includes diagrams generated using TikZ. Write personal (unpublished) notes on math-y subjects entirely in LaTeX. These are typically short (up to 10-20 pages), single-subject booklets.

0 views

Summary of reading: July - September 2025

"The Compromise" by Sergei Dovlatov - (read in Russian) the author was a journalist in the Soviet Union in the 60s and 70s. This book is a humorous, semi-biographic account of some of the issues faced by Soviet journalists in their attempt to report news aligned with party lines. Very good writing, though the Russian in this book was a bit difficult for me at times. "Twilight of the Gods: War in the Western Pacific, 1944-1945" by Ian Toll - the third part of the trilogy. As an overall conclusion to the series, I will reiterate the earlier feedback: the writing is great, the book is very readable for such immense size, but I wish the author's focus was elsewhere. If you're looking for very detailed tactical accounts of key battles, this is the book for you. It doesn't have much about the more strategic aspects, and especially the U.S. industrial capacity that played such a key role in the war. How was the production scaled so much, especially with millions of people drafted? I'd be definitely interested in looking for additional sources of information on this subject. "Threaded Interpretive Languages" by R.G. Loeliger - describes some traditional approaches to implementing FORTH (which is the prime example of a thread-interpretive language, or TIL) in assembly. This book is from the late 1970s, so the target machine used is a Z80. Overall it's pretty good, with useful diagrams and quirky humor, but it certainly shows its age. "System Design Interview – An insider's guide" by Alex Xu - a book form of the author's guidelines for system design interviews. It's okay , far from great. The sections are all very repetitive and the sum total of unique insights and ideas in the book is low. Moreover, it's some sort of samizdat instant-Amazon printing of rather low quality, no index, unfocused diagrams and barely any copywriting. "Why Nations Fail: The Origins of Power, Prosperity, and Poverty" by Daron Acemoglu and James A. Robinson - describes the author's theory of why some countries are rich and others are poor. The crux if the theory is extractive vs. inclusive political and economical institutions; in other words, a dictatorship vs. a pluralist government. Overall, the theory is interesting and insightful; the book is a bit scattered, though, with the authors jumping between examples haphazardly, making it difficult to focus. I like that the book doesn't shy away from making predictions for the future rather than just analyzing history. "A biography of the Pixel" by Alvy Ray Smith - the history of computer graphics, told by one of the founders of Pixar. Some parts of this book are good, but I can't say I really enjoyed most of it. Lots of very detailed history and names, and project names, etc. "The Age of Revolution: A History of the English Speaking Peoples, Volume III" by Winston Churchill - covers the period from 1688 to 1815. Though this series is ostensibly about all the "English speaking peoples", the focus is clearly on England. There's some coverage of the USA, but it mostly focuses on the interactions with the British (revolution and war of 1812), and there's also quite a bit on Napoleon and France. The series becomes somewhat more interesting as it approaches the more modern era. "The Nvidia Way: Jensen Huang and the making of a tech giant" by Tae Kim - a very interesting and well-written biography of Nvidia, from the early founding days to ~2024. "Babylon: Mesopotamia and the Birth of Civilization" by Paul Kriwaczek - an interesting historic account of Mesopotamia, from Eridu and until the fall of Babylon. "Demon Copperhead" by Barbara Kingsolver - a novel about a boy coming of age as an orphan in foster care, houses of friends, etc. The backdrop is the opioid epidemic of the early 2000s in the Appalachia, with broken families and lots of drugs. The book is pretty good, but the Pulitzer prize here is clearly for the unsettling coverage of an ongoing hot topic, not for any sort of literary flourish. "The Color of Our Sky" by Amita Trasi - the fictional story of two girls from different castes in India who find their lives intertwined in complex ways. Some thought provoking and troubling accounts of traditions still prevalent in India in relation to discrimination, human trafficking, child abuse and modern slavery. "El murmullo de las abejas" by Sofía Segovia - (read in Spanish) slightly mystical novel about the life of an aristocratic family in the north of Mexico in the early 20th century. Maybe it's just the Spanish, but I definitely got "100 años de soledad" vibes from this book: the mysticism, the multi-generational story going in circles, the ambience. "The Mysterious Island" by Jules Verne

0 views

Consistent hashing

As a concrete experiment, I ran a similar simulation to the one above, but with 10 virtual nodes per node. We'll consider the total portion of the circle mapped to a node when it maps to either of its virtual nodes. While the average remains 18 degrees, the variance is reduced drastically - the smallest one is 11 degrees and the largest 26. You can find the code for these experiments in the demo.go file of the source code repository. This is close to the original motivation for the development of consistent caching by researchers at MIT. Their work, described in the paper "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web" became the foundation of Akamai Technologies . There are other use cases of consistent hashing in distributed systems. AWS popularized one common application in their paper "Dynamo: Amazon’s Highly Available Key-value Store" , where the algorithm is used to distribute storage keys across servers.

1 views

Implementing Forth in Go and C

I first ran into Forth about 20 years ago when reading a book about designing embedded hardware . The reason I got the book back then was to actually learn more about the HW aspects, so having skimmed the Forth chapter I just registered an "oh, this is neat" mental note and moved on with my life. Over the last two decades I heard about Forth a few more times here and there, such as that time when Factor was talked about for a brief period, maybe 10-12 years ago or so. It always occupied a slot in the "weird language" category inside my brain, and I never paid it much attention. Until June this year, when a couple of factors combined fortuitously: And something clicked. I'm going to implement a Forth, because... why not? So I spent much of my free hacking time over the past two months learning about Forth and implementing two of them. It's useful to think of Forth (at least standard Forth , not offshoots like Factor) as having two different "levels": Another way to look at it (useful if you belong to a certain crowd) is that user-level Forth is like Lisp without macros, and hacker-level Forth has macros enabled. Lisp can still be great and useful without macros, but macros take it to an entire new level and also unlock the deeper soul of the language. This distinction will be important when discussing my Forth implementations below. There's a certain way Forth is supposed to be implemented; this is how it was originally designed, and if you get closer to the hacker level, it becomes apparent that you're pretty much required to implement it this way - otherwise supporting all of the language's standard words will be very difficult. I'm talking about the classical approach of a linked dictionary, where a word is represented as a "threaded" list [1] , and this dictionary is available for user code to augment and modify. Thus, much of the Forth implementation can be written in Forth itself. The first implementation I tried is stubbornly different. Can we just make a pure interpreter? This is what goforth is trying to explore (the Go implementation located in the root directory of that repository). Many built-in words are supported - definitely enough to write useful programs - and compilation (the definition of new Forth words using : word ... ; ) is implemented by storing the actual string following the word name in the dictionary, so it can be interpreted when the word is invoked. This was an interesting approach and in some sense, it "works". For the user level of Forth, this is perfectly usable (albeit slow). However, it's insufficient for the hacker level, because the host language interpreter (the one in Go) has all the control, so it's impossible to implement IF...THEN in Forth, for example (it has to be implemented in the host language). That was a fun way to get a deeper sense of what Forth is about, but I did want to implement the hacker level as well, so the second implementation - ctil - does just that. It's inspired by the jonesforth assembly implementation, but done in C instead [2] . ctil actually lets us implement major parts of Forth in Forth itself. For example, variable : Conditionals: These are actual examples of ctil's "prelude" - a Forth file loaded before any user code. If you understand Forth, this code is actually rather mind-blowing. We compile IF and the other words by directly laying our their low-level representation in memory, and different words communicate with each other using the data stack during compilation . Forth made perfect sense in the historic context in which it was created in the early 1970s. Imagine having some HW connected to your computer (a telescope in the case of Forth's creator), and you have to interact with it. In terms of languages at your disposal - you don't have much, even BASIC wasn't invented yet. Perhaps your machine still didn't have a C compiler ported to it; C compilers aren't simple, and C isn't very great for exploratory scripting anyway. So you mostly just have your assembly language and whatever you build on top. Forth is easy to implement in assembly and it gives you a much higher-level language; you can use it as a calculator, as a REPL, and as a DSL for pretty much anything due to its composable nature. Forth certainly has interesting aspects; it's a concatenative language , and thus inherently point-free . A classical example is that instead of writing the following in a more traditional syntax: You just write this: There is no need to explicitly pass parameters, or to explicitly return results. Everything happens implicitly on the stack. This is useful for REPL-style programming where you use your language not necessarily for writing large programs, but more for interactive instructions to various HW devices. This dearth of syntax is also what makes Forth simple to implement. All that said, in my mind Forth is firmly in the "weird language" category; it's instructive to learn and to implement, but I wouldn't actually use it for anything real these days. The stack-based programming model is cool for very terse point-free programs, but it's not particularly readable and hard to reason about without extensive comments, in my experience. Consider the implementation of a pretty standard Forth word: +! . It expects and address at the top of stack, and an addend below it. It adds the addend to the value stored at that address. Here's a Forth implementation from ctil's prelude: Look at that stack wrangling! It's really hard to follow what goes where without the detailed comments showing the stack layout on the right of each instruction (a common practice for Forth programs). Sure, we can create additional words that would make this simpler, but that just increases the lexicon of words to know. My point is, there's fundamental difficulty here. When you see this C code: Even without any documentation, you can immediately know several important things: Written in Forth [3] : How can you know the arity of the functions without adding explicit comments? Sure, if you have a handful of words like bar and foo you know like the back of your hand, this is easy. But imagine reading a large, unfamiliar code base full of code like this and trying to comprehend it. The source code of my goforth project is on GitHub ; both implementations are there, with a comprehensive test harness that tests both. The learn Forth itself, I found these resources very useful: To learn how to implement Forth: Implementing Forth is a great self-improvement project for a coder; there's a pleasantly challenging hump of understanding to overcome, and you gain valuable insights into stack machines, interpretation vs. compilation and mixing these levels of abstraction in cool ways. Also, implementing programming languages from scratch is fun! It's hard to beat the feeling of getting to interact with your implementation for the first time, and then iterating on improving it and making it more featureful. Just one more word ! Which is another deviation from the norm. Forth is really supposed to be implemented in assembly - this is what it was designed for, and it's very clear from its structure that it must be so in order to achieve peak performance. But where's the fun in doing things the way they were supposed to be done? Besides, jonesforth is already a perfectly fine Forth implementation in assembly, so I wouldn't have learned much by just copying it. I had a lot of fun coding in C for this one; it's been a while since I last wrote non-trivial amounts of C, and I found it very enjoyable.

0 views