Posts in Programming (20 found)

Notes on Lagrange Interpolating Polynomials

Polynomial interpolation is a method of finding a polynomial function that fits a given set of data perfectly. More concretely, suppose we have a set of n+1 distinct points [1] : And we want to find the polynomial coefficients {a_0\cdots a_n} such that: Fits all our points; that is p(x_0)=y_0 , p(x_1)=y_1 etc. This post discusses a common approach to solving this problem, and also shows why such a polynomial exists and is unique. When we assign all points (x_i, y_i) into the generic polynomial p(x) , we get: We want to solve for the coefficients a_i . This is a linear system of equations that can be represented by the following matrix equation: The matrix on the left is called the Vandermonde matrix . This matrix is known to be invertible (see Appendix for a proof); therefore, this system of equations has a single solution that can be calculated by inverting the matrix. In practice, however, the Vandermonde matrix is often numerically ill-conditioned, so inverting it isn’t the best way to calculate exact polynomial coefficients. Several better methods exist. Lagrange interpolation polynomials emerge from a simple, yet powerful idea. Let’s define the Lagrange basis functions l_i(x) ( i \in [0, n] ) as follows, given our points (x_i, y_i) : In words, l_i(x) is constrained to 1 at and to 0 at all other x_j . We don’t care about its value at any other point. The linear combination: is then a valid interpolating polynomial for our set of n+1 points, because it’s equal to at each (take a moment to convince yourself this is true). How do we find l_i(x) ? The key insight comes from studying the following function: This function has terms (x-x_j) for all j\neq i . It should be easy to see that l'_i(x) is 0 at all x_j when j\neq i . What about its value at , though? We can just assign into l'_i(x) to get: And then normalize l'_i(x) , dividing it by this (constant) value. We get the Lagrange basis function l_i(x) : Let’s use a concrete example to visualize this. Suppose we have the following set of points we want to interpolate: (1,4), (2,2), (3,3) . We can calculate l'_0(x) , l'_1(x) and l'_2(x) , and get the following: Note where each l'_i(x) intersects the axis. These functions have the right values at all x_{j\neq i} . If we normalize them to obtain l_i(x) , we get these functions: Note that each polynomial is 1 at the appropriate and 0 at all the other x_{j\neq i} , as required. With these l_i(x) , we can now plot the interpolating polynomial p(x)=\sum_{i=0}^{n}y_i l_i(x) , which fits our set of input points: We’ve just seen that the linear combination of Lagrange basis functions: is a valid interpolating polynomial for a set of n+1 distinct points (x_i, y_i) . What is its degree? Since the degree of each l_i(x) is , then the degree of p(x) is at most . We’ve just derived the first part of the Polynomial interpolation theorem : Polynomial interpolation theorem : for any n+1 data points (x_0,y_0), (x_1, y_1)\cdots(x_n, y_n) \in \mathbb{R}^2 where no two x_j are the same, there exists a unique polynomial p(x) of degree at most that interpolates these points. We’ve demonstrated existence and degree, but not yet uniqueness . So let’s turn to that. We know that p(x) interpolates all n+1 points, and its degree is . Suppose there’s another such polynomial q(x) . Let’s construct: That do we know about r(x) ? First of all, its value is 0 at all our , so it has n+1 roots . Second, we also know that its degree is at most (because it’s the difference of two polynomials of such degree). These two facts are a contradiction. No non-zero polynomial of degree \leq n can have n+1 roots (a basic algebraic fact related to the Fundamental theorem of algebra ). So r(x) must be the zero polynomial; in other words, our p(x) is unique \blacksquare . Note the implication of uniqueness here: given our set of n+1 distinct points, there’s only one polynomial of degree \leq n that interpolates it. We can find its coefficients by inverting the Vandermonde matrix, by using Lagrange basis functions, or any other method [2] . The set P_n(\mathbb{R}) consists of all real polynomials of degree \leq n . This set - along with addition of polynomials and scalar multiplication - forms a vector space . We called l_i(x) the "Lagrange basis" previously, and they do - in fact - form an actual linear algebra basis for this vector space. To prove this claim, we need to show that Lagrange polynomials are linearly independent and that they span the space. Linear independence : we have to show that implies a_i=0 \quad \forall i . Recall that l_i(x) is 1 at , while all other l_j(x) are 0 at that point. Therefore, evaluating s(x) at , we get: Similarly, we can show that a_i is 0, for all \blacksquare . Span : we’ve already demonstrated that the linear combination of l_i(x) : is a valid interpolating polynomial for any set of n+1 distinct points. Using the polynomial interpolation theorem , this is the unique polynomial interpolating this set of points. In other words, for every q(x)\in P_n(\mathbb{R}) , we can identify any set of n+1 distinct points it passes through, and then use the technique described in this post to find the coefficients of q(x) in the Lagrange basis. Therefore, the set l_i(x) spans the vector space \blacksquare . Previously we’ve seen how to use the \{1, x, x^2, \dots x^n\} basis to write down a system of linear equations that helps us find the interpolating polynomial. This results in the Vandermonde matrix . Using the Lagrange basis, we can get a much nicer matrix representation of the interpolation equations. Recall that our general polynomial using the Lagrange basis is: Let’s build a system of equations for each of the n+1 points (x_i,y_i) . For : By definition of the Lagrange basis functions, all l_i(x_0) where i\neq 0 are 0, while l_0(x_0) is 1. So this simplifies to: But the value at node is , so we’ve just found that a_0=y_0 . We can produce similar equations for the other nodes as well, p(x_1)=a_1 , etc. In matrix form: We get the identity matrix; this is another way to trivially show that a_0=y_0 , a_1=y_1 and so on. Given some numbers \{x_0 \dots x_n\} a matrix of this form: Is called the Vandermonde matrix. What’s special about a Vandermonde matrix is that we know it’s invertible when are distinct. This is because its determinant is known to be non-zero . Moreover, its determinant is [3] : Here’s why. To get some intuition, let’s consider some small-rank Vandermonde matrices. Starting with a 2-by-2: Let’s try 3-by-3 now: We can use the standard way of calculating determinants to expand from the first row: Using some algebraic manipulation, it’s easy to show this is equivalent to: For the full proof, let’s look at the generalized n+1 -by- n+1 matrix again: Recall that subtracting a multiple of one column from another doesn’t change a matrix’s determinant. For each column k>1 , we’ll subtract the value of column k-1 multiplied by from it (this is done on all columns simultaneously). The idea is to make the first row all zeros after the very first element: Now we factor out x_1-x_0 from the second row (after the first element), x_2-x_0 from the third row and so on, to get: Imagine we erase the first row and first column of . We’ll call the resulting matrix . Because the first row of is all zeros except the first element, we have: Note that the first row of has a common factor of x_1-x_0 , so when calculating \det(W) , we can move this common factor out. Same for the common factor x_2-x_0 of the second row, and so on. Overall, we can write: But the smaller matrix is just the Vandermonde matrix for \{x_0 \dots x_{n-1}\} . If we continue this process by induction, we’ll get: If you’re interested, the Wikipedia page for the Vandermonde matrix has a couple of additional proofs.

0 views
Jim Nielsen 2 days ago

Computers and the Internet: A Two-Edged Sword

Dave Rupert articulated something in “Priority of idle hands” that’s been growing in my subconscious for years: I had a small, intrusive realization the other day that computers and the internet are probably bad for me […] This is hard to accept because a lot of my work, hobbies, education, entertainment, news, communities, and curiosities are all on the internet. I love the internet, it’s a big part of who I am today Hard same. I love computers and the internet. Always have. I feel lucky to have grown up in the late 90’s / early 00’s where I was exposed to the fascination, excitement, and imagination of PCs, the internet, and then “mobile”. What a time to make websites! Simultaneously, I’ve seen how computers and the internet are a two-edged sword for me: I’ve cut out many great opportunities with them, but I’ve also cut myself a lot (and continue to). Per Dave’s comments, I have this feeling somewhere inside of me that the internet and computers don’t necessarily align in support my own, personal perspective of what a life well lived is for me . My excitement and draw to them also often leave me with a feeling of “I took that too far.” I still haven’t figured out a completely healthy balance (but I’m also doing ok). Dave comes up with a priority of constituencies to deal with his own realization. I like his. Might steal it. But I also think I need to adapt it, make it my own — but I don’t know what that looks like yet. To be honest, I don't think I was ready to confront any of this but reading Dave’s blog forced it out of my subconscious and into the open, so now I gotta deal. Thanks Dave. Reply via: Email · Mastodon · Bluesky

0 views
ava's blog 2 days ago

[photo dump] recent few weeks

Another photo dump is due. I saw a funny and unconventional ring online and I had to have it. Sorry, I love it so much. I already lost the white paper in it because it is just glued on, but I like it even more without it. Moving on to food... My wife made sushi. She also made matcha strawberry cookies: We're also on a bread baking journey because bread prices are ridiculous now. Our first few attempts were a fail, but now we have some awesome breads and it keeps getting better and better. One time, our sourdough starter escaped containment: It was also Valentine's day and the anniversary of my wife and I. Some chocolates, chocolate pancakes in bed, and flowers. We also played some Commander in the LGS. And I tidied up my wardrobe, and accidentally melted a container top on the toaster: Reply via email Published 27 Feb, 2026

0 views
David Bushell 2 days ago

Croissant and CORS proxy update

Croissant is my home-cooked RSS reader. I wish it was only a progressive web app (PWA) but due to missing CORS headers, many feeds remain inaccessible. My RSS feeds have the header and so should yours! Blogs Are Back has a guide to enable CORS for your blog . Bypassing CORS requires some kind of proxy. Other readers use a custom browser extension. That is clever, but extensions can be dangerous. I decided on two solutions. I wrapped my PWA in a Tauri app . This is also dangerous if you don’t trust me. I also provided a server proxy for the PWA. A proxy has privacy concerns but is much safer. I’m sorry if anyone is using Croissant as a PWA because the proxy is now gone. If a feed has the correct CORS headers it will continue to work. Sorry for the abrupt change. That’s super lame, I know! To be honest I’ve lost a bit of enthusiasm for the project and I can’t maintain a proxy. Croissant was designed to be limited in scope to avoid too much burden. In hindsight the proxy was too ambitious. Technically, yes! But you’ll have to figure that out by yourself. If you have questions, such as where to find the code, how the code works etc, the answer is no. I don’t mean to be rude, I just don’t have any time! You’re welcome to ask for support but unless I can answer in 30 seconds I’ll have to decline. Croissant is feature complete! It does what I set out to achieve. I have fixed several minor bugs and tweaked a few styles. Until inspiration (or a bug) strikes I won’t do another update anytime soon. Maybe later in the year I’ll decide to overhaul it? Who can predict! Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Robin Moffatt 2 days ago

Interesting links - February 2026

Phew, what a month! February may be shorter but that’s not diminished the wealth of truly interesting posts I’ve found to share with you this month.

0 views
(think) 2 days ago

Building Emacs Major Modes with TreeSitter: Lessons Learned

Over the past year I’ve been spending a lot of time building TreeSitter-powered major modes for Emacs – clojure-ts-mode (as co-maintainer), neocaml (from scratch), and asciidoc-mode (also from scratch). Between the three projects I’ve accumulated enough battle scars to write about the experience. This post distills the key lessons for anyone thinking about writing a TreeSitter-based major mode, or curious about what it’s actually like. Before TreeSitter, Emacs font-locking was done with regular expressions and indentation was handled by ad-hoc engines (SMIE, custom indent functions, or pure regex heuristics). This works, but it has well-known problems: Regex-based font-locking is fragile. Regexes can’t parse nested structures, so they either under-match (missing valid code) or over-match (highlighting inside strings and comments). Every edge case is another regex, and the patterns become increasingly unreadable over time. Indentation engines are complex. SMIE (the generic indentation engine for non-TreeSitter modes) requires defining operator precedence grammars for the language, which is hard to get right. Custom indentation functions tend to grow into large, brittle state machines. Tuareg’s indentation code, for example, is thousands of lines long. TreeSitter changes the game because you get a full, incremental, error-tolerant syntax tree for free. Font-locking becomes “match this AST pattern, apply this face”: And indentation becomes “if the parent node is X, indent by Y”: The rules are declarative, composable, and much easier to reason about than regex chains. In practice, ’s entire font-lock and indentation logic fits in about 350 lines of Elisp. The equivalent in tuareg is spread across thousands of lines. That’s the real selling point: simpler, more maintainable code that handles more edge cases correctly . That said, TreeSitter in Emacs is not a silver bullet. Here’s what I ran into. TreeSitter grammars are written by different authors with different philosophies. The tree-sitter-ocaml grammar provides a rich, detailed AST with named fields. The tree-sitter-clojure grammar, by contrast, deliberately keeps things minimal – it only models syntax, not semantics, because Clojure’s macro system makes static semantic analysis unreliable. 1 This means font-locking forms in Clojure requires predicate matching on symbol text, while in OCaml you can directly match nodes with named fields. To illustrate: here’s how you’d fontify a function definition in OCaml, where the grammar gives you rich named fields: And here’s the equivalent in Clojure, where the grammar only gives you lists of symbols and you need predicate matching: You can’t learn “how to write TreeSitter queries” generically – you need to learn each grammar individually. The best tool for this is (to visualize the full parse tree) and (to see the node at point). Use them constantly. You’re dependent on someone else providing the grammar, and quality is all over the map. The OCaml grammar is mature and well-maintained – it’s hosted under the official tree-sitter GitHub org. The Clojure grammar is small and stable by design. But not every language is so lucky. asciidoc-mode uses a third-party AsciiDoc grammar that employs a dual-parser architecture – one parser for block-level structure (headings, lists, code blocks) and another for inline formatting (bold, italic, links). This is the same approach used by Emacs’s built-in , and it makes sense for markup languages where block and inline syntax are largely independent. The problem is that the two parsers run independently on the same text, and they can disagree . The inline parser misinterprets and list markers as emphasis delimiters, creating spurious bold spans that swallow subsequent inline content. The workaround is to use on all block-level font-lock rules so they win over the incorrect inline faces: This doesn’t fix inline elements consumed by the spurious emphasis – that requires an upstream grammar fix. When you hit grammar-level issues like this, you either fix them yourself (which means diving into the grammar’s JavaScript source and C toolchain) or you live with workarounds. Either way, it’s a reminder that your mode is only as good as the grammar underneath it. Getting the font-locking right in was probably the most challenging part of all three projects, precisely because of these grammar quirks. I also ran into a subtle behavior: the default font-lock mode ( ) skips an entire captured range if any position within it already has a face. So if you capture a parent node like and a child was already fontified, the whole thing gets skipped silently. The fix is to capture specific child nodes instead: These issues took a lot of trial and error to diagnose. The lesson: budget extra time for font-locking when working with less mature grammars . Grammars evolve, and breaking changes happen. switched from the stable grammar to the experimental branch because the stable version had metadata nodes as children of other nodes, which caused and to behave incorrectly. The experimental grammar makes metadata standalone nodes, fixing the navigation issues but requiring all queries to be updated. pins to v0.24.0 of the OCaml grammar. If you don’t pin versions, a grammar update can silently break your font-locking or indentation. The takeaway: always pin your grammar version , and include a mechanism to detect outdated grammars. tests a query that changed between versions to detect incompatible grammars at startup. Users shouldn’t have to manually clone repos and compile C code to use your mode. Both and include grammar recipes: On first use, the mode checks and offers to install missing grammars via . This works, but requires a C compiler and Git on the user’s machine, which is not ideal. 2 The TreeSitter support in Emacs has been improving steadily, but each version has its quirks: Emacs 29 introduced TreeSitter support but lacked several APIs. For instance, (used for structured navigation) doesn’t exist – you need a fallback: Emacs 30 added , sentence navigation, and better indentation support. But it also had a bug in offsets ( #77848 ) that broke embedded parsers, and another in that required to disable its TreeSitter-aware version. Emacs 31 has a bug in where an off-by-one error causes to leave ` *)` behind on multi-line OCaml comments. I had to skip the affected test with a version check: The lesson: test your mode against multiple Emacs versions , and be prepared to write version-specific workarounds. CI that runs against Emacs 29, 30, and snapshot is essential. Most TreeSitter grammars ship with query files for syntax highlighting ( ) and indentation ( ). Editors like Neovim and Helix use these directly. Emacs doesn’t – you have to manually translate the patterns into and calls in Elisp. This is tedious and error-prone. For example, here’s a rule from the OCaml grammar’s : And here’s the Elisp equivalent you’d write for Emacs: The query syntax is nearly identical, but you have to wrap everything in calls, map upstream capture names ( ) to Emacs face names ( ), assign features, and manage behavior. You end up maintaining a parallel set of queries that can drift from upstream. Emacs 31 will introduce which will make it possible to use files for font-locking, which should help significantly. But for now, you’re hand-coding everything. When a face isn’t being applied where you expect: TreeSitter modes define four levels of font-locking via , and the default level in Emacs is 3. It’s tempting to pile everything into levels 1–3 so users see maximum highlighting out of the box, but resist the urge. When every token on the screen has a different color, code starts looking like a Christmas tree and the important things – keywords, definitions, types – stop standing out. Less is more here. Here’s how distributes features across levels: And follows the same philosophy: The pattern is the same: essentials first, progressively more detail at higher levels. This way the default experience (level 3) is clean and readable, and users who want the full rainbow can bump to 4. Better yet, they can use to cherry-pick individual features regardless of level: This gives users fine-grained control without requiring mode authors to anticipate every preference. Indentation issues are harder to diagnose because they depend on tree structure, rule ordering, and anchor resolution: Remember that rule order matters for indentation too – the first matching rule wins. A typical set of rules reads top to bottom from most specific to most general: Watch out for the empty-line problem : when the cursor is on a blank line, TreeSitter has no node at point. The indentation engine falls back to the root node as the parent, which typically matches the top-level rule and gives column 0. In neocaml I solved this with a rule that looks at the previous line’s last token to decide indentation: This is the single most important piece of advice. Font-lock and indentation are easy to break accidentally, and manual testing doesn’t scale. Both projects use Buttercup (a BDD testing framework for Emacs) with custom test macros. Font-lock tests insert code into a buffer, run , and assert that specific character ranges have the expected face: Indentation tests insert code, run , and assert the result matches the expected indentation: Integration tests load real source files and verify that both font-locking and indentation survive on the full file. This catches interactions between rules that unit tests miss. has 200+ automated tests and has even more. Investing in test infrastructure early pays off enormously – I can refactor indentation rules with confidence because the suite catches regressions immediately. When I became the maintainer of clojure-mode many years ago, I really struggled with making changes. There were no font-lock or indentation tests, so every change was a leap of faith – you’d fix one thing and break three others without knowing until someone filed a bug report. I spent years working on a testing approach I was happy with, alongside many great contributors, and the return on investment was massive. The same approach – almost the same test macros – carried over directly to when we built the TreeSitter version. And later I reused the pattern again in and . One investment in testing infrastructure, four projects benefiting from it. I know that automated tests, for whatever reason, never gained much traction in the Emacs community. Many popular packages have no tests at all. I hope stories like this convince you that investing in tests is really important and pays off – not just for the project where you write them, but for every project you build after. This one is specific to but applies broadly: compiling TreeSitter queries at runtime is expensive. If you’re building queries dynamically (e.g. with called at mode init time), consider pre-compiling them as values. This made a noticeable difference in ’s startup time. The Emacs community has settled on a suffix convention for TreeSitter-based modes: , , , and so on. This makes sense when both a legacy mode and a TreeSitter mode coexist in Emacs core – users need to choose between them. But I think the convention is being applied too broadly, and I’m afraid the resulting name fragmentation will haunt the community for years. For new packages that don’t have a legacy counterpart, the suffix is unnecessary. I named my packages (not ) and (not ) because there was no prior or to disambiguate from. The infix is an implementation detail that shouldn’t leak into the user-facing name. Will we rename everything again when TreeSitter becomes the default and the non-TS variants are removed? Be bolder with naming. If you’re building something new, give it a name that makes sense on its own merits, not one that encodes the parsing technology in the package name. I think the full transition to TreeSitter in the Emacs community will take 3–5 years, optimistically. There are hundreds of major modes out there, many maintained by a single person in their spare time. Converting a mode from regex to TreeSitter isn’t just a mechanical translation – you need to understand the grammar, rewrite font-lock and indentation rules, handle version compatibility, and build a new test suite. That’s a lot of work. Interestingly, this might be one area where agentic coding tools can genuinely help. The structure of TreeSitter-based major modes is fairly uniform: grammar recipes, font-lock rules, indentation rules, navigation settings, imenu. If you give an AI agent a grammar and a reference to a high-quality mode like , it could probably scaffold a reasonable new mode fairly quickly. The hard parts – debugging grammar quirks, handling edge cases, getting indentation just right – would still need human attention, but the boilerplate could be automated. Still, knowing the Emacs community, I wouldn’t be surprised if a full migration never actually completes. Many old-school modes work perfectly fine, their maintainers have no interest in TreeSitter, and “if it ain’t broke, don’t fix it” is a powerful force. And that’s okay – diversity of approaches is part of what makes Emacs Emacs. TreeSitter is genuinely great for building Emacs major modes. The code is simpler, the results are more accurate, and incremental parsing means everything stays fast even on large files. I wouldn’t go back to regex-based font-locking willingly. But it’s not magical. Grammars are inconsistent across languages, the Emacs APIs are still maturing, you can’t reuse files (yet), and you’ll hit version-specific bugs that require tedious workarounds. The testing story is better than with regex modes – tree structures are more predictable than regex matches – but you still need a solid test suite to avoid regressions. If you’re thinking about writing a TreeSitter-based major mode, do it. The ecosystem needs more of them, and the experience of working with syntax trees instead of regexes is genuinely enjoyable. Just go in with realistic expectations, pin your grammar versions, test against multiple Emacs releases, and build your test suite early. Anyways, I wish there was an article like this one when I was starting out with and , so there you have it. I hope that the lessons I’ve learned along the way will help build better modes with TreeSitter down the road. That’s all I have for you today. Keep hacking! See the excellent scope discussion in the tree-sitter-clojure repo for the rationale.  ↩︎ There’s ongoing discussion in the Emacs community about distributing pre-compiled grammar binaries, but nothing concrete yet.  ↩︎ Regex-based font-locking is fragile. Regexes can’t parse nested structures, so they either under-match (missing valid code) or over-match (highlighting inside strings and comments). Every edge case is another regex, and the patterns become increasingly unreadable over time. Indentation engines are complex. SMIE (the generic indentation engine for non-TreeSitter modes) requires defining operator precedence grammars for the language, which is hard to get right. Custom indentation functions tend to grow into large, brittle state machines. Tuareg’s indentation code, for example, is thousands of lines long. Use to verify the node type at point matches your query. Set to to see which rules are firing. Check the font-lock feature level – your rule might be in level 4 while the user has the default level 3. The features are assigned to levels via . Remember that rule order matters . Without , an earlier rule that already fontified a region will prevent later rules from applying. This can be intentional (e.g. builtin types at level 3 take precedence over generic types) or a source of bugs. Set to – this logs which rule matched for each line, what anchor was computed, and the final column. Use to understand the parent chain. The key question is always: “what is the parent node, and which rule matches it?” Remember that rule order matters for indentation too – the first matching rule wins. A typical set of rules reads top to bottom from most specific to most general: Watch out for the empty-line problem : when the cursor is on a blank line, TreeSitter has no node at point. The indentation engine falls back to the root node as the parent, which typically matches the top-level rule and gives column 0. In neocaml I solved this with a rule that looks at the previous line’s last token to decide indentation: See the excellent scope discussion in the tree-sitter-clojure repo for the rationale.  ↩︎ There’s ongoing discussion in the Emacs community about distributing pre-compiled grammar binaries, but nothing concrete yet.  ↩︎

0 views
baby steps 2 days ago

How Dada enables internal references

In my previous Dada blog post, I talked about how Dada enables composable sharing. Today I’m going to start diving into Dada’s permission system; permissions are Dada’s equivalent to Rust’s borrow checker. Dada aims to exceed Rust’s capabilities by using place-based permissions. Dada lets you write functions and types that capture both a value and things borrowed from that value . As a fun example, imagine you are writing some Rust code to process a comma-separated list, just looking for entries of length 5 or more: One of the cool things about Rust is how this code looks a lot like some high-level language like Python or JavaScript, but in those languages the call is going to be doing a lot of work, since it will have to allocate tons of small strings, copying out the data. But in Rust the values are just pointers into the original string and so is very cheap. I love this. On the other hand, suppose you want to package up some of those values, along with the backing string, and send them to another thread to be processed. You might think you can just make a struct like so… …and then create the list and items and store them into it: But as experienced Rustaceans know, this will not work. When you have borrowed data like an , that data cannot be moved. If you want to handle a case like this, you need to convert from into sending indices, owned strings, or some other solution. Argh! Dada does things a bit differently. The first thing is that, when you create a reference, the resulting type names the place that the data was borrowed from , not the lifetime of the reference . So the type annotation for would say 1 (at least, if you wanted to write out the full details rather than leaving it to the type inferencer): I’ve blogged before about how I would like to redefine lifetimes in Rust to be places as I feel that a type like is much easier to teach and explain: instead of having to explain that a lifetime references some part of the code, or what have you, you can say that “this is a that references the variable ”. But what’s also cool is that named places open the door to more flexible borrows. In Dada, if you wanted to package up the list and the items, you could build a type like so: Note that last line – . We can create a new class and move into it along with , which borrows from list. Neat, right? OK, so let’s back up and talk about how this all works. Let’s start with syntax. Before we tackle the example, I want to go back to the example from previous posts, because it’s a bit easier for explanatory purposes. Here is some Rust code that declares a struct , creates an owned copy of it, and then gets a few references into it. The Dada equivalent to this code is as follows: The first thing to note is that, in Dada, the default when you name a variable or a place is to create a reference. So doesn’t move , as it would in Rust, it creates a reference to the stored in . You could also explicitly write , but that is not preferred. Similarly, creates a reference to the value in the field . (If you wanted to move the character, you would write , not as in Rust.) Notice that I said “creates a reference to the stored in ”. In particular, I did not say “creates a reference to ”. That’s a subtle choice of wording, but it has big implications. The reason I wrote that “creates a reference to the stored in ” and not “creates a reference to ” is because, in Dada, references are not pointers . Rather, they are shallow copies of the value, very much like how we saw in the previous post that a acts like an but is represented as a shallow copy. So where in Rust the following code… …looks like this in memory… in Dada, code like this would look like so Clearly, the Dada representation takes up more memory on the stack. But note that it doesn’t duplicate the memory in the heap, which tends to be where the vast majority of the data is found. This gets at something important. Rust, like C, makes pointers first-class. So given , refers to the pointer and refers to its referent, the . Dada, like Java, goes another way. is a value – including in memory representation! The difference between a , , and is not in their memory layout, all of them are the same, but they differ in whether they own their contents . 2 So in Dada, there is no operation to go from “pointer” to “referent”. That doesn’t make sense. Your variable always contains a string, but the permissions you have to use that string will change. In fact, the goal is that people don’t have to learn the memory representation as they learn Dada, you are supposed to be able to think of Dada variables as if they were all objects on the heap, just like in Java or Python, even though in fact they are stored on the stack. 3 In Rust, you cannot move values while they are borrowed. So if you have code like this that moves into … …then this code only compiles if is not used again: There are two reasons that Rust forbids moves of borrowed data: Neither of these apply to Dada: OK, let’s revisit that Rust example that was giving us an error. When we convert it to Dada, we find that it type checks just fine: Woah, neat! We can see that when we move from into , the compiler updates the types of the variables around it. So actually the type of changes to . And then when we move from to , that’s totally valid. In PL land, updating the type of a variable from one thing to another is called a “strong update”. Obviously things can get a bit complicated when control-flow is involved, e.g., in a situation like this: OK, let’s take the next step. Let’s define a Dada function that takes an owned value and another value borrowed from it, like the name, and then call it: We could call this function like so, as you might expect: So…how does this work? Internally, the type checker type-checks a function call by creating a simpler snippet of code, essentially, and then type-checking that . It’s like desugaring but only at type-check time. In this simpler snippet, there are a series of statements to create temporary variables for each argument. These temporaries always have an explicit type taken from the method signature, and they are initialized with the values of each argument: If this type checks, then the type checker knows you have supplied values of the required types, and so this is a valid call. Of course there are a few more steps, but that’s the basic idea. Notice what happens if you supply data borrowed from the wrong place: This will fail to type check because you get: So now, if we go all the way back to our original example, we can see how the example worked: Basically, when you construct a , that’s “just another function call” from the type system’s perspective, except that in the signature is handled carefully. I should be clear, this system is modeled in the dada-model repository, which implements a kind of “mini Dada” that captures what I believe to be the most interesting bits. I’m working on fleshing out that model a bit more, but it’s got most of what I showed you here. 5 For example, here is a test that you get an error when you give a reference to the wrong value. The “real implementation” is lagging quite a bit, and doesn’t really handle the interesting bits yet. Scaling it up from model to real implementation involves solving type inference and some other thorny challenges, and I haven’t gotten there yet – though I have some pretty interesting experiments going on there too, in terms of the compiler architecture. 6 I believe we could apply most of this system to Rust. Obviously we’d have to rework the borrow checker to be based on places, but that’s the straight-forward part. The harder bit is the fact that is a pointer in Rust, and that we cannot readily change. However, for many use cases of self-references, this isn’t as important as it sounds. Often, the data you wish to reference is living in the heap, and so the pointer isn’t actually invalidated when the original value is moved. Consider our opening example. You might imagine Rust allowing something like this in Rust: In this case, the data is heap-allocated, so moving the string doesn’t actually invalidate the value (it would invalidate an value, interestingly). In Rust today, the compiler doesn’t know all the details of what’s going on. has a impl and so it’s quite opaque whether is heap-allocated or not. But we are working on various changes to this system in the Beyond the goal, most notably the Field Projections work. There is likely some opportunity to address this in that context, though to be honest I’m behind in catching up on the details. I’ll note in passing that Dada unifies and into one type as well. I’ll talk in detail about how that works in a future blog post.  ↩︎ This is kind of like C++ references (e.g., ), which also act “as if” they were a value (i.e., you write , not ), but a C++ reference is truly a pointer, unlike a Dada ref.  ↩︎ This goal was in part inspired by a conversation I had early on within Amazon, where a (quite experienced) developer told me, “It took me months to understand what variables are in Rust”.  ↩︎ I explained this some years back in a talk on Polonius at Rust Belt Rust , if you’d like more detail.  ↩︎ No closures or iterator chains!  ↩︎ As a teaser, I’m building it in async Rust, where each inference variable is a “future” and use “await” to find out when other parts of the code might have added constraints.  ↩︎ References are pointers, so those pointers may become invalidated. In the example above, points to the stack slot for , so if were to be moved into , that makes the reference invalid. The type system would lose track of things. Internally, the Rust borrow checker has a kind of “indirection”. It knows that is borrowed for some span of the code (a “lifetime”), and it knows that the lifetime in the type of is related to that lifetime, but it doesn’t really know that is borrowed from in particular. 4 Because references are not pointers into the stack, but rather shallow copies, moving the borrowed value doesn’t invalidate their contents. They remain valid. Because Dada’s types reference actual variable names, we can modify them to reflect moves. I’ll note in passing that Dada unifies and into one type as well. I’ll talk in detail about how that works in a future blog post.  ↩︎ This is kind of like C++ references (e.g., ), which also act “as if” they were a value (i.e., you write , not ), but a C++ reference is truly a pointer, unlike a Dada ref.  ↩︎ This goal was in part inspired by a conversation I had early on within Amazon, where a (quite experienced) developer told me, “It took me months to understand what variables are in Rust”.  ↩︎ I explained this some years back in a talk on Polonius at Rust Belt Rust , if you’d like more detail.  ↩︎ No closures or iterator chains!  ↩︎ As a teaser, I’m building it in async Rust, where each inference variable is a “future” and use “await” to find out when other parts of the code might have added constraints.  ↩︎

0 views

Est-ce que ChatGPT sait ce qu'est une question?

J’expliquais récemment à un ami que ChatGPT, dans son essence, n’est « qu’un » modèle de prédiction du mot suivant, celui qui vient après une suite d’autres mots. Ainsi, quand on lui demande « Quelle est la capitale de la France ? », il ne répond pas (vraiment) à la question : il complète plutôt une séquence de mots sur laquelle il a été entraîné, en profondeur et avec une très grande efficacité.

0 views
The Coder Cafe 3 days ago

Build Your Own Key-Value Storage Engine—Week 7

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Over the last few weeks, you refined your LSM tree to introduce leveling. In case of a key miss, the process requires the following steps: Lookup from the memtable. Lookup from all the L0 SSTables. Lookup from one L1 SSTable. Lookup from one L2 SSTable. Last week, you optimized the lookups by introducing block-based SSTables and indexing, but a lookup is still not a “free” operation. Worst case, it requires fetching two pages (one for the index block and one for the data block) to find out that a key is missing in an SSTable. This week, you will optimize searches by introducing a “tiny” level of caching per SSTable. If you’re an avid reader of The Coder Cafe 1 , we already discussed a great candidate for such a cache: One that doesn’t consume too much memory to make sure we don’t increase space amplification drastically. One that is fast enough so that a lookup doesn’t introduce too much overhead, especially if we have to check a cache before making any lookup in an SSTable. You will implement a cache using Bloom filters : a space-efficient, probabilistic data structure to check for set membership. A Bloom filter can return two possible answers: The element is definitely not in the set (no false negatives). The element may be in the set (false positives are possible). In addition to optimizing SSTable lookups, you will also optimize your memtable. In week 2, you implemented a memtable using a hashtable. Let’s get some perspective to understand the problems of using a hashtable: A memtable buffers writes. As it’s the main entry point for writes, a write has to be fast. → OK: a hashtable has average inserts, plus ( : the length of the key) for hashing. For reads, doing a key lookup has to be fast → OK: average lookups, plus to hash. Doing range scanning operations (week 5, optional work), such as: “ Give me the list of keys between bar and foo “ → A hashtable, because it’s not an ordered data structure, is terrible: you end up touching everything so with the number of elements in the hashtable. Flush to L0 → A hashtable isn’t ordered, so it requires sorting all the keys ( ) with n the number of elements) to produce the SSTables. Because of these negative points, could we find a better data structure? Yes! This week, you will switch the memtable to a radix trie (see Further Notes for a discussion on alternative data structures). A trie is a tree-shaped data structure usually used to store strings efficiently. The common example to illustrate a trie is to store a dictionary. For example, suppose you want to store these two words: Despite that starts with the same four letters, you need to store a total of 4 + 5 = 9 letters. Tries optimize the storage required by sharing prefixes. Each node stores one letter. Here’s an example of a trie storing these two words in addition to the word foo ( nodes represent the end of a word): As you can see, we didn’t duplicate the first four letters of to store . In this very example, instead of storing 9 letters for and , we stored only five letters. Yet, you’re not going to implement a “basic” trie for your memtable; instead, you will implement a compressed trie called a radix trie (also known as a patricia 2 trie). Back to the previous example, storing one node (one square) has an overhead. It usually means at least one extra field to store the next element, usually a pointer. In the previous example, we needed 11 nodes in total, but what if we could compress the number of nodes required? The idea is to combine nodes with a single child: This new trie stores the exact same information, except it requires 6 nodes instead of 11. That’s what radix tries are about. To summarize the benefits of switching a memtable from a hashtable to a radix trie: Ordered by design: Tries keep keys in order and make prefix/range lookups natural, which helps for and for streaming a sorted flush. No rebalancing/rehashing pauses: The shape doesn’t depend on insertion order, and operations don’t need rebalancing; you avoid periodic rehash work. Prefix compression: A radix trie can cut duplicated key bytes in the memtable, reducing in-memory space. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Let’s size the Bloom filter. You will target: (false-positive rate) = 1% (max elements per SSTable) = 1,953 (hash functions) = 5 Using the formula from the Bloom Filters post: We get ≈ 19,230 bits, i.e., 2,404 B. We will round up to 2,496 B (39 × 64 B), so the bitset is a whole number of cache lines. NOTE : Using =7 would shave only ~2–3% space for ~40% more hash work, so =5 is a good trade-off. To distribute elements across the bitvector, you will use the following approach. You will use xxHash64 with two different constant seeds to get two base hashes, then derive k indices by double hashing (pseudo-code): The required changes to introduce Bloom filters: For each SSTable in the MANIFEST, cache its related Bloom filter in memory. Since each Bloom filter requires only a small amount of space, this optimization has a minimal memory footprint. For example, caching 1,000 Bloom filters of the type you designed requires less than 2.5 MB of memory. SSTable creation: For each new SSTable you write, initialize an empty bitvector of 2,496 B. Build the Bloom filter in memory as you emit the keys (including tombstones): Compute based on the key. For each , set bit at position . When the SSTable is done, persist a sidecar file next to it (e.g., and ) and the file. Update the cache containing the Bloom filters. Compaction: Delete from memory the Bloom filters corresponding to deleted SSTables. Before reading an SSTable: Compute based on the key. If all the bits of are set: The key may be present, therefore, proceed with your normal lookup in the SSTable. Otherwise: Skip this SSTable. Now, let’s replace your hashtable with a trie. : Compressed edge fragment. : A map keyed by the next character after to a node. : An enum with the different possible values: : The node is just a prefix, no full key ends here. : A full key exists at this node. : This key was explicitly deleted. : If is , the corresponding value. Root is a sentinel node with an empty . Walk from the root, matching the longest common prefix against . If partial match in the middle of an edge, split once: Create a parent with the common part, two children: the old suffix and the new suffix. Descend via the next child (next unmatched character). At the terminal node: set and Walk edges by longest-prefix match. If an edge doesn’t match, return not found. At the terminal node: If : return If or , return not found. Walk as in . If the path doesn’t fully exist, create the missing suffix nodes with so that a terminal node exists. At the terminal node: set (you may have to clear ). Flush process: In-order traversal: : Emit tombstone. : Emit nothing. There are no changes to the client. Run it against the same file ( put-delete.txt ) to validate that your changes are correct. Use per-SSTable random seeds for the Bloom hash functions. Persist them in the Bloom filter files. In Bloom Filters , you introduced blocked Bloom filters, a variant that optimizes spatial locality by: Dividing the bloom filter into contiguous blocks, each the size of a cache line. Restricting each query to a single block to ensure all bit lookups stay within the same cache line. Switch to blocked Bloom filters and see the impacts on latency and throughput. If you implemented the operation from week 5 (optional work), wire it to your memtable radix trie. That’s it for this week! You optimized lookups with per-SSTable Bloom filters and switched the memtable to a radix trie, an ordered data structure. Since the beginning of the series, everything you built has been single-threaded, and flush/compaction remains stop-the-world. In two weeks, you will finally tackle the final boss of LSM trees: concurrency. If you want to dive more into tries, Trie Memtables in Cassandra is a paper that explains why Cassandra moved from a skip list + B-tree memtable to a trie, and what it changed for topics such as GC and CPU locality. A popular variant of radix trie is the Adaptive Radix Tree (ART): it dynamically resizes node types based on the number of children to stay compact and cache-friendly, while supporting fast in-memory lookups, inserts, and deletes. This paper (or this summary ) explores the topic in depth. You should also be aware that tries aren’t the only option for memtables, as other data structures exist. For example, RocksDB relies on a skip list. See this resource for more information. About Bloom filters, some engines keep a Bloom filter not only per SSTable but per data-block range as well. This was the case for RocksDB’s older block-based filter format ( source ). RocksDB later shifted toward partitioned index/filters, which partition the index and full-file filter into smaller blocks with a top-level directory for on-demand loading. The official doc delves into the new approach. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. I’m sure you are. Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Over the last few weeks, you refined your LSM tree to introduce leveling. In case of a key miss, the process requires the following steps: Lookup from the memtable. Lookup from all the L0 SSTables. Lookup from one L1 SSTable. Lookup from one L2 SSTable. One that doesn’t consume too much memory to make sure we don’t increase space amplification drastically. One that is fast enough so that a lookup doesn’t introduce too much overhead, especially if we have to check a cache before making any lookup in an SSTable. The element is definitely not in the set (no false negatives). The element may be in the set (false positives are possible). A memtable buffers writes. As it’s the main entry point for writes, a write has to be fast. → OK: a hashtable has average inserts, plus ( : the length of the key) for hashing. For reads, doing a key lookup has to be fast → OK: average lookups, plus to hash. Doing range scanning operations (week 5, optional work), such as: “ Give me the list of keys between bar and foo “ → A hashtable, because it’s not an ordered data structure, is terrible: you end up touching everything so with the number of elements in the hashtable. Flush to L0 → A hashtable isn’t ordered, so it requires sorting all the keys ( ) with n the number of elements) to produce the SSTables. As you can see, we didn’t duplicate the first four letters of to store . In this very example, instead of storing 9 letters for and , we stored only five letters. Yet, you’re not going to implement a “basic” trie for your memtable; instead, you will implement a compressed trie called a radix trie (also known as a patricia 2 trie). Back to the previous example, storing one node (one square) has an overhead. It usually means at least one extra field to store the next element, usually a pointer. In the previous example, we needed 11 nodes in total, but what if we could compress the number of nodes required? The idea is to combine nodes with a single child: This new trie stores the exact same information, except it requires 6 nodes instead of 11. That’s what radix tries are about. To summarize the benefits of switching a memtable from a hashtable to a radix trie: Ordered by design: Tries keep keys in order and make prefix/range lookups natural, which helps for and for streaming a sorted flush. No rebalancing/rehashing pauses: The shape doesn’t depend on insertion order, and operations don’t need rebalancing; you avoid periodic rehash work. Prefix compression: A radix trie can cut duplicated key bytes in the memtable, reducing in-memory space. (false-positive rate) = 1% (max elements per SSTable) = 1,953 (hash functions) = 5 Startup: For each SSTable in the MANIFEST, cache its related Bloom filter in memory. Since each Bloom filter requires only a small amount of space, this optimization has a minimal memory footprint. For example, caching 1,000 Bloom filters of the type you designed requires less than 2.5 MB of memory. SSTable creation: For each new SSTable you write, initialize an empty bitvector of 2,496 B. Build the Bloom filter in memory as you emit the keys (including tombstones): Compute based on the key. For each , set bit at position . When the SSTable is done, persist a sidecar file next to it (e.g., and ) and the file. Update the cache containing the Bloom filters. Compaction: Delete from memory the Bloom filters corresponding to deleted SSTables. Lookup: Before reading an SSTable: Compute based on the key. If all the bits of are set: The key may be present, therefore, proceed with your normal lookup in the SSTable. Otherwise: Skip this SSTable. : Compressed edge fragment. : A map keyed by the next character after to a node. : An enum with the different possible values: : The node is just a prefix, no full key ends here. : A full key exists at this node. : This key was explicitly deleted. : If is , the corresponding value. : Walk from the root, matching the longest common prefix against . If partial match in the middle of an edge, split once: Create a parent with the common part, two children: the old suffix and the new suffix. Descend via the next child (next unmatched character). At the terminal node: set and : Walk edges by longest-prefix match. If an edge doesn’t match, return not found. At the terminal node: If : return If or , return not found. : Walk as in . If the path doesn’t fully exist, create the missing suffix nodes with so that a terminal node exists. At the terminal node: set (you may have to clear ). In-order traversal: : Emit . : Emit tombstone. : Emit nothing. Dividing the bloom filter into contiguous blocks, each the size of a cache line. Restricting each query to a single block to ensure all bit lookups stay within the same cache line.

0 views
Brain Baking 3 days ago

Managing Multiple Development Ecosystem Installs

In the past year, I occasionally required another Java Development Kit besides the usual one defined in to build certain modules against older versions and certain modules against bleeding edge versions. In the Java world, that’s rather trivial thanks to IntelliJ’s project settings: you can just interactively click through a few panels to install another JDK flavour and get on with your life. The problem starts once you close IntelliJ and want to do some command line work. Luckily, SDKMan , the “The Software Development Kit Manager”, has got you covered. Want to temporarily change the Java compiler for the current session? . Want to change the default? . Easy! will point to , a symlink that gets rewired by SDKMan. A Java project still needs a dependency management system such as Gradle, but you don’t need to install a global specific Gradle version. Instead, just points to the jar living at . Want another one? Change the version number in and it’ll be auto-downloaded. Using Maven instead? Tough luck! Just kidding: don’t use but , the Maven Wrapper that works exactly the same. .NET comes with built-in support to change the toolchain (and specify the runtime target), more or less equal to a typical Gradle project. Actually, the command can both build list its own installed toolchains: . Yet installing a new one is done by hand. You switch toolchains by specifying the SDK version in a global.json file and tell the compiler to target a runtime in the file. In Python , the concept of virtual environments should solve that problem: each project creates its own that points to a specific version of Python. Yet I never really enjoyed working with this system: you’ve got , , , , , … That confusing mess is solved with a relatively new kid in town: uv , “An extremely fast Python package and project manager, written in Rust.” It’s more than as it also manages your multiple development ecosystems. Want to install a new Python distribution? . Want to temporarily change the Python binary for the current session? . Creating a new project with will also create a virtual environment, meaning you don’t run your stuff with but with that auto-selects the correct version. Lovely! What about JS/TS and Node ? Of course there the options are many: there’s nvm —but that’s been semi-abandoned ?—and of course someone built a Rust-alternative called fnm , but you can also manage Node versions with . I personally don’t care and use instead, which is aimed at not managing but replacing the Node JS runtime. But who will manage the bun versions? PHP is more troublesome because it’s tied to a web server. Solutions such as Laravel Nerd combine both PHP and web server dependency management into a sleek looking tool that’s “free”. Of course you can let your OS-system package manager manage your SDK packages: and then . That definitely feels a bit more hacky. For PHP, I’d even consider Mise. Speaking of which… Why use a tool that limits the scope to one specific development environment? If you’re a full-stack developer you’ll still need to know how to manage both your backend and frontend dev environment. That’s not needed with Mise-en-place , a tool that manages all these things . Asdf is another popular one that manages any development environment that doesn’t have its own dedicated tool. I personally think that’s an extraction layer too far. You’ll still need to dissect these tools separately in case things go wrong. Some ecosystems come with built-in multi-toolkit support, such as Go : simply installs into your directory 1 . That means you’ve installed the compiler (!) in exactly the same way as any other (global) dependency, how cool is that? The downside of this is that you’ll have to remember to type instead of so there’s no symlink rewiring involved. or can do that—or the above Mise. But wait, I hear you think, why not just use containers to isolate everything? Spinning up containers to build in an isolated environment: sure, that’s standard practice in continuous integration servers, but locally? Really? Really. Since the inception of Dev Containers by Microsoft, specifically designed for VS Code, working “inside” a container is as easy as opening up the project and “jumping inside the container”. From that moment on, your terminal, IntelliSense, … runs inside that container. That means you won’t have to wrestle Node/PHP versions on your local machine, and you can even use the same container to build your stuff on the CI server. That also means your newly onboarded juniors don’t need to wrestle through a week of “installing stuff”. Microsoft open sourced the Dev Container specification and the JetBrains folks jumped the gun: it has support for but I have yet to try it out. Of course the purpose was to integrate this into GitHub: their cloud-based IDE Codespaces makes heavy use of the idea—and yes, there’s an open-source alternative . Is there Emacs support for Dev Containers? Well, Tramp allows you to remotely open and edit any file, also inside a container . So just install the Dev Container CLI, run it and point Emacs to a source file inside it. From then on, everything Emacs does—including the LSP server, compilation, …—happens inside that container. That means you’ll also have to install your LSP binaries in there. devcontainer.el just wraps complication commands to execute inside the container whilst still letting you edit everything locally in case you prefer a hybrid approach. And then there’s Nix and devenv . Whatever that does, it goes way over my head! You’ll still have to execute after that.  ↩︎ Related topics: / containers / By Wouter Groeneveld on 26 February 2026.  Reply via email . You’ll still have to execute after that.  ↩︎

0 views

Notes on Linear Algebra for Polynomials

We’ll be working with the set P_n(\mathbb{R}) , real polynomials of degree \leq n . Such polynomials can be expressed using n+1 scalar coefficients a_i as follows: The set P_n(\mathbb{R}) , along with addition of polynomials and scalar multiplication form a vector space . As a proof, let’s review how the vector space axioms are satisfied. We’ll use p(x) , q(x) and r(x) as arbitrary polynomials from the set P_n(\mathbb{R}) for the demonstration. Similarly, a and b are arbitrary scalars in . Associativity of vector addition : This is trivial because addition of polynomials is associative [1] . Commutativity is similarly trivial, for the same reason: Commutativity of vector addition : Identity element of vector addition : The zero polynomial 0 serves as an identity element. \forall p(x)\in P_n(\mathbb{R}) , we have 0 + p(x) = p(x) . Inverse element of vector addition : For each p(x) , we can use q(x)=-p(x) as the additive inverse, because p(x)+q(x)=0 . Identity element of scalar multiplication The scalar 1 serves as an identity element for scalar multiplication. For each p(x) , it’s true that 1\cdot p(x)=p(x) . Associativity of scalar multiplication : For any two scalars a and b : Distributivity of scalar multiplication over vector addition : For any p(x) , q(x) and scalar a : Distributivity of scalar multiplication over scalar addition : For any scalars a and b and polynomial p(x) : Since we’ve shown that polynomials in P_n(\mathbb{R}) form a vector space, we can now build additional linear algebraic definitions on top of that. A set of k polynomials p_k(x)\in P_n(\mathbb{R}) is said to be linearly independent if implies a_i=0 \quad \forall i . In words, the only linear combination resulting in the zero vector is when all coefficients are 0. As an example, let’s discuss the fundamental building blocks of polynomials in P_n(\mathbb{R}) : the set \{1, x, x^2, \dots x^n\} . These are linearly independent because: is true only for zero polynomial, in which all the coefficients a_i=0 . This comes from the very definition of polynomials. Moreover, this set spans the entire P_n(\mathbb{R}) because every polynomial can be (by definition) expressed as a linear combination of \{1, x, x^2, \dots x^n\} . Since we’ve shown these basic polynomials are linearly independent and span the entire vector space, they are a basis for the space. In fact, this set has a special name: the monomial basis (because a monomial is a polynomial with a single term). Suppose we have some set polynomials, and we want to know if these form a basis for P_n(\mathbb{R}) . How do we go about it? The idea is using linear algebra the same way we do for any other vector space. Let’s use a concrete example to demonstrate: Is the set Q a basis for P_n(\mathbb{R}) ? We’ll start by checking whether the members of Q are linearly independent. Write: By regrouping, we can turn this into: For this to be true, the coefficient of each monomial has to be zero; mathematically: In matrix form: We know how to solve this, by reducing the matrix into row-echelon form . It’s easy to see that the reduced row-echelon form of this specific matrix is I , the identity matrix. Therefore, this set of equations has a single solution: a_i=0 \quad \forall i [2] . We’ve shown that the set Q is linearly independent. Now let’s show that it spans the space P_n(\mathbb{R}) . We want to analyze: And find the coefficients a_i that satisfy this for any arbitrary , and \gamma . We proceed just as before, by regrouping on the left side: and equating the coefficient of each power of separately: If we turn this into matrix form, the matrix of coefficients is exactly the same as before. So we know there’s a single solution, and by rearranging the matrix into I , the solution will appear on the right hand side. It doesn’t matter for the moment what the actual solution is, as long as it exists and is unique. We’ve shown that Q spans the space! Since the set Q is linearly independent and spans P_n(\mathbb{R}) , it is a basis for the space. I’ve discussed inner products for functions in the post about Hilbert space . Well, polynomials are functions , so we can define an inner product using integrals as follows [3] : Where the bounds a and b are arbitrary, and could be infinite. Whenever we deal with integrals we worry about convergence; in my post on Hilbert spaces, we only talked about L^2 - the square integrable functions. Most polynomials are not square integrable, however. Therefore, we can restrict this using either: Let’s use the latter, and restrict the bounds into the range [-1,1] , setting w(x)=1 . We have the following inner product: Let’s check that this satisfies the inner product space conditions. Conjugate symmetry : Since real multiplication is commutative, we can write: We deal in the reals here, so we can safely ignore complex conjugation. Linearity in the first argument : Let p_1,p_2,q\in P_n(\mathbb{R}) and a,b\in \mathbb{R} . We want to show that Expand the left-hand side using our definition of inner product: The result is equivalent to a\langle p_1,q\rangle +b\langle p_2,q\rangle . Positive-definiteness : We want to show that for nonzero p\in P_n(\mathbb{R}) , we have \langle p, p\rangle > 0 . First of all, since p(x)^2\geq0 for all , it’s true that: What about the result 0 though? Well, let’s say that Since p(x)^2 is a non-negative function, this means that the integral of a non-negative function ends up being 0. But p(x) is a polynomial, so it’s continuous , and so is p(x)^2 . If the integral of a continuous non-negative function is 0, it means the function itself is 0. Had it been non-zero in any place, the integral would necessarily have to be positive as well. We’ve proven that \langle p, p\rangle=0 only when p is the zero polynomial. The positive-definiteness condition is satisfied. In conclusion, P_n(\mathbb{R}) along with the inner product we’ve defined forms an inner product space . Now that we have an inner product, we can define orthogonality on polynomials: two polynomials p,q are orthogonal (w.r.t. our inner product) iff Contrary to expectation [4] , the monomial basis polynomials are not orthogonal using our definition of inner product. For example, calculating the inner product for 1 and x^2 : There are other sets of polynomials that are orthogonal using our inner product. For example, the Legendre polynomials ; but this is a topic for another post. A special weight function w(x) to make sure the inner product integral converges Set finite bounds on the integral, and then we can just set w(x)=1 .

0 views
Evan Hahn 3 days ago

Introducing gzpeek, a tool to parse gzip metadata

In short: gzip streams contain metadata, like the operating system that did the compression. I built a tool to read this metadata. I love reading specifications for file formats. They always have little surprises. I had assumed that the gzip format was strictly used for compression. My guess was: a few bytes of bookkeeping, the compressed data, and maybe a checksum. But then I read the spec . The gzip header holds more than I expected! In addition to two bytes identifying the data as gzip, there’s also: The operating system that did the compression. This was super surprising to me! There’s a single byte that identifies the compressor’s OS: for Windows, for the Amiga, for Unix, and many others I’d never heard of. Compressors can also set for an “unknown” OS. Different tools set this value differently. zlib, the most popular gzip library, changes the flag based on the operating system . (It even defines some OSes that aren’t in the spec, like for BeOS.) Many other libraries build atop zlib and inherit this behavior, such as .NET’s , Ruby’s , and PHP’s . Java’s , JavaScript’s , and Go’s set the OS to “unknown” regardless of operating system. Some, like Zopfli and Apache’s , hard-code it to “Unix” no matter what. All that to say: in practice, you can’t rely on this flag to determine the source OS, but it can give you a hint. Modification time for the data. This can be the time that compression started or the modification time of the file. It can also be set to if you don’t want to communicate a time. This is represented as an unsigned 32-bit integer in the Unix format. That means it can represent any moment between January 1, 1970 and February 7, 2106. I hope we devise a better compression format in the next ~80 years, because we can only represent dates in that range. In my testing, many implementations set this to . A few set it to the current time or the file’s modification time—the command is one of these. FTEXT , a boolean flag vaguely indicating that the data is “probably ASCII text”. When I say vaguely, I mean it: the spec “deliberately [does] not specify the algorithm used to set this”. This is apparently for systems which have different storage formats for ASCII and binary data. In all my testing, nobody sets this flag to anything but . An extra flag indicating how hard the compressor worked. signals that it was compressed with max compression (e.g., ), for the fastest algorithm, and for everything else. In practice, zlib and many others set this correctly per the spec, but some tools hard-code it to . And as far as I can tell, this byte is not used during decompression, so it doesn’t really matter. The original file name . For example, when I run , the name is set to . This field is optional, so many tools don’t set it, but the command line tool does. You can disable that with . A comment . This optional field is seldom used, and many decompressors ignore it. But you could add a little comment if you want. Extra arbitrary data . If the other metadata wasn’t enough, you can stuff whatever you want into arbitrary subfields. Each subfield has a two-byte identifier and then 0 or more bytes of additional info. That’s way more info than I expected! I was intrigued by this metadata and I’ve been wanting to learn Zig , so I wrote gzpeek . gzpeek is a command-line tool that lets you inspect the metadata of gzip streams. Here’s how to read metadata from a gzipped file: It extracts everything I listed above: the operating system, original file name, modification time, and more. I used it a bunch when surveying different gzip implementations. Give it a try, and let me know what gzip metadata you find. The operating system that did the compression. This was super surprising to me! There’s a single byte that identifies the compressor’s OS: for Windows, for the Amiga, for Unix, and many others I’d never heard of. Compressors can also set for an “unknown” OS. Different tools set this value differently. zlib, the most popular gzip library, changes the flag based on the operating system . (It even defines some OSes that aren’t in the spec, like for BeOS.) Many other libraries build atop zlib and inherit this behavior, such as .NET’s , Ruby’s , and PHP’s . Java’s , JavaScript’s , and Go’s set the OS to “unknown” regardless of operating system. Some, like Zopfli and Apache’s , hard-code it to “Unix” no matter what. All that to say: in practice, you can’t rely on this flag to determine the source OS, but it can give you a hint. Modification time for the data. This can be the time that compression started or the modification time of the file. It can also be set to if you don’t want to communicate a time. This is represented as an unsigned 32-bit integer in the Unix format. That means it can represent any moment between January 1, 1970 and February 7, 2106. I hope we devise a better compression format in the next ~80 years, because we can only represent dates in that range. In my testing, many implementations set this to . A few set it to the current time or the file’s modification time—the command is one of these. FTEXT , a boolean flag vaguely indicating that the data is “probably ASCII text”. When I say vaguely, I mean it: the spec “deliberately [does] not specify the algorithm used to set this”. This is apparently for systems which have different storage formats for ASCII and binary data. In all my testing, nobody sets this flag to anything but . An extra flag indicating how hard the compressor worked. signals that it was compressed with max compression (e.g., ), for the fastest algorithm, and for everything else. In practice, zlib and many others set this correctly per the spec, but some tools hard-code it to . And as far as I can tell, this byte is not used during decompression, so it doesn’t really matter. The original file name . For example, when I run , the name is set to . This field is optional, so many tools don’t set it, but the command line tool does. You can disable that with . A comment . This optional field is seldom used, and many decompressors ignore it. But you could add a little comment if you want. Extra arbitrary data . If the other metadata wasn’t enough, you can stuff whatever you want into arbitrary subfields. Each subfield has a two-byte identifier and then 0 or more bytes of additional info.

0 views
Hugo 4 days ago

The B2BigB Syndrome: How Large Corporations Quietly Kill Startups

In the late 2000s, I worked at a software publisher and one of my colleagues started a company. It was a kind of corporate Second Life , where an avatar could move around and trigger discussions with other people. I don't remember the details anymore, but with hindsight and probably lots of exaggeration, I'd say it was like Gather but 15 years ahead of its time. The application seemed to work well and the company was lining up meetings with major corporations that seemed super interested in rolling it out across their enterprise. We're talking about big banks, major energy suppliers, really serious companies. Except it dragged on. A month. A quarter. A year. Then two. And eventually the company died waiting for an actual signature and, incidentally, some cash. My friend unfortunately ran into the infamous B2BigB syndrome, this curse (a French one?) that tends to kill a lot of companies every year. So if you're starting a company today or thinking about it, I invite you to think twice before prioritizing this segment, and that's what we're going to talk about today. First, I need to define this acronym. In the business world, we tend to segment companies based on the customers they target: For example, Netflix is B2C and Jira is B2B. Among all this you have plenty of nuances. Microsoft sells in both B2C and B2B, for example. You have C2C platforms (exchanges between individuals). But let's keep it simple and just talk about B2C and B2B. Except "B" is broad. Between a 5-person company and a 40,000-person conglomerate, the way you sell to the two is very different. And in this category, there's a category of death: large corporations. It's hard to really say when a large corporation begins, but you recognize them easily. A large corporation starts when a decision requires a ton of meetings, a quarter, a steering committee and board approval or a purchasing department sign-off. In practice, you can even have 500-person companies that behave this way, even if it's more common starting at 1,000. But in any case, it gets worse with size. A quarter can become a year, or even 2, or even 5 (and I swear I've seen sales cycles that long). Anyway, that's what I call the BigB (the big B's). The big advantage of BigB's is, in theory, the ability to buy expensive because we're talking about deployment across an entire large corporation, so volumes that make most startups' eyes light up. Except that, it's often a mirage. The moment you start looking at costs and margins, not to mention all the associated risks. Working with a large corporation is often synonymous with complexity, and that complexity is financed by specialists. You have to respond to costly processes (a 200-page security questionnaire, legal questionnaires, framework contracts, ISO certification this and that) that often requires a lot of specialists (lawyers, security experts, finance people, etc.). And that's just to get through the first step of the sales cycle. To sell to a large corporation, you need to be prepared to spend a fortune. By the way, it's worth noting that this doesn't prevent these large corporations from regularly appearing on the monthly data breach list. Because no, churning out Excel questionnaires is not synonymous with security quality. After that, you're quickly going to fall into the spiral of quarterly meetings with a bunch of people you'll only see once in your life, some of whom will take advantage of their temporary power to take out their frustrations and pet peeves on you. And since you'll be in a weak position, well... This time is time not spent on the product. Of course it's normal to spend time on sales, but we're talking about quarterly meetings to prepare, with McKinsey-style PowerPoints (you sometimes even see scale-ups calling in consulting firms to fill out these documents) that will require weeks of preparation. Again, to sell to a large corporation, you need to already be prepared to spend a fortune and wait ages. But let's imagine you've finally got the green light to deploy in a large corporation. The contract is signed. Now it's up to you to figure out adoption. Actually, this is the beginning of a second nightmare. A year has passed since the beginning of the sales cycle. All your previous contacts are gone. They might have been contractors who left the company. Or executives who got transferred to other branches of the group. And now you have to find the people capable of helping you deploy your software because without a doubt your revenue depends on how much the software is actually used. No deployment, no money. So you're going to need a dedicated team of salespeople capable of navigating complex bureaucracy to find the right contacts, and maybe even a dedicated implementation team. Your costs are going to explode and you still won't have made anything at this stage. With a bit of luck, and because you were smart enough to get a payment at signature, you'll eventually issue your first invoice. That will be paid 8 months later, end of month . The first 3 months having caused countless incidents because a purchase order needed to be signed and you had to go through 3 different departments for that. Bad luck, your cash flow is starting to choke. You reach the end of the first year and then the purchasing department will come see you to renegotiate the contract, knowing full well that, in theory, they're your biggest client so it would be natural to do them a favor. In short, 2 years later, you've spent a fortune, your cash flow is negative, and your margin has melted like snow during a World Cup ski race in Saudi Arabia. OK, let's say I'm exaggerating and that despite everything, this contract allowed you to instead cross a threshold, to have an impressive signature to put forward and life continues for your startup/scaleup. Actually, you don't know it yet, but you've invited a Trojan horse into your company. Working with a large corporation means accepting the complexity inherent to that business. If it took you 2 years to sign a contract with them, imagine that everything else takes the same time. Your product has to evolve to fit their way of working. You'll be asked for 12-level approval workflows, software integrations with ERPs, broken enterprise SSO, integrations with legacy systems from the 90s. And every company has its own internal jargon that you'll be asked to force into your software. You'll invoice in units of work, have a "purchasing" role in your RBAC schemas (authorization systems), in short, in reality, you're going to develop an extension of your first client's IT infrastructure with all its constraints, its complexity, its slow onboarding, and its costs. And when you have a client representing 80% of your revenue (and even from 20% onwards it really starts to matter), you can hardly say no. So your roadmap is regularly hijacked by salespeople dedicated to this client, and globally a product that drifts away from the mass market. And that's normal, hey, I'm not throwing stones at that team. If you've dedicated people to a client, it's normal they try to influence how you build the product and even if the requests are absurd. Because that team doesn't have the perspective needed to judge. And when the roadmap is regularly sidetracked, it's also a huge amount of customization debt that will end up slowing the entire product. This big client may have allowed you to double your headcount. But 3/4 of the company will end up working for them, and will develop their own software culture, less UX sense, less sensitivity to product performance (no point working on acquisition or conversion, for example). All enterprise software has terrible UX, because first, that's not what drives sales, and second, because after burning money in the sales process, certification and onboarding, you have to make savings somewhere, often on the product which is no longer really central to the relationship with this client. They'll try to reassure you by saying no, it's important, but actually, the product at that point has become a cost center that needs to be optimized to not lose more margin. Margin eaten by the consulting firm that helped you determine your deployment strategy and pricing... But even when you "improve" your product for this client, you're going to continuously degrade it for all the others you thought you'd attract next by showcasing this win on your beautiful landing page. Because again, you're going to impose their complexity on all the other companies that could have been interested in your services. I'm obviously painting a dark picture. And there are companies that specialized FROM DAY 1 in large corporations, that tailored their commercial offering taking into account all the associated costs. Deployments are priced at 100k, contracts impose minimum usage, everything was framed from the start because the strategy was always to expand exclusively here. But for all the companies that think "just" doing a BigB to get a validation badge, but who actually target the entire SMB market and are looking for volume. It's rarely a good plan. At the beginning I said: "this curse (a French one?)". Why do I say it's a great French curse? Actually it's probably a magnifying glass effect and I'd certainly see the same thing in every country. But every year, I see companies that die after quarters of waiting for that famous contract with a large corporation (just yesterday I was talking to someone who told me the exact same story). So I think there's something a bit different about us. We like to be different. Partly, I get the sense it's related to the size of our SMB market which is less important than in Germany (the German Mittelstand seems bigger). We go faster from SMB to large corporations. Obviously, then, in terms of credibility, it's easier to sell a product once you have the logo of a large corporation than a bunch of logos of unknown companies. What's certain is that culturally, there's the CAC 40 and everything else. The CAC 40 has been basically the same companies for 30 or 40 years. By contrast, look at the S&P 500, in 1990 it was Exxon, GE, Philip Morris, IBM. They've all given way to Apple, Nvidia, Amazon, Google. In France, the large corporations in the CAC are structurally stable and dominant, which makes them all the more attractive as clients for startups. They have budgets, longevity, legitimacy. But these same large corporations aren't springboards to a global market — they're markets closed in on themselves. And conversely the SMB market can work. If I look at Pennylane, Qonto, Indy, Payfit, Spendesk, Livestorm, it's precisely by targeting this market that they've managed to go far. By contrast, I have real questions about the strategy of a company like Mistral which seems to position itself only on large corporations (on-premise deployment, Azure partnerships, etc.) and seems to be neglecting the mass market. I hope it won't be the future DailyMotion, which favored big media and telecom operators while missing the opportunity to become the B2C media platform that YouTube managed to become. You'll have gathered, if you're starting a company today, I'd tend to advise you to not see "B2B" as a single big playground. I'd tend to tell you to avoid B2BigB which is often destructive for startups and often ends up leading to a dead end. It's still possible, but you need to be armed for it. And if that's your choice, I'll only say one thing. Good luck :) Targeting large corporations (and the public sector) obviously gives you access to larger markets. But I'd tend to recommend tackling that step later, when the company is already solid. When DJI (Chinese drones) attacked the professional market, they already had a huge foothold in the B2C market. They came with an expertise and know-how that allowed them to be sovereign over their decisions. Now if you're tempted anyway, the recipe for having a chance is above all a question of seniority of leadership: you need to know how to say no firmly, you need to stop chasing every rabbit that passes by when you see a so-called "low hanging fruit", the expression that has replaced "quick win" as one of my most hated expressions. There's no such thing as effortless gain. Everything has a cost, even when it's hidden. And you need a good financial and reputational foundation to impose these conditions, hence the advice to already have a good base on the other segments. It's easier to say no when a client represents 2% than when they represent 20%. One strategy I've seen work several times is to create software with great UX, get adopted by the teams, then go see the purchasing departments of the companies in question and put the usage figures under their nose: "See, you already have 300 people using it, wouldn't you like to set up a framework contract and better understand usage at your company?" That's interesting because you've created a product whose adoption happened from the teams, you didn't modify your roadmap, and you're in a strong position with procurement to improve your presence without being pressured on everything else. In short, make a good product, track usage, wait until you have enough footprint, and then go negotiate. Anthropic (Claude Code) by first targeting individual developers (indie hackers, side projects) and small teams was pushed to constantly improve its product which became number 1 in its category (at the time of writing, this passage might age poorly :)). Today, they're selling enterprise licenses. Good companies are able to do volume and then move up the chain, small companies then large companies. I've rarely (never?) seen the reverse. When you do large corporations, you don't know how to come back to the rest of the segments. B2C (Business to Consumer), that's the general public. B2B (Business to Business), that's selling to companies.

0 views
<antirez> 5 days ago

Implementing a clear room Z80 / ZX Spectrum emulator with Claude Code

Anthropic recently released a blog post with the description of an experiment in which the last version of Opus, the 4.6, was instructed to write a C compiler in Rust, in a “clean room” setup. The experiment methodology left me dubious about the kind of point they wanted to make. Why not provide the agent with the ISA documentation? Why Rust? Writing a C compiler is exactly a giant graph manipulation exercise: the kind of program that is harder to write in Rust. Also, in a clean room experiment, the agent should have access to all the information about well established computer science progresses related to optimizing compilers: there are a number of papers that could be easily synthesized in a number of markdown files. SSA, register allocation, instructions selection and scheduling. Those things needed to be researched *first*, as a prerequisite, and the implementation would still be “clean room”. Not allowing the agent to access the Internet, nor any other compiler source code, was certainly the right call. Less understandable is the almost-zero steering principle, but this is coherent with a certain kind of experiment, if the goal was showcasing the completely autonomous writing of a large project. Yet, we all know how this is not how coding agents are used in practice, most of the time. Who uses coding agents extensively knows very well how, even never touching the code, a few hits here and there completely changes the quality of the result. # The Z80 experiment I thought it was time to try a similar experiment myself, one that would take one or two hours at max, and that was compatible with my Claude Code Max plan: I decided to write a Z80 emulator, and then a ZX Spectrum emulator (and even more, a CP/M emulator, see later) in a condition that I believe makes a more sense as “clean room” setup. The result can be found here: https://github.com/antirez/ZOT. # The process I used 1. I wrote a markdown file with the specification of what I wanted to do. Just English, high level ideas about the scope of the Z80 emulator to implement. I said things like: it should execute a whole instruction at a time, not a single clock step, since this emulator must be runnable on things like an RP2350 or similarly limited hardware. The emulator should correctly track the clock cycles elapsed (and I specified we could use this feature later in order to implement the ZX Spectrum contention with ULA during memory accesses), provide memory access callbacks, and should emulate all the known official and unofficial instructions of the Z80. For the Spectrum implementation, performed as a successive step, I provided much more information in the markdown file, like, the kind of rendering I wanted in the RGB buffer, and how it needed to be optional so that embedded devices could render the scanlines directly as they transferred them to the ST77xx display (or similar), how it should be possible to interact with the I/O port to set the EAR bit to simulate cassette loading in a very authentic way, and many other desiderata I had about the emulator. This file also included the rules that the agent needed to follow, like: * Accessing the internet is prohibited, but you can use the specification and test vectors files I added inside ./z80-specs. * Code should be simple and clean, never over-complicate things. * Each solid progress should be committed in the git repository. * Before committing, you should test that what you produced is high quality and that it works. * Write a detailed test suite as you add more features. The test must be re-executed at every major change. * Code should be very well commented: things must be explained in terms that even people not well versed with certain Z80 or Spectrum internals details should understand. * Never stop for prompting, the user is away from the keyboard. * At the end of this file, create a work in progress log, where you note what you already did, what is missing. Always update this log. * Read this file again after each context compaction. 2. Then, I started a Claude Code session, and asked it to fetch all the useful documentation on the internet about the Z80 (later I did this for the Spectrum as well), and to extract only the useful factual information into markdown files. I also provided the binary files for the most ambitious test vectors for the Z80, the ZX Spectrum ROM, and a few other binaries that could be used to test if the emulator actually executed the code correctly. Once all this information was collected (it is part of the repository, so you can inspect what was produced) I completely removed the Claude Code session in order to make sure that no contamination with source code seen during the search was possible. 3. I started a new session, and asked it to check the specification markdown file, and to check all the documentation available, and start implementing the Z80 emulator. The rules were to never access the Internet for any reason (I supervised the agent while it was implementing the code, to make sure this didn’t happen), to never search the disk for similar source code, as this was a “clean room” implementation. 4. For the Z80 implementation, I did zero steering. For the Spectrum implementation I used extensive steering for implementing the TAP loading. More about my feedback to the agent later in this post. 5. As a final step, I copied the repository in /tmp, removed the “.git” repository files completely, started a new Claude Code (and Codex) session and claimed that the implementation was likely stolen or too strongly inspired from somebody else's work. The task was to check with all the major Z80 implementations if there was evidence of theft. The agents (both Codex and Claude Code), after extensive search, were not able to find any evidence of copyright issues. The only similar parts were about well established emulation patterns and things that are Z80 specific and can’t be made differently, the implementation looked distinct from all the other implementations in a significant way. # Results Claude Code worked for 20 or 30 minutes in total, and produced a Z80 emulator that was able to pass ZEXDOC and ZEXALL, in 1200 lines of very readable and well commented C code (1800 lines with comments and blank spaces). The agent was prompted zero times during the implementation, it acted absolutely alone. It never accessed the internet, and the process it used to implement the emulator was of continuous testing, interacting with the CP/M binaries implementing the ZEXDOC and ZEXALL, writing just the CP/M syscalls needed to produce the output on the screen. Multiple times it also used the Spectrum ROM and other binaries that were available, or binaries it created from scratch to see if the emulator was working correctly. In short: the implementation was performed in a very similar way to how a human programmer would do it, and not outputting a complete implementation from scratch “uncompressing” it from the weights. Instead, different classes of instructions were implemented incrementally, and there were bugs that were fixed via integration tests, debugging sessions, dumps, printf calls, and so forth. # Next step: the ZX Spectrum I repeated the process again. I instructed the documentation gathering session very accurately about the kind of details I wanted it to search on the internet, especially the ULA interactions with RAM access, the keyboard mapping, the I/O port, how the cassette tape worked and the kind of PWM encoding used, and how it was encoded into TAP or TZX files. As I said, this time the design notes were extensive since I wanted this emulator to be specifically designed for embedded systems, so only 48k emulation, optional framebuffer rendering, very little additional memory used (no big lookup tables for ULA/Z80 access contention), ROM not copied in the RAM to avoid using additional 16k of memory, but just referenced during the initialization (so we have just a copy in the executable), and so forth. The agent was able to create a very detailed documentation about the ZX Spectrum internals. I provided a few .z80 images of games, so that it could test the emulator in a real setup with real software. Again, I removed the session and started fresh. The agent started working and ended 10 minutes later, following a process that really fascinates me, and that probably you know very well: the fact is, you see the agent working using a number of diverse skills. It is expert in everything programming related, so as it was implementing the emulator, it could immediately write a detailed instrumentation code to “look” at what the Z80 was doing step by step, and how this changed the Spectrum emulation state. In this respect, I believe automatic programming to be already super-human, not in the sense it is currently capable of producing code that humans can’t produce, but in the concurrent usage of different programming languages, system programming techniques, DSP stuff, operating system tricks, math, and everything needed to reach the result in the most immediate way. When it was done, I asked it to write a simple SDL based integration example. The emulator was immediately able to run the Jetpac game without issues, with working sound, and very little CPU usage even on my slow Dell Linux machine (8% usage of a single core, including SDL rendering). Once the basic stuff was working, I wanted to load TAP files directly, simulating cassette loading. This was the first time the agent missed a few things, specifically about the timing the Spectrum loading routines expected, and here we are in the territory where LLMs start to perform less efficiently: they can’t easily run the SDL emulator and see the border changing as data is received and so forth. I asked Claude Code to do a refactoring so that zx_tick() could be called directly and was not part of zx_frame(), and to make zx_frame() a trivial wrapper. This way it was much simpler to sync EAR with what it expected, without callbacks or the wrong abstractions that it had implemented. After such change, a few minutes later the emulator could load a TAP file emulating the cassette without problems. This is how it works now: do { zx_set_ear(zx, tzx_update(&tape, zx->cpu.clocks)); } while (!zx_tick(zx, 0)); I continued prompting Claude Code in order to make the key bindings more useful and a few things more. # CP/M One thing that I found really interesting was the ability of the LLM to inspect the COM files for ZEXALL / ZEXCOM tests for the Z80, easily spot the CP/M syscalls that were used (a total of three), and implement them for the extended z80 test (executed by make fulltest). So, at this point, why not implement a full CP/M environment? Same process again, same good result in a matter of minutes. This time I interacted with it a bit more for the VT100 / ADM3 terminal escapes conversions, reported things not working in WordStar initially, and in a few minutes everything I tested was working well enough (but, there are fixes to do, like simulating a 2Mhz clock, right now it runs at full speed making CP/M games impossible to use). # What is the lesson here? The obvious lesson is: always provide your agents with design hints and extensive documentation about what they are going to do. Such documentation can be obtained by the agent itself. And, also, make sure the agent has a markdown file with the rules of how to perform the coding tasks, and a trace of what it is doing, that is updated and read again quite often. But those tricks, I believe, are quite clear to everybody that has worked extensively with automatic programming in the latest months. To think in terms of “what a human would need” is often the best bet, plus a few LLMs specific things, like the forgetting issue after context compaction, the continuous ability to verify it is on the right track, and so forth. Returning back to the Anthropic compiler attempt: one of the steps that the agent failed was the one that was more strongly related to the idea of memorization of what is in the pretraining set: the assembler. With extensive documentation, I can’t see any way Claude Code (and, even more, GPT5.3-codex, which is in my experience, for complex stuff, more capable) could fail at producing a working assembler, since it is quite a mechanical process. This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation. We mostly ask LLMs to create work that requires assembling different knowledge they possess, and the result is normally something that uses known techniques and patterns, but that is new code, not constituting a copy of some pre-existing code. It is worth noting, too, that humans often follow a less rigorous process compared to the clean room rules detailed in this blog post, that is: humans often download the code of different implementations related to what they are trying to accomplish, read them carefully, then try to avoid copying stuff verbatim but often times they take strong inspiration. This is a process that I find perfectly acceptable, but it is important to take in mind what happens in the reality of code written by humans. After all, information technology evolved so fast even thanks to this massive cross pollination effect. For all the above reasons, when I implement code using automatic programming, I don’t have problems releasing it MIT licensed, like I did with this Z80 project. In turn, this code base will constitute quality input for the next LLMs training, including open weights ones. # Next steps To make my experiment more compelling, one should try to implement a Z80 and ZX Spectrum emulator without providing any documentation to the agent, and then compare the result of the implementation. I didn’t find the time to do it, but it could be quite informative. Comments

0 views
Martin Fowler 5 days ago

Knowledge Priming

Rahul Garg has observed a frustration loop when working with AI coding assistants - lots of code generated, but needs lots of fixing. He's noticed five patterns that help improve the interaction with the LLM, and describes the first of these : priming the LLM with knowledge about the codebase and preferred coding patterns.

0 views

A 1.27 fJ/B/transition Digital Compute-in-Memory Architecture for Non-Deterministic Finite Automata Evaluation

A 1.27 fJ/B/transition Digital Compute-in-Memory Architecture for Non-Deterministic Finite Automata Evaluation Christian Lanius, Florian Freye, and Tobias Gemmeke GLVLSI'25 This paper ostensibly describes an ASIC accelerator for NFA evaluation (e.g., regex matching), but this paper also describes two orthogonal techniques for optimizing NFA evaluation which are applicable to more than just this ASIC. Any regular expression can be converted to a non-deterministic finite automaton (NFA) . Think of an NFA like a state machine where some inputs can trigger multiple transitions. The state machine is defined by a set of transitions . A transition is an ( , , ) tuple. The non-deterministic naming comes from the fact that multiple tuples may exist with identical ( , ) values; they only differ in their values. This means that an NFA can be in multiple states at once. One way to evaluate an NFA is to use a bitmap to track the set of active states. For each new input symbol, the set of active states in the bitmap is used to determine which transitions apply. Each activated transition sets one bit in the bitmap used to represent the active states for the next input symbol. The hardware described in this paper uses a compute-in-memory (CIM) microarchitecture. A set of columns stores the state machine, with each column storing one transition. This assumes that the transition function is sparse (i.e., the number of transitions used is much lower than the maximum possible). During initialization, the transitions are written into the CIM hardware. An input symbol is processed by broadcasting it and the current state bitmap to all columns. All columns evaluate whether their transition should be activated. The hardware then iterates (over multiple clock cycles) over all activated transitions and updates the state bitmap for the next input symbol. The left side of Fig. 5 illustrates the hardware in each column which compares the input symbol, current state, against the stored tuple: Source: https://dl.acm.org/doi/10.1145/3716368.3735157 The algorithm described above processes at most one input symbol per cycle (and it is slower for inputs that activate multiple transitions). The paper contains two tricks for overcoming this limitation. Fig. 4 illustrates how an NFA that accepts one symbol per cycle can be converted into an NFA which accepts two symbols per cycle. For example, rather than consider and to be separate symbols, put them together into one mega-symbol: . This is feasible as long as your NFA implementation isn’t too sensitive to the number of bits per symbol. Source: https://dl.acm.org/doi/10.1145/3716368.3735157 Cool Trick #2 - Bloom Filter The target application for this hardware is monitoring network traffic for threats (e.g., Snort ). A key observation is that most inputs (network packets) do not produce a match, so it is reasonable to assume that most of the time the NFA will be in the initial state, and most input symbols will not trigger any transitions. If that assumption holds, then a bloom filter can be used to quickly skip many input symbols before they even reach the core NFA evaluation hardware. The bloom filter is built when the NFA transition function changes. To build the bloom filter, iterate over each transition for which holds. For each such transition, compute a hash of the input symbol, decompose the hashed value into indices, and set the corresponding bits in the bloom filter. To test an input symbol against the bloom filter, hash the input symbol, decompose the hashed value into indices, and check to see if all of the corresponding bits are set in the bloom filter. If any bit is not set, then the input symbol does not trigger a transition from the initial state. When that symbol finally arrives at the NFA hardware, it can be dropped if the NFA is in the initial state. Table 1 compares PPA results against other published NFA accelerators. It is a bit apples-to-oranges as the various designs target different technology nodes. The metric that stands out is the low power consumption of this design. Source: https://dl.acm.org/doi/10.1145/3716368.3735157 Dangling Pointers I wonder if the bloom filter trick can be extended. For example, rather than assuming the NFA will always be in the initial state, the hardware could dynamically compute which states are the most frequent and then use bloom filters to drop input symbols which cannot trigger any transitions from those states. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts and support my work.

0 views
daniel.haxx.se 5 days ago

decomplexification continued

Last spring I wrote a blog post about our ongoing work in the background to gradually simplify the curl source code over time. This is a follow-up: a status update of what we have done since then and what comes next. In May 2025 I had just managed to get the worst function in curl down to complexity 100, and the average score of all curl production source code (179,000 lines of code) was at 20.8. We had 15 functions still scoring over 70. Almost ten months later we have reduced the most complex function in curl from 100 to 59. Meaning that we have simplified a vast number of functions. Done by splitting them up into smaller pieces and by refactoring logic. Reviewed by humans, verified by lots of test cases, checked by analyzers and fuzzers, The current 171,000 lines of code now has an average complexity of 15.9. The complexity score in this case is just the cold and raw metric reported by the pmccabe tool. I decided to use that as the absolute truth, even if of course a human could at times debate and argue about its claims. It makes it easier to just obey to the tool, and it is quite frankly doing a decent job at this so it’s not a problem. In almost all cases the main problem with complex functions is that they do a lot of things in a single function – too many – where the functionality performed could or should rather be split into several smaller sub functions. In almost every case it is also immediately obvious that when splitting a function into two, three or more sub functions with smaller and more specific scopes, the code gets easier to understand and each smaller function is subsequently easier to debug and improve. I don’t know how far we can take the simplification and what the ideal average complexity score of a the curl code base might be. At some point it becomes counter-effective and making functions even smaller then just makes it harder to follow code flows and absorbing the proper context into your head. To illustrate our simplification journey, I decided to render graphs with a date axle starting at 2022-01-01 and ending today. Slightly over four years, representing a little under 10,000 git commits. First, a look a the complexity of the worst scored function in curl production code over the last four years. Comparing with P90 and P99. The most complex function in curl over time Identifying the worst function might not say too much about the code in general, so another check is to see how the average complexity has changed. This is calculated like this: For all functions, add its function-score x function-length to a total complexity score, and in the end, divide that total complexity score on total number of lines used for all functions. Also do the same for a median score. Average and median complexity per source code line in curl, over time. When 2022 started, the average was about 46 and as can be seen, it has been dwindling ever since, with a few steep drops when we have merged dedicated improvement work. One way to complete the average and median lines to offer us a better picture of the state, is to investigate the complexity distribution through-out the source code. How big portion of the curl source code is how complex This reveals that the most complex quarter of the code in 2022 has since been simplified. Back then 25% of the code scored above 60, and now all of the code is below 60. It also shows that during 2025 we managed to clean up all the dark functions, meaning the end of 100+ complexity functions. Never to return, as the plan is at least. We don’t really know. We believe less complex code is generally good for security and code readability, but I it is probably still too early for us to be able to actually measure any particular positive outcome of this work (apart from fancy graphs). Also, there are many more ways to judge code than by this complexity score alone. Like having sensible APIs both internal and external and making sure that they are properly and correctly documented etc. The fact that they all interact together and they all keep changing, makes it really hard to isolate a single factor like complexity and say that changing this alone is what makes an impact. Additionally: maybe just the refactor itself and the attention to the functions when doing so either fix problems or introduce new problems, that is then not actually because of the change of complexity but just the mere result of eyes giving attention on that code and changing it right then. Maybe we just need to allow several more years to pass before any change from this can be measured? All functions get a complexity score by pmccabe Each function has a number of lines

0 views
Andre Garzia 5 days ago

Building your own blogging tools is a fun journey

# Building your own blogging tools is a fun journey I read a very interesting blog post today: ["So I've Been Thinking About Static Site Generators" by PolyWolf](https://wolfgirl.dev/blog/2026-02-23-so-ive-been-thinking-about-static-site-generators/) in which she goes in depth about her quest to create a 🚀BLAZING🔥 fast [static site generator](https://en.wikipedia.org/wiki/Static_site_generator). It was a very good read and I'm amazed at how fast she got things running. The [conversation about the post on Lobste.rs](https://lobste.rs/s/pgh4ss/so_i_ve_been_thinking_about_static_site) is also full of gems. Seeing so many people pouring energy into the specific problem of making SSGs very fast feels to me pretty much like modders getting the utmost performance out of their CPUs or car engines. It is fun to see how they are doing and how fast they can make clean and incremental builds go. > No one will ever complain about their SSG being too fast. As someone who used [a very slow SSG](https://docs.racket-lang.org/pollen/) for years and eventually migrated to [my own homegrown dynamic site](/2025/03/why-i-choose-lua-for-this-blog.html), I understand how frustrating slow site generation can be. In my own personal case, I decided to go with an old-school dynamic website using old 90s tech such as *cgi-bin* scripts in [Lua](https://lua.org). That eliminates the need for rebuilds of the site as it is generated at runtime. One criticism I keep hearing is about the scalability of my approach, people say: *"what if one of your posts go viral and the site crashes?"*, well, that is ok for me cause if I get a cold or flu I crash too, why would I demand of my site something I don't demand of myself? Jokes aside, the problem of scalability can be dealt with by having some heuristic figuring out when a post is getting hot and then generating a static version of that post while keeping posts that are not hot dynamic. I'm not worried about it. Instead of devoting my time to the engineering problem of making my SSG fast, I decided to put my energy elsewhere. A point that is often overlooked by many people developing blogging systems is the editing and posting workflow. They'll have really fast SSGs and then let the user figure out how to write the source files using whatever tool they want. Nothing wrong with that, but I want something better than launching $EDITOR to write my posts. In my case, what prevented me from posting more was not how long my SSG took to rebuild my site, but the friction between wanting to post and having the post written. What tools to use, how to handle file uploads, etc. So I begun to optmising and developing tools for helping me with that. First, I [made a simple posting interface](/2025/01/creating-a-simple-posting-interface.html). This is not a part of the blogging system, it is an independent tool that shares the code base with the rest of the blog (just so I have my own CGI routines available). Internally it uses [micropub](https://www.w3.org/TR/micropub/) to publish. After that, I made it into a Firefox Add-on. The add-on is built for ad-hoc distribution and not shared on the store, it is just for me. Once installed, I get a sidebar that allows me to edit or post. ![Editor](/2026/02/img/c6e6afba-3141-4eca-a0f4-3425f7bea0d8.png) This is part of making my web browser of choice not only a web browser but a web making tool. I'm integrating all I need to write posts into the browser itself and thus diminishing the distance between browsing the web and making the web. I added features to the add-on to help me quote posts, get addresses as I browse them, and edit my own posts. It is all there right in the browser. ![quoting a post](/2026/02/img/b6d2abc1-c4ae-42c5-931f-d390fc9b793f.png) Like PolyWolf, I am passionate about my tools and blogging. I think we should take upon ourselves to build the tools we need if they're not available already (or just for the fun of it). Even though I'm no longer in the SSG bandwagon anymore, I'm deeply interested in blogging and would like to see more people experimenting with building their own tools, especially if their focus is on interesting ux and writing workflows.

0 views
Rik Huijzer 5 days ago

Setup a Syncthing service on Debian

Install via the APT instructions. Next (source): ``` useradd -u 1010 -c "Syncthing Service" -d /var/syncthing -s /usr/sbin/nologin syncthing mkdir /var/syncthing chown -R syncthing:syncthing /var/syncthing chmod 700 /var/syncthing systemctl enable [email protected] systemctl start [email protected] systemctl status [email protected] ``` Then you should be able to connect to the web GUI at `localhost:8385`. To allow this user to read files outside it's own directories, use ``` getfacl /some/other/dir ``` from `acl` (`apt-get install acl`) to view the permission...

0 views