Latest Posts (16 found)
<antirez> 2 weeks ago

Scaling HNSWs

I’m taking a few weeks of pause on my HNSWs developments (now working on some other data structure, news soon). At this point, the new type I added to Redis is stable and complete enough, it’s the perfect moment to reason about what I learned about HNSWs, and turn it into a blog post. That kind of brain dump that was so common pre-AI era, and now has become, maybe, a bit more rare. Well, after almost one year of thinking and implementing HNSWs and vector similarity stuff, it is time for some writing. However this is not going to be an intro on HNSWs: too many are present already. This is the “extra mile” instead. If you know HNSWs, I want to share with you my more “advanced” findings, especially in the context of making them fast enough to allow for a “Redis” experience: you know, Redis is designed for low latency and high performance, and HNSWs are kinda resistant to that, so there were challenges to expose HNSWs as an abstract data structure. This blog post will be split into several sections. Think of them as pages of the same book, different chapters of the same experience. Oh and, by the way, I already wrote and subsequently lost this blog post :D [long, sad story about MacOS and bad habits – I hadn’t lost something like that since the 90s, during blackouts], so here most of the problem will be to recall what I wrote a few days ago and, while I’m at it, to better rephrase what I didn’t like very much. ## A few words about the state of HNSW Before digging into the HNSWs internals and optimizations, I want to say a few things about HNSWs. The original paper introducing HNSWs is a great piece of computer science literature, and HNSWs are amazing data structures, but: I don’t believe they are the last word for searching, in a greedy way, for nearby vectors according to a distance function. The paper gives the feeling it lacks some “pieces”, almost like if the researchers, given six months more, had a lot more to explore and say. For instance, I modified the paper myself, extending it in order to support removal of entries, actual removals, not just tombstone deletions where the element is marked as gone and collected later: deleting items is totally missing from the paper. Similarly, there are, right now, efforts in order to really check if the “H” in the HNSWs is really needed, and if instead a flat data structure with just one layer would perform more or less the same (I hope I’ll cover more about this in the future: my feeling is that the truth is in the middle, and that it makes sense to modify the level selection function to just have levels greater than a given threshold). All this to say that, if you are into data structures research, I believe that a great area is to imagine evolutions and refinements of HNSWs, without getting trapped within the idea that the evolutions are only in the sense of: let’s do it, but for disk (see Microsoft efforts), or the like. Ok, enough with the premise, let’s go to the actual low level stuff :) ## Scaling memory Redis is an in-memory system, and both HNSWs and vectors have the unfortunate quality of being very space-hungry. There are three reasons for this: 1. HNSWs have a lot of pointers, like 16, 32 or more pointers (this is a tunable parameter of HNSWs) to neighbor nodes. 2. HNSWs have many levels, being a skiplist-alike data structure. This exacerbates the first problem. 3. HNSW’s satellite data is a vector of floating point numbers, so, in the vanilla case, 4 bytes per component, and normally you can have 300-3000 components, this is the usual range. So, what are the lessons learned here? There are folks that compress pointers, since it is very likely that many pointers (8 bytes in 64 bit systems) will have the highest four bytes all the same. This is smart, I didn’t implement it yet, because in Redis I need to go fast, and this is a tradeoff between space and time: but maybe it is worth it, maybe not. I’ll dig more. However, if you do the math, the fact that there are many layers is not *so* terrible as it looks. On average, the multiple layers per node make the situation worse by just ~1.3x (if the probability of level increase is 0.25 in the level selection function), since many nodes will be just at layer 0. But still 1.3 is more than 1, and if that “H” in HNSWs really is not *so* useful… [Spoiler, what I found is that the seek time if you have everything at layer 0 is greater, the main loop for the greedy search will start from less optimal places and it will eventually reach the right cluster, but will take more computation time. However this is just early results.] So here the *real* low hanging fruit is: vector quantization. What I found is that if you use 8 bit quantization what you get is an almost 4x speedup, a 4x reduction of your vectors (but not a 4x reduction of the whole node: the pointers are still there, and they take a lot of space), and a recall that is virtually the same in real world use cases. This is the reason why Redis Vector Sets use 8 bit quantization by default. You can specify, via VADD options, that you want full precision vectors or binary quantized vectors, where we just take the sign, but I’m skeptical about using both full size vectors and binary quantized vectors. Before talking about them, let’s see what kind of quantization I used for 8 bit. What I do is to compute the maximum absolute value of the component of each vector (so quantization is per-vector), then I use signed 8 bit values to represent the quant from -127 to 127. This is not as good as storing both min and max value, but it is faster when computing cosine similarity, since I can do this: /* Each vector is quantized from [-max_abs, +max_abs] to [-127, 127] * where range = 2*max_abs. */ const float scale_product = (range_a/127) * (range_b/127); Then I multiply things together in the integer domain with (actually in the code the main loop is unrolled and uses multiple accumulators, to make modern CPUs more busy) for (; i And finally we can return back to the floating point distance with: float dotf = dot0 * scale_product; Check the vectors_distance_q8() for more information, but I believe you got the idea: it is very simple to go from the integer quants domain to the unquantized dotproduct with trivial operations. So, 8 bit quantization is a great deal, and full precision was a *needed* feature, because there will be people doing things with vectors generated in a way where each small amount makes a difference (no, with learned vectors this is not the case…) but, why binary quantization? Because I wanted users to have a simple way to not waste space when their *original* information is already binary. Imagine you have a set of users and they have yes/no properties, and you want to find similar users, items, whatever. Well: this is where binary quantization should be used, it’s just, again, an option of the VADD command. ## Scaling speed: threading and locality Oh, you know, I have to tell you something about myself: I’m not a fan of threaded systems when it is possible to do a lot with a single core, and then use multiple cores in a shared-nothing architecture. But HNSWs are different. They are *slow*, and they are accessed almost always in read-only ways, at least in most use cases. For this reason, my Vector Sets implementation is fully threaded. Not just reads, even writes are partially threaded, and you may wonder how this is possible without it resulting in a mess, especially in a system like Redis, where keys can be accessed in different ways by the background saving process, the clients, and so forth. Well, to start, let’s focus on reads. What happens is that as long as nobody is writing in the data structure, we can spawn threads that do the greedy collection of near vectors and return back the results to the blocked client. However, my implementation of HNSWs was written from scratch, I mean, from the empty C file opened with vim, it has 0% of shared code with the two implementations most other systems use, so there are a few “novelties”. One of such different things is that in order to avoid re-visiting already visited nodes, I use an integer stored in each node that is called “epoch”, instead of using another data structure to mark (like, in a hash table) nodes already visited. This is quite slow, I believe. The epoch instead is local to the node, and the global data structure increments the epoch for each search. So in the context of each search, we are sure that we can find epochs that are just But with threads, there are multiple searches occurring at the same time! And, yep, what I needed was an array of epochs: typedef struct hnswNode { uint32_t level; /* Node's maximum level */ … many other stuff … uint64_t visited_epoch[HNSW_MAX_THREADS]; } That’s what you can read in hnsw.h. This is, again, a space-time tradeoff, and again time won against space. So, how was it possible to have threaded writes? The trick is that in HNSW inserts, a lot of time is spent looking for neighbors candidates. So writes are split into a reading-half and commit-half, only the second needs a write lock, and there are a few tricks to make sure that the candidates we accumulated during the first part are discarded if the HNSW changed in the meantime, and some nodes may no longer be valid. There is, however, another problem. What about the user deleting the key, while background threads are working on the value? For this scenario, we have a function that waits for background operations to return before actually reclaiming the object. With these tricks, it is easy to get 50k ops/sec on real world vector workloads, and these are numbers I got from redis-benchmark itself, with all the overhead involved. The raw numbers of the flat HNSW library itself are much higher. ## Scaling memory: reclaiming it properly Before talking about how to scale HNSWs into big use cases with multiple instances involved, and why Redis Vector Sets expose the actual data structure in the face of the user (I believe programmers are smart and don’t need babysitting, but it’s not *just* that), I want to go back and talk again about memory, because there is an interesting story to tell about this specific aspect. Most HNSWs implementations are not able to reclaim memory directly when you delete a node from the graph. I believe there are two main reasons for that: 1. People misunderstand the original HNSW paper in a specific way: they believe links can be NOT reciprocal among neighbors. And there is a specific reason why they think so. 2. The paper does not say anything about deletion of nodes and how to fix the graph after nodes go away and we get missing links in the “web” of connections. The first problem is a combination (I believe) of lack of clarity in the paper and the fact that, while implementing HNSWs, people face a specific problem: when inserting a new node, and good neighbors are searched among existing nodes, often the candidates already have the maximum number of outgoing links. What to do, in this case? The issue is often resolved by linking unidirectionally from the new node we are inserting to the candidates that are already “full” of outgoing links. However, when you need to delete a node, you can no longer resolve all its incoming links, so you can’t really reclaim memory. You mark it as deleted with a flag, and later sometimes there is some rebuilding of the graph to “garbage collect” stale nodes, sometimes memory is just leaked. So, to start, my implementation in Redis does things differently by forcing links to be bidirectional. If A links to B, B links to A. But, how to do so, given that A may be busy? Well, this gets into complicated territory but what happens is that heuristics are used in order to drop links from existing nodes, with other neighbors that are well connected, and if our node is a better candidate even for the target node, and if this is not true there are other ways to force a new node to have at least a minimal number of links, always trying to satisfy the small world property of the graph. This way, when Redis deletes a node from a Vector Set, it always has a way to remove all the pointers to it. However, what to do with the remaining nodes that now are missing a link? What I do is to create a distance matrix among them, in order to try to link the old node neighbors among them, trying to minimize the average distance. Basically for each pair of i,j nodes in our matrix, we calculate how good is their connection (how similar their vectors are) and how badly linking them affects the *remaining* possible pairs (since there could be elements left without good pairs, if we link two specific nodes). After we build this matrix of scores, we then proceed with a greedy pairing step. This works so well that you can build a large HNSW with millions of elements, later delete 95% of all your elements, and the remaining graph still has good recall and no isolated nodes and so forth. That is what I mean when I say that there is space in HNSWs for new papers to continue the work. ## Scaling HNSWs to multiple processes When I started to work at Redis Vector Sets, there was already a vector similarity implementation in Redis-land, specifically as an index type of RediSearch, and this is how most people think at HNSWs: a form of indexing of existing data. Yet I wanted to provide Redis with a new HNSW implementation exposed in a completely different way. Guess how? As a data structure, of course. And this tells a story about how Redis-shaped is my head after so many years, or maybe it was Redis-shaped since the start, and it is Redis that is shaped after my head, since I immediately envisioned how to design a Redis data structure that exposed HNSWs to the users, directly, and I was puzzled that the work with vectors in Redis was not performed exactly like that. At the same time, when I handed my design document to my colleagues at Redis, I can’t say that they immediately “saw” it as an obvious thing. My reasoning was: vectors are like scores in Redis Sorted Sets, except they are not scalar scores where you have a total order. Yet you can VADD, VREM, elements, and then you can call VSIM instead of ZRANGE in order to have *similar* elements. This made sense not just as an API, but I thought of HNSWs as strongly composable, and not linked to a specific use case (not specific to text embeddings, or image embeddings, or even *learned* embeddings necessarily). You do: VADD my_vector_set VALUES [… components …] my_element_string So whatever is in your components, Redis doesn't care, when you call VSIM it will report similar elements. But this also means that, if you have different vectors about the same use case split in different instances / keys, you can ask VSIM for the same query vector into all the instances, and add the WITHSCORES option (that returns the cosine distance) and merge the results client-side, and you have magically scaled your hundred of millions of vectors into multiple instances, splitting your dataset N times [One interesting thing about such a use case is that you can query the N instances in parallel using multiplexing, if your client library is smart enough]. Another very notable thing about HNSWs exposed in this raw way, is that you can finally scale writes very easily. Just hash your element modulo N, and target the resulting Redis key/instance. Multiple instances can absorb the (slow, but still fast for HNSW standards) writes at the same time, parallelizing an otherwise very slow process. This way of exposing HNSWs also scales down in a very significant way: sometimes you want an HNSW for each user / item / product / whatever you are working with. This is very hard to model if you have an index on top of something, but it is trivial if your HNSWs are data structures. You just can have a Vector Set key for each of your items, with just a handful of elements. And of course, like with any other Redis key, you can set an expiration time on the key, so that it will be removed automatically later. All this can be condensed into a rule that I believe should be more present in our industry: many programmers are smart, and if instead of creating a magic system they have no access to, you show them the data structure, the tradeoffs, they can build more things, and model their use cases in specific ways. And your system will be simpler, too. ## Scaling loading times If I don’t use threading, my HNSW library can add word2vec (300 components for each vector) into an HNSW at 5000 elements/second if I use a single thread, and can query the resulting HNSW at 90k queries per second. As you can see there is a large gap. This means that loading back an HNSW with many millions of elements from a Redis dump file into memory would take a lot of time. And this time would impact replication as well. Not great. But, this is true only if we add elements from the disk to the memory in the most trivial way, that is storing “element,vector” on disk and then trying to rebuild the HNSW in memory. There is another lesson to learn here. When you use HNSWs, you need to serialize the nodes and the neighbors as they are, so you can rebuild everything in memory just allocating stuff and turning neighbors IDs into pointers. This resulted in a 100x speedup. But do you really believe the story ends here? Hehe. Recently Redis has stronger security features and avoids doing bad things even when the RDB file is corrupted by an attacker. So what I needed to do was to make sure the HNSW is valid after loading, regardless of the errors and corruption in the serialized data structure. This involved many tricks, but I want to take the freedom to just dump one comment I wrote here, as I believe the reciprocal check is particularly cool: /* Second pass: fix pointers of all the neighbors links. * As we scan and fix the links, we also compute the accumulator * register "reciprocal", that is used in order to guarantee that all * the links are reciprocal. * * This is how it works, we hash (using a strong hash function) the * following key for each link that we see from A to B (or vice versa): * * hash(salt || A || B || link-level) * * We always sort A and B, so the same link from A to B and from B to A * will hash the same. Then we xor the result into the 128 bit accumulator. * If each link has its own backlink, the accumulator is guaranteed to * be zero at the end. * * Collisions are extremely unlikely to happen, and an external attacker * can't easily control the hash function output, since the salt is * unknown, and also there would be to control the pointers. * * This algorithm is O(1) for each node so it is basically free for * us, as we scan the list of nodes, and runs on constant and very * small memory. */ ## Scaling use cases: JSON filters I remember the day when the first working implementation of Vector Sets felt complete. Everything worked as expected and it was the starting point to start with the refinements and the extra features. However in the past weeks and months I internally received the feedback that most use cases need some form of mixed search: you want near vectors to a given query vector (like most similar movies to something) but also with some kind of filtering (only released between 2000 and 2010). My feeling is that you need to query for different parameters less often than product people believe, and that most of the time you can obtain this more efficiently by adding, in this specific case, each year to a different vector set key (this is another instance of the composability of HNSWs expressed as data structures versus a kind of index). However I was thinking about the main loop of the HNSW greedy search, that is something like this: // Simplified HNSW greedy search algorithm. Don’t trust it too much. while(candidates.len() > 0) { c = candidates.pop_nearest(query); worst_distance = results.get_worst_dist(query); if (distance(query,c) > worst_distance) break; foreach (neighbor from c) { if (neighbor.already_visited()) continue; neighbor.mark_as_visited(); if (results.has_space() OR neighbor.distance(query) candidates.add(neighbor); results.add(neighbor); } } } return results; So I started to play with the idea of adding a JSON set of metadata for each node. What if, once I have things like {“year”: 1999}, this was enough to filter while I perform the greedy search? Sure, the search needed to be bound, but there is a key insight here: I want, to start, elements that are *near* to the query vector, so I don’t really need to explore the whole graph if the condition on the JSON attributes is not satisfied by many nodes. I’ll let the user specify the effort, and anyway very far away results that match the filter are useless. So that’s yet another way how my HNSW differs: it supports filtering by expressions similar to the ones you could write inside an “if” statement of a programming language. And your elements in the Vector Set can be associated with JSON blobs, expressing their properties. Then you can do things like: VSIM movies VALUES … your vector components here… FILTER '.year >= 1980 and .year ## A few words on memory usage HNSW’s fatal issue is — in theory — that they are normally served from memory. Actually, you can implement HNSWs on disk, even if there are better data structures from the point of view of disk access latencies. However, in the specific case of Redis and Vector Sets the idea is to provide something that is very fast, easy to work with: the flexibility of in-memory data structures help with that. So the question boils down to: is the memory usage really so bad? Loading the 3 million Word2Vec entries into Redis with the default int8 quantization takes 3GB of RAM, 1kb for each entry. Many use cases have just a few tens of million of entries, or a lot less. And what you get back from HNSWs, if well implemented, and in memory, is very good performance, which is crucial in a data structure and in a workload that is in itself slow by definition. In my MacBook I get 48k ops per second with redis-benchmark and VSIM against this key (holding the word2vec dataset). My feeling is that the memory usage of in-memory HNSWs is very acceptable for many use cases. And even in the use cases where you want the bulk of your vectors on disk, even if there is to pay for slower performance, your hot set should likely be served from RAM. This is one of the reasons why I believe that, to be active in HNSW research is a good idea: I don’t think they will be replaced anytime soon for most use cases. It seems more likely that we will continue to have different data structures that are ideal for RAM and for disk depending on the use cases and data size. Moreover, what I saw recently, even just scanning the Hacker News front page, is people with a few millions of items fighting with systems that are slower or more complicated than needed. HNSWs and carefully exposing them in the right way can avoid all that. ## Conclusions I like HNSWs, and working and implementing them was a real pleasure. I believe vectors are a great fit for Redis, even in an AI-less world (for instance, a few months ago I used them in order to fingerprint Hacker News users, replicating an old work published on HN in the past). HNSWs are simply too cool and powerful for a number of use cases, and with AI, and learned embeddings, all this escalates to a myriad of potential use cases. However, like most features in Redis, I expect that a lot of time will pass before people realize they are useful and powerful and how to use them (no, it’s not just a matter of RAG). This happened also with Streams: finally there is mass adoption, after so many years. If instead you are more interested in HNSW and the implementation I wrote, I believe the code is quite accessible, and heavily commented: https://github.com/redis/redis/blob/unstable/modules/vector-sets/hnsw.c If you want to learn more about Redis Vector Sets, please feel free to read the README file I wrote myself. There is also the official Redis documentation, but I suggest you start from here: https://github.com/redis/redis/tree/unstable/modules/vector-sets Thanks for reading such a long blog post! And have a nice day. References. This is the paper about the "H" in HNSW and how useful it is -> https://arxiv.org/abs/2412.01940 Comments

0 views
<antirez> 3 months ago

AI is different

Regardless of their flaws, AI systems continue to impress with their ability to replicate certain human skills. Even if imperfect, such systems were a few years ago science fiction. It was not even clear that we were so near to create machines that could understand the human language, write programs, and find bugs in a complex code base: bugs that escaped the code review of a competent programmer. Since LLMs and in general deep models are poorly understood, and even the most prominent experts in the field failed miserably again and again to modulate the expectations (with incredible errors on both sides: of reducing or magnifying what was near to come), it is hard to tell what will come next. But even before the Transformer architecture, we were seeing incredible progress for many years, and so far there is no clear sign that the future will not hold more. After all, a plateau of the current systems is possible and very credible, but it would likely stimulate, at this point, massive research efforts in the next step of architectures. However, if AI avoids plateauing long enough to become significantly more useful and independent of humans, this revolution is going to be very unlike the past ones. Yet the economic markets are reacting as if they were governed by stochastic parrots. Their pattern matching wants that previous technologies booms created more business opportunities, so investors are polarized to think the same will happen with AI. But this is not the only possible outcome. We are not there, yet, but if AI could replace a sizable amount of workers, the economic system will be put to a very hard test. Moreover, companies could be less willing to pay for services that their internal AIs can handle or build from scratch

1 views
<antirez> 4 months ago

Coding with LLMs in the summer of 2025 (an update)

Frontier LLMs such as Gemini 2. 5 PRO, with their vast understanding of many topics and their ability to grasp thousands of lines of code in a few seconds, are able to extend and amplify the programmer capabilities. If you are able to describe problems in a clear way and, if you are able to accept the back and forth needed in order to work with LLMs, you can reach incredible results such as: 1. Eliminating bugs you introduced in your code before it ever hits any user: I experienced this with Vector Sets implementation of Redis. I would end eliminating all the bugs eventually, but many were just removed immediately by Gemini / Claude code reviews. 2. Explore faster how a given idea could work, by letting the LLM write the throw away code to test ASAP in order to see if a given solution is actually more performant, if it is good enough, and so forth. 3. Engage in pair-design activities where your instinct, experience, design taste can be mixed with the PhD-level knowledge encoded inside the LLM. In this activity, the LLM will sometimes propose stupid paths, other times incredibly bright ideas: you, the human, are there in order to escape local minimal and mistakes, and exploit the fact your digital friend knows of certain and various things more than any human can. 4. Accelerate your work by writing part of the code under your clear specifications. 5. Work with technologies far from your expertise but contiguous with what you can do (for instance: coding in 68000 assembly for an Amiga demo. ) using LLMs as an extension of specific parts of your mind, for the knowledge you don't have. One and half years ago I wrote a blog post called “LLMs and programming in the first days of 2024”. There, I found LLMs to be already useful, but during these 1. 5 years, the progresses they made completely changed the game. However, in order to leverage their capabilities, humans interacting with LLMs must have certain qualities and follow certain practices

0 views
<antirez> 6 months ago

Human coders are still better than LLMs

This is a short story of how humans are still so much more capable of LLMs. Note that I'm not anti-AI or alike, you know it if you know me / follow me somewhere. I use LLMs routinely, like I did today, when I want to test my ideas, for code reviews, to understand if there are better approaches than what I had in mind, to explore stuff at the limit of my expertise, and so forth (I wrote a blog post about coding with LLMs almost two years, when it was not exactly cool: I was already using LLMs for coding and never stopped, I'll have to write an update, but that's not the topic of this post). But, still: the current level of AI is useful, great too, but so incredibly behind human intelligence, and I want to remark this as lately it is impossible to have balanced conversations. So, today I was working to Vector Sets for Redis, to fix a complicated bug: during the time I stopped working at Redis my colleagues introduced resistance against corruption RDB and RESTORE payloads, even when the checksum of the data passes. This feature is disabled by default, but provides an enhanced layer of safety for people wanting it. But… there is a but as big as an elephant: In order to make HNSWs fast to save into Redis RDBs and to load back, I serialized the *graph* representation, and not the element-vector pairs, otherwise I would have to re-insert back data into HNSWs, and that would be, like, 100 times slower (. ). So I store all the links the nodes have with other nodes, as integers, and then I resolve them into pointers, it’s a nice trick and works great. But if you mix this and random corruptions of the representation, and the fact that my own twist on HNSWs enforce reciprocal links between nodes (I wrote my own implementation of HNSWs with many useful features, but reciprocal links are needed to enable many of them) then this could happen: 1. We load corrupted data that says A links to B, but B no longer links to A (corrupted node IDs). 2

0 views
<antirez> 7 months ago

What I learned during the license switch

Yesterday, it was a very intense day. In Italy it was 1st of May, the workers holiday, so in the morning I went for a 4h walk in the Etna with friends Then at 6PM I was at home to release my blog post about the AGPL license switch, and I started following the comments, feedbacks, private messages, and I learned a few things in the process. 1. Regardless of the different few clauses, that IMHO make a difference, the AGPL vs SSPL main difference is that AGPL is "understood". In general, yesterday for the first time I realized that in licensing there is not just what you can do and can't do, but the degree a given license is understood, tested, adopted, . 2. I was very touched by the words of Simon Willison on the matter (https://simonwillison. net/2025/May/1/redis-is-open-source-again/) because it is very peculiar that different persons, living in different parts of the world, but with a similar age and background in software, feel *so similar* about things. I, too, when was writing Vector Sets, was thinking: I would never use it if it wasn't going to be released under the AGPL (or other open source license I understand). This sentiment, multiplied by a non trivial fraction of the community, makes open source eventually win even in the complex software landscape that there is today. 3. People still care a lot about software distributions. Not that I didn't care, but in the past I burned my fingers with it. I was a very initial Linux user, with SlackWare 3. 1 or something like that. During the years I wrote my device drivers, contributed a few patches to the kernel, during the years Debian had maybe ~10 packages of stuff that I wrote, from hping, to the Visitors web log analyzer, dump1090, Redis, and a few more

0 views
<antirez> 7 months ago

Redis is open source again

Five months ago, I rejoined Redis and quickly started to talk with my colleagues about a possible switch to the AGPL license, only to discover that there was already an ongoing discussion, a very old one, too. Many people, within the company, had the feeling that the AGPL was a better pick than SSPL, and while eventually Redis switched to the SSPL license, the internal discussion continued. I tried to give more strength to the ongoing pro-AGPL license side. My feeling was that the SSPL, in practical terms, failed to be accepted by the community. The OSI wouldn’t accept it, nor would the software community regard the SSPL as an open license. In little time, I saw the hypothesis getting more and more traction, at all levels within the company hierarchy. I’ll be honest: I truly wanted the code I wrote for the new Vector Sets data type to be released under an open source license. Writing open source software is too rooted in me: I rarely wrote anything else in my career. I’m too old to start now. This may be childish, but I wrote Vector Sets with a huge amount of enthusiasm exactly because I knew Redis (and my new work) was going to be open source again. I understand that the core of our work is to improve Redis, to continue building a good system, useful, simple, able to change with the requirements of the software stack. Yet, returning back to an open source license is the basis for such efforts to be coherent with the Redis project, to be accepted by the user base, and to contribute to a human collective effort that is larger than any single company. So, honestly, while I can’t take credit for the license switch, I hope I contributed a little bit to it, because today I’m happy. I’m happy that Redis is open source software again, under the terms of the AGPLv3 license

0 views
<antirez> 7 months ago

Reproducing Hacker News writing style fingerprinting

} At this point, the final script, insert. py, could do all the real work: to apply the Borrows method for each user, create the user style vector, and insert it into Redis. The advantage of pre-processing the files (a slow operation) is that the insertion script could be called more easily with different parameters (especially the number of top words to use) in order to see the different results more promptly, without the need to re-process the Parquet files each time. # How the Burrow method works. In the original post, Christopher wrote that you just need to normalize the frequency of the words usage and apply cosine similarity. Actually the process is a bit more involved. First, let’s ask ourselves, how this method actually works, in its essence. Well, it wants to capture words that each specific user over-uses or under-uses compared to the expected “average” language. To do so, we actually use the following steps (from the Python code). That’s what we do for each of the top words: # Convert to relative frequency rel_freq = frequency / total_words # Standardize using z-score: z = (freq - mean) / stddev mean = word_means. get(word, 0. 0) stddev = word_stddevs. get(word, 1. 0) # Default to 1. 0 to avoid division by zero z_score = (rel_freq - mean) / stddev # Set the z-score directly in the vector at the word's index vector[word_to_index[word]] = z_score So we start by “centering” the frequency the user used a given word, by subtracting the *global* usage frequency for that word. This way, we have a number that describes how much the user under (negative) or over (positive) used such word. But, if you think at it, words that have a much higher variance among usage of different writers are less important, when they change. We want to amplify the signal of words that are under of over used by this user in a much greater way compared to the normal variance of the word

0 views
<antirez> 8 months ago

Vector Sets are part of Redis

So basically with VSETATTR / VGETATTR (and equivalent option to set the JSON attribute directly when adding the item in VADD) you can associate a string to the items you want. Then you can do things like that: > VSIM word_embeddings_int8 ele "banana" FILTER ".len == 3" 1) "yam" 2) "pea" 3) "fig" 4) "rum" 5) "ube" 6) "oat" 7) "nut" 8) "gum" 9) "soy" 10) "pua" The filter expression is not a programming language, is what you could write inside the if() statement of high level programming languages, with &&, ||, all the obvious operators and so forth (but I bet we will add a few more). Well, the details are into the doc, there are also memory usage examples, in depth discussions about specific features, and so forth. I will extend the documentation soon, I hope. For now I really (really!) hope you’ll enjoy Vector Sets. Please, ping me if you find bugs :) Comments

0 views
<antirez> 8 months ago

AI is useless, but it is our best bet for the future

I used AI with success 5 minutes ago. Just five minutes ago, I was writing a piece of software and relied on AI for assistance. Yet, here I am, starting this blog post by telling you that artificial intelligence, so far, has proven somewhat useless. How can I make such a statement if AI was just so helpful a moment ago. Actually, there's no contradiction here if we clarify exactly what we mean. Here’s the thing: at this very moment, artificial intelligence can support me significantly. If I'm struggling with complicated code or need to understand an advanced scientific paper on math, I can turn to AI for clarity. It can help me generate an image for a project, make a translation, clean my YouTube transcript. Clearly, it’s practical and beneficial in these everyday tasks. However, except for rare, groundbreaking examples like AlphaFold — Google's AI that significantly advanced our understanding of protein folding — AI has yet to genuinely push forward human knowledge in a fundamental way. Aside from these few exceptional results, AI hasn’t (obviously) yet matched the capabilities of the very best human minds. If an AI system were at the same level as the brightest humans (and not better than that: it's not needed for a first humanity jump) we could deploy millions of such systems to accelerate research dramatically, transforming progress expected to take centuries into developments happening within decades, or decades into years. Yet, if artificial intelligence remains stuck at its current level of development indefinitely (even if with small incremental improvements, enough to fire many translators, programmers, drivers, actors, . ), perhaps it might have been better not to have it at all. I mentioned this during a conference here in Sicily. The thought hadn't crossed my mind until I was asked on stage

0 views
<antirez> 8 months ago

Big LLMs weights are a piece of history

By multiple accounts, the web is losing pieces: every year a fraction of old web pages disappear, lost forever. We should regard the Internet Archive as one of the most valuable pieces of modern history; instead, many companies and entities make the chances of the Archive to survive, and accumulate what otherwise will be lost, harder and harder. I understand that the Archive headquarters are located in what used to be a church: well, there is no better way to think of it than as a sacred place. Imagine the long hours spent by old programmers hacking with the Z80 assembly on their Spectrums. All the discussions about the first generation of the Internet. The subcultures that appeared during the 90s. All things that are getting lost, piece by piece. And what about the personal blogs. Pieces of life of single individuals that dumped part of their consciousness on the Internet. Scientific papers and processes that are lost forever as publishers fail, their websites shut down. Early digital art, video games, climate data once published on the Internet and now lost, and many sources of news, as well. This is a known issue and I believe that the obvious approach of trying to preserve everything is going to fail, for practical reasons: a lot of efforts for zero economic gains: the current version of the world is not exactly the best place to make efforts that cost a lot of money and don't pay money. This is why I believe that the LLMs' ability to compress information, even if imprecise, hallucinated, lacking, is better than nothing. DeepSeek V3 is already an available, public lossy compressed view of the Internet, as other very large state of-art models are. This will not bring back all the things we are losing, and we should try hard supporting The Internet Archive and other similar institutions and efforts

0 views
<antirez> 9 months ago

Reasoning models are just LLMs

It’s not new, but it’s accelerating. People that used to say that LLMs were a fundamentally flawed way to reach any useful reasoning and, in general, to develop any useful tool with some degree of generality, are starting to shuffle the deck, in the hope to look less wrong. They say: “the progresses we are seeing are due to the fact that models like OpenAI o1 or DeepSeek R1 are not just LLMs”. This is false, and it is important to show their mystification as soon as possible. First, DeepSeek R1 (don’t want to talk about o1 / o3, since it’s a private thing we don’t have access to, but it’s very likely the same) is a pure decoder only autoregressive model. It’s the same next token prediction that was so strongly criticized. There isn’t, in any place of the model, any explicit symbolic reasoning or representation. Moreover, R1 Zero has similar reasoning capabilities of R1 without requiring *any* supervised fine tuning, just generating chain of thoughts, and improving it with a reward function, using reinforcement learning, was enough to learn a stronger form of reasoning. Interestingly enough, part of these capabilities were easily distilled into smaller models via SFT, which brings me to the next point. The other fundamental observation is that the S1 paper shows that you need very few examples (as little as 1000) in order for the model to start being able to build complex reasoning steps and solve non trivial mathematical problems. S1, and R1 Zero, hint that in some way in the pre-training step the models already learned the representations needed in order to perform reasoning, just with the unsupervised next word prediction training target

0 views
<antirez> 9 months ago

We are destroying software

We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension. We are destroying software with complex build systems. We are destroying software with an absurd chain of dependencies, making everything bloated and fragile. We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels. We are destroying software by no longer caring about backward APIs compatibility. We are destroying software pushing for rewrites of things that work. We are destroying software by jumping on every new language, paradigm, and framework. We are destroying software by always underestimating how hard it is to work with existing complex libraries VS creating our stuff. We are destroying software by always thinking that the de-facto standard for XYZ is better than what we can do, tailored specifically for our use case. We are destroying software claiming that code comments are useless. We are destroying software mistaking it for a purely engineering discipline. We are destroying software by making systems that no longer scale down: simple things should be simple to accomplish, in any system. We are destroying software trying to produce code as fast as possible, not as well designed as possible. We are destroying software, and what will be left will no longer give us the joy of hacking. Comments

0 views
<antirez> 11 months ago

From where I left

I’m not the kind of person that develops a strong attachment to their own work. When I decided to leave Redis, about 1620 days ago (~ 4. 44 years), I never looked at the source code, commit messages, or anything related to Redis again. From time to time, when I needed Redis, I just downloaded it and compiled it. I just typed “make” and I was very happy to see that, after many years, building Redis was still so simple. My detachment was not the result of me hating my past work. While in the long run my creative work was less and less important and the “handling the project” activities became more and more substantial — a shift that many programmers are able to do, but that’s not my bread and butter — well, I still enjoyed doing Redis stuff when I left. However, I don’t share the vision that most people at my age (I’m 47 now) have: that they are still young. I wanted to do new stuff, especially writing. I wanted to stay more with my family and help my relatives. I definitely needed a break. However, during the “writing years” (I’m still writing, by the way), I often returned to coding, as a way to take breaks from intense writing sessions (writing is the only mental activity I found to be a great deal more taxing than coding): I did a few embedded projects; played more with neural networks; built Telegram bots: a bit of everything. Hacking randomly was cool but, in the long run, my feeling was that I was lacking a real purpose, and every day I started to feel a bigger urgency to be part of the tech world again. At the same time, I saw the Redis community fragmenting, something that was a bit concerning to me, even as an outsider. So I started to think that maybe, after all, I could have a role back in the Redis ecosystem. Perhaps I could be able to reshape the company's attitude towards the community. Maybe I could even help to take back the role of the Redis core as the primary focus of new developments

0 views
<antirez> 1 years ago

Playing audio files in a Pi Pico without a DAC

The Raspberry Pico is suddenly becoming my preferred chip for embedded development. It is well made, durable hardware, with a ton of features that appear designed with smartness and passion (the state machines driving the GPIOs are a killer feature. ). Its main weakness, the lack of connectivity, is now resolved by the W variant. The data sheet is excellent and documents every aspect of the chip. Moreover, it is well supported by MicroPython (which I’m using a lot), and the C SDK environment is decent, even if full of useless complexities like today fashion demands: a cmake build system that in turn generates a Makefile, files to define this and that (used libraries, debug outputs, …), and in general a huge overkill for the goal of compiling tiny programs for tiny devices. No, it’s worse than that: all this complexity to generate programs for a FIXED hardware with a fixed set of features (if not for the W / non-W variant). Enough with the rant about how much today software sucks, but it must be remembered. One of the cool things one wants to do with an MCU like that, is generating some sound. The most obvious way to do this is using the built-in PWM feature of the chip. The GPIOs can be configured to just alterante between zero and one at the desired frequency, like that: from machine import Pin, PWM pwm = PWM(Pin(1)) pwm. freq(400) pwm. duty_u16(1000) Assuming you connected a piezo to GND and pin 1 of your Pico, you will hear a square wave sound at 400hz of frequency. Now, there are little sounds as terrible to hear as square waves. Maybe we can do better. I’ll skip all the intermediate steps here, like producing a sin wave, and directly jump to playing a wav file. Once you see how to do that, you can easily generate your own other waves (sin, noise, envelops for such waveforms and so forth). Now you are likely asking yourself: how can I generate the complex wave forms to play a wav file, if the Pico can only switch the pin high or low

0 views
<antirez> 1 years ago

First Token Cutoff LLM sampling

From a theoretical standpoint, the best reply provided by an LLM is obtained by always picking the token associated with the highest probability. This approach makes the LLM output deterministic, which is not a good property for a number of applications. For this reason, in order to balance LLMs creativity while preserving adherence to the context, different sampling algorithms have been proposed in recent years. Today one of the most used ones, more or less the default, is called top-p: it is a form of nucleus sampling where top-scoring tokens are collected up to a total probability sum of “p”, then random weighted sampling is performed. In this blog post I’ll examine why I believe nucleus sampling may not be the best approach, and will show a simple and understandable alternative in order to avoid the issues of nucleus sampling. The algorithm is yet a work in progress, but by publishing it now I hope to stimulate some discussion / hacking. ## There is some gold in the logits Despite the fact that LLM logits are one of the few completely understandable parts of the LLM inner working, I generally see very little interest in studying their features, investigating more advanced sampling methods, detecting and signaling users uncertainty and likely hallucination. Visualizing the probabilities distribution for successive tokens is a simple and practical exercise in order to gain some insights: . ~. In the image we can see the top 32 candidate tokens colored by probability (white = 0, blue = 1), the selected token and the rank of the selected token (highest probability = 0, the previous one = 1, and so forth). In the above example, the Mistral base model knows the birth and death dates of Umberto Eco, so it confidently signals the most likely token with most of the total probability. Other times the model is more perplexed because either there are multiple ways to express the continuation of the text, or because it is not certain about certain facts

0 views
<antirez> 1 years ago

Translating blog posts with GPT-4, or: on hope and fear

My usual process for writing blog posts is more or less in two steps: 1. Think about what I want to say for weeks or months. No, I don’t spend weeks focusing on a blog post, the process is exactly reversed: I write blog posts about things that are so important to me to be in my mind for weeks. 2. Then, once enough ideas collapsed together in a decent form, I write the blog post in 30 minutes, often without caring much about the form, and I hit “publish”. This process usually works writing the titles of the sections as I initially just got the big picture of what I want to say, and then filling the empty paragraphs with text. Why I take step 2 so lightly. Because I got other stuff to do, and if blogging would take more than 30/60 minutes I would rather not blog at all, or blog less, or suffer doing it: all things I want to avoid at all costs. Blogging is too important to let it go. It’s better, for me, to give up on the form. At the same time, this is why many of my blog posts, regardless of the content that may be more or less informative, more or less useful, are generally badly written. I hope that the fact I can write well enough in my mother language in some way it is still visibile in my English posts, but I have the feeling that the extremely limited vocabulary I possess, the grammar errors, the sentence construction that oftentimes I just take from Italian and turn into English, all those limits irremediably damage the reading experience. This is why, for the first time, to write my blog post about LLMs and programming I tried a different approach: I wrote the post in Italian and I translated it to English using GPT-4

0 views