Posts in Hardware (20 found)
Chris Coyier Yesterday

Oregon Rocketry

My co-worker Robert is into model rocketry. I made a few rockets in my day, but the hobby stopped at Estes . I didn’t really realize people take rocketry much further until knowing Robert. His partner Michelle produced a short video piece for OPB on the community around it here. I’d embed the video here, but it looks like OPB hosts their own video and doesn’t offer an embeddable format. A move I think it probably pretty smart for an independent, nonprofit media organization these days.

0 views

JIT: so you want to be faster than an interpreter on modern CPUs…

Since my previous blog entry about JIT compiler for PostgreSQL, sadly not much happened due to a lack of time, but still some things were done (biggest improvement was the port to ARM64, a few optimizations, implementing more opcodes…). But I am often asking myself how to really beat the interpreter… And on “modern” CPUs, with a well written interpreter, that’s far more complicated than many would imagine. So in order to explain all this and show how I am planning to improve performance (possibly of the interpreter itself too, thus making this endeavor self-defeating), let’s first talk about… If you already know about all the topics mentioned in this title, feel free to jump to the next section . Note that the following section is over-simplified to make the concepts more accessible. I am writing this blog post on a Zen 2+ CPU. If I upgraded to a Zen 3 CPU, same motherboard, same memory, I would get an advertised 25% performance jump in single thread benchmarks while the CPU frequency would be only 2% higher. Why such a discrepancy? Since the 90s and the Pentium-class CPUs, x86 has followed RISC CPUs in the super-scalar era. Instead of running one instruction per cycle, when conditions are right, several instructions can be executed at the same time. Let’s consider the following pseudo-code: X and Y can be calculated at the same time. The CPU can execute these on two integer units, fetch the results and store them. The only issue is the computation of Z: everything must be done before this step, making it impossible for the CPU to go further without waiting for the previous results. But now, what if the code was written as follow: Every step would require waiting for the previous one, slowing down the CPU terribly. Hence the most important technique used to implement superscalar CPUs: out-of-order execution. The CPU will fetch the instructions, dispatch them in several instruction queues, and resolve dependencies to compute Y before computing Z1 in order to have it ready sooner. The CPU is spending less time idling, thus the whole thing is faster. But, alas, what would happen with the following function? Should the CPU wait for X and Y before deciding which Z to compute? Here is the biggest trick: it will try its luck and compute something anyway. This way, if its bet was right, a lot of time will be saved, otherwise the mistake result will be dropped and the proper computation will be done instead. This is called branch prediction, it has been the source of many fun security issues (hello meltdown), but the performance benefits are so huge that one would never consider disabling this. Most interpreters will operate on an intermediate representation, using opcodes instead of directly executing from an AST or similar. So you could use the following main loop for an interpreter. This is how many, many interpreters were written. But this has a terrible drawback at least when compiled that way: it has branches all over the place from a single starting point (most if not all optimizing compilers will generate a jump table to optimize the dispatch, but this will still jump from the same point). The CPU will have a hard time predicting the right jump, and is thus losing a lot of performance. If this was the only way an interpreter could be written, generating a function by stitching the code together would save a lot of time, likely giving a more than 10% performance improvement. If one look at Python, removing this switch made the interpreter 15 to 20% faster. Many project, including PostgreSQL, use this same technique, called “computed gotos”. After a first pass to fill in “label” targets in each step, the execution would be When running a short sequence of operations in a loop, the jumps will be far more predictable, making the branch predictor’s job easier, and thus improving the speed of the interpreter. Now that we have a very basic understanding of modern CPUs and the insane level of optimization they reach, let’s talk about fighting the PostgreSQL interpreter on performance. I will not discuss optimizing the tuple deforming part (aka. going from on-disc structure to the “C” structure used by the code), this will be a topic for a future blog post when I implement it in my compiler. As you may know, PostgreSQL has a very complete type system with operators overloading. Even this simple query ends up being a call to int4eq, a strict function that will perform the comparison. Since it is a strict function, PostgreSQL must check that the arguments are not null, otherwise the function is not called and the result will be null. If you execute a very basic query like the one in the title, PostgreSQL will have the following opcodes: The EEOP_FUNCEXPR_STRICT_2 will perform the null check, and then call the function. If we unroll all the opcodes in real C code, we end up with the following: We can already spot one optimization: why do we check the two arguments, including our constant, against null? It will never change for the entire run of this query and thus each comparison is going to use an ALU, and branch depending on that comparison. But of course the CPU will notice the corresponding branch pattern, and will thus be able to remain active and feed its other units. What is the real cost of such a pointless comparison? For this purpose, I’ve broken a PostgreSQL instance and replaced all FUNCEXPR_STRICT with a check on one argument only, and one with no STRICT check (do not try this at home!). Doing 10 times a simple SELECT * FROM demo WHERE a = 42 on a 100 million rows table, with no index, here are the two perf results: So, even if this is not the optimization of the century, it’s not that expensive to make, so… why not do it? (Patch coming to pgsql-hackers soon) But a better optimization is to go all-in on inlining. Indeed, instead of jumping through a pointer to the int4eq code (again, something that the CPU will optimize a lot), one could have a special opcode for this quite common operation. With this change alone (but keeping the two null checks, so there are still optimizations possible), we end up with the following perf results. Let’s sum up these results. The biggest change comes, quite obviously, from inlining the int4eq call. Why is it that much better? Because it reduces by quite a lot the number of instructions to run, and it removes a call to an address stored in memory. And this is again an optimization I could do on my JIT compiler that can also be done on the interpreter with the same benefits. The biggest issue here is that you must keep the number of opcodes within (unspecified) limits: too many opcodes could make the compiler job far worse. Well. At first, I thought the elimination of null checks could not be implemented easily in the interpreter. The first draft in my compiler was certainly invalid, but gave me interesting numbers (around 5%, as seen above) and made me want to go ahead. And I realized that implementing it cleanly in the interpreter was far easier than implementing it in my JIT compiler … Then I went with optimizing another common case, the call to int4eq, and, well… One could also add an opcode for that in the interpreter, and thus the performance gain of the JIT compiler are going to be minimal compared to the interpreter. Modern CPUs don’t make my job easy here. Most of the cost of an interpreter is taken away by the branch predictor and the other optimizations implemented in silicon. So is all hope lost, am I to declare the interpreter the winner against the limitations of the copy-patch method I have available for my JIT? Of course not, see you in the next post to discuss the biggest interpreter bottleneck! PS: help welcome. Last year I managed to spend some time working on this during my work time. Since then I’ve changed job, and can hardly get some time on this. I also tried to get some sponsoring to work on this and present at future PostgreSQL conferences, to no luck :/ If you can help in any way on this project, feel free to reach me (code contribution, sponsoring, missions, job offers, nudge nudge wink wink). Since I’ve been alone on this, a lot of things are dibbles on scratch paper, I benchmark code and stuff in my head when life gives me some boring time but testing it for real is of course far better. I have some travel planned soon so I hope for next part to be released before next year, with interesting results since my experiences have been as successful as anticipated.

0 views

Extended User Interrupts (xUI): Fast and Flexible Notification without Polling

Extended User Interrupts (xUI): Fast and Flexible Notification without Polling Berk Aydogmus, Linsong Guo, Danial Zuberi, Tal Garfinkel, Dean Tullsen, Amy Ousterhout, and Kazem Taram ASPLOS'25 This paper describes existing hardware support for userspace interrupts , and extensions to make it more efficient. The kernel is powerful, and yet slow. You only want to call on it when necessary. Kernel bypass technologies like DPDK and io_uring exist to allow applications to reduce the frequency of kernel calls. In addition to I/O, applications frequently call the kernel to communicate between threads. For example, in a producer/consumer design, the producer could use a signal to tell the consumer that more data is ready. Section 2 of the paper reminds readers that each of these signals costs about 2.4 microseconds. The idea behind userspace interrupts is to get the kernel out of the way and instead have dedicated hardware to support cheap signaling between threads. UIPI is a hardware feature introduced by Intel with Sapphire Rapids. Section 3 of the paper describes how UIPI works, including reverse engineering some of the Sapphire Rapids microarchitecture. Like other kernel bypass technologies, the kernel is still heavily involved in the control path . When a process requests that the kernel configure UIPI, the kernel responds by creating or modifying the user interrupt target table (UITT) for the process. This per-process table has one entry per thread. The kernel thread scheduler updates this table so that the UIPI hardware can determine which core a thread is currently running on. Once the control path is setup, the data path runs without kernel involvement. Userspace code which wants to send an interrupt to another thread can execute the instruction. This instruction has one operand, which is an index into the UITT (the index of the destination thread). The hardware then consults the UITT and sends an inter-processor interrupt (IPI) to the core on which the destination thread is running. Userspace code in the destination thread then jumps to a pre-registered interrupt handler, which runs arbitrary code in userspace. Hardware has long supported IPIs, but typically only the kernel has had the ability to invoke them. The hardware has the ability to coordinate with the OS to handle the case where the destination thread is not currently running on any core. These cases do involve running kernel code, but they are the slow path. The fast path is handled without any kernel involvement. The authors measure an end-to-end latency of 1360 clock cycles for a userspace interrupt on Sapphire Rapids. Fig. 2 illustrates where that time goes: Source: https://dl.acm.org/doi/abs/10.1145/3676641.3716028 The largest cost is the pipeline flush when the receiving core receives the IPI. Section 3.4 describes experiments the authors performed to determine these numbers, including how they determined that the receiving processor pipeline is flushed. Note that “flush” here means that in-flight instructions are squashed (i.e., what happens when a branch misprediction is detected). An alternative strategy would be to drain the pipeline, which would let outstanding instructions commit before handling the interrupt. This would avoid duplicate work but would increase the latency of handling an interrupt. The authors propose tracked interrupts to solve the performance problem associated with pipeline flushes. When an interrupt is received, the receiving core immediately injects the micro-ops needed to handle the interrupt into the pipeline. Outstanding instructions are not squashed. This may sound similar to draining, but there is a key difference. With draining, the processor waits for all inflight instructions to commit before injecting the interrupt handling micro-ops. With tracked interrupts, micro-ops enter the pipeline immediately, and the out-of-order machinery of the processor will not see any dependencies between the interrupt handling micro-ops and the inflight instructions. This means that the interrupt handling micro-ops can execute quickly and typically do not need to wait behind all inflight instructions. The paper has simulation results which show that tracked interrupts save 414 clock cycles on the receiver side. The paper also discusses some related improvements to UIPI: Userspace timer support Userspace interrupts from I/O devices Safepoints - to allow userspace interrupts to place nice with garbage collection I wonder how much of the Sapphire Rapids design was dictated by simplicity (to reduce the risk associated with this feature)? A frequent theme in papers reviewed on this blog is how difficult it is to implement fine-grained multithreading on general purpose multi-core CPUs. I wonder how much userspace interrupt support can help with that problem? Subscribe now Source: https://dl.acm.org/doi/abs/10.1145/3676641.3716028 The largest cost is the pipeline flush when the receiving core receives the IPI. Section 3.4 describes experiments the authors performed to determine these numbers, including how they determined that the receiving processor pipeline is flushed. Note that “flush” here means that in-flight instructions are squashed (i.e., what happens when a branch misprediction is detected). An alternative strategy would be to drain the pipeline, which would let outstanding instructions commit before handling the interrupt. This would avoid duplicate work but would increase the latency of handling an interrupt. Tracked Interrupts The authors propose tracked interrupts to solve the performance problem associated with pipeline flushes. When an interrupt is received, the receiving core immediately injects the micro-ops needed to handle the interrupt into the pipeline. Outstanding instructions are not squashed. This may sound similar to draining, but there is a key difference. With draining, the processor waits for all inflight instructions to commit before injecting the interrupt handling micro-ops. With tracked interrupts, micro-ops enter the pipeline immediately, and the out-of-order machinery of the processor will not see any dependencies between the interrupt handling micro-ops and the inflight instructions. This means that the interrupt handling micro-ops can execute quickly and typically do not need to wait behind all inflight instructions. Results The paper has simulation results which show that tracked interrupts save 414 clock cycles on the receiver side. The paper also discusses some related improvements to UIPI: Userspace timer support Userspace interrupts from I/O devices Safepoints - to allow userspace interrupts to place nice with garbage collection

0 views
Jeff Geerling 5 days ago

Resizeable BAR support on the Raspberry Pi

While not an absolute requirement for modern graphics card support on Linux, Resizeable BAR support makes GPUs go faster, by allowing GPUs to throw data back and forth on the PCIe bus in chunks larger than 256 MB. In January, I opened an issue in the Raspberry Pi Linux repo, Resizable BAR support on Pi 5 .

0 views
Daniel Mangum 6 days ago

Using a Laptop as an HDMI Monitor for an SBC

Though I spend the majority of my time working with microcontroller class devices, I also have an embarassingly robust collection of single board computers (SBC), including a few different Raspberry Pi models, the BeagleV Starlight Beta (RIP), and more. Typically when setting up these devices for whatever automation task I have planned for them, I’ll use “headless mode” and configure initial user and network credentials when writing the operating system to the storage device using a tool like Raspberry Pi’s Imager.

0 views
Jeff Geerling 6 days ago

How much radiation can a Pi handle in space?

Late in the cycle while researching CubeSats using Pis in space , I got in touch with Ian Charnas 1 , the chief engineer for the Mark Rober YouTube channel. Earlier this year, Crunchlabs launched SatGus , which is currently orbiting Earth taking 'space selfies'.

0 views
Brain Baking 6 days ago

I Made My Own Fountain Pen!

Those of you who know me also know that I love writing with a fountain pen . My late father-in-law had been pushing me for years to buy a small lathe and try my hand at some simple shapes—including a fountain pen barrel, of course. Being quite the capable woodworking autodidact, he taught me how to construct a few rudimentary things. Together, we created my stone oven cabinet on wheels I still use on a weekly basis. To this day, I regret not buying a lathe to create more things together. The idea of following a woodworking workshop or a pen creation workshop stuck on the back of my mind but never quite managed to materialize. In May 2024 , when I visited the Dutch Pen Show, a few artisans that presented their home-made pens there also offered workshops but lived more than away in the northern part of Germany, being out of reach for a quick “let’s go there and do that” excursion. Until last month, when my wife somehow found out about Eddy Nijsen’s Wood Blanks & Penkits company and neglected to tell me. Instead, she organized a secret birthday present, invited two more friends over to accompany me, and booked a “mystery event” in the calendar. That morning, when I heard one of my friend’s voices coming to pick me up, I expected us to go to some kind of board game convention. An hour later, we pulled over in a rather anonymous looking street in Weert, The Netherlands, and I had no clue what we were doing there. Boy, was I in for a pleasant surprise! We spent the entire day doing this: Me working on a lathe carefully shaving off wooden clippings to create a pen barrel from a blank. Note the enormously varied amount of available wooden pen blanks on the shelves in the back. It was quite possibly the best day I’ve had in months. The hours flew by and at the end of the day we all made two pens: one regular ball pen with a typical Parker filling that twists to open, and one fountain pen. Both pens turned out to be remarkable for different reasons. The ball pen is not one I will be using regularly but that doesn’t mean it wasn’t worth creating it. The wooden blank we used for this pen is unique to say the least. The black splintery wood almost smelled and felt like charcoal. Eddy, our instructor, managed to salvage it during a local archaeological dig that excavated a medieval oak water well shaft. Experts estimate that the ancient oak was felled in around 1250. Decades of exposure to ground water penetrating the oak cells permanently deformed and coloured the wood. After years of drying in Eddy’s workspace, it was sawn into smaller rectangular blocks called “pen blanks” where we proceeded to drill a hole in, attach to the lathe, and rework into a cylindrical shape that can be pressed onto other components called a “pen kit”. The metal components we worked with that day were high quality Beaufort Ink pen kits . After sanding, multiple waxing steps, the involvement of glue and a dedicated pen press, it was ready to write with! The pen has a mechanical twist mechanism on top that’s part of the kit. Therefore, we only needed to finish one pen blank. For the fountain pen that has a screw cap, we’d need to up our game, as not only we have to work two barrels, but the dimensions and particular shapes differ: the pen is thinner on the bottom and thicker near the grip. A blurry photo of the result: walnut fountain pen (left) and medieval oak ball pen (right). For the fountain pen, we could choose whatever wood we wanted. My friends chose different kinds of bright looking exotic wood while I went for the dark brown-grey walnut. My parents had multiple walnut trees when we were kids and I loved climbing in them and helping with the harvest. Selecting a type of wood closer to home seemed like the obvious choice for me. I carefully recorded all specific steps we took that day—with the home-made pen, of course—in case I accidentally buy a lathe and want to get in some more exercise. It felt amazing to work with my hands instead of staring at a screen all day long. Eddy’s mastery over his woodworking felt magical. He said that there’s only one way to achieve this: practice, fail, practice, fail, practice, fail some more. I doubt I’ll be able to finish one pen on my own without his guidance. I wish my father-in-law was still alive. It gradually dawned to me that I wasn’t really making a fountain pen. I was just creating a beautiful hull. Woodworking is not enough: you also need to be an expert jeweller to craft a great nib that writes like a dream. The stock nib that came with the Beaufort Ink pen kit unfortunately didn’t: it felt scratchy and dry. I anticipated this and have since replaced it with a fine platinum Bock nib that writes great although I’m still struggling with the ink flow going from the converter to the feed. The platinum nib was expensive ( excluding shipping) but it would be a shame never to use the pen. While the Beaufort Ink material indeed is of very high quality, this particular pen kit model is not the most well-balanced: posting the cap is entirely useless as it’s much too heavy. Also, the metal grip is much thinner than the wooden body that we created. Compared to a Lamy 2000 or a kit-less pen, searching for the right grip and writing takes a while to get used to. But who cares, I made my own fountain pen! The second mod I’m planning to do is to laser the Brain Baking logo on top of the cap. I love the way the pen and walnut wood feels and the subtle colour differences that neatly line up when you screw on the cap again is beautiful (but difficult to catch on camera). I do wonder what else you can do with a lathe if you do not limit yourself to just using a pen kit… Related topics: / fountain pens / activity / By Wouter Groeneveld on 8 October 2025.  Reply via email .

0 views

No Cap, This Memory Slaps: Breaking Through the Memory Wall of Transactional Database Systems with Processing-in-Memory

No Cap, This Memory Slaps: Breaking Through the Memory Wall of Transactional Database Systems with Processing-in-Memory Hyoungjoo Kim, Yiwei Zhao, Andrew Pavlo, Phillip B. Gibbons VLDB'25 This paper describes how processing-in-memory (PIM) hardware can be used to improve OLTP performance. Here is a prior paper summary from me on a similar topic, but that one is focused on OLAP rather than OLTP. UPMEM is specific PIM product (also used in the prior paper ) on this blog. A UPMEM DIMM is like a DRAM DIMM, but each DRAM bank is extended with a simple processor which can run user code. That processor has access to a small local memory and the DRAM associated with the bank. This paper calls each processor a PIM Module . There is no direct communication between PIM modules. Fig. 2 illustrates the system architecture used by this paper. A traditional CPU is connected to a set of boring old DRAM DIMMs and is also connected to a set of UPMEM DIMMs. Source: https://vldb.org/pvldb/volumes/18/paper/No%20Cap%2C%20This%20Memory%20Slaps%3A%20Breaking%20Through%20the%20Memory%20Wall%20of%20Transactional%20Database%20Systems%20with%20Processing-in-Memory Four Challenges The paper identifies the following difficulties associated with using UPMEM to accelerate an OLTP workload: PIM modules can only access their local memory PIM modules do not have typical niceties associated with x64 CPUs (high clock frequency, caches, SIMD) There is a non-trivial cost for the CPU to send data to UPMEM DIMMs (similar to the CPU writing data to regular DRAM) OLTP workloads have tight latency constraints The authors arrived at a solution that both provides a good speedup and doesn’t require boiling the ocean. The database code and architecture remain largely unchanged. Much of the data remains in standard DRAM DIMMs, and the database operates on it as it always has. In section 3.2 the authors identify a handful of data structures and operations with near-memory affinity which are offloaded. These data structures are stored in UPMEM DIMMs, and the algorithms which access them are offloaded to the PIM modules. The key feature that these algorithms have in common is pointer chasing . The sweet spots the authors identify involve a small number of parameters sent from the CPU to a PIM module, then the PIM module performing multiple roundtrips to its local DRAM bank, followed by the CPU reading back a small amount of response data. The roundtrips to PIM-local DRAM have lower latency than accesses from a traditional CPU core. One data structure which involves a lot of pointer chasing is B+ tree traversal. Thus, the system described in this paper moves B+ tree indexes into UPMEM DIMMs and uses PIM modules to search for values in an index. Note that the actual tuples that hold row data stay in plain-old DRAM. The tricky part is handling range queries while distributing an index across many banks. The solution described in this paper is to partition the set of keys into 2 R partitions (the lower bits of a key define the index the partition which holds that key). Each partition is thus responsible for a contiguous array of keys. For a range query, the lower bits of the lower and upper bounds of the range can be used to determine which partitions must be searched. Each PIM module is responsible for multiple partitions, and a hash function is used to convert a partition index into a PIM module index. MVCC is a concurrency control method which requires the database to keep around old versions of a given row (to allow older in-flight queries to access them). The set of versions associated with a row are typically stored in a linked list (yet another pointer traversal). Again, the actual tuple contents are stored in regular DRAM, but the list links are stored in UPMEM DIMMs, with the PIM modules traversing the links. Section 4.3 has more information about how old versions are eventually reclaimed with garbage collection. Fig. 7 has the headline results. is the baseline, is the work described by this paper. It is interesting that only beats on for read-only workloads. Source: https://vldb.org/pvldb/volumes/18/paper/No%20Cap%2C%20This%20Memory%20Slaps%3A%20Breaking%20Through%20the%20Memory%20Wall%20of%20Transactional%20Database%20Systems%20with%20Processing-in-Memory Processing-in-memory can help with memory bandwidth and memory latency. It seems like this work is primarily focused on memory latency. I suppose this indicates that OLTP workloads are fundamentally latency-bound, because there is not enough potential concurrency between transactions to hide that latency. Is there no way to structure a database such that OLTP workloads are not bound by memory latency? It would be interesting to see if these tricks could work in a distributed system, where the PIM modules are replaced by separate nodes in the system. Subscribe now Source: https://vldb.org/pvldb/volumes/18/paper/No%20Cap%2C%20This%20Memory%20Slaps%3A%20Breaking%20Through%20the%20Memory%20Wall%20of%20Transactional%20Database%20Systems%20with%20Processing-in-Memory Four Challenges The paper identifies the following difficulties associated with using UPMEM to accelerate an OLTP workload: PIM modules can only access their local memory PIM modules do not have typical niceties associated with x64 CPUs (high clock frequency, caches, SIMD) There is a non-trivial cost for the CPU to send data to UPMEM DIMMs (similar to the CPU writing data to regular DRAM) OLTP workloads have tight latency constraints

0 views
Stavros' Stuff 1 weeks ago

I made a really small LED panel

I bought a really small 8x8 LED panel a while ago because I have a problem. I just can’t resist a nice WS2812 LED panel, much like I can’t resist an e-ink display. These days I manage to stay sober, but once in a while I’ll see a nice cheap LED panel and fall off the wagon. It has now been thirteen minutes that I have gone without buying LED panels, and this is my story. This isn’t really going to be super interesting, but there are some good lessons, so I thought I’d write it up anyway. On the right you can see the LED panel I used, it’s a bare PCB with a bunch of WS2812 (Neopixel) addressable LEDs soldered onto it. It was the perfect excuse for trying out WLED , which I’ve wanted to take a look at for ages, and which turned out to be absolutely fantastic. As with every light-based project, one of the big issues is proper diffusion. You don’t want your LEDs to show up as the points of light they are, we really like nice, big, diffuse lights, so you need a way to do that. My idea was to print a two-layer white square out of PLA (which would be translucent enough to show the light, but not so translucent that you could see the LEDs behind it. I also printed a box for the square to go in front of: I printed the diffuser (the white square) first, held it over the LED panel and increased or decreased the distance of the square from the LEDs until the LEDs didn’t look like points, but the colors also didn’t blend into the neighboring squares’ colors. This turned out to be around 10mm, so that’s how thick I made the box. The eagle-eyed among you may want to seek medical assistance, but if you have normal human eyes, you may have noticed that there’s nowhere in the box for the microcontroller to go, and you would be correct. For this build, I decided to use an ESP8266 (specifically, a WeMos dev board), but I didn’t want to make the whole box chunky just to fit a small microcontroller in there, so I did the next best thing: I designed a hole in the back of the box for the cables that connect to the LED panel, and I glued the ESP8266 to the back of the box. YOLO. Look, it works great, ok? The cables are nice and shortish, even though they go to the entirely wrong side of the thing, the USB connector is at a very weird place, and the ESP8266 is exposed to the elements and the evil eye. It’s perfect. Here’s the top side, with the diffuser: And here’s the whole mini tiny cute little panel showing some patterns from WLED (did I mention it’s excellent? It is). That’s it! I learned a few things and made a cute box of lights. I encourage you to make your own, it’s extremely fun and mesmerizing and I love it and gave it to a friend because I never used it and it just took up space and then made a massive 32x32 version that I also never use and hung it on my wall. Please feel free to Tweet or toot at me, or email me directly.

0 views
Jeff Geerling 1 weeks ago

Qualcomm's buying Arduino – what it means for makers

Qualcomm just announced they're acquiring Arduino , the company that introduced a whole generation of tinkerers to microcontrollers and embedded electronics. The Uno R3 was the first microcontroller board I owned. Over a decade ago, I blinked my first LED with an Uno; the code for that is actually still up on my GitHub .

0 views
Chris Coyier 1 weeks ago

Local by Flywheel was Ultra Slow Because I Had The Wrong Version

Maybe a few months ago on my Mac Studio, Local by Flywheel became incredibly slow clicking any button in the UI would take literally minutes. I was gonna just give up and switch to Studio , but I had tried that once and it had enough rough edges I gave up (can’t remember why now, but I’ll give it a shot again one of these days). There were a few releases of Local since my problem and each time I’d upgrade I’d cross my fingers it would fix it and it never did. But then they released a version that now does some kind of “are you running the correct version?” check and it warned me that I wasn’t. I must have been (somehow, someway) running the “Mac Intel” version on my “Mac Apple Silicon” machine. So anyway, if Local by Flywheel is ultra mega slow for you, make sure to check you’re running the appropriate version for your machine. Derp.

0 views

Parendi: Thousand-Way Parallel RTL Simulation

Parendi: Thousand-Way Parallel RTL Simulation Mahyar Emami, Thomas Bourgeat, and James R. Larus ASPLOS'25 This paper describes an RTL simulator running on (one or more) Graphcore IPUs. One nice side benefit of this paper is the quantitative comparisons of IPU synchronization performance vs traditional CPUs. Here is another paper summary which describes some challenges with RTL simulation. The Graphcore IPU used in this paper is a chip with 1472 cores, operating with a MIMD architecture. A 1U server can contain 4 IPUs. It is interesting to see a chip that was designed for DNN workloads adapted to the domain of RTL simulation. Similar to other papers on RTL simulation, a fundamental step of the Parendi simulator is partitioning the circuit to be simulated. Parendi partitions the circuit into fibers . A fiber comprises a single (word-wide) register, and all of the combinational logic which feeds it. Note that some combinational logic may be present in multiple fibers. Fig. 3 contains an example, node a3 is present in multiple fibers. As far as I can tell, Parendi does not try to deduplicate this work (extra computation to save synchronization). Source: https://dl.acm.org/doi/10.1145/3676641.3716010 The driving factor in the design of this fiber-specific partitioning system is scalability . Each register has storage to hold the value of the register at the beginning and end of the current clock cycle (i.e., the and values). I think of the logic to simulate a single clock cycle with the following pseudo-code ( is the register rooted at fiber : Scalability comes from the fact that there are only two barriers per simulated clock cycle. This is an instance of the bulk synchronous parallel (BSP) model. In many cases, there are more fibers than CPU/IPU cores. Parendi addresses this by distributing the simulation across chips and scheduling multiple fibers to run on the same core. If the simulation is distributed across multiple chips, then a min-cut algorithm is used to partition the fibers across chips while minimizing communication. The Parendi compiler statically groups multiple fibers together into a single process . A core simulates all fibers within a process. The merging process primarily seeks to minimize inter-core communication. First, a special case merging algorithm merges multiple fibers which reference the same large array. This is to avoid communicating the contents of such an array across cores. I imagine this is primarily for simulation of on-chip memories. Secondly, a general-purpose merging algorithm merges fibers which each have low compute cost, and high data sharing with each other. Fig. 7 compares Parendi vs Verilator simulation. is a 2-socket server with 28 Intel cores per socket. is a 2-socket server with 64 AMD cores per socket: Source: https://dl.acm.org/doi/10.1145/3676641.3716010 Section 6.4 claims a roughly 2x improvement in cost per simulation using cloud pricing. As far as I can tell, this system doesn’t have optimizations for the case where some or all of a fiber’s inputs do not change between clock cycles. It seems tricky to optimize for this case while maintaining a static assignment of fibers to cores. Fig. 4 has a fascinating comparison of synchronization costs between an IPU and a traditional x64 CPU. This microbenchmark loads up the system with simple fibers (roughly 6 instructions per fiber). Note that the curves represent different fiber counts (e.g., the red dotted line represents 7 fibers on the IPU graph, vs 736 fibers on the x64 graph). The paper claims that a barrier between 56 x64 threads implemented with atomic memory accesses consumes thousands of cycles, whereas the IPU has dedicated hardware barrier support. Source: https://dl.acm.org/doi/10.1145/3676641.3716010 This seems to be one of many examples of how generic multi-core CPUs do not perform well with fine-grained multi-threading. We’ve seen it with pipeline parallelism, and now with the BSP model. Interestingly, both cases seem to work better with specialized multi-core chips (pipeline parallelism works with CPU-based SmartNICs, BSP works with IPUs). I’m not convinced this is a fundamental hardware problem that cannot be addressed with better software. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts.

0 views
Preah's Website 1 weeks ago

Emulation on a MacBook Pro M4 Pro

I bought a MacBook Pro 14-inch November 2024 M4 Pro with 24GB RAM in June of this year. It is my first and only Mac. I found out pretty fast that this thing is pretty powerful. It can run VMs (I use Parallels) smoothly, play games in said VM, play YouTube while playing modded Minecraft, and most important to why I got it, great for emulation. I wanted a laptop that would be able to handle emulation of Gamecube, Wii, and Switch games, and be portable to bring to people's houses, and that's what I got. Here is my line-up for emulators: Note: Ryujinx does not have a safe official website anymore, it seems Nintendo has taken over the domain. Do not blindly trust any random site that shows up offering a download of the software when you search for it. This meets all of my emulation needs. I've noticed I haven't been able to connect real Wii remotes to Dolphin because of Bluetooth changes on modern Macs, but Nintendo Switch Pro controllers with motion control seem to connect and work excellently in my experience thus far. A while ago, I ran a Mario Party get-together with friends, we played Mario Party... 7, I think, off of my laptop connected via HDMI to the TV. I told people to bring controllers since I only had two that I knew worked. I basically brute-forced the controller compatibility, testing several ones brought and barely meeting the requirement of four working. The ones that worked were: Once these were connected via Bluetooth, it was easy to configure their settings and map buttons to how players want them in the emulator. Fun gaming and snacks ensued. Dolphin in particular is amazing in terms of settings. You can configure controllers however you want very easily, and use a pass-through bluetooth adapter and connect real Wii remotes!! One major thing that annoyed me was that I didn't get a 1TB MacBook. I regret it because ROMs can get large. I ended up offloading them onto a personal cloud drive, which the emulators can still see and read normally. The actual data (saves, preferences) are still on my Mac. I think overall, this MacBook does an excellent job of emulating. However, if you want to play Wii games specifically, a computer running Windows of similar power would be more suitable because of Bluetooth and emulator compatibility. Keep in mind Ryujinx is no longer developed, and there are better alternatives now that I haven't found run on Mac as far as I can tell. Try it out yourself on a computer you have laying around, and get into some retro games, or hold a mario party with your friends! Subscribe via email or RSS Azahar , for 3DS games Cemu , for Wii U Dolphin , for Wii and Gamecube games OpenEmu , for various mainly retro forms Ryujinx , for Switch Xbox Series X (my preferred one) Two Switch Pros DualShock 4 (PS4)

0 views
Christian Jauvin 1 weeks ago

Ignore Your Check Engine Light at Your Own Peril

Currently in my car I have the “check engine” light being on, but it’s ok, because I know what is the problem, my mechanic told me that it’s , and that even though it’s not ideal, it can wait while he finds the part to repair it (apparently it’s not so easy to find). There is something I don’t like about this though: if there is a NEW problem appearing, I won’t know about it, because this check engine light has only one state, and now it’s being used.

1 views
Uros Popovic 1 weeks ago

PC cooler control with a $2 microcontroller, no development board

Walkthrough for a small embedded system based on ATmega328p which controls a PC cooler fan. The focus of this experiment is on using only open-source CLI software tooling for the solution. Additionally, no development boards were used, just the bare microcontroller, which should be helpful in transitioning to building your own boards.

1 views

Skia: Exposing Shadow Branches

Skia: Exposing Shadow Branches Chrysanthos Pepi, Bhargav Reddy Godala, Krishnam Tibrewala, Gino A. Chacon, Paul V. Gratz, Daniel A. Jiménez, Gilles A. Pokam, and David I. August ASPLOS'25 This paper starts with your yearly reminder of the high cost of the Turing Tax : Recent works demonstrate that the front-end is a considerable source of performance loss [16], with upwards of 53% of performance [23] bounded by the front-end. Everyone knows that the front-end runs ahead of the back-end of a processor. If you want to think of it in AI terms, imagine a model that is told about the current value of and recent history of the program counter, and asked to predict future values of the program counter. The accuracy of these predictions determines how utilized the processor pipeline is. What I did not know is that in a modern processor, the front-end itself is divided into two decoupled components, one of which runs ahead of the other. Fig. 4 illustrates this Fetch Direction Instruction Processing (FDIP) microarchitecture: Source: https://dl.acm.org/doi/10.1145/3676641.3716273 The Instruction Address Generator (IAG) runs the furthest ahead and uses tables (e.g., the Branch Target Buffer (BTB)) in the Branch Prediction Unit (BPU) to predict the sequence of basic blocks which will be executed. Information about each predicted basic block is stored in the Fetch Target Queue (FTQ). The Instruction Fetch Unit (IFU) uses the control flow predictions from the FTQ to actually read instructions from the instruction cache. Some mispredictions can be detected after an instruction has been read and decoded. These result in an early re-steer (i.e., informing the IAG about the misprediction immediately after decode). When a basic block is placed into the FTQ, the associated instructions are prefetched into the IFU (to reduce the impact of instruction cache misses). This paper introduces the term “shadow branch”. A shadow branch is a (static) branch instruction which is currently stored in the instruction cache but is not present in any BPU tables. The top of fig. 5 illustrates a head shadow branch. A branch instruction caused execution to jump to byte 24 and execute the non-shaded instructions. This causes an entire cache line to be pulled into the instruction cache, including the branch instruction starting at byte 19. The bottom of fig. 5 shows a tail shadow branch. In this case, the instruction at byte 12 jumped away from the cache line, causing the red branch instruction at byte 16 to not be executed (even though it is present in the instruction cache). Source: https://dl.acm.org/doi/10.1145/3676641.3716273 Skia The proposed design (Skia) allows the IAG to make accurate predictions for a subset of shadow branches, thus improving pipeline utilization and reducing instruction cache misses. The types of shadow branches which Skia supports are: Direct unconditional branches (target PC can be determined without looking at backend state) Function calls As shown in Fig. 6, these three categories of branches (purple, red, orange) account for a significant fraction of all BTB misses: Source: https://dl.acm.org/doi/10.1145/3676641.3716273 When a cache line enters the instruction cache, the Shadow Branch Decoder (SBD) decodes just enough information to locate shadow branches in the cache line and determine the target PC (for direct unconditional branches and function calls). Metadata from the SBD is placed into two new branch prediction tables in the BPU: The U-SBB holds information about direct unconditional branches and function calls The R-SBB holds information about returns When the BPU encounters a BTB miss, it can fall back to the U-SBB or R-SBB for a prediction. Fig. 11 illustrates the microarchitectural changes proposed by Skia: Source: https://dl.acm.org/doi/10.1145/3676641.3716273 Section 4 goes into more details about these structures including: Replacement policy How a shadow branch is upgraded into a first-class branch in the BTB Handling variable length instructions Fig. 14 has (simulated) IPC improvements across a variety of benchmarks: Source: https://dl.acm.org/doi/10.1145/3676641.3716273 Dangling Pointers A common problem that HW and SW architects solve is getting teams out of a local minimum caused by fixed interfaces. The failure mode is when two groups of engineers agree on a static interface, and then each optimize their component as best they can without changing the interface. In this paper, the interface is the ISA, and Skia is a clever optimization inside of the CPU front-end. Skia shows that there is fruit to be picked here. It would be interesting to examine potential performance gains from architectural (i.e., ISA) changes to pick the same fruit. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts. Source: https://dl.acm.org/doi/10.1145/3676641.3716273 The Instruction Address Generator (IAG) runs the furthest ahead and uses tables (e.g., the Branch Target Buffer (BTB)) in the Branch Prediction Unit (BPU) to predict the sequence of basic blocks which will be executed. Information about each predicted basic block is stored in the Fetch Target Queue (FTQ). The Instruction Fetch Unit (IFU) uses the control flow predictions from the FTQ to actually read instructions from the instruction cache. Some mispredictions can be detected after an instruction has been read and decoded. These result in an early re-steer (i.e., informing the IAG about the misprediction immediately after decode). When a basic block is placed into the FTQ, the associated instructions are prefetched into the IFU (to reduce the impact of instruction cache misses). Shadow Branches This paper introduces the term “shadow branch”. A shadow branch is a (static) branch instruction which is currently stored in the instruction cache but is not present in any BPU tables. The top of fig. 5 illustrates a head shadow branch. A branch instruction caused execution to jump to byte 24 and execute the non-shaded instructions. This causes an entire cache line to be pulled into the instruction cache, including the branch instruction starting at byte 19. The bottom of fig. 5 shows a tail shadow branch. In this case, the instruction at byte 12 jumped away from the cache line, causing the red branch instruction at byte 16 to not be executed (even though it is present in the instruction cache). Source: https://dl.acm.org/doi/10.1145/3676641.3716273 Skia The proposed design (Skia) allows the IAG to make accurate predictions for a subset of shadow branches, thus improving pipeline utilization and reducing instruction cache misses. The types of shadow branches which Skia supports are: Direct unconditional branches (target PC can be determined without looking at backend state) Function calls Source: https://dl.acm.org/doi/10.1145/3676641.3716273 When a cache line enters the instruction cache, the Shadow Branch Decoder (SBD) decodes just enough information to locate shadow branches in the cache line and determine the target PC (for direct unconditional branches and function calls). Metadata from the SBD is placed into two new branch prediction tables in the BPU: The U-SBB holds information about direct unconditional branches and function calls The R-SBB holds information about returns Source: https://dl.acm.org/doi/10.1145/3676641.3716273 Section 4 goes into more details about these structures including: Replacement policy How a shadow branch is upgraded into a first-class branch in the BTB Handling variable length instructions

0 views
ava's blog 1 weeks ago

amdgpu is borked for me

Currently not having a good time with my AMD system. :( Started out with infrequent random kernel panics out of nowhere, doing nothing specific, even after a while of login screen. I updated, no change. Still happened. Unfortunately, the QR codes for the kernel panics had no log entries, no info at all aside from what kernel version it was. I reinstalled kernels, but afterwards, I had full system freezes instead. Not even TTY or SysRq worked. Had to shut down hard, physically. I read through all kinds of logs, outputs, reports, and monitored system performance. Couldn’t find anything, seemed like nothing relevant was written to logs at that point. Decided to install the LTS kernel. Lasted for almost two hours until the laptop screen completely froze while the external screen worked just fine. Was finally able to find stuff in the logs. kernel: amdgpu 0000:07:00.0: [drm] * ERROR * flip_done timed out 0010:amdgpu_dm_atomic_commit_tail+0x3934/0x3a10 [amdgpu] I found a lot about this online from every other year and seems to be a kernel bug, specifically with how it handles atomic commit, that pops up every couple years. It’s causing KWin pageflip issues, freezes the system and can also cause a panic. Latest issue thread activity was from 3 weeks ago, some even more recent, so I’m not the only one. I don’t feel confident in any temp solutions provided online as they just throw stuff at the wall to see what sticks and none work for everyone. Also I would like to understand what they do before I apply anything, and I don’t. So I guess I’ll live with a freeze every couple hours and just use my other laptop more until that is fixed again. I may try to see if switching between Wayland or Xorg changes anything. I have always been lucky with updates on that machine, so I guess I was due for this tax. Reply via email Published 02 Oct, 2025

0 views
マリウス 2 weeks ago

Updates 2025/Q3

This post includes personal updates and some open source project updates. Q3 has been somewhat turbulent, marked by a few unexpected turns. Chief among them, changes to my home base . In mid-Q3, the owner of the apartment I was renting abruptly decided to sell the property, effectively giving me two weeks to vacate. Thanks to my lightweight lifestyle , moving out wasn’t a major ordeal. However, in a global housing landscape distorted by corporations, wealthy boomers, and trust-fund heirs, securing a new place on such short notice proved nearly impossible. Embracing the fact that I am the master of my own time and destiny, and guided by the Taoist principle of Wu Wei , I chose not to force a solution. Instead, I placed all my belongings (including my beloved desk ) into storage and set off for a well-earned break from both the chaos and the gloom of the wet season. Note: If you ever feel you’re not being treated with the respect you deserve, the wisest response is often to simply walk away. Go where you’re treated best. Give yourself the space to reflect, to regain clarity, and most importantly, to reconnect with your sense of self. Resist the urge to look back or second-guess your instincts. Never make life-altering decisions in haste; Make them on your terms, not someone else’s. And remember, when life gives you onions, make onionade . On a different note, my coffee equipment has been extended by a new addition that is the Bookoo Themis Mini Coffee Scale , a super lightweight (~140g) and portable (8cm x 8cm x 1.5cm) coffee scale that allows me to precisely measure the amount of coffee that I’m grinding and brewing up . So far I’m very happy with the device. I don’t use its Bluetooth features at all, but when I initially tried, out of curiosity, their Android app didn’t really work. Speaking of brewing: Unfortunately at the end of Q3 my 9barista espresso maker seemingly broke down . While there are no electronics or mechanics that can actually break, I suspect that during my last descaling procedure enough limestone was removed for the boiler O-ring to not properly seal the water chamber any longer. I took the 9barista apart and couldn’t visually see anything else that could make it misbehave. I have hence ordered a repair kit from the manufacturer’s online store and am waiting for it to be delivered before I can continue enjoying self-made, awesome cups of coffee. Europe is continuing to build its surveillance machinery under claims of online safety , with the UK enforcing online age verification for major platforms, followed by the EU piloting similar acts in several member states . Even though the changes don’t affect me, I find this trend frightening, especially considering the looming threat to online privacy that is Chat Control . Even presumed safe-havens for censorship and surveillance like Matrix have rolled over and implemented age verification on the Matrix.org homeserver. The CEO of Element ( New Vector Limited ) gave the following explanation for it : Now, the Online Safety Act is a very sore point. The fact is that Matrix.org or Element is based in the UK, but even if we weren’t we would still be required to comply to the Online Safety Act for users who are in the UK, because it is British law. That statement is not quite accurate, however. If the Matrix.org homeserver was run by an entity in a non-cooperative jurisdiction they wouldn’t need to implement any of this. This is important, because people need to understand that despite the all the globalism that is being talked about, not every place on earth part-takes in implementing these mindless laws, even if your government would like you to think that it’s the norm. Obviously it’s not exactly easy to run a platform from within an otherwise (at least partially) sanctioned country, especially when user data is at stake. However with regulations like these becoming more and more widespread my pirate mind imagines a future where such setups are becoming viable options, given that the countries in question are scrambling for income streams and would be more than happy to gain leverage over other countries. We’ve already seen this in several instances (e.g. Yandex, Alibaba, ByteDance ( TikTok ), Telegram, DiDi, Tencent ( WeChat ), …) and given the global political climate I can imagine more services heading towards jurisdictions that allow them to avoid requesting IDs from their users or breaking security measures only so government agencies can siphon out data at will. However, a different future outcome might be an increased focus on decentralization (or at least federation ), which would as well be a welcome change. As Matrix correctly pointed out, individually run homeservers are not affected by any of this. Similarly, I haven’t heard of any instances of XMPP service operators being approached by UK officials. Unlike on centralized platforms like Discord, and wannabe-decentralized platforms like Bluesky, enforcing something like age verification on an actual federated/decentralized network is near impossible, especially with services that are being hosted outside of the jurisdiction’s reach. In the future, federated protocols, as well as peer-to-peer projects are going to become more important than ever to counter the mindless policies enacted by the people in power. Looking at this mess from the bright side, with major big tech platforms requiring users to present IDs we can hope for a growing number of people to cut ties with those platforms, driving them, and their perpetrators , into the ground in the long run. If you are looking for decentralized alternatives to centralized services, here is a non-exhaustive list: Since I began publishing code online, I’ve typically used the GPL or MIT license for my projects. However, given the current global climate and the direction the world seems to be heading, I’ve grown increasingly skeptical of these traditional licenses. While I still want to offer free and open source software to people , I find myself more and more reluctant to grant unrestricted use, particularly to organizations whose values or missions I fundamentally disagree with. Unfortunately, exclusions or prohibitions were never part of the vision behind the GNU or OSI frameworks, making most conventional open source licenses unsuitable for this kind of selective restriction. Recently, however, I came across the Hippocratic License , which is designed to address exactly these concerns. In fact, the HL3 already includes three of the four exclusions I would like to enforce: Mass surveillance, military activities, and law enforcement. The fourth, government revenue services, could likely be added in a similar manner. That said, HL3 does overreach in some areas, extending into domains where I don’t believe a software license should have jurisdiction, such as: 3.2. The Licensee SHALL: 3.2.1. Provide equal pay for equal work where the performance of such work requires equal skill, effort, and responsibility, and which are performed under similar working conditions, except where such payment is made pursuant to: 3.2.1.1. A seniority system; 3.2.1.2. A merit system; 3.2.1.3. A system which measures earnings by quantity or quality of production; or 3.2.1.4. A differential based on any other factor other than sex, gender, sexual orientation, race, ethnicity, nationality, religion, caste, age, medical disability or impairment, and/or any other like circumstances (See 29 U.S.C.A. § 206(d)(1); Article 23, United Nations Universal Declaration of Human Rights; Article 7, International Covenant on Economic, Social and Cultural Rights; Article 26, International Covenant on Civil and Political Rights); and 3.2.2. Allow for reasonable limitation of working hours and periodic holidays with pay (See Article 24, United Nations Universal Declaration of Human Rights; Article 7, International Covenant on Economic, Social and Cultural Rights). These aspects of the Hippocratic License have already drawn significant criticism, and I would personally remove them in any variation I choose to adopt. However, a far greater concern lies with the license’s stewardship, the Organization for Ethical Source ( OES ). While supporting a good cause is typically straightforward, the organization’s founder and current president has unfortunately earned a reputation for unprofessional conduct , particularly in addressing the very issues the organization was created to confront. I’m reluctant to have my projects associated with the kind of “drama” that seems to follow the organization’s leadership. For this reason, I would likely need to distance any variation of the license as far as possible from its heritage, to avoid direct association with the OES and the leadership’s behavior. Hence, I’m still on the lookout for alternative licenses, specifically ones that maintain the permissiveness of something like the GPL, but allow for clearly defined, legally enforceable exceptions. If you have experience in working with such licenses, I would very much appreciate your input. PS: I’m fully aware that adopting such a license would render my software non-free in the eyes of organizations like GNU or the OSI. However, those organizations were founded in a different era and have, in my view, failed to adapt to the realities of today’s world. It’s curious how many advocates of GNU/OSI philosophies call for limitations on freedom of speech, yet insist on software being usable without restriction in order to qualify as free and open source . This site has received what some might consider a useless or even comical update, which, however, is meant to further the goal of raising awareness about the role JavaScript plays in browsers. I got the inspiration for this from this post by sizeof.cat , a site I discovered thanks to the friendly folks in the VT100 community room . While sizeof.cat uses this feature purely for the lulz , I believe it can serve as an effective way to encourage people to disable JavaScript in their browsers by default, and to be very selective about which websites they enable it for. As a result, this website now features a similar (but edgier ) option, which you can test by enabling JavaScript for this domain and then sending this tab to the background. Go ahead, I’ll wait. :-) Like sizeof.cat ’s original implementation, this site will randomly alternate between different services . However, unlike the original, you’ll see an overlay when you return to the site, explicitly encouraging you to disable JavaScript in your browser. After having used neomutt for several years, I grew tired of the many cogs ( notmuch , mbsync , w3m , reader , etc.) I had to maintain for the setup to function the way I expected it to do, especially when my primary requirement is to not leave e-mails on the server for more time than really needed. Eventually I got fed up with my e-mail client breaking whenever I needed it most, and with having to deal with HTML e-mail on the command line, thinking that if I’d use an actual GUI things would be much simpler. Little did I know. I moved to Mozilla Thunderbird as my primary e-mail client a while ago. I set up all my IMAP accounts, and I created a “Local Folder” that Mozilla sold me as maildir : Fast-forward to today and I’m stuck with a setup where I cannot access my “Local Folder” maildir by any other maildir -compliant software besides Thunderbird, because even though Mozilla called it maildir , it is not an actual maildir format : Note this is NOT full maildir in the sense that most people, particularly linux users or mail administrators, know as maildir. On top of that, my OpenSnitch database is overflowing with deny rules for Mozilla’s supposed “privacy respecting” software. At this point I’m not even wondering what the heck is wrong with this company anymore. Mozilla has lost it, with Firefox , and seemingly also with other software they maintain. With my e-mails now locked-in into something that Mozilla titles maildir even though it is not, I am looking forward to go back to where I came from. I might however replace the overly complex neomutt setup with a more modern and hopefully lightweight aerc configuration. Unfortunately, I have used Thunderbird ’s “Local Folder” account for too long and I’ll have to figure out a way to get those e-mails into an actual maildir format before I can leave Mozilla’s ecosystem once and for all. Note on Firefox: I don’t care what your benchmark says, in everyday use Firefox is annoyingly slow despite all its wRiTtEn In RuSt components. For reasons that I didn’t care to investigate, it also seemingly hangs and waits for connections made by its extensions (e.g. password managers) and meanwhile prevents websites from loading. The amount of obscure problems that I’ve encountered with Firefox over the past years is unmatched by any other browser. Not to mention the effort that goes into checking the configuration editor with every new release and disabling all the privacy invasive bs that Mozilla keeps adding. At this point I’m not supporting Firefox any longer, despite the world’s need for a Chromium alternative. Firefox is sucking out the air in the room and with it dead hopefully more effort will be put into alternatives. I had to bring my Anker A1257 power bank to a “recycling” facility, due to it being recalled by Anker : There’s an interesting post by lumafield if you want to know the details. However, what Anker calls a recall is effectively a throw it away and we give you a voucher , because apparently we’re too stupid as societies to demand for manufacturers to take back their broken junk and recycle it properly . I tried to be better by not tossing the device into the trash but bring it to a dedicated “recycling” facility, even when I know for sure that they won’t actually recycle it or even dispose of it in a proper way. But that’s pretty much all I, as a consumer, can do in this case. While I, too, got a lousy voucher from Anker, none of their current options fit the specifications of the A1257. I therefor decided to look for alternatives and found the Sharge Pouch Mini P2. I needed something that is lightweight, has a relatively small form factor and doesn’t rely on a single integrated charging cable which would render the device useless the moment it would break. Given how bad Anker USB-C cables usually are in terms of longevity, I would never buy into a power bank from Anker that comes with an integrated USB cable, especially when it’s the only option to charge the power bank. While the Sharge also has a fixed USB cable, it is nevertheless possible to use the USB-C port for re-charging the device. If the integrated red cable ever breaks, I can still continue using the device. As I have zero experience with this manufacturer it remains to be seen how this 213g- heavy power bank will perform long-term. So far the power bank appears sufficient. While charging it barely gets warm, and even though the device lacks a display for indicating the charge level, the LED ring around the power button is sliced into four segments that make it easy to guesstimate the remaining charge. Charging it full takes around an hour. One thing that is slightly annoying is the USB-C port, which won’t fit significantly thicker cable-heads. The maximum that I could fit were my Cable Matters USB4 cables. The situation with GrapheneOS devices (and Android in general) mentioned in my previous updates has prompted me to revive my dormant Pinephone Pro . Trying to do so, however, I found that the original Pinephone battery was pretty much dead. Hence, I ordered a new battery that is compatible with the Samsung Galaxy J7 (models / ) – primarily because Pine64 doesn’t appear to be selling batteries for the Pinephone Pro anymore; Update: Pine64 officially discontinued the Pinephone Pro – and gave the latest version of postmarketOS (with KDE Plasma Mobile) a go. While Pinephone Pro support has become better over the years, with at least the selfie-camera finally “working” , the Pinephone hardware unfortunately remains a dead-end. Even with a new battery the phone discharges within a few hours (with sleep enabled). In fact, it even discharges over night when turned off completely. I don’t know whether newer versions of the Pine64 hardware have fixed the hardware bugs, but judging by the search results that I’m getting I doubt so. The UI has certainly become more usable with hardware acceleration seemingly working fine now, however the Pinephone is still too weak for most use cases. Retrospectively, the Pinephone Pro was a bad investment, as it’s effectively a wire-bound device with an integrated UPS at most, that I would argue isn’t even suitable as a development platform with all its hardware glitches ( Hello %0 battery boot loop! , just to name one). It is in fact so bad that you cannot even boot the device when there is no battery in it, to use it as a regular SBC with integrated display. This is sad because the Pinephone hardware tarnishes the reputation of Linux on mobile, given that it is one of the most prominent options. If you’re considering to give Linux on mobile a try, I do not recommend the Pinephone, and I am somewhat happy that Pine64 decided to discontinue it. They did not discontinue the original Pinephone, yet, however. Having that said, I have been following the progress that Luca Weiss ( Z3ntu ) made with running pmOS on the Fairphone 6 and I have to admit that I’m intrigued. While it’s still a long way to go , it is nice to see a Fairphone engineer that is actively working on bringing mobile Linux to the device. I don’t know whether his efforts are partially funded by his employer, or whether it’s his personal interest, but I truly hope for the former. The Fairphone is an odd value proposition for the average Android user. The native Fairphone Android experience seems average , and their Murena /e/OS partnership is highly questionable at best and might tarnish their reputation in the long run. However, I feel like they could gain a pretty large nerd-following by officially supporting mobile Linux, and actively pushing for it. At least in my books, having full-fledged postmarketOS support on their phones would be an instant money-magnet from the tech sphere, especially with the current bar being as low as the Pinephone. I will keep an eye on the progress, because I would be more than happy to give it a go once UFS support, 3D acceleration and WiFi connectivity issues are fixed. Alternatively, it appears that the OnePlus 6T is among the best supported postmarketOS devices at this point, and from several videos I came across on YouTube it appears that performance is significantly better than the Pinephone. However, a 7-years-old phone battery is probably cooked, and replacing it requires removal of the glued backcover. At an average price (on eBay) of around $100, plus another $30 for the replacement battery, the phone is not a particularly attractive option from a hardware standpoint. I invested quite some time in pursuing my open source projects in the past quarter, hence there are a few updates to share. With 📨🚕 going live Overpush has received a lot of updates over the past months, most of which are as beneficial for self-hosted versions as they are for the hosted service. You can find an overview of the changes on the releases page. zpoweralertd 0.0.2 was released with compatibility for Zig 0.15.1. Apart from the adjustments to compile with the latest Zig release no new things were introduced. Nearly five years after its initial release, zeit has weathered the test of time ( hah ) relatively well and continues to grow in popularity on GitHub . What started as a minimal command-line time-tracking utility has evolved into a program packed with a wide range of features and options. Depending on your preferences, you might however say that it now has one too many these days. zeit began as a personal pet project, with no clear long-term plan. Whenever users requested a new feature or option, I either implemented it myself or accepted their pull requests without much second thought. My mantra was simple: If a small enhancement made the software more useful to even one other person, I was happy to introduce it. Fast forward to today, and the very first version of zeit (dubbed zeit v0 ) has strayed far from its roots as a minimal and clean command-line tool. Instead, it has grown into a somewhat unwieldy UX experience, cluttered with features that are neither intuitive nor well thought out. From a code perspective, some of the decisions that made sense a few years ago now seem less ideal, particularly as we look ahead. While I could have sifted through the original v0 codebase to clean it up and remove features that were added by contributors who ultimately didn’t maintain them long-term, I chose instead to rewrite zeit from the ground up. This new version will be based on more modern dependencies and, hopefully, will be cleaner, more streamlined, and free of the “one-off” features that were added for single users who eventually stopped using zeit altogether. That said, I’ve learned a lot from the feature requests submitted over the past five years. With this new version, I’m working to implement the most useful and practical requests in a way that feels more cohesive and polished from a UX perspective, and less like an afterthought. I’m nearing the release of the first version of this complete rewrite, which will be called zeit v1 and carry the version number v1.0.0 . This new version will not be compatible with your existing zeit v0 database. However, if you’re currently using zeit v0 , you can export your entries using , and then import them into v1 with the new command. If you’re interested in a command-line utility for time tracking, especially if you’re already using a different tracker, I’d love to hear from you . Let me know your top three feature requests for a tool like zeit and which platform(s) you currently use or would like to switch from. Footnote: The artwork was generated using AI and further botched by me using the greatest image manipulation program . Twitter X: Mastodon Facebook: See Twitter X. Reddit: Lemmy Instagram: Pixelfed YouTube: PeerTube Spotify: Funkwhale , or simply host your own Jellyfin server WhatsApp: plenty to choose from The Fed, ECB, etc.: Monero , Bitcoin , et al.

0 views