Posts in Hardware (20 found)
Daniel Mangum Yesterday

Interesting SPI Routing with iCE40 FPGAs

A few weeks ago I posted about how much fun I was having with the Fomu FPGA development board while travelling. This project from Tim ‘mithro’ Ansell and Sean ‘xobs’ Cross is not new, but remains a favorite of mine because of how portable it is — the entire board can fit in your USB port! The Fomu includes a Lattice Semiconductor iCE40 UltraPlus 5K, which has been a popular FPGA option over the past few years due to the reverse engineered bitstream format and ability to program it with a fully open source toolchain (see updated repository here).

0 views
Brain Baking 2 days ago

The 1994 IBM PC Invoice

In 1994, my late father-in-law bought a new computer. That then brand new sparkling piece of hardware now is my 31 year old 80486 retro PC . When he gifted it to me in 2020, he also handed over the original invoice, as if the warranty was still valid. Also, who saves a twenty something year old piece of paper that becomes obsolete after two years? I’m glad that he did, otherwise I wouldn’t be able to write this. Below is the scanned version of the invoice printed out by Veldeman Office Supplies in Hasselt: According to the KBO public search , The company went bankrupt in 2013 after 28 years of faithful service, even though their head offices moved a couple of times. My father got his original 486 somewhere in Brussels, and after that, I remember we always went to Bells Computercenter in Diest, a specialized hardware store that still exists today! When the first Voodoo cards dropped, Bells is the place we ran to. It was that kind of place with the cool looking Creative sound card big boxes in the front windows to attract attention. It seems like a strange choice to buy a PC at Veldeman , a store that mostly sells general office supplies. The invoice details the exact purchase: amount of the following: I received the computer with of RAM installed, not , but perhaps my father-in-law upgraded it later in the nineties. See my Reviving a 80486 post for photos: the CPU was stamped with an early version of the Microsoft Windows logo, and below it, it proudly states “MICROSOFT WINDOWS COMPATIBLE”. That must have been the main reason for the purchase, as my father-in-law mainly used it in conjunction with Windows 3.x spreadsheet tooling for keeping track of expenses and general calculations as part of his job as an mechanical engineer. Buying a new PC in 1994—on the 16th of May, to be more precise—turned out to be a very risky business. In the nineties, technology moved at a dizzying speed. Windows 95 was just about the corner, Intel’s Pentium became more and more affordable, the AT system got replaced by ATX, the motherboard layout changed, AGP got introduced pushing VLB into obscurity, … In less than a year, the above purchase would become obsolete. That’s quite painful for such a hefty price. The invoice totalled to an amount of 1 or . Taking inflation into account , that amounts to in 2025, which is more expensive than the most beefed out 15" MacBook Air you can get right now boasting the M4 CPU technology with 10 cores, 24 GB of RAM, and 512 GB SSD storage. That MacBook will stay relevant for more than six years—my last one managed to keep it together for eight, and the one I’m typing this on is almost six years old. The 486DX Mini Tower sold by Veldeman lasted less than a year. To be fair, it wasn’t exactly the most performant machine you could get your hands on in 1994. It didn’t even properly run 1993’s DOOM : you’ll need more raw CPU power (and preferably more RAM) to push beyond ten to fifteen frames per second. But if that PC already was more than in current EURs, you can imagine that a true high-end machine was only reserved for the wealthy. According to DOS Days UK , in 1994, a mid-range PC typically came with a DX2-66 with more RAM, so technically speaking, this invoice here is for a low-end PC… As a result, my father-in-law faithfully clung on to Windows 3.1(1) while others moved on to Windows 95. My wife recalls they didn’t buy a new one (or upgraded the existing one besides the RAM slots) in quite a few years, while my father bought a new machine early 1996 that was capable of rendering Quake . Keen observers will notice that the Veldeman PC Mini Tower did not come with a sound card. Popular Creative Sound Blaster cards were sold in big bright boxes for more than without adjusting for inflation: needless to say, the good ones were crazy expensive. Nowadays, people don’t even care any more, and the built-in sound chip that comes with the motherboard is usually good enough. It’s remarkably difficult to get hold of historical price data on 1994 PC hardware. The Computer Paper Vol. 7 No. 7 , an archive from , contains an interesting “Grand Opening” advertisement from 3A COMPUTER WAREHOUSE in Markham, Ontario, Canada, listing similar hardware: An excerpt from computer hardware ads. Copyright The Computer Paper magazine publisher. A “basic” OEM Sound Blaster would have set you back for —that’s in 2025 or . Note that only the PCS 486DX Multimedia CD on the bottom left comes with what seems to be a generic “sound card”. IBM PCs simply didn’t come equipped with decent sound capabilities: many of us Apogee game fans have the iconic speaker sounds permanently burned into our brains. The IBM PC advertised at the top left most closely matches the hardware from my invoice and came at — in 2025 or . That’s quite a bit less but hardware was/is more expensive in Europe but I’m probably comparing apples with oranges here. Besides, the Canadian ad didn’t state it comes with a free mouse mat! Other magazines closer to home are MSX Computer Magazine (no ads containing prices), Computer! Totaal (vol. 3 is from 1994 but I can’t find a scanned version), and the one I remember my grandfather buying, PC-Active . Unfortunately, my parents threw out all copies after cleaning up their elderly house years ago. I’ll try to be on the lookout for copies or might pay the Dutch Home Computer Museum a visit that also collects old computer magazines. Luckily, my Dutch retro blogging liaison Diederick de Vries managed to procure the following scan of PC-Active issue 49 from May 1993 containing ads of 486 PCs: AMBRA PERSONAL COMPUTERS: gun je verstand de vrijheid (give your mind freedom). Copyright the PC-Active magazine publisher. The mid-range PC advertised is a 486 SX (25 Mhz, 100 Mb disk space, 4 Mb RAM) for , while the high-end one decked out with a 486 DX2 (66 Mhz, 200 Mb disk space, 4 Mb RAM) was for sale for the staggering amount of . That’s in today’s money—wowza. Can you imagine spending that much on a computer? Of course, in 1993, the DX2 was brand new and within a year it became much more affordable. And in another year it was rendered irrelevant by the Pentium… In a way, I consider myself lucky to have grown up in that golden age of molten silicon. Hopefully today’s Ryzen CPUs will be remembered as fondly by my kids as I remember the 486 and early Pentium/Celeron/Athlon era. I highly doubt it. In case you hadn’t noticed, we sensible Belgians use as the thousand separator and as a, well, comma?  ↩︎ Related topics: / am486 / Hasselt / By Wouter Groeneveld on 6 November 2025.  Reply via email . In case you hadn’t noticed, we sensible Belgians use as the thousand separator and as a, well, comma?  ↩︎

0 views
Jeff Geerling 3 days ago

It's not that hard to stop a Trane

Six years ago, I replaced the old HVAC system that came with our house, a central forced air system installed in 1995 1 . The new system is a Trane XR AC paired with an S9V2 96% efficiency forced-air gas furnace . And it ran great! Better efficiency, quieter, multiple fan speeds so I can circulate air and prevent stale air in some parts of the house... what's not to love? Well, apparently the engineering:

0 views

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance

High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance Maximilian Kuschewski, Jana Giceva, Thomas Neumann, and Viktor Leis SIGMOD'25 In database vernacular, spilling is the process of writing intermediate data to disk in order to evaluate a query with a finite amount of main memory. It goes without saying that database folks are control freaks performance conscious and don’t want to rely on generic OS paging mechanisms to handle working sets which are larger than main memory. This tidbit about Snowflake is fascinating: Only 5% of analytical queries in Snowflake’s 2018 workload trace spill data, but those 5% contribute 45% of the overall CPU time and 29% of the total execution time [77]. One fundamental problem in this area is that it is very hard to predict up-front exactly how much working memory will be needed to efficiently execute a query. Databases have to estimate based on statistics of the relations used in a query, or assume no spilling will be needed, and then gracefully fall back to spilling if necessary. The two operators which this paper deals with are joins and aggregations. Both involve key columns, and the cardinality of the key columns is critical in determining if spilling is necessary. One obvious spilling mechanism is to use a partitioning approach for joins and aggregations. I’ve described partitioned joins in summaries of these papers: Efficiently Processing Joins and Grouped Aggregations on GPUs The reason why partitioning nicely solves the problem is that the working set requirements for both steps of a partitioned join are modest. The partitioning step only requires a small amount of memory per partition (e.g., accumulate 64KiB per partition before appending partitioned tuples to on-disk storage). The join step only needs enough working memory to join a single partition. Section 4.1 of the paper claims that partitioning slows down TPC-H queries by 2-5x. My spider sense is tingling, but let’s take this as an axiom for now. Here is the premise of the paper: partitioning is better for queries that must spill, but worse for queries that can be completed without spilling . What is an efficient and simple design given that a database cannot perfectly predict up-front if it will need to spill? Prior work along similar lines introduced the hybrid hash join. A hybrid hash join partitions the left (build) input of the join and dynamically decides what percentage of build partitions must be spilled. A hash table is built containing all non-spilled build partitions. Next the right (probe) input to the join is processed. For each probe tuple, the database determines if the associated partition was spilled. If the partition was spilled, then the probe tuple is spilled. If not, then the probe tuple is processed immediately via a lookup in the hash table. Finally, all spilled partitions are processed. The downside of this approach is that it always partitions the build side, even when that is unnecessary. This paper proposes a join implementation that only pays the cost of partitioning when spilling is required. It is a two-phase process. In the materialization phase, the build table is scanned (and pushed-down filters are applied). The resulting tuples are stored in a list of pages. At first, the system optimistically assumes that no spilling is necessary and appends each tuple to the current page. If a memory limit is reached, then partitioning is enabled. Each tuple processed after that point is partitioned, and per-partition lists of pages are allocated. If a further memory limit is reached, then some partitions are spilled to disk. Next, the build and probe phase executes in a manner similar to hybrid hash join. However, there is a fast path for the case where no spilling occurred. In this case, the tuples produced by the materialization phase are inserted into one large hash table, and then the probe tuples are streamed, with one hash table lookup per tuple. If partitioning (but no spilling) occurred, then the hash table inserts will have high locality (assuming a chained hash table). If spilling did occur, then the build and probe phase operates like a hybrid hash join, spilling probe tuples if and only if the associated build partition was spilled. The paper isn’t clear on what happens to non-partitioned build tuples once the system decides to start spilling build partitions. My assumption is that in this case, probe tuples must probe both the spilled build tuples and these build tuples that were processed before partitioning was enabled. The implementation of aggregation described in this paper follows a similar 2-phase approach. The materialization phase performs pre-aggregation, the system starts by aggregating into an in-memory hash table. If the hash table grows too large, then partitioning kicks in. Tuples from in-memory hash tables are evicted into per-partition page lists, and subsequent tuples are directly stored in these per-partition page lists. These per-partition pages can be spilled if further memory limits are reached. The subsequent phase then performs any necessary aggregation. If no eviction occurred, then this phase has no work to do. The paper describes an adaptive compression algorithm that improves spilling performance. Some interesting numbers from the paper: The number of CPU cycles per input byte for executing an in-memory TPC-H queries ranges from 3.3 to 60.3 The number of CPU cycles required to write and read a byte to/from SSD is 11.1 The adaptive nature of this scheme is driven by the fact that queries are diverse, and compressors have knobs which trade off speed for compression ratio, as illustrated by Fig. 3: Source: https://dl.acm.org/doi/10.1145/3698813 When spilling occurs, the system dynamically balances CPU usage and IO bandwidth by adjusting these knobs. Fig. 5 shows in-memory throughput while Fig. 6 shows throughput when spilling occurs: Source: https://dl.acm.org/doi/10.1145/3698813 Dangling Pointers The design of the unified hash join hinges on the fact that partitioning is a bad idea for the in-memory case. This is in contrast to papers describing in-memory partitioning join implementation on other types of chips like SPID-Join and Efficiently Processing Joins and Grouped Aggregations on GPUs . I imagine there is a lot of literature about this topic that I haven’t read. Leave a comment with your experience. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts. The number of CPU cycles per input byte for executing an in-memory TPC-H queries ranges from 3.3 to 60.3 The number of CPU cycles required to write and read a byte to/from SSD is 11.1

0 views
Chris Coyier 4 days ago

ToiletTree Fogless Shower Mirror

I know this is a weird product recommendation, but I’ve just thought about it too long and it needs to come out. I’ve used the ToiletTree Fogless Shower Mirror for like 15 years at least. See you’ve got this problem with shower mirrors where they like instantly fog up with the steam. This mirror solves the problem with science. It’s got a narrow cavity behind the mirror you fill with hot water, and the hotness makes the mirror not fog up at all. It’s either that or magic that takes hot water as magic fuel. The design of it makes it easy to slide off the mirror and fill up the cavity. But then maybe it’s already fogged up or is covered in little water dots. So it comes with a squeegee to clean it off after you’ve filled the cavity. The squeegee is a weird touch, but it never seems to get stiff or turn to crap, so I have some affinity for the little guy. The installation is also great. The original design I linked to above comes with an adhesive gel that you squirt onto the mirror and stick it into place. Sticks to anything. It’s not so strong it’s hard to get off or damages the surface, you just pull decently hard and it pops right off (like when you move or whatever). It leaves no residue. I’m all hyped on this damn mirror again because I’ve just needed a new one for a new shower and noticed they have a newer nicer model. Apparently it’s the Deluxe Fogless Shower Mirror . I liked the old model so much I barely considered how it could be better, but this one is in all ways! I just love that they cared enough to take a great product already and make it a ton better. It’s 10 bucks more than the original model, but the still sell the original so it’s a choice not a squeeze. The mirror is a bit bigger and portrait shaped. About the size of, ya know, your face. The adhesive gel is now and adhesive strip, which is just a little easier to work with. The cavity to fill is thinner, so it fills up faster, despite the mirror being bigger. The mirror adjusts on a ball joint making it easy to move to any angle. It has a hanger for multiple razors, instead of just a shelf for one.

0 views
The Tymscar Blog 1 weeks ago

From 400 Mbps to 1.7 Gbps: A WiFi 7 Debugging Journey

I recently upgraded from a UniFi Dream Machine to a UniFi Dream Router 7 because I’m getting 2.5 Gbps internet in two weeks and figured I’d jump on the WiFi 7 bandwagon while I’m at it. My iPhone 17 Pro Max supports it, so why not? After setting everything up, I was getting nowhere near the speeds I expected. Time for some debugging. My wired connection was pulling 950 Mbps through a 1 Gbps switch, and iperf3 directly to the UDR7’s 2.5 Gbps port showed around 2.3 Gbps. The backbone was solid. But on WiFi 7 (6 GHz, 160 MHz width), standing literally a foot from the router, I was getting around 400 Mbps with iperf3. With 10 concurrent streams it went up to 650 Mbps, but that’s still pathetic.

0 views
Jeff Geerling 1 weeks ago

The Arduino Uno Q is a weird hybrid SBC

The Arduino Uno Q is... a weird board. It's the first product born out of Qualcomm's buyout of Arduino . It's like if you married an Intel CPU, and a Raspberry Pi RP2040 microcontroller—oh wait, Radxa's X4 did that . Arduino even tried it before with their old Yún board, which had Linux running on a MIPS CPU, married to an ATmega microcontroller.

0 views
Stone Tools 1 weeks ago

CAD-3D on the Atari ST

There are wizards among us who can bend hardware like a Uri Geller spoon to perform tricks thought impossible. Bill Budge springs to mind, with Steve Wozniak calling Pinball Construction Set the "greatest program ever written for an 8-bit machine." A pinball physics simulation, table builder, paint program, software distribution system, and more, driven by one of the first point-and-click GUIs, all in 48K on an Apple 2. Likewise, Bill Atkinson seemed able to produce literal magic on the Macintosh's 68000 processor. Even when he felt a task were impossible, when pushed he'd regroup, rethink, and come back with an elegant solution . QuickDraw and HyperCard are legendary, not just in what they could do, but in how they did it. Meanwhile, over on the Atari ST, Tom Hudson was producing a steady string of minor miracles. With CAD-3D , he both pushed the machine beyond what many thought possible, while also creating something that had its users drooling at the prospect of advanced systems yet to come. For the most part, the ST crowd had to wait essentially forever for machines that were up to the mathematical task. Hudson, frustrated with Atari's broken promises, and anxious to continue pushing the limits of 3D modeling and rendering, defected to DOS. Atari would die, but Hudson's work would live and grow. You know it today as 3ds Max . Let's see how it started. This quite literally marks my first time using an Atari ST GEM environment and software. I don't anticipate any serious transition pains coming from an Amiga/Mac background. The desktop has a cursor and a trashcan; I should be fine. My first 3D software experience was generating magazine illustrations in Infini-D on the Macintosh around 1996. Since then, form-Z , Poser, Bryce , Strata Studio Pro , and Cinema 4D came and went; these days its just Blender . It's fair to say I have a healthier-than-average amount of experience going into CAD-3D . I found two tutorials worth looking into. The first is in the manual; always a good starting point. The second is a mini-tutorial which ran in Atari ST Review , issue 8, December 1992. The cover disk, a literal 3.5" floppy disk glued to the cover, included CAD-3D 1.0 and 2.0 on it. As far as I can tell, it was the full featured software, not stripped-down "demos." A special offer in the magazine gave readers a discount on ordering the manual for £34 (about $50 US, $115 adjusted for 2025). Those suckers could have saved a bunch of money if they'd waited 30 years like I did. The Atari ST emulator STEEM comes bundled with a free operating system ROM replacement called EmuTOS. At first, it seems like a nice alternative to GEM/TOS until I try to run CAD-3D . I have obtained a TOS which works. I am accepting NO follow-up questions. Upon boot, I get a desktop icon of a filing cabinet (?) for floppy disk B (??) even though I don't have a disk in that drive (???). Do I want a "blitter?" CAD-3D wants "medium resolution" (640x200), which forces GEM to draw icons and cursors at half-width, giving the interface an elongated look. Click-and-drag operations in GEM need a half-second pause to "grab" a tool, lest the cursor slide off the intended drag target. Coming from the Mac and Amiga, I get distinct parallel universe vibes. The aspect of GEM driving me most crazy is the menu bar. It is summoned by mere cursor proximity, no click required, but requires a click outside the menu to dismiss. Inadvertent menus which cover the tool I want to interact with require an extra dismissal step so I can continue with the interface element it obscured. It's maddening. Launching the app is relatively quick and I can kick the emulator into overdrive, which exhibits rreeppeeaattiinngg kkeeyyss. But this will be necessary from time to time, as it would for any older system trying to do the math this program demands. Once launched, I'm hit with a pretty intense set of tools and windows. The left 1/3 of the screen is a fixed set of iconography for the tools. These were hidden away under menus in v1.0, but now they're front and left-of-center. The four-way split view on the right 2/3 of the screen is par for the modeling course. What's different here is the Camera view is "look, but don't touch." I spent a lot of time trying to move things around in there before I remembered "RTFM". There was a moment early on with Electric Pencil when I felt attuned to its way of thinking, summoning menu commands without reading the manual. Deluxe Paint was the same, the tools doing intuitively what I expected. I really enjoy such moments when an affinity between the interface metaphor and my exploration is rewarded. CAD-3D is resisting this, preferring to remain inscrutable for now. Screenshots in the manual contain a lot more fine detail than I see while using the program. In the GEM desktop preferences there was a greyed out "high resolution" option, unlocked by setting the emulator itself to a beefier 4MB MegaSTE. This hardware upgrade brings the display up to Macintosh-style high quality B&W, which feels very nice to use, but for this I want color so back to medium-res for me. While the top/right/front views function similarly to modern modelers, these have one notable difference. They are not "cameras" into those views, they are showing you the full, literal totality of a cube which contains your entire 3D world. Need more space? Make your objects very small. Objects are too small to see? Well, that's just how we do things here in 1986. It feels claustrophobic, but is also a simple mental model for comprehending "the universe" of your scene. What I'm quickly learning is how the program is configured to conserve processing time and memory at all times. Changing a value, say camera zoom, doesn't change anything until you "apply" it. Shades of VisiCalc 's "recalculate" function. The low memory inherent to the hardware is subverted by a "split the advanced functions out into separate products" approach. Complex extrusions are relegated to Cyber Sculpt . Model texturing is available in Cyber Texture . Advanced compositing can be done in Cyber Paint . Memory budget is accessed by the big button just left of the center of the screen. This will show how close you are to maximizing the vertex/face count of the scene. Again, shades of VisiCalc's free-memory counter ticking down to zero as you work. Adjusting objects, by scale or position, requires selecting them. Despite the mouse and pointer GUI interface, there is no direct manipulation of objects to be found here. If you enjoy grabbing handles on objects and dragging them into scale and position, you're going to have some hard habits to break. Object selection is convoluted; the basic tenant is "visible = selected." The "Objects" modal window provides a button for every object. Each button toggles its associated object's visibility after you click "OK." Selections here will modify the currently active group, designated by the tools mid-screen. Scaling sliders (left side of screen) affect every visible object along the active view's axes. So horizontal/vertical in "Top" view scales different axes than "Front" view. Per-object scaling is possible, but only if you deselect all but one object. Switch from "Scale" to "Rotate" and the scaling sliders switch their function accordingly. Selections can be rotated around one of three pivot points: view center, selection center, or an arbitrary point within a given view. Sliders control horizontal and vertical rotation, but the third axis can also be rotated upon, though for some reason it was decided this should be via a pop-up window pie chart. The ST in medium resolution can display up to 16 colors. Applying a color to an object is setting the "brightest" color for rendering that object. Any other colors within that color's (user-definable) group will fill in the mid and darker tones. Simple palette gradients make for "realistic" lighting, but bright orange highlights with fluorescent green shading is also possible, if you want to fight the power. In the toolbox, at the bottom, is an unassuming button labelled . This is a boolean tool, presented as a kind of equation builder called "Object Join Control." Choose an action: Add, Subtract, And, or Stamp. Select the first object then the second object, and name the resulting object. "Add" will produce something like "Subtract" will read and so on. With this tool we have what we need to sculpt complex, asymmetric figures. To paraphrase Michelangelo, the shape we want is already inside the primitive volume. We just have to remove the superfluous parts and set it free. If only I were so talented. Like sculptors of yore, setting shapes free can take a lot of time. Click "OK" on a join operation and prepare to wait, depending on the number of faces involved. I put down a default sphere "3" and a default torus, shrank the torus a bit to intersect with the sphere's circumference, and requested . Now I've done it. Even at top emulator speed in overdrive I've been waiting well over 20 minutes for the subtraction to complete. Did I kill it? Is this like in The Prisoner when #6 made the computer self-destruct? Despite the computation time, there are very good reasons for performing booleans beyond the sculpting power they impart. Earlier I noted that there is a vertex/face count limit on the scene. Intersecting objects added together can reduce their vertex count by eliminating interior and unused faces. It also turns out there is a 40 object limit to our scenes. Adding two objects together reduces them to one, deleting the originals. Understand that there is no undo function. Only a rigorous discipline of saving before performing such destructive functions will save you from yourself if you needed the original objects. This will become the mantra for this blog, "Save often, kids." Boolean functions respect object colors, which makes for neat effects. Subtract a yellow sphere from a purple cube and the scoop taken out of the cube will be yellow while the rest of the cube stays purple. A pleasant surprise! The "Stamp" option tattoos the target surface with the shape and color of the second object. Stencil text onto surfaces (provided you have 3D text objects), add decorative filigree, generate multi-color objects, and so on. It kind of depends on how well you can master the stiff, limited extrude tools to generate surfaces worth stamping. OK, this torus/sphere boolean operation still isn't done, so I'm chalking it up as, "This is a thing CAD-3D cannot do." While waiting for the numbers to crunch, I realized I could create the intended shape manually with a custom lathe. Only while experiencing the computational friction did the second method occur to me. That reminds me of something I've been thinking about since starting this site. Working with retro-computing means choosing to accept the boundaries of a given machine or piece of software. Working within these boundaries takes real, conscious effort; nothing comes easy. Meanwhile, technology in 2025 is designed to make the journey from "I want something" to "I have something" instantaneous and frictionless. It is a monkey's paw fulfilling wishes, and like a monkey's paw, it can go wrong. Not just "turkey's a little dry" wrong, but "it obscures objective truths" wrong. The first is seen with (say it with me now) the spread of AI into everything. You want something? Prompt and get it. Our every whim or half-considered idea must be rewarded, nay PRAISED! We needn't even prompt, services will prompt on our behalf . Every search delivers what you asked for, even if it delivers lies to do so. There are plenty of others more qualified to discuss the ramifications of AI on the arts, society, and our very minds. For this article I want to use it to illustrate what much of tech has become: an unchallenging baby's toy. A pacifier. Another way "friction" is stigmatized as detrimental is, admittedly, a personal bias but I know I'm not alone . UI density is typically considered "friction" and it's "bad" because a user may disengage from a piece of software. To keep engagement up, interfaces simplify, slowly conflating "user-friendly" with "childlike." The net result is a trend toward UIs with scant few pieces of real information distributed over vast plains of pure white or oversaturated swoops of color. UI/UX professionals like to call it " playful " or " delightful ." I don't want to come off as a killjoy against "fun" user interfaces , but I'm an adult. I eat vegetables as well as candy. Where are the vegetables in the modern tech landscape? Where is the roughage which requires me to chew on its ideas? The industry wants to eliminate friction, but without friction there can be no spark . "Spark" is what I felt struggling against a hyper-strict budget during my publishing days. I found it when examining the depth of Deluxe Paint in the animation controls. It is what I felt when I overcame the Y2K bug in Superbase . I felt it again just now as I realized the lathe solution while waiting for the boolean to finish. Each little struggle forced me to shift my frame of mind, which revealed new opportunities. If my very first thought is brought to life instantly, with no artistic struggle (one hour of prompting is not a struggle), then why ever "waste time" thinking of second options? Or alternate directions? Even, heaven forbid, throwing ideas away ? These common creative pathways are discouraged in a modern computing landscape. Put another way, I can't think of a time when my first idea was my best idea. Given the protective bubble-wrap our software tends to wrap itself in, perhaps it will not surprise you that computer literacy and computational thinking scores have dropped in the US over the past five years. Some readers may be thinking, "In the Deluxe Paint article you picked on Adobe for airplane cockpit UIs. Isn't that the "friction" you're describing?" That is complexity , which can cause a type of friction, true. But it is the friction of a rug-burn, not a spark. Back into the program, I've only touched on it so far, but that "Superview" button is far more important than its obfuscated name suggests. That is the renderer. Double-click it for options, like rendering style and quality, including stereoscopic if you're lucky enough to own the Stereotek 3D glasses. Images, for example previous renders, can be brought in as a background for a new render. All drawing is restricted to the same 16-color palette though, so plan accordingly. The basic wireframe renderer is quite interesting because it provides real-time interactive view manipulation, like a video game. That makes sense because the algorithm that drives this view came from a video game. It was purchased by Hudson from Jez San, the creator of real-time 3D graphics game Starglider . Even if you never touched Starglider , you know San's work today. He was a developer of the Super FX chip for Nintendo, which made Starfox on the SNES possible. CAD-3D has one more significant trick up its sleeve. Using a simple, clever method of XOR compression, animations can be generated. Turn "ON" animation with the clapboard icon. Set up the scene and camera position for a frame. Capture a render with "Superview." Commit that frame to the sequence with the frame counter icon. Repeat until you're done. It's stop-motion animation, essentially. This is time consuming and requires some blind trust as there is no previewing your work. Luckily, a more elegant, and far more complex, option comes bundled on disk in the form of an entire movie-making scripting language . I tried to understand it, but my utter lack of ability to make movies was exposed. I wanted to at least try the on-disk animation tutorial. Unfortunately, the program which can play back the XOR compression method is nowhere to be found. No disk image I could find contained it. Regardless, the scope of what was being attempted with this product, and really the entire suite, is clear. ANTIC Software wanted your Atari ST to be nothing short of a full movie-production studio. If there were some way to calculate price per unit of coolness, an ST paired with CAD-3D may be quite high up that chart. Put simply, no. On its own CAD-3D lacks modeling and rendering tools which many would consider absolute basics in a modern workflow. Lighting control is restrictive, there are no cast shadows, no dithering to make up for the limited rendering palette, extrusion along splines isn't possible, views into the world are rigid and hard to work in, and basic object selection requires a clunky series of menus and button presses. A theoretical v3.0 could have been amazing. But I must concede a few points here. First, this is really just one part of a larger package. It's the main part, but not the only part. The bundled scripting was expanded into Cyber Control and the bundled "super extruder" was expanded into Cyber Sculpt, for example. There was a VCR controller, genlock support, stereoscopic 3D glasses support, multiple paint programs, a sound controller, and more. Certain deficiencies are more than adequately compensated for if we take the full suite into account. Second, there's something to be said about the simple aesthetic of CAD-3D . There is absolutely a subset of people out there who just want to play with 3D like a toy, not a career. I think the success of PicoCAD speaks to this; just look at the fun things people are creating in a 128x128, 16-color modeler in 2025. Third, working within limits is, paradoxically ( and also well-acknowledged ), creatively freeing. The human need to bend our tools beyond their design is a powerful force. We see it when a working Atari 2600 is built within Minecraft . We see it in the PicoCAD projects I linked to above. Full-screen, full-motion video on a TRS-80 Model III ? Hey, why not? In that sense, I can feel a pull toward CAD-3D. I managed to model a none-too-shabby CX40 joystick, and I catch myself wondering now what more I could do. I started feeling a groove with its tools, so how far could I push it? How far could I push myself ? I hope you'll understand my positive take on CAD-3D when I say, there is friction to be found here. Ways to improve the experience, notable deficiencies, workarounds, and notes about incorporating the software into modern workflows (if possible). As you may expect, enable the best version of the system you can. TOS 4.x seems to be incompatible with CAD-3D , so keep the OS side simple. In fact, it's better to crank up the virtual CPU speed than to fumble around with toggling on/off the warp mode. It's less fiddly and doesn't suffer from key repeat troubles. Steem SSE 4.2.0 R3 64bit on Windows 11 Emulating a MegaSTE 4MB RAM TOS v.???? (it doesn't report its version number?) Stereo CAD-3D v2.02 Neither the application nor OS ever crashed on me even once. There's a lot to be said for that stability. Converting the 3D data into a format that could be brought into Blender , for example, seems like a bespoke conversion tool would be needed. I've not found such a thing yet. .PI1 render files can be converted into .pngs thanks to XnConvert . I don't know of a way to convert animations, though a capture system like OBS Studio would work in a pinch. You could also render each frame out separately and stack them together into a movie file. ImageMagick's function can take a folder of sequentially numbered images and stitch them together into a movie. Rendering quality. The engine can't do dithering, and the flat colors of the limited palette can visually flatten a render that isn't very carefully lit. Getting around object limitations means locking yourself out of scene design flexibility as you "Add" multiple objects together to collapse them into single objects and reduce the object count. The object selection process really hurts.

0 views

My impressions of the MacBook Pro M4

I have been using a MacBook Pro M4 as my portable computer for the last half a year and wanted to share a few short impressions. As always, I am not a professional laptop reviewer, so in this article you won’t find benchmarks, just subjective thoughts! Back in 2021, I wrote about the MacBook Air M1 , which was the first computer I used that contained Apple’s own ARM-based CPU. Having a silent laptop with long battery life was a game-changer, so I wanted to keep those properties. When the US government announced tariffs, I figured I would replace my 4-year old MacBook Air M1 with a more recent model that should last a few more years. Ultimately, Apple’s prices remained stable, so, in retrospect, I could have stayed with the M1 for a few more years. Oh well. I went to the Apple Store to compare the different options in person. Specifically, I was curious about the display and whether the increased weight and form factor of the MacBook Pro (compared to a MacBook Air) would be acceptable. Another downside of the Pro model is that it comes with a fan, and I really like absolutely quiet computers. Online, I read from other MacBook Pro owners that the fan mostly stays off. In general, I would have preferred to go with a MacBook Air because it has enough compute power for my needs and I like the case better (no ventilation slots), but unfortunately only the MacBook Pro line has the better displays. Why aren’t all displays nano-textured? The employee at the Apple Store presented the trade-off as follows: The nano texture display is great at reducing reflections, at the expense of also making the picture slightly less vibrant. I could immediately see the difference when placing two laptops side by side: The bright Apple Store lights showed up very prominently on the normal display (left), and were almost not visible at all on the nano texture display (right): Personally, I did not perceive a big difference in “vibrancy”, so my choice was clear: I’ll pick the MacBook Pro over the MacBook Air (despite the weight) for the nano texture display! After using the laptop in a number of situations, I am very happy with this choice. In normal scenarios, I notice no reflections at all (where my previous laptop did show reflections!). This includes using the laptop on a train (next to the window), or using the laptop outside in daylight. (When I chose the new laptop, Apple’s M4 chips were current. By now, they have released the first devices with M5 chips.) I decided to go with the MacBook Pro with M4 chip instead of the M4 Pro chip because I don’t need the extra compute, and the M4 needs less cooling — the M4 Pro apparently runs hotter. This increases the chance of the fan staying off. Here are the specs I ended up with: One thing I noticed is that the MacBook Pro M4 sometimes gets warm, even when it is connected to power, but is suspended to RAM (and has been fully charged for hours). I’m not sure why. Luckily, the fan indeed stays silent. I think I might have heard it spin up once in half a year or so? The battery life is amazing! The previous MacBook Air M1 had amazing all-day battery life already, and this MacBook Pro M4 lasts even longer. For example, watching videos on a train ride (with VLC) for 3 hours consumed only 10% of battery life. I generally never even carry the charger. Because of that, Apple’s re-introduction of MagSafe, a magnetic power connector (so you don’t damage the laptop when you trip over it), is nice-to-have but doesn’t really make much of a difference anymore. In fact, it might be better to pack a USB-C cable when traveling, as that makes you more flexible in how you use the charger. I was curious whether the 120 Hz display would make a difference in practice. I mostly notice the increased refresh rate when there are animations, but not, for example, when scrolling. One surprising discovery (but obvious in retrospect) is that even non-animations can become faster. For example, when running a Go web server on , I noticed that navigating between pages by clicking links felt faster on the 120 Hz display! The following illustration shows why that is, using a page load that takes 6ms of processing time. There are three cases (the illustration shows an average case and the worst case): As you can see, the waiting time becomes shorter when going from 60 Hz (one frame every 16.6ms) to 120 Hz (one frame every 8.3ms). So if you’re working with a system that has <8ms response times, you might observe actions completing (up to) twice as fast! I don’t notice going back to 60 Hz displays on computers. However, on phones, where a lot more animations are a key part of the user experience, I think 120 Hz displays are more interesting. My ideal MacBook would probably be a MacBook Air, but with the nano-texture display! :) I still don’t like macOS and would prefer to run Linux on this laptop. But Asahi Linux still needs some work before it’s usable for me (I need external display output, and M4 support). This doesn’t bother me too much, though, as I don’t use this computer for serious work. 14" Liquid Retina XDR Display with nano texture Apple M4 Chip (10 core CPU, 10 core GPU) 32 GB RAM (this is the maximum!), 2 TB SSD (enough for this computer) Best case: Page load finishes just before the next frame is displayed: no delay. Worst case: Page load finishes just after a frame is displayed: one frame of delay. Most page loads are somewhere in between. We’ll have 0.x to 1.0 frames of delay

0 views
Chris Coyier 1 weeks ago

Microsoft™ Ergonomic Keyboard (now sold by Incase)

For my own long-term reference. My favorite keyboard is the Microsoft Ergonomic Keyboard. But Microsoft is out of the keyboard hardware game. So apparently they sold the design to Incase, who now continues to sell it at a perfectly fair price.

0 views
Simon Willison 1 weeks ago

Hacking the WiFi-enabled color screen GitHub Universe conference badge

I'm at GitHub Universe this week (thanks to a free ticket from Microsoft). Yesterday I picked up my conference badge... which incorporates a full Raspberry Pi Raspberry Pi Pico microcontroller with a battery, color screen, WiFi and bluetooth. GitHub Universe has a tradition of hackable conference badges - the badge last year had an eInk display. This year's is a huge upgrade though - a color screen and WiFI connection makes this thing a genuinely useful little computer! The only thing it's missing is a keyboard - the device instead provides five buttons total - Up, Down, A, B, C. It might be possible to get a bluetooth keyboard to work though I'll believe that when I see it - there's not a lot of space on this device for a keyboard driver. Everything is written using MicroPython, and the device is designed to be hackable: connect it to a laptop with a USB-C cable and you can start modifying the code directly on the device. Out of the box the badge will play an opening animation (implemented as a sequence of PNG image frames) and then show a home screen with six app icons. The default apps are mostly neat Octocat-themed demos: a flappy-bird clone, a tamagotchi-style pet, a drawing app that works like an etch-a-sketch, an IR scavenger hunt for the conference venue itself (this thing has an IR sensor too!), and a gallery app showing some images. The sixth app is a badge app. This will show your GitHub profile image and some basic stats, but will only work if you dig out a USB-C cable and make some edits to the files on the badge directly. I did this on a Mac. I plugged a USB-C cable into the badge which caused MacOS to treat it as an attached drive volume. In that drive are several files including . Open that up, confirm the WiFi details are correct and add your GitHub username. The file should look like this: The badge comes with the SSID and password for the GitHub Universe WiFi network pre-populated. That's it! Unmount the disk, hit the reboot button on the back of the badge and when it comes back up again the badge app should look something like this: Here's the official documentation for building software for the badge. When I got mine yesterday the official repo had not yet been updated, so I had to figure this out myself. I copied all of the code across to my laptop, added it to a Git repo and then fired up Claude Code and told it: Here's the result , which was really useful for getting a start on understanding how it all worked. Each of the six default apps lives in a folder, for example apps/sketch/ for the sketching app. There's also a menu app which powers the home screen. That lives in apps/menu/ . You can edit code in here to add new apps that you create to that screen. I told Claude: This was a bit of a long-shot, but it totally worked! The first version had an error: I OCRd that photo (with the Apple Photos app) and pasted the message into Claude Code and it fixed the problem. This almost worked... but the addition of a seventh icon to the 2x3 grid meant that you could select the icon but it didn't scroll into view. I had Claude fix that for me too . Here's the code for apps/debug/__init__.py , and the full Claude Code transcript created using my terminal-to-HTML app described here . Here are the four screens of the debug app: The icons used on the app are 24x24 pixels. I decided it would be neat to have a web app that helps build those icons, including the ability to start by creating an icon from an emoji. I bulit this one using Claude Artifacts . Here's the result, now available at tools.simonwillison.net/icon-editor : I noticed that last year's badge configuration app (which I can't find in github.com/badger/badger.github.io any more, I think they reset the history on that repo?) worked by talking to MicroPython over the Web Serial API from Chrome. Here's my archived copy of that code . Wouldn't it be useful to have a REPL in a web UI that you could use to interact with the badge directly over USB? I pointed Claude Code at a copy of that repo and told it: It took a bit of poking (here's the transcript ) but the result is now live at tools.simonwillison.net/badge-repl . It only works in Chrome - you'll need to plug the badge in with a USB-C cable and then click "Connect to Badge". If you're a GitHub Universe attendee I hope this is useful. The official badger.github.io site has plenty more details to help you get started. There isn't yet a way to get hold of this hardware outside of GitHub Universe - I know they had some supply chain challenges just getting enough badges for the conference attendees! It's a very neat device, built for GitHub by Pimoroni in Sheffield, UK. A version of this should become generally available in the future under the name "Pimoroni Tufty 2350". You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

1 views
Ash's Blog 1 weeks ago

Scaling Elections with GPUs and Mojo 🔥

Last summer, me, Chris Lattner, and a bunch of other people across the industry gathered together for a GPU-programming hackathon at the AGI House in San Francisco. After one too many LLM optimizations, I decided to accelerate something nobody asked for! Most elections use simple plurality voting — whoever gets the most votes wins. But there are “fairer” methods that consider ranked preferences, like the Schulze method used by Wikimedia Foundation, Debian, and pirate parties worldwide. The catch? It scales $O(n³)$ with the number of candidates.

0 views
Ruslan Osipov 1 weeks ago

The Yamaha moment

There’s this old joke: I just had my own Yamaha moment. I was looking for a good pepper grinder, and I just found that one of the best pepper grinders on the market is made by… Peugeot. Yup, apparently the car company produced great pepper grinders, bicycles, and cars, in that order. Live and learn. And yeah, the pepper mill is sturdy, feels and looks great, and the grinding mechanism comes with a lifetime warranty. Me: I’d like to buy a piano. Yamaha: We got you! Me: I’m also looking for a motorcycle, where could I get one? Yamaha: You’re not gonna believe this…

0 views
neilzone 2 weeks ago

Is now the best time ever for Linux laptops?

As I’ve said, ad nauseum probably, I like my secondhand ThinkPads. But I’m not immune to the charms of other machines and, as far as I can tell, now is an amazing time for Linux laptops. By which I mean, companies selling laptops with Linux pre-installed or no OS preinstalled, or aimed at Linux users. Yes, it’s a bit subjective. There seems to be quite a range of machines, at quite a range of prices, with quite a range of Linux and other non-Windows/macOS operating systems available. This isn’t meant to be a comprehensive list, but just some thoughts on a few of them that have crossed my timeline recently. All have points that I really like but, right now at least, if my current ThinkPad died, I’d probably just buy another eBay ThinkPad… Update 2025-10-25: This is a list, not recommendations, but personally I won’t be buying a Framework machine: “Framework flame war erupts over support of politically polarizing Linux projects” I love the idea of the Framework laptops , which a user can repair and upgrade with ease. Moving away from “disposable” IT, into well-built systems which can be updated in line with user needs, and readily repaired, is fantastic. Plus, they have physical switches to disconnect microphone and camera, which I like. I’ve seen more people posting about Framework machines than I have about pretty much all of the others here put together, so my guess is that these are some of the more popular Linux-first machines at the moment. I know a few people who have, or had, one of these. Most seem quite happy. One… not so much. But the fact that multiple people I know have them means, perhaps, sooner rather than later, I’ll get my hands on one temporarily, to see what it is like. I only heard about Malibal while seeing if there was anything obvious that I’d missed from this post. Their machines appear to start at $4197, based on what they displayed when I clicked on the link to Linux machines, which felt noteworthy. And some of the stuff on their website seems surprising. Update 2025-10-25: The link about their reasons for not shipping to Colorado no longer works, nor is it available via archive.org (“This URL has been excluded from the Wayback Machine.”). Again, this is a list, not recommendations, but this thread on Reddit does not make for good reading. I’m slipping this in because I have soft spot for Leah’s Minifree range of machines even though, strictly, they are not “Linux-first” laptops, but rather Libreboot machines, which can come with a Linux installation. I massively admire what Leah is doing here, both in terms of funding their software development work, and also helping reduce electronic waste through revitalising used equipment. Of all the machines and companies in this blog post, Minifree’s are, I think, the ones which tempt me the most. I think the MNT Pocket Reform is a beautiful device, in a sort-of-quirky kind of way. In my head, these are hand-crafted, artisan laptops. Could I see myself using it every day? Honestly, no. The keyboard would concern me, and I am not sure I see the attraction of a trackball. (I’d happily try one though!) But I love the idea of a 7" laptop, and this, for me, is one of its key selling points. If I saw one in person, could I be tempted? Perhaps… The Pinebook Pro is a cheap ARM laptop. I had one of these, and it has gone to someone who could make better use of it than I could. Even its low price - I paid about £150 for it, I think, because it was sold as “broken” (which it was not) - could not really make up for the fact that I found it underpowered for my needs. This is probably a “me” thing, and perhaps my expectations were simply misaligned. The Pine64 store certainly hints in this direction: Please do not order the Pinebook Pro if you’re seeking a substitute for your X86 laptop, or are just curious Purism makes a laptop, a tablet, and a mini desktop PC . I love their hardware kill switches for camera and microphone. A camera cover is all well and good, but I’d really like to have a way of physically disconnecting the microphone on my machines. Again, I don’t think I know anyone who has one. Were it not for a friend of mine, I wouldn’t even be aware of Slimbook. Matija, who wrote up his experiences setting up a Slimbook Pro X 14 , is the only person I’ve seem mention them. But there they are, with a range of Linux-centric laptops , at a range of prices. I could be tempted by a Linux-first tablet, and StarLabs’ StarLite looks much the best of the bunch… But, at £540 + VAT, or thereabouts, with a keyboard, it is far from cheap for something that I don’t think would replace my actual laptop. I’m aware of System 76 , but I’m not sure I know anyone who has one of their machines. As with System 76, I’m aware of Tuxedo , which certainly appears to have an impressive range of machines. But I don’t think I’ve heard or seen of anyone using one.

0 views
Jeff Geerling 2 weeks ago

Why do some radio towers blink?

One day on my drive home, I saw three towers. One of them had a bunch of blinking white lights, another one had red lights that kind of faded in and out, and the third one, well, it wasn't doing anything. I'm lucky to have a radio engineer for a dad, so Dad: why do some towers blink? Joe: Well, blinking I would call like the way you described it, "flashing", "white light", or "strobe". All these lights are to aid pilots and air traffic. helicopters, fighter planes, regular jets. So that's the purpose of it. Jeff: Well that one tower that I saw had red lights that faded in and out, but I even think there's a freestanding tower just north of here that has red and white on top.

1 views

Falcon: A Reliable, Low Latency Hardware Transport

Falcon: A Reliable, Low Latency Hardware Transport Arjun Singhvi, Nandita Dukkipati, Prashant Chandra, Hassan M. G. Wassel, Naveen Kr. Sharma, Anthony Rebello, Henry Schuh, Praveen Kumar, Behnam Montazeri, Neelesh Bansod, Sarin Thomas, Inho Cho, Hyojeong Lee Seibert, Baijun Wu, Rui Yang, Yuliang Li, Kai Huang, Qianwen Yin, Abhishek Agarwal, Srinivas Vaduvatha, Weihuang Wang, Masoud Moshref, Tao Ji, David Wetherall, and Amin Vahdat SIGCOMM'25 Falcon is an IP block which can be integrated into a 3rd-party NIC. Fig. 7 shows an example integration of Falcon into a NIC. Blue components are part of Falcon: Source: https://dl.acm.org/doi/abs/10.1145/3718958.3754353 Multiple Upper Layer Protocols (ULPs, e.g., NVMe and RDMA ) are implemented on top of Falcon. Other protocols (e.g., Ethernet) can bypass Falcon and go straight to the standard NIC hardware. Falcon provides reliability and ordering via a connection-oriented interface to the ULPs. Multipathing is the ability for a single connection to use multiple network paths from the sender to the receiver. This improves throughput by allowing use of aggregate bandwidth and allows Falcon to quickly react to transient congestion on a subset of paths. The paper uses the term flow for a single path from sender to receiver. A single connection is associated with many flows. There are two parts to implementing multipathing, one easy and one not-so-easy. The easy task is to use the IPv6 Flow Label field. When the sending NIC chooses a flow for a particular packet, it sets the index of the flow in the flow label field. When a switch determines that there are multiple valid output ports for a packet, it hashes various fields from the packet (including the flow label) to determine which port to use. The switches are doing the hard work here. A Falcon NIC doesn’t need to maintain a local view of the network topology between the sender and receiver, nor does it have to pre-plan the exact set of switches a packet will traverse. The NIC simply sets the flow label field. The hard part is handling out-of-order packets. If the sending NIC is interleaving between flows at a fine granularity, then the receiving NIC will commonly receive packets out of order. Falcon burns 1-2 mm 2 of silicon on a packet buffer which holds received packets until they can be delivered to a ULP in order. ACK packets contain a packet sequence number and a 128-bit wide bitmap which represent a window of 128 recent packets that have been received. The sender uses these bitmaps to determine when to retransmit. The NIC maintains an estimate of the round-trip latency on each flow. If the most recent bitmap indicates that a packet has not been received, and a period of time longer than the round-trip latency has elapsed, then the packet is retransmitted. Falcon attempts to be a good citizen and minimize bufferbloat by estimating per-flow round-trip latency. These estimates are gathered via hardware near the edge of the NIC which records timestamps as packets (including ACKs) are sent and received. When Falcon is processing a packet to be sent for a given connection, it computes the open window associated with each flow. The open window is the difference between the round-trip latency and the number of unacknowledged packets. The flow with the largest open window is selected. You can think of the open window like a per-flow credit scheme, where the total credits available is determined from round-trip latency, sending a packet consumes a credit, and receiving an ACK produces credits. The trick here is that the round-trip latency associated with each flow is constantly changing. Section 5.2 of the paper describes three details which the authors felt were worth mentioning. The unspoken assumption is that these are non-standard design choices: As mentioned before, Falcon dedicates a non-trivial amount of on-chip resources to SRAM buffers which hold received packets before they are reassembled into the correct order. The paper says 1.2MB is required for 200Gbps, and the buffer size grows linearly with throughput. One interesting fact is that the buffer size is independent of latency, because throughput decreases with latency. For example, the paper mentions the same size works well with “inter-metro use-cases” which have 5-10x higher latency, but also 5-10x lower bandwidth. Falcon has an on-chip cache to hold mutable connection state, but the paper says that it is very common to have a high miss rate in this cache. The solution to this is to provision enough bandwidth to be able to have good performance when most accesses to connection state must go off chip. Reading between the lines, it seems like there are two scenarios which are important. The first has a small number of connections, with each connection experiencing a high packet rate. The second is a large number of connections, each with a low packet rate. Falcon has hardware support for somewhat rare events (errors, timeouts) rather than letting software on the host handle this. Fig. 10 compares Falcon against RoCE for various RDMA verbs and drop rates. Note that the drop rate maxes out at 1%. Source: https://dl.acm.org/doi/abs/10.1145/3718958.3754353 Dangling Pointers Falcon contains a lot of great optimizations. I wonder how many of them are local optimizations, and how much more performance is on the table if global optimization is allowed. In particular, Falcon works with standard ULPs (RDMA, NVMe) and standard Ethernet switches. At some scale, maybe extending the scope of allowable optimizations to those components would make sense? Thanks for reading Dangling Pointers! Subscribe for free to receive new posts. Source: https://dl.acm.org/doi/abs/10.1145/3718958.3754353 Multiple Upper Layer Protocols (ULPs, e.g., NVMe and RDMA ) are implemented on top of Falcon. Other protocols (e.g., Ethernet) can bypass Falcon and go straight to the standard NIC hardware. Falcon provides reliability and ordering via a connection-oriented interface to the ULPs. Multipathing Multipathing is the ability for a single connection to use multiple network paths from the sender to the receiver. This improves throughput by allowing use of aggregate bandwidth and allows Falcon to quickly react to transient congestion on a subset of paths. The paper uses the term flow for a single path from sender to receiver. A single connection is associated with many flows. There are two parts to implementing multipathing, one easy and one not-so-easy. The easy task is to use the IPv6 Flow Label field. When the sending NIC chooses a flow for a particular packet, it sets the index of the flow in the flow label field. When a switch determines that there are multiple valid output ports for a packet, it hashes various fields from the packet (including the flow label) to determine which port to use. The switches are doing the hard work here. A Falcon NIC doesn’t need to maintain a local view of the network topology between the sender and receiver, nor does it have to pre-plan the exact set of switches a packet will traverse. The NIC simply sets the flow label field. The hard part is handling out-of-order packets. If the sending NIC is interleaving between flows at a fine granularity, then the receiving NIC will commonly receive packets out of order. Falcon burns 1-2 mm 2 of silicon on a packet buffer which holds received packets until they can be delivered to a ULP in order. ACK packets contain a packet sequence number and a 128-bit wide bitmap which represent a window of 128 recent packets that have been received. The sender uses these bitmaps to determine when to retransmit. The NIC maintains an estimate of the round-trip latency on each flow. If the most recent bitmap indicates that a packet has not been received, and a period of time longer than the round-trip latency has elapsed, then the packet is retransmitted. Congestion Control Falcon attempts to be a good citizen and minimize bufferbloat by estimating per-flow round-trip latency. These estimates are gathered via hardware near the edge of the NIC which records timestamps as packets (including ACKs) are sent and received. When Falcon is processing a packet to be sent for a given connection, it computes the open window associated with each flow. The open window is the difference between the round-trip latency and the number of unacknowledged packets. The flow with the largest open window is selected. You can think of the open window like a per-flow credit scheme, where the total credits available is determined from round-trip latency, sending a packet consumes a credit, and receiving an ACK produces credits. The trick here is that the round-trip latency associated with each flow is constantly changing. Notable Hardware Details Section 5.2 of the paper describes three details which the authors felt were worth mentioning. The unspoken assumption is that these are non-standard design choices: As mentioned before, Falcon dedicates a non-trivial amount of on-chip resources to SRAM buffers which hold received packets before they are reassembled into the correct order. The paper says 1.2MB is required for 200Gbps, and the buffer size grows linearly with throughput. One interesting fact is that the buffer size is independent of latency, because throughput decreases with latency. For example, the paper mentions the same size works well with “inter-metro use-cases” which have 5-10x higher latency, but also 5-10x lower bandwidth. Falcon has an on-chip cache to hold mutable connection state, but the paper says that it is very common to have a high miss rate in this cache. The solution to this is to provision enough bandwidth to be able to have good performance when most accesses to connection state must go off chip. Reading between the lines, it seems like there are two scenarios which are important. The first has a small number of connections, with each connection experiencing a high packet rate. The second is a large number of connections, each with a low packet rate. Falcon has hardware support for somewhat rare events (errors, timeouts) rather than letting software on the host handle this.

0 views

Bounding Speculative Execution of Atomic Regions to a Single Retry

Bounding Speculative Execution of Atomic Regions to a Single Retry Eduardo José Gómez-Hernández, Juan M. Cebrian, Stefanos Kaxiras, and Alberto Ros ASPLOS'24 This paper proposes adding hardware support for a specific subset of atomic regions . An atomic region can either be a regular old critical section or a transaction in a system which supports transactional memory. Speculative lock elision is a microarchitectural optimization to speculatively remove synchronization between cores. A mis-speculation results in a finite number of retries, followed by falling back to locking as specified by the program. Hardware transactional memory (HTM) requires programmers to specify transactions at the language level. Again, conflicts are handled with a bounded number of retries, followed by executing user-specified “fallback” code. The key insight in this paper is that: many atomic regions can be implemented with at most a single retry . These atomic regions have an immutable set of memory addresses which they access. In other words, these regions will access the same set of cache lines on each retry. Table 1 shows statistics for each atomic region in a set of benchmarks analyzed by the paper: Source: https://dl.acm.org/doi/10.1145/3622781.3674176 Cacheline-locked executed Atomic Region The paper proposes hardware support for cacheline-locked executed atomic region s (CLEAR), to optimize execution of most atomic regions. The first invocation on an atomic region is part of the discovery phase . The processor keeps track of properties (e.g., the set of cache lines accessed) of the atomic region. If no conflicts occur with other transactions, then this first invocation of the atomic region can commit. If a conflict occurs, hardware finishes executing the atomic region, so that it gets a full picture of the set of cache lines touched by the region. The hardware then retries executing the atomic region, this time locking each cache line which will be accessed (locking in a sorted order to avoid deadlocks with other cores also locking cache lines). In most cases, this does the trick, and this second retry will succeed. A few things can go wrong, but the paper claims they are not too common: The atomic region can be too large for the core to keep track of all relevant metadata during discovery. The atomic region could contain indirections, which means the set of cache lines accessed could change from run to run. The processor optimistically assumes the set of cache lines won’t change and detects if this assumption was incorrect. For these reasons, the hardware must still have a fallback path (e.g., coarse-grained locking). But this is not the common case. Fig. 8 has the headline results. B is a simplistic implementation of HTM (requester-wins). P is a more advanced implementation of HTM (PowerTM). C is CLEAR built on top of the requester-wins HTM design. PW is CLEAR built on top of the PowerTM implementation of HTM. Source: https://dl.acm.org/doi/10.1145/3622781.3674176 Dangling Pointers Similar patterns appear in other domains. Ripple atomics require the read and write set of each atomic block to be known at compile time. Calvin runs OLTP transactions first as reconnaissance queries to determine the read/write sets of a transaction. From a language design point of view, it seems worth considering a special syntax for the subset of transactions which have immutable read/write sets. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts. Source: https://dl.acm.org/doi/10.1145/3622781.3674176 Cacheline-locked executed Atomic Region The paper proposes hardware support for cacheline-locked executed atomic region s (CLEAR), to optimize execution of most atomic regions. The first invocation on an atomic region is part of the discovery phase . The processor keeps track of properties (e.g., the set of cache lines accessed) of the atomic region. If no conflicts occur with other transactions, then this first invocation of the atomic region can commit. If a conflict occurs, hardware finishes executing the atomic region, so that it gets a full picture of the set of cache lines touched by the region. The hardware then retries executing the atomic region, this time locking each cache line which will be accessed (locking in a sorted order to avoid deadlocks with other cores also locking cache lines). In most cases, this does the trick, and this second retry will succeed. A few things can go wrong, but the paper claims they are not too common: The atomic region can be too large for the core to keep track of all relevant metadata during discovery. The atomic region could contain indirections, which means the set of cache lines accessed could change from run to run. The processor optimistically assumes the set of cache lines won’t change and detects if this assumption was incorrect.

0 views