Latest Posts (20 found)
The Coder Cafe 4 days ago

Linus Torvalds vs. Ambiguous Abstractions

🎄 If you’re planning to do Advent of Code this year, join The Coder Cafe leaderboard: . I’ll find a few prizes for the winner(s). If you’re new to Advent of Code, I wrote a short introduction last year, and I also wrote a blog post called I Completed All 8 Advents of Code in One Go: Here Are the Lessons I Learned if you’re interested. I’ve also created a custom channel in the Discord channel. Join the Discord ☕ Welcome to The Coder Cafe! Today, we discuss a recent comment from Linus Torvalds about the use of a helper function. Get cozy, grab a coffee, and let’s begin! In August 2025, there was (yet another) drama involving Linus Torvalds replying on a pull request: No. This is garbage and it came in too late. I asked for early pull requests because I’m traveling, and if you can’t follow that rule, at least make the pull requests good. This adds various garbage that isn’t RISC-V specific to generic header files. And by “garbage” I really mean it. This is stuff that nobody should ever send me, never mind late in a merge window. Like this crazy and pointless make_u32_from_two_u16() “helper”. That thing makes the world actively a worse place to live. It’s useless garbage that makes any user incomprehensible, and actively WORSE than not using that stupid “helper”. If you write the code out as “(a << 16) + b”, you know what it does and which is the high word. Maybe you need to add a cast to make sure that ‘b’ doesn’t have high bits that pollutes the end result, so maybe it’s not going to be exactly pretty, but it’s not going to be wrong and incomprehensible either. In contrast, if you write make_u32_from_two_u16(a,b) you have not a f^%$ing clue what the word order is . IOW, you just made things WORSE, and you added that “helper” to a generic non-RISC-V file where people are apparently supposed to use it to make other code worse too. So no. Things like this need to get bent. It does not go into generic header files, and it damn well does not happen late in the merge window. Let’s not discuss the rudeness of this comment (it’s atrocious). Instead, let’s focus on the content itself. , a popular newsletter, wrote a post about it: the main point Linus makes here is that good code optimizes for reducing cognitive load . {…] Humans have limited working memory capacity - let’s say the human brain can only store 4-7 “chunks” at at time. Each abstraction or helper function costs a chunk slot. Each abstractions costs more tokens. I share the view that good code optimizes for reducing cognitive load 1 , but I don’t understand Linus’s comment in exactly the same way. Yes, Linus is virulent about the helper function, but in my opinion, his main argument isn’t simply that an abstraction costs a “chunk slot” as mentioned; it’s rather that this isn’t the right abstraction. Here is the code added in the pull request: This macro builds a 32-bit integer by putting one 16-bit value in the high half and the other in the low half. For example: The main problem with this macro isn’t necessarily that it exists. It’s that its intent (meaning what it tries to accomplish) could have been clearer. Indeed, the helper’s name doesn’t tell which word is high and which one is low and that’s exactly what Linus is calling out with “ you have not a f^%$ing clue what the word order is ”. Because we can’t get the intent from the name ( ), we have to open the macro to understand the order. That’s precisely why it costs a “chunk slot.”: not because the abstraction exists, but because it’s an ambiguous one. If we wanted to keep using a macro, a better approach, in my opinion 2 , would be to encode the word order in the name itself ( = most significant word, = least significant word): In this case, the word order is carried by the macro name, which makes it a clearer abstraction. Reading the call site doesn’t require opening the macro to understand the word order: Such an abstraction doesn’t cost a “chunk slot” in terms of cognitive load. Its intent is clear from the name, so we don’t need to load an extra piece of information into our working memory to understand it. In summary, if we want to optimize for cognitive load, there’s not necessarily an issue with using helper functions. But if we do, we should make the abstraction as explicit as possible, and that starts with a clear function name that conveys what it tries to accomplish. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Readability Cognitive Load Nested Code Re: [GIT PULL] RISC-V Patches for the 6.17 Merge Window, Part 1 - Linus Torvalds // The discussion. GitHub // The code proposed in the pull request Linus and the two youts // Interestingly, the macro was plain wrong when the second word was negative. The full explanation is here. ❤️ If you enjoyed this post, please hit the like button. 💬 Where do you draw the line between “helpful” and “harmful” abstraction? Leave a comment At least most of the time. Sometimes we must optimize for performance at the expense of cognitive load. Mr Torvalds, if you see this and you disagree, please do not insult me. In August 2025, there was (yet another) drama involving Linus Torvalds replying on a pull request: No. This is garbage and it came in too late. I asked for early pull requests because I’m traveling, and if you can’t follow that rule, at least make the pull requests good. This adds various garbage that isn’t RISC-V specific to generic header files. And by “garbage” I really mean it. This is stuff that nobody should ever send me, never mind late in a merge window. Like this crazy and pointless make_u32_from_two_u16() “helper”. That thing makes the world actively a worse place to live. It’s useless garbage that makes any user incomprehensible, and actively WORSE than not using that stupid “helper”. If you write the code out as “(a << 16) + b”, you know what it does and which is the high word. Maybe you need to add a cast to make sure that ‘b’ doesn’t have high bits that pollutes the end result, so maybe it’s not going to be exactly pretty, but it’s not going to be wrong and incomprehensible either. In contrast, if you write make_u32_from_two_u16(a,b) you have not a f^%$ing clue what the word order is . IOW, you just made things WORSE, and you added that “helper” to a generic non-RISC-V file where people are apparently supposed to use it to make other code worse too. So no. Things like this need to get bent. It does not go into generic header files, and it damn well does not happen late in the merge window. Let’s not discuss the rudeness of this comment (it’s atrocious). Instead, let’s focus on the content itself. , a popular newsletter, wrote a post about it: the main point Linus makes here is that good code optimizes for reducing cognitive load . {…] Humans have limited working memory capacity - let’s say the human brain can only store 4-7 “chunks” at at time. Each abstraction or helper function costs a chunk slot. Each abstractions costs more tokens. I share the view that good code optimizes for reducing cognitive load 1 , but I don’t understand Linus’s comment in exactly the same way. Yes, Linus is virulent about the helper function, but in my opinion, his main argument isn’t simply that an abstraction costs a “chunk slot” as mentioned; it’s rather that this isn’t the right abstraction. Here is the code added in the pull request: This macro builds a 32-bit integer by putting one 16-bit value in the high half and the other in the low half. For example: The main problem with this macro isn’t necessarily that it exists. It’s that its intent (meaning what it tries to accomplish) could have been clearer. Indeed, the helper’s name doesn’t tell which word is high and which one is low and that’s exactly what Linus is calling out with “ you have not a f^%$ing clue what the word order is ”. Because we can’t get the intent from the name ( ), we have to open the macro to understand the order. That’s precisely why it costs a “chunk slot.”: not because the abstraction exists, but because it’s an ambiguous one. If we wanted to keep using a macro, a better approach, in my opinion 2 , would be to encode the word order in the name itself ( = most significant word, = least significant word): In this case, the word order is carried by the macro name, which makes it a clearer abstraction. Reading the call site doesn’t require opening the macro to understand the word order: Such an abstraction doesn’t cost a “chunk slot” in terms of cognitive load. Its intent is clear from the name, so we don’t need to load an extra piece of information into our working memory to understand it. In summary, if we want to optimize for cognitive load, there’s not necessarily an issue with using helper functions. But if we do, we should make the abstraction as explicit as possible, and that starts with a clear function name that conveys what it tries to accomplish. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Resources More From the Programming Category Readability Cognitive Load Nested Code Re: [GIT PULL] RISC-V Patches for the 6.17 Merge Window, Part 1 - Linus Torvalds // The discussion. GitHub // The code proposed in the pull request Linus and the two youts // Interestingly, the macro was plain wrong when the second word was negative. The full explanation is here.

0 views
The Coder Cafe 1 weeks ago

Build Your Own Key-Value Storage Engine—Week 2

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Before delving into this week’s tasks, it’s important to understand what you will implement. This week, you will implement a basic log-structured merge-tree (LSM tree). At its core, an LSM tree is a data structure that prioritizes write efficiency by trading off some read complexity. It buffers writes in memory and uses append-only files on disk, then rewrites data during compaction. It consists of two main components: A mutable in-memory data structure called a memtable, used to store recent writes. A set of immutable SSTables (Sorted String Table) stored on disk. Regularly, the current memtable is snapshotted, its entries are sorted by key, and a new immutable SSTable file is written. In addition, a MANIFEST file is an append-only list of SSTable filenames. It tells the engine which SSTable files exist and in which order to read them, newest to oldest. Why LSM trees shine for write-heavy workloads: Fast writes with sequential I/O: New updates are buffered in memory (memtable) and later written sequentially to disk during a flush (SSTable), which is faster than the random I/O patterns common with B-trees, for example. Decouples writes from read optimization: Writes complete against the memtable, while compaction work runs later (you will tackle that in a future week). Space and long-term efficiency: Compaction processes remove dead data and merge many small files into larger sorted files, which keeps space usage in check and sustains read performance over time. For the memtable, you will start with a hashtable. In a future week, you will learn why a hashtable is not the most efficient data structure for an LSM tree, but it is a simple starting point. For the SSTables, you will use JSON as the data format. Get comfortable with a JSON parser if you are not already. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord This week’s implementation is single-threaded. You will revisit that assumption later. Implement a hashtable to store requests (create or update). You can probably reuse a lot of code from Week 1. When your memtable contains 2,000 entries: Flush the memtable as a new immutable JSON SSTable file with keys sorted. The SSTable file is a JSON array of objects, each with two fields, and . Keys are unique within a file. For example, if your memtable contains the following entries: You need to create the following SSTable: Use a counter for the filename prefix, for example , , . After writing the new SSTable, append its filename to the MANIFEST (append only), then clear the memtable: For now, the flush is a stop-the-world operation. While the file is being written, do not serve reads or writes. You will revisit that later. Create an empty file if it doesn’t exist. Derive the next SSTable ID from the MANIFEST so you don't reuse the same filename. Check the memtable: If found, return the corresponding value. If not found, read the MANIFEST to list SSTable filenames: Scan SSTables from newest to oldest (for example , then , then ). Use a simple linear scan inside each file for now. Stop at the first hit and return the corresponding value. If still not found, return . There are no changes to the client you built in week 1. Run it against the same file ( put.txt ) to validate that your changes are correct. Keep a small LRU cache of known-absent keys (negative cache) between the memtable and SSTables. This avoids repeated disk scans for hot misses: after the first miss, subsequent lookups are O(1). Implementation details are up to you. Instead of parsing the MANIFEST file for each request, you can cache the content in-memory. That’s it for this week! You have built the first version of an LSM tree: a memtable in memory, SSTable files written by regular flushes, and a MANIFEST that lists those SSTables. For now, durability isn’t guaranteed. Data already flushed to SSTables will be read after a restart, but anything still in the memtable during a crash is lost. In two weeks, you will make sure that any request acknowledged to a client remains in your storage engine, even after a restart. The flush trigger you used was pretty simple: once the memtable contains 2,000 entries. In real systems, flushes can be triggered by various factors, for example: Some databases flush when the memtable reaches a target size in bytes, ensuring predictable memory usage. A flush can also occur after a period of time has passed. This occurs because the database eventually needs to release commit log segments. For tables with very low write activity, this can sometimes lead to data resurrection scenarios. Here’s an old issue from the ScyllaDB codebase that illustrates this behavior. Regarding the model, this series assumes a simple key–value one: every PUT stores the whole value, so a GET just finds the newest entry and returns it. If you need a richer model (e.g., rows with many fields or collections), writes are often partial (patches) rather than full replacements. Therefore, reads must reconstruct the result by scanning newest to oldest and merging changes until all required fields are found or a full-write record is encountered. Last but not least, in this series, you implicitly rely on client-side ordering: the validation client issues requests sequentially. Production KV databases typically attach a sequence number or a logical timestamp to each write to handle out-of-order arrivals, merging, and reconciling results. Pure wall-clock timestamps are convenient but brittle; see Kyle Kingsbury’s notes on clock pitfalls for a deeper dive. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary . ❤️ If you enjoyed this post, please hit the like button. Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations A mutable in-memory data structure called a memtable, used to store recent writes. A set of immutable SSTables (Sorted String Table) stored on disk. Fast writes with sequential I/O: New updates are buffered in memory (memtable) and later written sequentially to disk during a flush (SSTable), which is faster than the random I/O patterns common with B-trees, for example. Decouples writes from read optimization: Writes complete against the memtable, while compaction work runs later (you will tackle that in a future week). Space and long-term efficiency: Compaction processes remove dead data and merge many small files into larger sorted files, which keeps space usage in check and sustains read performance over time. This week’s implementation is single-threaded. You will revisit that assumption later. Flush the memtable as a new immutable JSON SSTable file with keys sorted. The SSTable file is a JSON array of objects, each with two fields, and . Keys are unique within a file. For example, if your memtable contains the following entries: You need to create the following SSTable: Use a counter for the filename prefix, for example , , . After writing the new SSTable, append its filename to the MANIFEST (append only), then clear the memtable: Create an empty file if it doesn’t exist. Derive the next SSTable ID from the MANIFEST so you don't reuse the same filename. Check the memtable: If found, return the corresponding value. If not found, read the MANIFEST to list SSTable filenames: Scan SSTables from newest to oldest (for example , then , then ). Use a simple linear scan inside each file for now. Stop at the first hit and return the corresponding value. If still not found, return . Some databases flush when the memtable reaches a target size in bytes, ensuring predictable memory usage. A flush can also occur after a period of time has passed. This occurs because the database eventually needs to release commit log segments. For tables with very low write activity, this can sometimes lead to data resurrection scenarios. Here’s an old issue from the ScyllaDB codebase that illustrates this behavior. The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary .

0 views
The Coder Cafe 2 weeks ago

Nothing Beats Kindness

☕ Welcome to The Coder Cafe! Today, November 13, is World Kindness Day. For this special occasion, we discuss how kindness matters at work. Get cozy, grab a coffee, and let’s begin! We’re in 2022. It’s Saturday evening, and I’m about to go to bed. I’m on-call that night. I haven’t been paged, but just to make sure everything is OK, I logged in and checked Slack. An incident was going on, and a colleague was already on it. I DMed him: “ Why didn’t you contact me? ” He replied: “ It’s late and I thought you might be sleeping. I was awake, so I looked to see if there’s something I could do. ” My first reaction was: I’m on-call. I’m paid for it. I’ll take care of it. Go to bed. But here’s the thing: on a Saturday evening, he chose to help because he thought I might be sleeping, even though I was the one on-call, the one paid to handle it. That was a pure act of kindness. No points. No credit. Just care. And after that? Honestly, I would have done anything for that person . At work, we work with people long before we work with code. There’s always a little distance between us: roles and power dynamics, deadlines and pressure, different cultures, communication styles, sometimes different time zones. Kindness is the fastest bridge across that distance. Kindness is about being generous, considerate, and having concern for others without expecting praise or reward in return. It’s a voluntary act that creates psychological safety among team members. When people feel safe, they surface risks earlier, ask the “naive” questions, and move faster together. Kind people make work better day in, day out. Kindness boosts trust, speeds decisions, reduces stress, and quietly raises the bar for everyone. Let’s look at a few places where kindness matters in our daily jobs: Code review : When we’re assigned a review, we’re not there to rate someone’s code. We’re there to merge the best possible change together. Be respectful and stay factual. Favor questions over pronouncements: “ What scenarios does this handle? I’m worried about X; would Y cover Z? ” Point out what’s good, suggest concrete fixes, and link to standards or examples. If there’s confusion, offer help. Meetings : Make space so everyone can be heard. Don’t interrupt. Invite quieter people in: “ Ben, anything you would add? ” It’s not because someone is more vocal that they’re more right. Mentoring : People make mistakes. Don’t jump to blame or perform expertise. The goal is to protect in public and correct in private. Give clear, kind feedback, focus on the next step, and share your own past mistakes to lower the temperature. Random thank you : When you receive help or just enjoy working with someone, say thank you. Recognition matters, and doing it publicly multiplies the effect. For example, at Google, there’s a program called gThanks that lets you thank someone publicly so others can see it too. Make time to listen : Being kind also means making time to listen. I remember going through a difficult period of my life, and a former manager just took time to talk, without judging. That mattered more than any advice. Self-compassion : Kindness also applies to yourself. Give yourself the same understanding you would give a teammate. Take breaks, ask for help, forgive your own mistakes, and learn from them. Being kind is a bridge to people, and even in a professional context, as Aesop wrote, no act of kindness, no matter how small, is ever wasted. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Don’t Forget About Your Mental Health Keeping a Mistake Journal The XY Problem Why Kindness at Work Pays Off Random Acts of Kindness Foundation ❤️ If you enjoyed this post, please hit the like button. 💬 What’s one act of kindness that changed your workday? Leave a comment We’re in 2022. It’s Saturday evening, and I’m about to go to bed. I’m on-call that night. I haven’t been paged, but just to make sure everything is OK, I logged in and checked Slack. An incident was going on, and a colleague was already on it. I DMed him: “ Why didn’t you contact me? ” He replied: “ It’s late and I thought you might be sleeping. I was awake, so I looked to see if there’s something I could do. ” My first reaction was: I’m on-call. I’m paid for it. I’ll take care of it. Go to bed. But here’s the thing: on a Saturday evening, he chose to help because he thought I might be sleeping, even though I was the one on-call, the one paid to handle it. That was a pure act of kindness. No points. No credit. Just care. And after that? Honestly, I would have done anything for that person . Why Kindness Wins At work, we work with people long before we work with code. There’s always a little distance between us: roles and power dynamics, deadlines and pressure, different cultures, communication styles, sometimes different time zones. Kindness is the fastest bridge across that distance. Kindness is about being generous, considerate, and having concern for others without expecting praise or reward in return. It’s a voluntary act that creates psychological safety among team members. When people feel safe, they surface risks earlier, ask the “naive” questions, and move faster together. Kind people make work better day in, day out. Kindness boosts trust, speeds decisions, reduces stress, and quietly raises the bar for everyone. Let’s look at a few places where kindness matters in our daily jobs: Code review : When we’re assigned a review, we’re not there to rate someone’s code. We’re there to merge the best possible change together. Be respectful and stay factual. Favor questions over pronouncements: “ What scenarios does this handle? I’m worried about X; would Y cover Z? ” Point out what’s good, suggest concrete fixes, and link to standards or examples. If there’s confusion, offer help. Meetings : Make space so everyone can be heard. Don’t interrupt. Invite quieter people in: “ Ben, anything you would add? ” It’s not because someone is more vocal that they’re more right. Mentoring : People make mistakes. Don’t jump to blame or perform expertise. The goal is to protect in public and correct in private. Give clear, kind feedback, focus on the next step, and share your own past mistakes to lower the temperature. Random thank you : When you receive help or just enjoy working with someone, say thank you. Recognition matters, and doing it publicly multiplies the effect. For example, at Google, there’s a program called gThanks that lets you thank someone publicly so others can see it too. Make time to listen : Being kind also means making time to listen. I remember going through a difficult period of my life, and a former manager just took time to talk, without judging. That mattered more than any advice. Self-compassion : Kindness also applies to yourself. Give yourself the same understanding you would give a teammate. Take breaks, ask for help, forgive your own mistakes, and learn from them. Don’t Forget About Your Mental Health Keeping a Mistake Journal The XY Problem Why Kindness at Work Pays Off Random Acts of Kindness Foundation

0 views
The Coder Cafe 3 weeks ago

Build Your Own Key-Value Storage Engine—Week 1

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Welcome to week 1 of Build Your Own Key-Value Storage Engine ! Let’s start by making sure what you’re about to build in this series makes complete sense: what’s a storage engine? A storage engine is the part of a database that actually stores, indexes, and retrieves data, whether on disk or in memory. Think of the database as the restaurant, and the storage engine as the kitchen that decides how food is prepared and stored. Some databases let you choose the storage engine. For example, MySQL uses InnoDB by default (based on B+-trees). Through plugins, you can switch to RocksDB, which is based on LSM trees. This week, you will build an in-memory storage engine and the first version of the validation client that you will reuse throughout the series. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Keys are lowercase ASCII strings. Values are ASCII strings. NOTE : Assumptions persist for the rest of the series unless explicitly discarded. The request body contains the value. If the key exists, update its value and return success. If the key doesn’t exist, create it and return success. Keep all data in memory. If the key exists, return 200 OK with the value in the body. If the key does not exist, return . Implement a client to validate your server: Read the testing scenario from this file: put.txt . Run an HTTP request for each line: → Send a to with body . → Send a to . Confirm that is returned. If not, something is wrong with your implementation. → Send a GET to . Confirm that is returned. If not, something is wrong with your implementation. Each request must be executed sequentially, one line at a time; otherwise, out-of-order responses may fail the client’s assertions. If you want to generate an input file with a different number of lines, you can use this Go generator : is the format to generate. is the number of lines. At this stage, you need a -type file, so for example, if you need one million lines: Add basic metrics for latency: Record start and end time for each request. Keep a small histogram of latencies in milliseconds. At the end, print , , and . This work is optional as there is no latency target in this series. However, it can be an interesting point of comparison across weeks to see how your changes affect latency. That’s it for this week! You have built a simple storage engine that keeps everything in memory. In two weeks, we will level up. You will delve into a data structure widely used in key-value databases: LSM trees. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. Week 0: Introduction Week 1: In-Memory Store Welcome to week 1 of Build Your Own Key-Value Storage Engine ! Let’s start by making sure what you’re about to build in this series makes complete sense: what’s a storage engine? A storage engine is the part of a database that actually stores, indexes, and retrieves data, whether on disk or in memory. Think of the database as the restaurant, and the storage engine as the kitchen that decides how food is prepared and stored. Some databases let you choose the storage engine. For example, MySQL uses InnoDB by default (based on B+-trees). Through plugins, you can switch to RocksDB, which is based on LSM trees. This week, you will build an in-memory storage engine and the first version of the validation client that you will reuse throughout the series. Your Tasks 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Assumptions Keys are lowercase ASCII strings. Values are ASCII strings. : The request body contains the value. If the key exists, update its value and return success. If the key doesn’t exist, create it and return success. Keep all data in memory. : If the key exists, return 200 OK with the value in the body. If the key does not exist, return . Read the testing scenario from this file: put.txt . Run an HTTP request for each line: → Send a to with body . → Send a to . Confirm that is returned. If not, something is wrong with your implementation. → Send a GET to . Confirm that is returned. If not, something is wrong with your implementation. Each request must be executed sequentially, one line at a time; otherwise, out-of-order responses may fail the client’s assertions. is the format to generate. is the number of lines. Record start and end time for each request. Keep a small histogram of latencies in milliseconds. At the end, print , , and .

1 views
The Coder Cafe 1 months ago

Horror Coding Stories: Therac-25

📅 Last updated: March 9, 2025 🎃 Welcome to The Coder Cafe! Today, we examine the Therac-25 accidents, where design and software failures resulted in multiple radiation overdoses and deaths. Make sure to check the Explore Further section to see if you’re able to reproduce the deadly issue. Get cozy, grab a pumpkin spice latte, and let’s begin! Therac-25 Treating cancers used to require a mix of machines, depending on tumor depth: shallow or deep. In the early 1980s, a new generation promised both from a single system. That was a big deal for hospitals: one machine instead of several meant lower maintenance and fewer systems to manage. That was the case with the Therac-25. The Therac-25 offered two therapies with selectable modes: Electron beam: Low-energy electrons for shallow tumors (e.g., skin cancer). X-ray photons: High-energy radiation for deep tumors (e.g., lung cancer). Earlier Therac models allowed switching modes with hardware circuits and physical interlocks. The new version was smaller, cheaper, and computer-controlled. Less hardware and fewer parts meant lower costs. However, what no one realized soon enough: it also removed an independent safety net. On a routine day, a radiology technologist sat at the console and began entering a plan: By habit, she selected X-ray (deep mode). Then she immediately corrected it for Electron (shallow mode) and hit start. The machine halted with a message. The operator’s manual didn’t explain the code. Service materials listed the number but gave no useful guidance, so she resumed and triggered the radiation. The patient was receiving his ninth treatment. Immediately, he knew something was different. He reported a buzzing sound, later recognized as the accelerator pouring out radiation at maximum. The pain came fast; paralysis followed. He later died from radiation injury. Weeks later, a second patient endured the same incident on the same model. Initially, the radiology technologist entered ‘ ’ for X-ray (‘▮’ is the cursor and ‘ ’ are other fields): She immediately hit Cursor Up to go back and correct the field to ‘ ’: After a rapid sequence of Return presses, she moved back down to the command area: From her perspective, the screen showed the corrected mode, so she hit return and started the treatment: Behind the scenes, the Therac-25 software ran several concurrent tasks: Data-entry task : Monitored operator inputs and edited a shared treatment-setup structure. Hardware-control task : On a periodic loop, snapshotted that same structure and positioned the turntable and magnets based on user input. Because both tasks read the same memory with no mutual exclusion, there was a short window (on the order of seconds) in which the hardware-control task used a different value than the one displayed on the screen. As a result: The UI showed Electron mode, which looked correct to the operator. The hardware-control task had snapshotted stale data and marked the system as ready even though critical elements (e.g., turntable position, scanning magnets/accessories) were not yet aligned with electron mode. When treatment was started, the machine delivered an effectively unscanned, high-intensity electron beam, causing a massive overdose. This is a race condition example: the outcome depends on the timing of events, here, the input cadence of the technologist. Depending on the timing, the system could enter a fatal state, with one process seeing ‘ ’ while another saw ‘ ’. The manufacturer later confirmed the error could not be reproduced reliably in testing. The timing had to line up just right, which made the bug elusive. They initially misdiagnosed it as a hardware fault and applied only minor fixes. Unfortunately, the speed of operator editing was the key trigger that exposed this software race. The problem could have stopped here, but it didn’t. Months later, another fatal overdose occurred, this time caused by a different software defect. It wasn’t a timing race. This time, the issue was a counter overflow within the control program. The software used an internal counter to track how many times certain setup operations ran. After the counter exceeded its maximum value, it wrapped back to zero. That arithmetic overflow created a window where a critical safety check was bypassed, allowing the beam to turn on without the proper accessories in place. Again, the Therac-25 fired a high-intensity beam without the proper hardware configuration. Both the race condition and the counter overflow stemmed from the same design flaw: the belief that software alone could enforce safety. The Therac-25 showed, in tragic terms, that without independent safeguards, small coding errors can have catastrophic consequences. We should know that whether it’s software, hardware, or a human process, every single safeguard has inherent flaws. Therefore, in complex systems, safety should be layered, as illustrated by the Swiss cheese model: Credits In total, there were six known radiation overdoses involving the Therac-25, and at least three were fatal. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Adaptive LIFO Resilient, Fault-tolerant, Robust, or Reliable? Lurking Variables The Worst Computer Bugs in History: Race conditions in Therac-25 Killed By A Machine: The Therac-25 An Investigation of the Therac-25 Accidents I created a Docker image based on a C implementation from an MIT course simulating the operator console of the Therac-25 interface: You can run the UI using Docker: Simulator commands: Beam Type: ‘ ’ or ‘ ’ Command: ‘ ’ for beam on, or ‘ ’ to quit the simulator. 👉 Try to trigger the error based on the scenario discussed. ❤️ If you enjoyed this post, please hit the like button. 💬 Any other horror coding stories you want to share? Leave a comment Therac-25 Treating cancers used to require a mix of machines, depending on tumor depth: shallow or deep. In the early 1980s, a new generation promised both from a single system. That was a big deal for hospitals: one machine instead of several meant lower maintenance and fewer systems to manage. That was the case with the Therac-25. The Therac-25 offered two therapies with selectable modes: Electron beam: Low-energy electrons for shallow tumors (e.g., skin cancer). X-ray photons: High-energy radiation for deep tumors (e.g., lung cancer). By habit, she selected X-ray (deep mode). Then she immediately corrected it for Electron (shallow mode) and hit start. Data-entry task : Monitored operator inputs and edited a shared treatment-setup structure. Hardware-control task : On a periodic loop, snapshotted that same structure and positioned the turntable and magnets based on user input. The UI showed Electron mode, which looked correct to the operator. The hardware-control task had snapshotted stale data and marked the system as ready even though critical elements (e.g., turntable position, scanning magnets/accessories) were not yet aligned with electron mode. When treatment was started, the machine delivered an effectively unscanned, high-intensity electron beam, causing a massive overdose. Credits In total, there were six known radiation overdoses involving the Therac-25, and at least three were fatal. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Resources More From the Reliability Category Adaptive LIFO Resilient, Fault-tolerant, Robust, or Reliable? Lurking Variables The Worst Computer Bugs in History: Race conditions in Therac-25 Killed By A Machine: The Therac-25 An Investigation of the Therac-25 Accidents You can run the UI using Docker: Simulator commands: Beam Type: ‘ ’ or ‘ ’ Command: ‘ ’ for beam on, or ‘ ’ to quit the simulator.

0 views
The Coder Cafe 1 months ago

Build Your Own Key-Value Storage Engine

Welcome to The Coding Corner ! This is our new section at The Coder Cafe, where we build real-world systems together, one step at a time. Next week, we will launch the first post series: Build Your Own Key-Value Storage Engine . Are you interested in understanding how key-value databases work? Tackling challenges like durability, partitioning, and compaction? Exploring data structures like LSM trees, Bloom filters, and tries? Then this series is for you. Build Your Own Key-Value Storage Engine focuses on the storage engine itself; we will stay single-node. Topics such as replication and consensus are out of scope. Yet, if this format works, we may cover them in a future series. The structure of each post will be as follows: Introduction : The theory for what you are about to build that week. Your tasks : A list of tasks to complete the week’s challenges. Note that you can complete the series in any programming language you want. Further notes : Additional perspective on how things work in real systems. If you’re not going to implement things yourself but are interested in databases, you may still want to read sections 1 and 3 at least. Each week out of two, a new post of the series will be released. Last but not least, I’m delighted to share that this series was written in collaboration with ScyllaDB. They reviewed the content for accuracy and shared practical context from real systems, providing a clearer view of how production databases behave and the problems they solve. Huge thanks to , Felipe Cardeneti Mendes , and ScyllaDB. By the way, they host a free virtual conference called Monster Scale Summit, and the content is always excellent. If you care about scaling challenges, it’s absolutely worth registering! Also, if you’re interested in giving a talk, the CFP closes in two days. Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. On a personal note, this has been the most time-consuming project I have done for The Coder Cafe . I really hope you will enjoy it! See you this Friday for a special post for Halloween and next Wednesday for the first post of the series. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. Welcome to The Coding Corner ! This is our new section at The Coder Cafe, where we build real-world systems together, one step at a time. Next week, we will launch the first post series: Build Your Own Key-Value Storage Engine . Are you interested in understanding how key-value databases work? Tackling challenges like durability, partitioning, and compaction? Exploring data structures like LSM trees, Bloom filters, and tries? Then this series is for you. Build Your Own Key-Value Storage Engine focuses on the storage engine itself; we will stay single-node. Topics such as replication and consensus are out of scope. Yet, if this format works, we may cover them in a future series. The structure of each post will be as follows: Introduction : The theory for what you are about to build that week. Your tasks : A list of tasks to complete the week’s challenges. Note that you can complete the series in any programming language you want. Further notes : Additional perspective on how things work in real systems.

1 views
The Coder Cafe 1 months ago

Speed vs. Velocity

☕ Welcome to The Coder Cafe! Today, we discuss the difference between speed and velocity in team productivity, illustrating that tracking speed alone can be misleading. Get cozy, grab a coffee, and let’s begin! We often celebrate teams for moving fast. But speed alone can be a trap. A rush of fast changes that barely move the product toward the real goal shouldn’t count as a win. When we talk about team productivity, we should understand that speed ≠ velocity : Speed is how quickly a team ship changes. Velocity is speed with direction, the movement toward a defined goal. Let’s look at three teams to illustrate these definitions. Team A has ideal velocity. Each iteration is represented by an arrow: its length shows speed, and its angle shows direction. Every iteration moves the team consistently closer to the goal. Team A shows ideal velocity: steady speed and consistent direction. Team B has a speed problem. Each iteration is well-aligned (correct angle), but the team delivers too slowly (small length), resulting in a lower velocity. They finish only halfway to the goal. Team B has the right direction but low speed, so they only reach halfway to the goal. Team C ships as rapidly as team A (same arrow length). However, various factors such as frequent bug fixes and changing targets make their direction inconsistent. Despite high speed, their velocity is low, and they end only halfway to the goal, just like team B. Team C moves fast but changes direction often, so velocity stays low and progress stops halfway. As we can see, velocity requires both speed and direction. A team moving too slowly or in inconsistent directions will make little progress, even if they’re busy . Only when speed is high and direction is aligned do teams reach their goals efficiently. Measuring team speed isn’t useless, though. We can track speed by considering various metrics such as deployment frequency, average time in code review, or mean time to recovery (MTTR) following a production bug. These metrics are interesting to track and provide a certain perspective to understand a team's productivity. Yet, speed shouldn’t be the sole dimension to track. The danger of tracking speed only is that a team might become organized in a way to optimize short-term delivery . The team might focus on delivering many changes that, together, do not move the product in a meaningful way. Instead, teams should track speed and velocity. As we said, velocity is speed with direction. We already discussed metrics to track speed; what is missing is monitoring direction. Setting clear and factual objectives that align with the business strategy helps us track direction. For example: Payment success rate above 99 percent. Signup to activation above 50 percent within 7 days. Retention after week 4 is above 40 percent. The easier a metric is to measure, the easier it is to track the direction over time. One caveat is how to report progress. In a past team, we used to set OKRs (Objectives and Key Results) every semester. Some objectives were difficult to measure, so we tracked progress differently. Say we created 50 tickets and closed 45. In that case, we reported that we reached 90% of the OKR. That number said nothing about real progress toward the objective. Key results should be outcomes, not ticket counts. Something else to mention, I read here and there that we should track velocity by counting the number of story points delivered within a timeframe (e.g., during a sprint). I strongly disagree with this. Let me give an example: A team ships a first change ← 3 story points The team finds a bug and ships a fix ← 2 story points Later, the team learns the initial approach is not the best, so it ships another change ← 5 story points On paper, the team delivered 10 story points. That is only speed, though, not velocity. If we care about direction as well, the team only delivered a single feature. Story points measure effort; they don’t measure progress toward the objective. Speed is how fast we ship. Velocity is speed with direction toward a goal. The goal of a team shouldn’t be to reach high speed; it should be to reach high velocity, where rapid iterations translate into real system-level improvements. Tracking story points, for example, is a measure of speed, not velocity. Objectives tracking should use outcomes, not ticket counts. Reporting 90 percent of tickets done is not a good measure of progress toward the objective. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Keeping a Mistake Journal Streetlight Effect Survivor Bias Sprint Velocity in Scrum: How to Measure and Improve Performance // The velocity definition I disagree with. ❤️ If you enjoyed this post, please hit the like button. 💬 What’s your take on speed vs. velocity? How do you measure velocity in your team? Leave a comment We often celebrate teams for moving fast. But speed alone can be a trap. A rush of fast changes that barely move the product toward the real goal shouldn’t count as a win. When we talk about team productivity, we should understand that speed ≠ velocity : Speed is how quickly a team ship changes. Velocity is speed with direction, the movement toward a defined goal. Team A has ideal velocity. Each iteration is represented by an arrow: its length shows speed, and its angle shows direction. Every iteration moves the team consistently closer to the goal. Team A shows ideal velocity: steady speed and consistent direction. Team B has a speed problem. Each iteration is well-aligned (correct angle), but the team delivers too slowly (small length), resulting in a lower velocity. They finish only halfway to the goal. Team B has the right direction but low speed, so they only reach halfway to the goal. Team C ships as rapidly as team A (same arrow length). However, various factors such as frequent bug fixes and changing targets make their direction inconsistent. Despite high speed, their velocity is low, and they end only halfway to the goal, just like team B. Team C moves fast but changes direction often, so velocity stays low and progress stops halfway. Payment success rate above 99 percent. Signup to activation above 50 percent within 7 days. Retention after week 4 is above 40 percent. A team ships a first change ← 3 story points The team finds a bug and ships a fix ← 2 story points Later, the team learns the initial approach is not the best, so it ships another change ← 5 story points Speed is how fast we ship. Velocity is speed with direction toward a goal. The goal of a team shouldn’t be to reach high speed; it should be to reach high velocity, where rapid iterations translate into real system-level improvements. Tracking story points, for example, is a measure of speed, not velocity. Objectives tracking should use outcomes, not ticket counts. Reporting 90 percent of tickets done is not a good measure of progress toward the objective. Keeping a Mistake Journal Streetlight Effect Survivor Bias Sprint Velocity in Scrum: How to Measure and Improve Performance // The velocity definition I disagree with.

0 views
The Coder Cafe 1 months ago

Conflict-Free Replicated Data Types (CRDTs)

☕ Welcome to The Coder Cafe! Today, we will explore CRDTs, why they matter in distributed systems, and how they keep nodes in sync. Get cozy, grab a coffee, and let’s begin! CRDTs, short for Conflict-Free Replicated Data Types, are a family of data structures built for distributed systems. At first sight, CRDTs may look intimidating. Yet at their core, the idea is not that complex. What makes them special is that they allow updates to happen independently on different nodes while still guaranteeing that all replicas eventually converge to the same state. To understand how CRDTs achieve this, we first need to step back. We need to talk about concurrent operations and what coordination means in a distributed system. Let’s take it step by step. What does concurrent operations mean? Our first intuition might be to say they happen at the same time. That’s not quite right. Here’s a counterargument based on a collaborative editing example. While on a plane, Alice connects to a document and makes an offline change to a sentence. An hour later, Bob connects to the same document and edits the very same sentence, but online. Later, when Alice lands, both versions have to sync. The two edits (1. and 2.) were separated by an hour. They didn’t happen at the same time, yet they are concurrent. So what’s a better definition for concurrent operations? Two operations that are not causally related. In the previous example, neither operation was made with knowledge of the other. They are not causally related, which makes them concurrent. Yet, if Bob had first seen Alice’s update and then made his own, his edit would depend on hers. In that case, the two operations wouldn’t be concurrent anymore. We should also understand concurrent ≠ conflict: If Alice fixes a missing letter in a word while Bob removes the whole word, that’s a conflict. If Alice edits one sentence while Bob edits another, that’s not a conflict. Concurrency is about independence in knowledge. Conflict is about whether the effects of operations collide. Now, let’s talk about coordination in distributed systems. Imagine a database with two nodes, node 1 and node 2. A bunch of clients connect to it. Sometimes requests go to node 1, sometimes to node 2. Let’s say two clients send concurrent and conflicting operations: In this case, we can’t have node 1 storing $200 while node 2 stores -$100. That would be a consistency violation with the two nodes disagreeing on Alice’s balance. Instead, both nodes need to agree on a shared value. To do that, they have to communicate and decide on one of the following: Reject both operations Accept client A’s update and set the balance to $200 Accept client B’s update and set the balance to -$100 The very action of nodes communicating and, if needed, waiting to agree on a single outcome is called coordination. Coordination is one way to keep replicas consistent under concurrent operations. But coordination is not the only way. That’s where CRDTs come in. CRDT stands for Conflict-Free Replicated Data Types . In short, CRDTs are data structures built so that nodes can accept local updates independently and concurrently, without the need for coordination. If you read our recent post on availability models, you might notice we’re now in the territory of total availability: a system is totally available if every non-faulty node can execute any operation. Total availability comes with weaker consistency. For CRDTs, the consistency guarantee is called Strong Eventual Consistency (SEC) . For that, CRDTs rely on a deterministic conflict resolution algorithm. Because every node applies the same rules, all replicas are guaranteed to eventually converge to the same state. Let’s make this more concrete with a classic CRDT: the G-Counter (Grow-Only Counter). Imagine a database with two nodes tracking the number of likes on a post. Node 1 receives a new like, increments its counter, and replies success to the client: Then, node 1 communicates with node 2 to send this update: Ultimately, both nodes converge to the same value: 6. How does the conflict resolution work for a G-Counter? Each replica keeps a vector of counters, with one slot per node. In our example, the total number of likes is 5. Let’s say node 1 has seen 2 likes and node 2 has seen 3 likes. So the initial state is the following: When node 1 receives a new like, it only increments its own slot. Node 2 is now temporarily out of sync: During synchronization, both nodes merge their vectors by taking the element-wise maximum: Now both replicas converge to the same state: The beauty of this algorithm is that it’s deterministic and order-independent. No matter when or how often the nodes sync, they always end up with the same state. NOTE : Do you know Gossip Glomers? It’s a series of distributed systems challenges we briefly introduced in an earlier post . Challenge 4 is to build a Grow-Only Counter. It’s worth checking out if you haven’t already. CRDTs can also be combined to make a more complex CRDT. For example, if we want to track both likes and dislikes, we can use two G-Counters together. This data type is called a PN-Counter (Positive-Negative Counter). Imagine two clients act concurrently on the same post: one likes it, another dislikes it. The nodes exchange their updates and converge to the same value: In the case of a PN-Counter, the conflict resolution algorithm is similar to the G-Counter. The difference lies in the fact that it involves not one but two vectors: one for increases and one for decreases. Assume an initial state where node 1 has received 2 likes and 0 dislikes, and node 2 has received 3 likes and 0 dislikes: Now, suppose node 1 receives a new like and node 2 receives a dislike. Before the sync, the state is the following: When the replicas exchange their state, the merge rule is element-wise maximum for each vector: After sync, both nodes converge to: The final counter of likes is: Let’s pause for a second. Based on what we’ve discussed, can you think of some use cases for CRDTs? A data structure where nodes are updated independently, concurrently, without coordination, and still guarantees that they converge to the same state? One main use case is collaborative and offline-first systems. For example, Notion, a collaborative workspace, recently introduced a feature that lets people edit the same content offline. They rely on CRDTs, and more specifically on Peritext, a CRDT for rich-text collaboration co-authored by multiple people, including . Another big use case is totally available systems that put availability ahead of strong consistency. As we’ve seen, nodes don’t need to coordinate before acknowledging a client request, which makes the system more highly available. Take Redis, for example. It can be configured in an active-active architecture with geographically distributed datacenter s. Clients connect to their closest cluster and get local latencies without waiting for coordination across distant regions. And yes, this setup is built on CRDTs. We could also think about other applications for CRDTs, like: Edge & IoT : Devices update offline and merge later without a central server. Peer-to-peer : Peers share changes directly and match up when they reconnect. CDN/edge state : Keep preferences, drafts, or counters near users and sync to the origin later. There are two main types of CRDTs: State-based CRDTs : Convergence happens by propagating the full state. Operation-based CRDTs : Convergence happens by propagating the update operations. In the previous examples, we looked at two state-based CRDTs: the G-Counter (Grow-Only Counter) and the PN-Counter (Positive-Negative Counter). In both cases, what was exchanged between the nodes was the entire state. For example, node 1 could tell node 2 that its total number of likes is 3. With state-based CRDTs, states are merged with a function that must be: Commutative: We can merge in any order and get the same result. Idempotent: Merging something with itself doesn’t change it. Associative: We can merge in any grouping and get the same result. Each synchronization monotonically increases the internal state. In other words, when two replicas sync, the state can only move forward, never backward. This is enforced by a simple “ can’t-go-backwards ” rule (a partial order), where merges use operations like max for numbers (as we’ve seen) or union for sets. In operation-based CRDTs, nodes share the operations rather than the full state. Convergence relies on three properties: Commutativity of concurrent operations Causality: Either carried in the operations’ metadata (for example, vector clocks) or guaranteed by the transport layer through causal delivery Duplicate tolerance: Handled by idempotent operations, unique operation IDs with deduplication, or a transport layer that guarantees no duplicates One example of an operation-based CRDT is the LWW-Register (Last-Writer-Wins Register), which stores a single value. Updates are resolved using a logical timestamp (such as Lamport clocks) along with a tie-breaker like the node ID. When a node writes a value, it broadcasts an operation . On receiving it, a node applies the update if the pair is greater than the one it currently holds. To summarize: State-based CRDTs: Convergence is guaranteed because merging states is associative, commutative, and idempotent. Don’t require assumptions on the delivery layer beyond eventual delivery. Simpler to reason about. Exchanging full states can be more bandwidth-intensive. Operation-based CRDTs: More bandwidth-efficient; we only send the operations, not the whole state. Correctness usually depends on having causal order (or encoding causality in the ops) and tolerating duplicates via idempotence/dedup. More complex to implement (causal broadcast, vector clocks, or equivalent). For completeness, there’s also a third type we should be aware of: delta-based CRDTs . Here, convergence is achieved by sending and merging fragments of state (deltas) rather than the entire state. A quick analogy to picture the differences: State-based CRDT: “ From time to time, send me the whole document. ” Operation-based CRDT: “ When you make a change, tell me exactly what you did. ” → “ Adding word `miles` at position 42. ” Delta-based CRDT: “ When you make a change, send me just the delta that reflects it (for example, the updated sentence) ” → “ And miles to go before I sleep. ” We talked about collaborative document editing. So you might assume a system like Google Docs is based on CRDTs, right? Well, that’s not the case. Google Docs is based on another concept called OT (Operational Transformation) . The goal of OT and CRDT is the same: convergence among all nodes in a collaborative system. The main difference is that OT requires all communication to go through the same server: We haven’t mentioned it until now (on purpose), but with CRDTs, there’s no need for a central server to achieve convergence . Back to our collaborative editing tool: if Alice and Bob are both offline but manage to connect their laptops directly, they could still achieve convergence without talking to a central server: As we saw earlier, CRDTs embed a deterministic conflict resolution algorithm. The data type itself ensures convergence. That’s the key difference: CRDTs don’t need to make any assumptions about the network topology or about a central server. considers CRDT to be the natural successor of OT. NOTE : So, why is Google Docs still based on OT? Historical reasons. Google Docs was launched before CRDTs existed, and it still works really well. There’s no practical reason for Google to migrate from OT to CRDT, despite some discussions about it in the past. Operations are concurrent when they aren’t causally related; concurrency doesn’t automatically mean conflict. Coordination is when replicas communicate and, if needed, wait to agree on a single outcome for concurrent updates before acknowledging clients, so they don’t diverge. CRDTs accept independent updates on each replica and still converge via deterministic merge rules. Three types: state-based (share full state), operation-based (share operations), delta-based (share just the changed parts). CRDTs are a great fit for systems like offline-first collaboration and highly available systems. Unlike OT, CRDTs don’t rely on a central server to reach the same result everywhere. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Exploring Database Isolation Levels Safety and Liveness Ivan Zhao (Notion’s CEO) tweet on the new Notion offline collaboration feature Diving into Conflict-Free Replicated Data Types (CRDTs) - Redis CRDTs: The Hard Parts by Hacker News discussion Peritext - A CRDT for Rich-Text Collaboration Active-Active geo-distribution (CRDTS-based) - Redis Bartosz Sypytkowski’s 12-part blog series on CRDT ❤️ If you enjoyed this post, please hit the like button. 💬 Have you worked with CRDTs before, or do you see another use case where they shine? Share your thoughts in the comments! Leave a comment CRDTs, short for Conflict-Free Replicated Data Types, are a family of data structures built for distributed systems. At first sight, CRDTs may look intimidating. Yet at their core, the idea is not that complex. What makes them special is that they allow updates to happen independently on different nodes while still guaranteeing that all replicas eventually converge to the same state. To understand how CRDTs achieve this, we first need to step back. We need to talk about concurrent operations and what coordination means in a distributed system. Let’s take it step by step. Concurrent Operations What does concurrent operations mean? Our first intuition might be to say they happen at the same time. That’s not quite right. Here’s a counterargument based on a collaborative editing example. While on a plane, Alice connects to a document and makes an offline change to a sentence. An hour later, Bob connects to the same document and edits the very same sentence, but online. Later, when Alice lands, both versions have to sync. If Alice fixes a missing letter in a word while Bob removes the whole word, that’s a conflict. If Alice edits one sentence while Bob edits another, that’s not a conflict. In this case, we can’t have node 1 storing $200 while node 2 stores -$100. That would be a consistency violation with the two nodes disagreeing on Alice’s balance. Instead, both nodes need to agree on a shared value. To do that, they have to communicate and decide on one of the following: Reject both operations Accept client A’s update and set the balance to $200 Accept client B’s update and set the balance to -$100 Then, node 1 communicates with node 2 to send this update: Ultimately, both nodes converge to the same value: 6. How does the conflict resolution work for a G-Counter? Each replica keeps a vector of counters, with one slot per node. In our example, the total number of likes is 5. Let’s say node 1 has seen 2 likes and node 2 has seen 3 likes. So the initial state is the following: When node 1 receives a new like, it only increments its own slot. Node 2 is now temporarily out of sync: During synchronization, both nodes merge their vectors by taking the element-wise maximum: Now both replicas converge to the same state: The beauty of this algorithm is that it’s deterministic and order-independent. No matter when or how often the nodes sync, they always end up with the same state. NOTE : Do you know Gossip Glomers? It’s a series of distributed systems challenges we briefly introduced in an earlier post . Challenge 4 is to build a Grow-Only Counter. It’s worth checking out if you haven’t already. PN-Counter CRDTs can also be combined to make a more complex CRDT. For example, if we want to track both likes and dislikes, we can use two G-Counters together. This data type is called a PN-Counter (Positive-Negative Counter). Imagine two clients act concurrently on the same post: one likes it, another dislikes it. The nodes exchange their updates and converge to the same value: In the case of a PN-Counter, the conflict resolution algorithm is similar to the G-Counter. The difference lies in the fact that it involves not one but two vectors: one for increases and one for decreases. Assume an initial state where node 1 has received 2 likes and 0 dislikes, and node 2 has received 3 likes and 0 dislikes: Now, suppose node 1 receives a new like and node 2 receives a dislike. Before the sync, the state is the following: When the replicas exchange their state, the merge rule is element-wise maximum for each vector: After sync, both nodes converge to: The final counter of likes is: Use Cases Let’s pause for a second. Based on what we’ve discussed, can you think of some use cases for CRDTs? A data structure where nodes are updated independently, concurrently, without coordination, and still guarantees that they converge to the same state? One main use case is collaborative and offline-first systems. For example, Notion, a collaborative workspace, recently introduced a feature that lets people edit the same content offline. They rely on CRDTs, and more specifically on Peritext, a CRDT for rich-text collaboration co-authored by multiple people, including . Another big use case is totally available systems that put availability ahead of strong consistency. As we’ve seen, nodes don’t need to coordinate before acknowledging a client request, which makes the system more highly available. Take Redis, for example. It can be configured in an active-active architecture with geographically distributed datacenter s. Clients connect to their closest cluster and get local latencies without waiting for coordination across distant regions. And yes, this setup is built on CRDTs. We could also think about other applications for CRDTs, like: Edge & IoT : Devices update offline and merge later without a central server. Peer-to-peer : Peers share changes directly and match up when they reconnect. CDN/edge state : Keep preferences, drafts, or counters near users and sync to the origin later. State-based CRDTs : Convergence happens by propagating the full state. Operation-based CRDTs : Convergence happens by propagating the update operations. Commutative: We can merge in any order and get the same result. Idempotent: Merging something with itself doesn’t change it. Associative: We can merge in any grouping and get the same result. Commutativity of concurrent operations Causality: Either carried in the operations’ metadata (for example, vector clocks) or guaranteed by the transport layer through causal delivery Duplicate tolerance: Handled by idempotent operations, unique operation IDs with deduplication, or a transport layer that guarantees no duplicates State-based CRDTs: Convergence is guaranteed because merging states is associative, commutative, and idempotent. Don’t require assumptions on the delivery layer beyond eventual delivery. Simpler to reason about. Exchanging full states can be more bandwidth-intensive. Operation-based CRDTs: More bandwidth-efficient; we only send the operations, not the whole state. Correctness usually depends on having causal order (or encoding causality in the ops) and tolerating duplicates via idempotence/dedup. More complex to implement (causal broadcast, vector clocks, or equivalent). State-based CRDT: “ From time to time, send me the whole document. ” Operation-based CRDT: “ When you make a change, tell me exactly what you did. ” → “ Adding word `miles` at position 42. ” Delta-based CRDT: “ When you make a change, send me just the delta that reflects it (for example, the updated sentence) ” → “ And miles to go before I sleep. ” We haven’t mentioned it until now (on purpose), but with CRDTs, there’s no need for a central server to achieve convergence . Back to our collaborative editing tool: if Alice and Bob are both offline but manage to connect their laptops directly, they could still achieve convergence without talking to a central server: As we saw earlier, CRDTs embed a deterministic conflict resolution algorithm. The data type itself ensures convergence. That’s the key difference: CRDTs don’t need to make any assumptions about the network topology or about a central server. considers CRDT to be the natural successor of OT. NOTE : So, why is Google Docs still based on OT? Historical reasons. Google Docs was launched before CRDTs existed, and it still works really well. There’s no practical reason for Google to migrate from OT to CRDT, despite some discussions about it in the past. Conclusion Operations are concurrent when they aren’t causally related; concurrency doesn’t automatically mean conflict. Coordination is when replicas communicate and, if needed, wait to agree on a single outcome for concurrent updates before acknowledging clients, so they don’t diverge. CRDTs accept independent updates on each replica and still converge via deterministic merge rules. Three types: state-based (share full state), operation-based (share operations), delta-based (share just the changed parts). CRDTs are a great fit for systems like offline-first collaboration and highly available systems. Unlike OT, CRDTs don’t rely on a central server to reach the same result everywhere. Exploring Database Isolation Levels Safety and Liveness Ivan Zhao (Notion’s CEO) tweet on the new Notion offline collaboration feature Diving into Conflict-Free Replicated Data Types (CRDTs) - Redis CRDTs: The Hard Parts by Hacker News discussion Peritext - A CRDT for Rich-Text Collaboration Active-Active geo-distribution (CRDTS-based) - Redis Bartosz Sypytkowski’s 12-part blog series on CRDT

0 views
The Coder Cafe 1 months ago

The Story of The Coder Cafe

☕ Welcome to The Coder Cafe! This week marks the first anniversary of the newsletter. To celebrate, I will share its story. Get cozy, grab a coffee, and let’s begin! Origins We were in July 2024. It was a warm weekend, and I was lying in bed thinking about my next “big thing”. A few years earlier, I had finished writing my book. It was an exhausting experience, but I finally felt ready for another challenge. All of a sudden, I got an idea. What if I launched… a podcast? I jumped out of bed, searched for a book on podcasts, bought one, and started reading about all that needs to be known: the different formats, how to find an audience, whether we should invite guests, and so on. I even had the perfect name. As a lover of the cozy atmosphere of coffee shops, I wanted my podcast to reflect that same warm ambience. The name would be The Coder Cafe . But a few days later, the excitement faded. Did I really want to make a podcast after all? I had a few concepts in mind, but none of them truly clicked . Eventually, I decided to drop the idea. Yet the name The Coder Cafe stuck with me. I checked thecoder.cafe domain, and it was available. Every great story starts with a domain name. Let’s buy it! Now I had a domain name, but I still didn’t know what to do about it. Around that time, I started reading a lot of newsletters. One in particular inspired me for the quality and regularity of its content: by . I even launched one called Go Engineer, which I quickly stopped. Thanks to my book, I already had a Go audience. But deep down, I didn’t want to write only about Go anymore. I didn’t want to be tied to a single language when there are so many areas I’m passionate about: code health, testing, distributed systems, reliability, observability, performance, and more. At some point, the two ideas converged. I would create a newsletter, and it would be called The Coder Cafe . It wouldn’t be tied to one language. It would be a place where any software engineer could find something useful. I already had experience with online writing. My Medium blog had more than 4k followers. But since my book, I hadn’t really written there. I needed a fresh start and a chance to relearn how to write online. Writing a book and writing online are two very different activities. So I ditched my book on podcasts and bought another one: The Art and Business of Online Writing . If you don’t know this book, I really recommend it. Two principles in particular stuck with me: Volume matters. The most popular blogs and newsletters publish often. Timeless topics matter. Daily news about which stock to buy has volume but no staying power. Timeless content always wins. These two ideas would shape my newsletter: I wanted to write daily ( volume ) and focus on fundamental concepts ( timeless ). I started drafting the newsletter description: Feeling overwhelmed by the endless stream of tech content? At The Coder Cafe, we serve timeless concepts with your coffee, every day. At first, I planned to publish five posts per week on different topics. But after talking with my girlfriend, she suggested I group posts by theme. For example, one week on caching, another on testing, etc. I loved the idea. She also recommended writing a recap post at the end of the week as a way to reinforce learning. I loved this idea even more. I wasn’t sure people wanted to read my content during the weekend, though, so I refined the plan: Four posts from Monday to Thursday, each on one core concept. A recap on Friday, to reinforce the week’s lessons. By mid-August, the concept was finalized. It was time to write some posts find a good logo! I bought a paid subscription to Canva and spent my whole weekend creating different logo variations: None of them really convinced me, though. When I was about to design yet another version, this time of a logo that repeats inside itself a few times to hint at recursion, I suddenly became aware of my lack of artistic skills . I decided to delegate the work, heading to  Fiverr  to find a freelancer. That’s how I met a very talented artist, Eli Huynh . She even worked on the Attack on Titan anime, can you believe it? I asked her to craft a coffee shop logo, and I slipped in some personal touches I wanted to see hidden in the design: An illustration of Designing Data-Intensive Applications by , my favorite computer science book. Moby Dock , the Docker mascot (I worked at Docker and loved my time there!) A Docker command on the coffee machine. She produced this beautiful masterpiece: I shared it with different people, and the feedback was unanimous: the drawing was stunning, but… it was not really a logo. Nothing against the artist, obviously, I was the one giving the requirements, but indeed, it wasn’t a logo. So I moved this illustration to the About page and kept searching for another freelancer. After a few failures, I started working with someone who feels promising. His first attempt was the following: After many back-and-forths, we converged on this version: When I received it, I loved it instantly. The colors felt warm and cozy. It captured the atmosphere I had in mind. Sometimes you don’t need outside validation, you just know. This logo would be The Coder Cafe ’s identity. In the end, I spent dozens of hours on this logo quest. You might think it was absurd, especially since I still hadn’t written a single post. But for me, visual identity had to come first. It’s like opening a restaurant: before designing the menu, you work on the atmosphere and decoration. Is it really absurd? I don’t think so. The same applied to the tagline. I tried dozens of variations: One daily concept. Learn daily, grow deeply. A timeless concept with your coffee. Brewed daily. Eventually, I chose this one: One concept with your coffee. It captures exactly what I want: concepts, a cozy coffee-shop vibe, and a short, memorable line that feels right at home in The Coder Cafe 1 . Yes, initially, my idea was to create a paid newsletter. I even shared it with an ex-colleague who gave me blunt feedback: I won’t give you my money; you already work at Google. From the outside, it may sound harsh, but I didn’t take it negatively. It actually made me reflect. Why did I want to create a paid newsletter in the first place? One of the things I enjoy most in life is learning and teaching (whether through writing, speaking, or any other format). If I could one day make a living from these two activities, that would be a dream job. It’s that simple. I didn’t create The Coder Cafe because I needed more money (of course, Google pays well). I made it because, in my dreams, it could eventually become my main activity. So, starting as a paid newsletter felt natural at the time. It was my way of committing fully to the project and testing whether The Coder Cafe could be more than just a side experiment . By the end of August, it was time to start creating content. Publishing one post a day is a lot. I did not want to feel pressured every evening to write for the next day, so I decided to build a buffer of posts. Throughout September, I wrote 25 posts plus their recaps, enough for six weeks of content. That felt safe. On October 7, 2024, The Coder Cafe newsletter was ready for launch, and its first post went live 🎉. To organize free and paid content, my idea was simple. All posts during the first four weeks would be free. The goal was to build an audience and show that my writing was worth paying for. After that period, I would keep one post free per week and place the others behind a paywall. The free post would act as a sample for new readers. The first week showed decent traction. I reached 120 free subscribers and 8 paid subscribers. The number of free subs was modest, but the paid-to-free ratio was good. Yet, to be honest, many of the early paid subscribers were friends or people who wanted to support me. Very few were convinced by the paid value proposition yet. The second week was better. One post reached the front page of Hacker News and passed 25k views. In a single day, subscriptions jumped to 291 free and 11 paid. That is when I saw the real impact of Hacker News: However, barely two weeks after launch, my girlfriend and I learned something that would have a massive impact on my daily newsletter: we were going to have a baby . Balancing my job at Google, my girlfriend, some social life, and a daily newsletter was already challenging. With a newborn on the way, it became absolutely impossible. Just one week after launch, I was the happiest man alive, but my daily newsletter concept was already dead. A quick pause to talk about marketing. I’m not going to overwhelm you with my “massive” marketing campaign (aka one post on LinkedIn and one post on X). But one day, I came up with a fun idea. What if I hid a coupon for lifetime access to the newsletter somewhere on thecoder.cafe website? Developers love puzzles. Maybe this could go viral? I started by hiding a tiny URL in the illustration designed by Eli Huynh: When pasted in a browser, this URL pointed to a Gist containing some JavaScript code. Running the code displayed a Base64 string. Decoding it three times gave another URL. Visiting that link opened a blank page that printed yet another URL in the browser console. That new URL pointed to an SVG file of a blue circle. Opening the SVG’s source revealed a hidden message with a free coupon: So, how many people found it? How many even reached the very first URL hidden in the image? Get ready for my next book on marketing! ( Credits to the original book) Stopping The Coder Cafe ? So, I knew I would eventually have to stop publishing daily because I would run out of time. It broke my heart for one particular reason. With free subscribers, I can explore topics. If they do not like a post, they skip it. With paying subscribers, though, I felt accountable. People spend their own money on my content, and I cannot disappoint them. The daily newsletter was a contract between them and me, and I was about to break it. In December, my buffer was shrinking fast, and I started to think seriously about stopping everything and refunding all paid subscriptions. Then something unexpected happened. It was close to Christmas, and the post of the day was on TDD . I was at my parents’ home when I received an email notification: a new paid subscriber had joined. And not just anyone, himself. Funny enough, I was not even praising TDD. I said that while it makes sense in some contexts, I do not really use it. A few days later, Kent left a comment on one of my posts: Looking back, that comment changed everything for me. It gave me confidence in myself and in my writing. If someone like Kent Beck found value in what I was doing, then maybe there was something worth continuing. Right after that, I took the bull by the horns and published a post titled Stepping Back to Move Forward . I explained that I would stop paid subscriptions, and I also pointed out a deeper issue with my daily format. As I explained, each week had a theme, for example, unit tests. Due to limited writing time, the daily posts were not very long. Instead of writing one in-depth article, I split the topic into four posts. Since only one of them was free, that was the one I shared on platforms like Reddit. The feedback I got was often that it lacked depth. I could have argued online: “But the depth is spread across four posts. If you read the whole series, you’ll see it. You just have to become a paid subscriber. Blah blah blah.” Let’s be honest. Nobody would have cared. Readers judged the content they saw. If it looked shallow, they were right, period. This made it hard to attract new people, and apart from the post that went viral, most of my content stayed relatively anonymous. One week later, I sent a private email to all paid subscribers. I explained I was stopping paid subscriptions and issuing refunds: On January 3rd, I had 747 free subscribers and went from 15 to 0 paid. From that moment on, The Coder Cafe became about enjoying the writing without pressure to deliver and taking the time to delve more into each topic. In March, I decided to stop blogging on Medium and fully switch to Substack. I feel at home here. I also wondered what would happen if I wrote more personal reflections and stories rather than concepts. So, I created a new section called Lattes & Stories (this post is in this section). I really enjoy writing in a more storytelling style. In April, I reached the 1,000 subscriber mark. To celebrate, I organized a coding challenge and gathered around $1,000 worth of prizes from Keychron , , JetBrains , and O’Reilly . Thanks again to the sponsors! It was a lot of fun to run. Maybe I will do another one later. $10,000 of prizes for 10,000 subs? We can still dream. Also in April, the idea of building a community around The Coder Cafe started to emerge. I wanted to create a sense of belonging, where members feel that they matter to one another and to the group. So, I created a Discord server. I’m still exploring options to spark engagement. Join the Discord community In May, something surreal happened: , the very newsletter that inspired me, recommended The Coder Cafe 2 . In September, I converted the content of year one into a 260-page book, available on Leanpub . Also in September, I enabled sponsorships to explore partnerships with companies interested in supporting The Coder Cafe . If you would like to partner on a post, you can learn more here . Some stats after one year: The Coder Cafe reached more than 3,600 subscribers across 119 countries. I wrote 78 posts and collected 309,432 views. Two posts stood out and together represent about one-third of total views: So, I Wrote a Book: The Story Behind 100 Go Mistakes and How to Avoid Them . I started writing this as soon as I finished my book, but it took three years to get the perspective to complete it. This post has a special meaning for me. Working on Complex Systems: What I Learned Working at Google . A deep dive into complex systems, written during a holiday with my brother in a city life retreat in the middle of Sweden. That trip also gave the post a special meaning. Writing from there with that view was unforgettable. Conclusion and Future So, is The Coder Cafe a success story? Financially, no. Because Stripe took fees on subscriptions, I refunded more than I received, so the balance is even negative 😅. But in the end, after one year, money wasn’t the measure of success. Positive feedback, helping readers, and rediscovering the joy of sharing stuff mattered more. It also pushed me to explore many topics across tech and non-tech. I haven’t let go of the dream of making this my living, just not yet. My priority now is reaching more people and building a stronger community. We will see where that journey leads. A glimpse into the future. I will continue to write about concepts and occasionally share stories. I will also launch a new section where we will pick a system and build it from scratch, step by step, week after week. I am currently collaborating with a company to craft the content of the first series. I am looking forward to releasing it. Speaking of collaboration, I would love to explore this aspect, whether it’s partnering with companies, inviting other writers 3 , or featuring people with a worthy story. We will see. 🫶 I love sharing, and you give me an audience. Thank you for that. See you next week for a (long) post on CRDTs. Subscribe now So, I Wrote a Book Why I Switched to Vim Keybindings What I Learned During My Paternity Leave The Art and Business of Online Writing ❤️ If you enjoyed this post, please hit the like button. I will revisit this tagline months later to Learn one concept with your coffee . Substack recommendations are a feature where one newsletter can endorse another, so its readers get suggested to subscribe. I haven’t yet formalized the process, but if you’re interested in contributing as a guest writer, let’s discuss . Origins We were in July 2024. It was a warm weekend, and I was lying in bed thinking about my next “big thing”. A few years earlier, I had finished writing my book. It was an exhausting experience, but I finally felt ready for another challenge. All of a sudden, I got an idea. What if I launched… a podcast? I jumped out of bed, searched for a book on podcasts, bought one, and started reading about all that needs to be known: the different formats, how to find an audience, whether we should invite guests, and so on. I even had the perfect name. As a lover of the cozy atmosphere of coffee shops, I wanted my podcast to reflect that same warm ambience. The name would be The Coder Cafe . But a few days later, the excitement faded. Did I really want to make a podcast after all? I had a few concepts in mind, but none of them truly clicked . Eventually, I decided to drop the idea. Yet the name The Coder Cafe stuck with me. I checked thecoder.cafe domain, and it was available. Every great story starts with a domain name. Let’s buy it! Now I had a domain name, but I still didn’t know what to do about it. Around that time, I started reading a lot of newsletters. One in particular inspired me for the quality and regularity of its content: by . I even launched one called Go Engineer, which I quickly stopped. Thanks to my book, I already had a Go audience. But deep down, I didn’t want to write only about Go anymore. I didn’t want to be tied to a single language when there are so many areas I’m passionate about: code health, testing, distributed systems, reliability, observability, performance, and more. At some point, the two ideas converged. I would create a newsletter, and it would be called The Coder Cafe . It wouldn’t be tied to one language. It would be a place where any software engineer could find something useful. I already had experience with online writing. My Medium blog had more than 4k followers. But since my book, I hadn’t really written there. I needed a fresh start and a chance to relearn how to write online. Writing a book and writing online are two very different activities. So I ditched my book on podcasts and bought another one: The Art and Business of Online Writing . If you don’t know this book, I really recommend it. Two principles in particular stuck with me: Volume matters. The most popular blogs and newsletters publish often. Timeless topics matter. Daily news about which stock to buy has volume but no staying power. Timeless content always wins. Four posts from Monday to Thursday, each on one core concept. A recap on Friday, to reinforce the week’s lessons. None of them really convinced me, though. When I was about to design yet another version, this time of a logo that repeats inside itself a few times to hint at recursion, I suddenly became aware of my lack of artistic skills . I decided to delegate the work, heading to  Fiverr  to find a freelancer. That’s how I met a very talented artist, Eli Huynh . She even worked on the Attack on Titan anime, can you believe it? I asked her to craft a coffee shop logo, and I slipped in some personal touches I wanted to see hidden in the design: An illustration of Designing Data-Intensive Applications by , my favorite computer science book. Moby Dock , the Docker mascot (I worked at Docker and loved my time there!) A Docker command on the coffee machine. I shared it with different people, and the feedback was unanimous: the drawing was stunning, but… it was not really a logo. Nothing against the artist, obviously, I was the one giving the requirements, but indeed, it wasn’t a logo. So I moved this illustration to the About page and kept searching for another freelancer. After a few failures, I started working with someone who feels promising. His first attempt was the following: After many back-and-forths, we converged on this version: When I received it, I loved it instantly. The colors felt warm and cozy. It captured the atmosphere I had in mind. Sometimes you don’t need outside validation, you just know. This logo would be The Coder Cafe ’s identity. In the end, I spent dozens of hours on this logo quest. You might think it was absurd, especially since I still hadn’t written a single post. But for me, visual identity had to come first. It’s like opening a restaurant: before designing the menu, you work on the atmosphere and decoration. Is it really absurd? I don’t think so. The same applied to the tagline. I tried dozens of variations: One daily concept. Learn daily, grow deeply. A timeless concept with your coffee. Brewed daily. However, barely two weeks after launch, my girlfriend and I learned something that would have a massive impact on my daily newsletter: we were going to have a baby . Balancing my job at Google, my girlfriend, some social life, and a daily newsletter was already challenging. With a newborn on the way, it became absolutely impossible. Just one week after launch, I was the happiest man alive, but my daily newsletter concept was already dead. Marketing A quick pause to talk about marketing. I’m not going to overwhelm you with my “massive” marketing campaign (aka one post on LinkedIn and one post on X). But one day, I came up with a fun idea. What if I hid a coupon for lifetime access to the newsletter somewhere on thecoder.cafe website? Developers love puzzles. Maybe this could go viral? I started by hiding a tiny URL in the illustration designed by Eli Huynh: When pasted in a browser, this URL pointed to a Gist containing some JavaScript code. Running the code displayed a Base64 string. Decoding it three times gave another URL. Visiting that link opened a blank page that printed yet another URL in the browser console. That new URL pointed to an SVG file of a blue circle. Opening the SVG’s source revealed a hidden message with a free coupon: Get ready for my next book on marketing! ( Credits to the original book) Stopping The Coder Cafe ? So, I knew I would eventually have to stop publishing daily because I would run out of time. It broke my heart for one particular reason. With free subscribers, I can explore topics. If they do not like a post, they skip it. With paying subscribers, though, I felt accountable. People spend their own money on my content, and I cannot disappoint them. The daily newsletter was a contract between them and me, and I was about to break it. In December, my buffer was shrinking fast, and I started to think seriously about stopping everything and refunding all paid subscriptions. Then something unexpected happened. It was close to Christmas, and the post of the day was on TDD . I was at my parents’ home when I received an email notification: a new paid subscriber had joined. And not just anyone, himself. Funny enough, I was not even praising TDD. I said that while it makes sense in some contexts, I do not really use it. A few days later, Kent left a comment on one of my posts: Looking back, that comment changed everything for me. It gave me confidence in myself and in my writing. If someone like Kent Beck found value in what I was doing, then maybe there was something worth continuing. Right after that, I took the bull by the horns and published a post titled Stepping Back to Move Forward . I explained that I would stop paid subscriptions, and I also pointed out a deeper issue with my daily format. As I explained, each week had a theme, for example, unit tests. Due to limited writing time, the daily posts were not very long. Instead of writing one in-depth article, I split the topic into four posts. Since only one of them was free, that was the one I shared on platforms like Reddit. The feedback I got was often that it lacked depth. I could have argued online: “But the depth is spread across four posts. If you read the whole series, you’ll see it. You just have to become a paid subscriber. Blah blah blah.” Let’s be honest. Nobody would have cared. Readers judged the content they saw. If it looked shallow, they were right, period. This made it hard to attract new people, and apart from the post that went viral, most of my content stayed relatively anonymous. One week later, I sent a private email to all paid subscribers. I explained I was stopping paid subscriptions and issuing refunds: On January 3rd, I had 747 free subscribers and went from 15 to 0 paid. From that moment on, The Coder Cafe became about enjoying the writing without pressure to deliver and taking the time to delve more into each topic. What Followed In March, I decided to stop blogging on Medium and fully switch to Substack. I feel at home here. I also wondered what would happen if I wrote more personal reflections and stories rather than concepts. So, I created a new section called Lattes & Stories (this post is in this section). I really enjoy writing in a more storytelling style. In April, I reached the 1,000 subscriber mark. To celebrate, I organized a coding challenge and gathered around $1,000 worth of prizes from Keychron , , JetBrains , and O’Reilly . Thanks again to the sponsors! It was a lot of fun to run. Maybe I will do another one later. $10,000 of prizes for 10,000 subs? We can still dream. Also in April, the idea of building a community around The Coder Cafe started to emerge. I wanted to create a sense of belonging, where members feel that they matter to one another and to the group. So, I created a Discord server. I’m still exploring options to spark engagement. Join the Discord community In May, something surreal happened: , the very newsletter that inspired me, recommended The Coder Cafe 2 . In September, I converted the content of year one into a 260-page book, available on Leanpub . Also in September, I enabled sponsorships to explore partnerships with companies interested in supporting The Coder Cafe . If you would like to partner on a post, you can learn more here . Some stats after one year: The Coder Cafe reached more than 3,600 subscribers across 119 countries. I wrote 78 posts and collected 309,432 views. Two posts stood out and together represent about one-third of total views: So, I Wrote a Book: The Story Behind 100 Go Mistakes and How to Avoid Them . I started writing this as soon as I finished my book, but it took three years to get the perspective to complete it. This post has a special meaning for me. Working on Complex Systems: What I Learned Working at Google . A deep dive into complex systems, written during a holiday with my brother in a city life retreat in the middle of Sweden. That trip also gave the post a special meaning. Writing from there with that view was unforgettable. So, I Wrote a Book Why I Switched to Vim Keybindings What I Learned During My Paternity Leave The Art and Business of Online Writing

0 views
The Coder Cafe 1 months ago

Announcing The Coder Cafe Season 1 (Book)

🔔 This post is in the Announcements section, where we share news and updates related to The Coder Cafe. Notifications for each section can be configured in your settings . TL;DR We turned year one of The Coder Cafe into a 260-page book. Published on Leanpub: DRM-free EPUB/PDF. Works on Kindle, Kobo, iPad: read it anywhere. Pay what you want, min $4.90. Buying a copy is a way to support The Coder Cafe . Get the book Behind the Book Today marks the first anniversary of The Coder Cafe newsletter 🥳. To celebrate, I gathered the core concepts we explored this year into one book. I just published The Coder Cafe Season 1: Timeless Concepts for Software Engineers on Leanpub . If you’re unfamiliar with Leanpub, it’s a platform for DRM-free EPUB/PDF books with pay-what-you-want pricing and free updates. You can set your price (min $4.90) and read it on Kindle, Kobo, iPad, or e-reader/app. Drawn from the first year of the newsletter, it’s a single, carefully sequenced journey. Read sequentially, or jump to the concept you need. I’ve also included a special bonus with the book: my personal algorithms & data structures Anki deck, the support I mainly used to prepare for the Google SWE interviews. Buying the book helps support The Coder Cafe into year two and some even more ambitious projects. Get the book TL;DR We turned year one of The Coder Cafe into a 260-page book. Published on Leanpub: DRM-free EPUB/PDF. Works on Kindle, Kobo, iPad: read it anywhere. Pay what you want, min $4.90. Buying a copy is a way to support The Coder Cafe .

0 views
The Coder Cafe 2 months ago

Organic Growth vs. Controlled Growth

☕ Welcome to The Coder Cafe! Today, we will discuss the concept of organic growth vs. controlled growth. Get cozy, grab a coffee, and let’s begin! A few months ago, I was in a meeting discussing the state of a codebase and heard myself say: This codebase has grown organically. After reflecting on my own words, I began questioning myself: I don’t even know what organic growth really means and whether it is ultimately positive or negative. Discussing the concept with various colleagues, I noticed we had different interpretations. I also created a poll on X that went viral (# irony ), showing that most people see organic growth as something positive. If Perplexity is correct, the term organic growth was used first in Bushido: The Soul of Japan , published in 1899. The author, Inazo Nitobe, describes Bushido (aka the way of the samurai) as something that developed through organic growth rather than being the invention of a single person or the result of a single event. In other words, Bushido emerged gradually over decades and centuries as a collective experience, not the product of one individual. It’s easy to draw parallels with the many codebases we maintain at work. The older the codebase and the more developers it has had, each with their own vision of how the code should look. Differences in perception arise from factors like developers’ varied mental models based on the parts of the code they know best. Developers come and go, making changes, introducing biases about what the future should be, and after years, the codebase reflects hundreds or even thousands of such incremental changes. This is organic growth in software: incremental delivery driven by agile methodology that shapes codebases, often in a bottom-up manner. So, is organic growth good or bad? Let’s use the garden metaphor: imagine a public garden where anyone can plant whatever they want and wherever they want. Over time, sure, it’s colorful, but there’s little harmony, with a lack of overall guidance or direction: Now imagine we control what can be planted and where, and periodically review and rearrange what’s growing. Our garden will likely look harmonious and well-maintained: In my opinion, organic growth carries a somewhat negative connotation, implying a lack of direction, with systems growing out of ad hoc solutions and quick fixes rather than intentional design. So, how can we move from a messy garden to a harmonious one? By transitioning from organic to controlled growth. Controlled growth is the deliberate and reflective process of evolving a codebase. With controlled growth, we should also embrace progress that is not always perfect but consistently guided by shared patterns and standards, fostering sustainable, manageable, and harmonious development over time. Here are five main rules for controlled growth: Be consistent . Follow agreed conventions and standards to build maintainable and predictable codebases. Many standards aren’t explicit, so strive to follow even the implicit ones. For example, if an entity is consistently named X, don’t be a cowboy and name it Y; stick to X. Plan ahead for large changes. Think strategically and collectively about big transitions such as significant new features. Avoid chaos and technical debt by breaking ambitious changes into smaller phases, communicating plans early, and leveraging documentation assets such as design docs (see Explore Further section). Hold regular retrospectives. After major development milestones, gather the team to review what went well and what could improve. Retrospectives foster a culture of learning and continuous improvement, a strategic pillar of controlled growth. Apply the Boy Scout rule. Frequent changes are inevitable and often rushed under pressure. Remember: “Always leave the campground cleaner than you found it.” When making a pull request, spend a little time refactoring small bits, cleaning legacy code, removing dead code, or renaming variables. This keeps the garden harmonious over time. Build a team where everyone shares responsibility and speaks openly. Controlled growth shouldn’t fall only on tech leads; it should be everyone’s job. When all team members feel heard and share responsibility for the health and quality of the codebase, the team grows on a stronger, more sustainable foundation. Subscribe now Tidy First? // We already discussed organic growth in this post. Focus on Product Ideas, Not Requirements Cognitive Load Bushido: The Soul of Japan Design Docs at Google ❤️ If you enjoyed this post, please hit the like button. 💬 When you say a codebase grew organically, what do you mean? Leave a comment A few months ago, I was in a meeting discussing the state of a codebase and heard myself say: This codebase has grown organically. After reflecting on my own words, I began questioning myself: I don’t even know what organic growth really means and whether it is ultimately positive or negative. Discussing the concept with various colleagues, I noticed we had different interpretations. I also created a poll on X that went viral (# irony ), showing that most people see organic growth as something positive. If Perplexity is correct, the term organic growth was used first in Bushido: The Soul of Japan , published in 1899. The author, Inazo Nitobe, describes Bushido (aka the way of the samurai) as something that developed through organic growth rather than being the invention of a single person or the result of a single event. In other words, Bushido emerged gradually over decades and centuries as a collective experience, not the product of one individual. It’s easy to draw parallels with the many codebases we maintain at work. The older the codebase and the more developers it has had, each with their own vision of how the code should look. Differences in perception arise from factors like developers’ varied mental models based on the parts of the code they know best. Developers come and go, making changes, introducing biases about what the future should be, and after years, the codebase reflects hundreds or even thousands of such incremental changes. This is organic growth in software: incremental delivery driven by agile methodology that shapes codebases, often in a bottom-up manner. So, is organic growth good or bad? Let’s use the garden metaphor: imagine a public garden where anyone can plant whatever they want and wherever they want. Over time, sure, it’s colorful, but there’s little harmony, with a lack of overall guidance or direction: Now imagine we control what can be planted and where, and periodically review and rearrange what’s growing. Our garden will likely look harmonious and well-maintained: In my opinion, organic growth carries a somewhat negative connotation, implying a lack of direction, with systems growing out of ad hoc solutions and quick fixes rather than intentional design. So, how can we move from a messy garden to a harmonious one? By transitioning from organic to controlled growth. Controlled growth is the deliberate and reflective process of evolving a codebase. With controlled growth, we should also embrace progress that is not always perfect but consistently guided by shared patterns and standards, fostering sustainable, manageable, and harmonious development over time. Here are five main rules for controlled growth: Be consistent . Follow agreed conventions and standards to build maintainable and predictable codebases. Many standards aren’t explicit, so strive to follow even the implicit ones. For example, if an entity is consistently named X, don’t be a cowboy and name it Y; stick to X. Plan ahead for large changes. Think strategically and collectively about big transitions such as significant new features. Avoid chaos and technical debt by breaking ambitious changes into smaller phases, communicating plans early, and leveraging documentation assets such as design docs (see Explore Further section). Hold regular retrospectives. After major development milestones, gather the team to review what went well and what could improve. Retrospectives foster a culture of learning and continuous improvement, a strategic pillar of controlled growth. Apply the Boy Scout rule. Frequent changes are inevitable and often rushed under pressure. Remember: “Always leave the campground cleaner than you found it.” When making a pull request, spend a little time refactoring small bits, cleaning legacy code, removing dead code, or renaming variables. This keeps the garden harmonious over time. Build a team where everyone shares responsibility and speaks openly. Controlled growth shouldn’t fall only on tech leads; it should be everyone’s job. When all team members feel heard and share responsibility for the health and quality of the codebase, the team grows on a stronger, more sustainable foundation. Tidy First? // We already discussed organic growth in this post. Focus on Product Ideas, Not Requirements Cognitive Load Bushido: The Soul of Japan Design Docs at Google

0 views
The Coder Cafe 2 months ago

What I Learned During My Paternity Leave

🔕 This post is part of the Lattes & Stories section, where I share personal reflections and stories (not the regular Concepts section). If you want, you can turn off notifications for this section here : Notifications → Disable “Lattes & Stories“. ☕ Welcome to The Coder Cafe! Today is a recap of my paternity leave, focusing on the things I read and learned. Get cozy, grab a coffee, and let’s begin! At Google, we get 18 weeks of paternity leave (yes, that’s amazing). Since the birth of my baby at the end of May, I’ve been out of the office and will be back at work next week. These months were the best time to spend with my newborn, but also a nice chance to read, learn, and try new things. Here are some of the technical and non-technical things I got into. I started reading Code Health Guardian . Let me jump straight: it’s one of the best books I’ve ever read on software engineering . Period. Don’t be fooled by the use of “AI“ in the subtitle (“ The Old-New Role of a Human Programmer in the AI Era ”). The focus is on code health. It covers various topics such as complexity, causes of complexity, documentation, interfaces, code discoverability, and functional programming. To me, it felt like a more modern (and better) version than Clean Code. A quote that perfectly summarizes my love for this book: Clever in programming is a compliment, clever in software engineering is an accusation. I strongly recommend this book. Learning Systems Thinking is a book about the concept of systems thinking. What’s system thinking? Modern software is no longer just isolated applications; they are becoming systems of software. Systems thinking invites us to shift our perspective from focusing on a single software to looking at the larger system it belongs to, and how the parts interact. One interesting idea in the book is the Iceberg model. It suggests that what we see (the events) is only the tip of the iceberg. Beneath the surface, there are patterns of behavior, deeper systemic structures, and even mental models that shape how the system works: The lesson is that when working with systems, we should move from reacting to events to understanding the patterns and structures that create them, so we can design better long-term solutions . The book was a good introduction. Yet, in retrospect, I would have liked it to give more concrete actions or applied examples. I will need to follow up with another resource to go deeper into practical applications of systems thinking. NOTE : Have you read my post on complex systems? It’s the most-read post of The Coder Cafe. During my leave, I started learning C++. Why? At Google, many systems are developed in C++. One I’ve been involved with is Borg . Because I hadn’t done C/C++ since my studies, every pull request I made was painful. I wanted to improve that. I started with A Tour of C++ by Bjarne Stroustrup. It’s refreshingly short for a C++ book (about 300 pages). That was my entry point. Let’s see what will come next 1 . NOTE : Did you know that Google uses C++ without exceptions? Mostly for performance and maintainability reasons, functions return an 2 , which is similar to Rust’s . At some point, I wanted to delve deeper into distributed systems, but I felt like I had already read most of the well-known books on the topic. What I had completely missed until then was the angle of technical whitepapers. Most of them are more challenging to read than blog posts, but they offer a depth that can’t be matched . I read a few during my leave, including F1: A Distributed SQL Database That Scales and Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service . After my paternity leave, I plan to continue exploring more of them as they have become one of my favorite sources of technical insight. For discovery, I used ‘s papershelf . The Mom Test is a great book that explores patterns and anti-patterns when discussing business or product ideas with customers. The core idea is this: if we pitch an idea to our mom, she will tell us it’s great, even if it’s not . Customers often do the same. They don’t want to hurt our feelings, so they give polite feedback instead of useful feedback. The solution is to avoid asking opinion-based questions like “ Do you think it’s a good idea? ” Instead, we should ask about real experiences and behaviors, questions that even our mom couldn’t fake. For example: “ Would you buy a product which did X? ” “ How much would you pay for X? ” “ How are you dealing with it now? ” “ Why do you bother? ” “ Talk me through the last time that happened. ” I haven’t run customer interviews myself, but I’ve participated in some. Interestingly enough, many of the anti-patterns from the book showed up in those meetings. This wasn’t a book with an immediate outcome for me, but it broadened my perspective, and that’s always valuable. The Art of Explanation is a book written by a BBC presenter and journalist. The author describes how to clearly explain any topics, focusing on ten main attributes: Simplicity: Is this the simplest way we can say this? Essential detail: What detail is essential to this explanation? Complexity: If a topic is complex, we can’t dodge the complexities and hope to explain something well. It reminded me of this illustration: Efficiency: Is this the most succinct way we can explain this? Precision: Are we saying exactly what we want to communicate? Context: Are we provided all the necessary context for people to understand the topic? No distractions: Are there any verbal, written or visual distractions? (It reminded me of rule #9 here .) Engaging: Are there times when it’s easy to lose focus? Lots of good things on how to maintain a good flow, such as making sure we move from one sentence to another logically. Useful: Have we answered the questions that people may have? Clarity of purpose: Above all else, what are we trying to explain? The book gave me a practical checklist I can return to whenever I need to convey something complex . It was also a reminder that being clear isn’t a gift, it’s a skill worth practicing regularly. Made to Stick is a book that focuses on the question of why some ideas have a lasting impact while others don’t. The authors introduce their SUCCESs framework to make ideas stick : Simple: Find the core of an idea. The more we reduce the amount of information in an idea, the stickier it will be. I loved this line: When you say three things, you say nothing. Unexpected: A great way to catch attention is with a surprise. We can’t demand attention; we must attract it, and one of the easiest ways is to convey something unexpected by breaking an existing pattern. Concrete: Our brains are wired to remember concrete data. Credible: The more credible we are, the more an idea will stick. Emotions: A great way to make people care is to convey emotion, to make them feel something. Story: How can we make them feel something? Via stories and great storytelling. Made to Stick helped me learn that ideas don’t just succeed because they are true; they also succeed because they are communicated in a way that people remember . I strongly recommend this book as well. Building a Second Brain was one of my favorite reads during this time. I loved it so much that I even wrote a dedicated post about it: For me, this book was a real game-changer in how I capture and reuse what I learn. Steal Like an Artist is a book on creativity built around a simple idea: nothing is completely original. As creators (writers, musicians, or anyone producing work), the author suggests we should “steal” from anything that inspires us and sparks our imagination. I didn’t find this book as compelling as others, but it does offer a few useful insights. One that stuck with me is the difference between copying and creating: copying a single person is plagiarism, but drawing from many influences is what makes something feel original. The trick isn’t to imitate, it’s to transform and remix ideas until they become your own. Four Thousand Weeks is a productivity book, but with a very different take than most books on the topic. The title comes from the fact that our lifespan is roughly 4,000 weeks. During my paternity leave, I felt overwhelmed at times by everything I wanted to read and do, and this book helped. Its message is simple: instead of trying to produce more and more, we should accept that our time is limited. We will never get everything done. What we can do is focus on the few things that matter most and give them our best attention. It helped me find some peace and reminded me to focus on what matters most to me. Last but not least, I couldn’t spend this paternity leave without some science fiction and fantasy reading. On the sci-fi side, I started exploring the Warhammer 40k universe by picking up a few books about the Night Lords faction. If all you know about Warhammer 40k is the painting and miniatures, you should also know there’s a massive, surprisingly coherent lore built around it. I always assumed it would be some cheap sci-fi, but I was wrong; it’s much richer than I expected. On the fantasy side, I usually enjoy epic fantasy such as The Lord of the Rings or The Realm of the Elderlings . Yet, I wasn’t in the mood for long and heavy stories. I wanted something lighter and more relaxing. That’s how I found the “cozy fantasy” genre. I read with Legends & Lattes and The Spellshop . Nothing really epic happens in these books, but the atmosphere is calm and comforting. Perfect for a quiet morning coffee (after a tough night). Above all else, these four months were a great opportunity to spend a lot of time with my kid (thanks, Google, for that). I’m not going to elaborate too much on how it went, as it’s very personal. The only thing I want to emphasize is how having a baby is one of those rare experiences in life. Let me explain: If we traveled to Japan and loved it, we could still say next year, " I’m going back to Japan again.” If we enjoyed a tlayuda , we could say this weekend, “ I’m going to make some tlayudas again.” If we enjoyed that Coldplay concert, we could say, “ Next album, I’m going to see them again.” With a baby, things feel different. Experiences are truly unique because of how quickly they grow. Especially in the beginning, a baby changes day in, day out: a new way of looking at things, better head control, a new way to grab an object, etc. Because babies change so quickly, every experience becomes one of a kind. Tomorrow is already going to be a different day, a different experience. But if we look beyond, we might even say that most experiences are truly unique: We might return to Japan, but we will notice new things, meet new people, or experience it differently. We might make tlayudas again, but they won’t taste exactly the same. We might see Coldplay again, but maybe we’re with a different person, in a different city, at a different point in our lives. Being a dad brought me this truth: life itself is unrepeatable in its details. It is not inherently good or bad; it is simply what makes so many experiences unique and fatherhood so magical. ❤️ If you enjoyed the post, please consider giving it a like. It’s a helpful signal to decide what to write next. 💬 What have you learned in these past months? Anything you’d like to share? Leave a comment So, I Wrote a Book Why I Switched to Vim Keybindings Effective Modern C++ perhaps? Let me know what you would recommend as a follow-up. https://abseil.io/docs/cpp/guides/status At Google, we get 18 weeks of paternity leave (yes, that’s amazing). Since the birth of my baby at the end of May, I’ve been out of the office and will be back at work next week. These months were the best time to spend with my newborn, but also a nice chance to read, learn, and try new things. Here are some of the technical and non-technical things I got into. Technical Stuff Code Health Guardian I started reading Code Health Guardian . Let me jump straight: it’s one of the best books I’ve ever read on software engineering . Period. Don’t be fooled by the use of “AI“ in the subtitle (“ The Old-New Role of a Human Programmer in the AI Era ”). The focus is on code health. It covers various topics such as complexity, causes of complexity, documentation, interfaces, code discoverability, and functional programming. To me, it felt like a more modern (and better) version than Clean Code. A quote that perfectly summarizes my love for this book: Clever in programming is a compliment, clever in software engineering is an accusation. I strongly recommend this book. Systems Thinking Learning Systems Thinking is a book about the concept of systems thinking. What’s system thinking? Modern software is no longer just isolated applications; they are becoming systems of software. Systems thinking invites us to shift our perspective from focusing on a single software to looking at the larger system it belongs to, and how the parts interact. One interesting idea in the book is the Iceberg model. It suggests that what we see (the events) is only the tip of the iceberg. Beneath the surface, there are patterns of behavior, deeper systemic structures, and even mental models that shape how the system works: The lesson is that when working with systems, we should move from reacting to events to understanding the patterns and structures that create them, so we can design better long-term solutions . The book was a good introduction. Yet, in retrospect, I would have liked it to give more concrete actions or applied examples. I will need to follow up with another resource to go deeper into practical applications of systems thinking. NOTE : Have you read my post on complex systems? It’s the most-read post of The Coder Cafe. Learning C++ During my leave, I started learning C++. Why? At Google, many systems are developed in C++. One I’ve been involved with is Borg . Because I hadn’t done C/C++ since my studies, every pull request I made was painful. I wanted to improve that. I started with A Tour of C++ by Bjarne Stroustrup. It’s refreshingly short for a C++ book (about 300 pages). That was my entry point. Let’s see what will come next 1 . NOTE : Did you know that Google uses C++ without exceptions? Mostly for performance and maintainability reasons, functions return an 2 , which is similar to Rust’s . Whitepapers At some point, I wanted to delve deeper into distributed systems, but I felt like I had already read most of the well-known books on the topic. What I had completely missed until then was the angle of technical whitepapers. Most of them are more challenging to read than blog posts, but they offer a depth that can’t be matched . I read a few during my leave, including F1: A Distributed SQL Database That Scales and Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service . After my paternity leave, I plan to continue exploring more of them as they have become one of my favorite sources of technical insight. For discovery, I used ‘s papershelf . Non-Technical Stuff The Mom Test The Mom Test is a great book that explores patterns and anti-patterns when discussing business or product ideas with customers. The core idea is this: if we pitch an idea to our mom, she will tell us it’s great, even if it’s not . Customers often do the same. They don’t want to hurt our feelings, so they give polite feedback instead of useful feedback. The solution is to avoid asking opinion-based questions like “ Do you think it’s a good idea? ” Instead, we should ask about real experiences and behaviors, questions that even our mom couldn’t fake. For example: Bad: “ Would you buy a product which did X? ” “ How much would you pay for X? ” Good: “ How are you dealing with it now? ” “ Why do you bother? ” “ Talk me through the last time that happened. ” Simplicity: Is this the simplest way we can say this? Essential detail: What detail is essential to this explanation? Complexity: If a topic is complex, we can’t dodge the complexities and hope to explain something well. It reminded me of this illustration: Efficiency: Is this the most succinct way we can explain this? Precision: Are we saying exactly what we want to communicate? Context: Are we provided all the necessary context for people to understand the topic? No distractions: Are there any verbal, written or visual distractions? (It reminded me of rule #9 here .) Engaging: Are there times when it’s easy to lose focus? Lots of good things on how to maintain a good flow, such as making sure we move from one sentence to another logically. Useful: Have we answered the questions that people may have? Clarity of purpose: Above all else, what are we trying to explain? Simple: Find the core of an idea. The more we reduce the amount of information in an idea, the stickier it will be. I loved this line: When you say three things, you say nothing. Unexpected: A great way to catch attention is with a surprise. We can’t demand attention; we must attract it, and one of the easiest ways is to convey something unexpected by breaking an existing pattern. Concrete: Our brains are wired to remember concrete data. Credible: The more credible we are, the more an idea will stick. Emotions: A great way to make people care is to convey emotion, to make them feel something. Story: How can we make them feel something? Via stories and great storytelling. If we traveled to Japan and loved it, we could still say next year, " I’m going back to Japan again.” If we enjoyed a tlayuda , we could say this weekend, “ I’m going to make some tlayudas again.” If we enjoyed that Coldplay concert, we could say, “ Next album, I’m going to see them again.” We might return to Japan, but we will notice new things, meet new people, or experience it differently. We might make tlayudas again, but they won’t taste exactly the same. We might see Coldplay again, but maybe we’re with a different person, in a different city, at a different point in our lives. So, I Wrote a Book Why I Switched to Vim Keybindings

0 views
The Coder Cafe 2 months ago

Second Brain

☕ Welcome to The Coder Cafe! Today, I wanted to share a book that had a huge impact on how I organize my personal and professional knowledge: Building a Second Brain by Tiago Forte. Get cozy, grab a coffee, and let’s begin! A Decade of Failures For over a decade, I failed miserably at keeping an effective note-taking system, mostly for two reasons: I never had a single, centralized place. I used to rely on a combination of Notion + Apple Notes + a physical notebook + Kindle highlights + Anki . I didn’t have a system that was generic enough. I had a different format pretty much every time: book summaries, posts, computer science notes, mistake journal , personal growth notes, and so on. The result was a messy system that didn’t scale. I kept losing knowledge, and it made learning inefficient . When we read or watch a great resource but don’t capture what we learned from it, chances are high that within a few months or even weeks, that knowledge fades away. That’s what kept happening to me. In a world full of content—social media, work, books, courses, podcasts—being able to extract and retain working knowledge isn’t optional; it’s necessary if we want to keep growing as an engineer and as a person . Let’s discuss an approach to solve this problem. One of the first sentences in the book hooked me: Your mind is for having ideas, not holding them. Our brain is great for processing ongoing tasks, but it’s not meant to retain everything. Over time, it lets go of unused ideas to make room for more relevant ones. This process, referred to as synaptic pruning, helps us adapt, but it also means we lose what we don’t externalize. That’s the promise of a second brain: a place to offload notes and thoughts, avoid losing knowledge, and build on it over time with new ideas. Before switching to the second brain approach, I also struggled with my notes because I wanted everything to be perfect. That drained a lot of time and energy. But one paragraph from the book really shifted my perspective: We have to remember that we are not building an encyclopaedia of immaculately organized knowledge . We are building a working system . […] For that reason, you should prefer a system that is imperfect, but that continues to be useful in the real conditions of your life. That changed everything for me. I stopped chasing perfection and started building something that simply works, something that supports me every day, both at work and in my personal life. What I found enriching is that the book presents not just a system, but also a mindset . Let’s start with the latter. Knowledge work is about taking information and turning it into results, for example, delivering on a project. All day, we consume and then produce: The problem with this approach is that most of the information that we will gain will eventually be lost. Sure, we might remember the most important parts for a while. But what about the rest? It will fade, eventually. What we miss is a feedback loop, a way to recycle information into knowledge that we can reuse later : That’s what a second brain gives us: a way to turn information into assets we can reinvest in the future. Another interesting idea in the book is to stop seeing notes as a flat list of things we’ve saved. Our notes can become building blocks , meaning pieces that help us create new ideas later on. That’s a powerful shift: notes aren’t just storage, they’re raw material for thinking. They can connect and evolve into new ideas. Being able to track down all your notes in a single place makes those connections easier to spot and build on. One last point I wanted to discuss: creativity. Here’s the definition given by Neuroscientist Nancy C. Andreasen: Creative people are better at recognizing relationships, making associations, and connections. A second brain is not only a memory tool, but also a thinking tool. Keeping an effective system to track down notes and ideas becomes a great tool to improve our creativity. Remember: creativity is not a talent, it’s a way of operating . NOTE: Did you enjoy this punchline? It’s coming from a note I hold on creativity in my second brain 😎. In this section, we’ll delve into what a note really is, how to take notes effectively, and then go over the PARA system presented in the book. So, what’s a note? The author defines it this way: A piece of content, interpreted through your lens, curated according to your taste, translated into your own words, or drawn from life experience, and stored in a secure place. Let’s go over the different parts of that definition. First, a note is something to use, not just to collect. Again, we’re not building an encyclopaedia, we’re building something that works for us. For example, if we’re interested in public speaking, we don’t need a note on every single resource we read or watch. Instead, the author suggests asking a few questions to decide what’s worth capturing: Does it inspire us? Is it useful? Is this personal? Is it surprising? Ultimately, we should capture what resonates. For example, we may have read this amazing book that everyone is talking about and discover that it doesn’t resonate with us at all 1 . Conversely, we may have watched a 30-second video that had a profound effect on us and triggered emotions. In this case, it’s better to spend some time creating a note on this video rather than the book. When something resonates with us, it’s our emotion-based, intuitive mind telling us it’s interesting before our logical mind can explain why. Every time we take a note, we should ask: “ How can I make this as useful as possible for my future self? ” One way to do that is to be mindful of our future limited time. Instead of tracking dozens, if not hundreds, of lines, we should focus on finding the essence , meaning the heart and soul of what a resource is trying to communicate. But in some cases, that’s barely possible. For example, I recently read the F1: A distributed SQL database that scales whitepaper. How can we capture the essence of such a dense technical document containing so much valuable information? The solution, brought again by the author, is to use the progressive summarization technique . In short, it’s about layering our notes: Layer 1 - Captured notes : Either a copy and paste or even better writing, down in our own words what we understood from it. Layer 2 - Bolded passages : Go over the captured notes and mark in bold the most important pieces. Layer 3 - Highlighted passages : Go over the bolded passages and highlight the most important pieces. Layer 4 - Summary : Write down a summary. Here’s an example from my F1 whitepaper notes. I created two main sections: Summary and Highlights . Layer 1: In Highlights, I captured all the raw content, for example, on locking. Layer 2: I bolded the passages that were most interesting to me. Layer 3: I highlighted the most important parts. Layer 4: I wrote a summary section with the ideas that were the most interesting for me. In the end, the note looks like this: This approach allows me to come back later and start with a quick summary to refresh the main ideas. If I need a bit more, I can scan the bolded and highlighted passages. And if I really want to dive deeper, I can go through all the highlights. Of course, every layer is optional. If we consider that a resource only needs layers 1 and 4 of summarization, or just 1 and 2, that’s perfectly fine. Again, our second brain should be something that works for us. The PARA system is at the core of the book. It’s a proposition on how to organize our notes. It is designed for actionability, with layers of action. To make things clear, a note can be assigned to one of the following domains: Project : A short-term effort with a possible due date and a clear outcome that needs to happen in order to mark the project as complete. For example, publish a blog post about second brain. Area : Ongoing responsibilities, what we are committed to, and what requires constant attention. Resource : A catchall for anything that doesn’t belong to a project or an area. Archive : When a note becomes inactive or outdated, we can move it to the archive. NOTE : The distinction between area and resource wasn’t immediately clear to me, so here’s how I think about it. I enjoy both fitness and climbing. But I’m only committed to fitness. I try to eat healthy, work out regularly, and so on. Climbing, on the other hand, is something I enjoy, but I only go from time to time. So in my system, fitness is an area, and climbing is a resource. The PARA system has two main benefits. Clear focus : We’re not mixing short-term efforts with long-term maintenance. It helps us focus on outcomes and next steps rather than just piles of information. Genericity : PARA can handle all kinds of notes. It organizes information based on how actionable it is, not what kind of information it is. As I said, I used to lack a system that was generic enough to track all my notes. Now, whether it’s a book I read, a course I followed, or a post I came across, I capture everything that resonates with me using the PARA system. To implement the second brain, I used Notion , which I think is a fantastic tool with a lot of flexibility and configuration options. If you don’t want to build your own second brain from scratch, you can check out this online tutorial: Or, you can use my personal Notion template . This is the setup I use every day to track notes, growth tasks, my to-do list, areas, resources, and more. In summary, the second brain method presented in the book follows the CODE system: Capture : Keep what resonates, leave the rest aside. Organize : Save for actionability (project, area, or resource). Distill : Find the essence of what a resource communicates. Express : Show your work based on the knowledge you gained. Unless you’re a super memory genius, I strongly recommend looking into building a second brain. As we discussed, our brain should be for having ideas, not holding them. Maybe the ideas in this post won’t fit you exactly, and you’ll come up with your own way of tracking notes. However you approach it, building a second brain has been incredibly important for me, and it might be for you as well. Don’t Forget About Your Mental Health Survivor Bias The XY Problem Building a Second Brain Create Your Own Second Brain Your Resource Guide to Building a Second Brain ❤️ If you enjoyed the post, please consider giving it a like. It’s a helpful signal to decide what to write next. 💬 Are you using a second brain system? If not, how do you keep track of your notes and ideas? Leave a comment The Alchemist , I’m looking at you. A Decade of Failures For over a decade, I failed miserably at keeping an effective note-taking system, mostly for two reasons: I never had a single, centralized place. I used to rely on a combination of Notion + Apple Notes + a physical notebook + Kindle highlights + Anki . I didn’t have a system that was generic enough. I had a different format pretty much every time: book summaries, posts, computer science notes, mistake journal , personal growth notes, and so on. That’s what a second brain gives us: a way to turn information into assets we can reinvest in the future. Another interesting idea in the book is to stop seeing notes as a flat list of things we’ve saved. Our notes can become building blocks , meaning pieces that help us create new ideas later on. That’s a powerful shift: notes aren’t just storage, they’re raw material for thinking. They can connect and evolve into new ideas. Being able to track down all your notes in a single place makes those connections easier to spot and build on. One last point I wanted to discuss: creativity. Here’s the definition given by Neuroscientist Nancy C. Andreasen: Creative people are better at recognizing relationships, making associations, and connections. A second brain is not only a memory tool, but also a thinking tool. Keeping an effective system to track down notes and ideas becomes a great tool to improve our creativity. Remember: creativity is not a talent, it’s a way of operating . NOTE: Did you enjoy this punchline? It’s coming from a note I hold on creativity in my second brain 😎. System In this section, we’ll delve into what a note really is, how to take notes effectively, and then go over the PARA system presented in the book. Notes So, what’s a note? The author defines it this way: A piece of content, interpreted through your lens, curated according to your taste, translated into your own words, or drawn from life experience, and stored in a secure place. Let’s go over the different parts of that definition. First, a note is something to use, not just to collect. Again, we’re not building an encyclopaedia, we’re building something that works for us. For example, if we’re interested in public speaking, we don’t need a note on every single resource we read or watch. Instead, the author suggests asking a few questions to decide what’s worth capturing: Does it inspire us? Is it useful? Is this personal? Is it surprising? Layer 1 - Captured notes : Either a copy and paste or even better writing, down in our own words what we understood from it. Layer 2 - Bolded passages : Go over the captured notes and mark in bold the most important pieces. Layer 3 - Highlighted passages : Go over the bolded passages and highlight the most important pieces. Layer 4 - Summary : Write down a summary. Layer 1: In Highlights, I captured all the raw content, for example, on locking. Layer 2: I bolded the passages that were most interesting to me. Layer 3: I highlighted the most important parts. Layer 4: I wrote a summary section with the ideas that were the most interesting for me. Project : A short-term effort with a possible due date and a clear outcome that needs to happen in order to mark the project as complete. For example, publish a blog post about second brain. Area : Ongoing responsibilities, what we are committed to, and what requires constant attention. Resource : A catchall for anything that doesn’t belong to a project or an area. Archive : When a note becomes inactive or outdated, we can move it to the archive. Clear focus : We’re not mixing short-term efforts with long-term maintenance. It helps us focus on outcomes and next steps rather than just piles of information. Genericity : PARA can handle all kinds of notes. It organizes information based on how actionable it is, not what kind of information it is. Capture : Keep what resonates, leave the rest aside. Organize : Save for actionability (project, area, or resource). Distill : Find the essence of what a resource communicates. Express : Show your work based on the knowledge you gained. Don’t Forget About Your Mental Health Survivor Bias The XY Problem Building a Second Brain Create Your Own Second Brain Your Resource Guide to Building a Second Brain

0 views
The Coder Cafe 3 months ago

Availability Models

Hello! Last week, we reached 3,000 subscribers, that’s awesome, thank you all! Recently, I was going through the distributed systems reliability glossary brought by Antithesis & Jepsen , and I found their approach on availability models particularly interesting. Let’s dive into that. Introduction Here are two database whitepapers: Megastore , a Google storage system: Dynamo , an Amazon key-value store: Both of these whitepapers use the term highly available . Of course, as a reader, we may expect these two whitepapers to mean the same thing when discussing high availability. After all, an orange is an orange, an apple is an apple, and a highly available system should be a highly available system. Yet, that’s not the case, and each paper means something different. Let’s first look at what availability means, then discuss why high availability is a vague concept, and finally explore the different availability models. In The CAP Theorem , we already discussed availability as: Every request receives a non-error response, even if it may not contain the most up-to-date data. However, this definition missed a crucial dimension: response time . Technically, a system could be available even if it responds after an hour. While such a system technically provides a non-error response, it fails to deliver a usable experience, severely compromising its practical availability from a user perspective. This is where The PACELC Theorem offers a more practical perspective on availability. PACELC highlights that: In the presence of a partition, a system must choose between availability and consistency. In the absence of partition, a system must choose between latency (the upper-bound limit during which a request should receive a non-error response) and consistency. So latency becomes part of availability. And that makes sense, right? If a system is too slow, it’s effectively down for the user. Availability isn’t just about uptime; it's also about whether the system is responsive in a meaningful timeframe. The term high availability is vague. Does it mean 99.9% uptime? 99.999%? ScyllaDB, in their technical glossary says it’s about maintaining levels of uptime that exceed normal SLAs . That’s fine, but it’s also easy to game. Say we define an SLA at 50%, then run at 80%. Technically, we exceeded it. Yet, does that mean we’re offering a highly available database? Probably not. The Antithesis reliability glossary defines high availability as a system that is available more often than a single node . I quite like that one. If we take the availability of our best node and our system does worse than that, it’s not really highly available. Simple and practical. Still, it’s not perfect. Let’s say we have five nodes, each available 50% of the time. With a write quorum of two, our write availability might still reach ~80%. But we’d still be down one request out of five. Hard to sell that as high availability. To bring more clarity to the conversation, Antithesis introduced a set of availability models. An availability model is something to help us define when an operation should succeed. What do we mean by operation? It’s simply a request made to the system. That could be a read, a write, a ping, whatever the system is supposed to handle. Instead of thinking in terms of the whole system being up or down, we look at whether a specific request can succeed, even when parts of the system are failing. Let’s explore three availability models. Definition : A system is majority available if when a majority of non-faulty nodes can communicate with one another, these nodes can execute some operations. Consider a database composed of five nodes: In the nominal case, everything works: a client connects, and the database can process all operations. Now, imagine two nodes go down. Maybe they crash, maybe there’s a network issue causing a partition: We’re left with three nodes that can still talk to each other. That’s a majority. If the database can still perform operations in this situation, we say it’s majority available . That’s how the Megastore whitepaper defined highly available : being majority available. This model is often used when consistency matters. For example, when a write or a leader election requires a majority to agree before responding to the client. So even if some nodes are unreachable, the system can still make safe progress as long as the majority is intact. Definition : A system is totally available if every non-faulty node can execute any operation. Let’s take the same setup: a database with five nodes. This time, three of them are faulty: In a majority available model, we can’t do much as we don’t have a majority. But in a totally available model, the system can still handle operations. Indeed, in this model, each non-faulty node can act on its own. It doesn’t need to coordinate with others or wait for a network round-trip. This model favors latency. Just handle the request and move on. That’s how the Dynamo whitepaper defined highly available : being totally available. The tradeoff is consistency. Totally available systems can’t enforce strong guarantees because the nodes don’t necessarily sync before responding. That’s why this model typically goes hand in hand with weaker consistency models . Definition : A system is sticky available if whenever a client’s transactions are executed against a copy of database state that reflects all of the client’s prior operations, it eventually receives a response, even in the presence of indefinitely long partitions. Let’s look at an example. Two clients, A and B, are connected to a database and make updates over time: Blue updates are made by client A, and green updates are made by client B. Sticky available means: After update 5, client A will eventually get a response that reflects at least updates 1, 4, and 5. After update 3, client B will eventually get a response that reflects at least updates 2 and 3. How can we achieve that? It depends on how replication is handled by the system. In a fully replicated system, all nodes store the full dataset. Sticky availability can be achieved by making sure a client always talks to the same node. Here’s the same example again, but now with client A always connected to node 1, and client B to node 2: Blue updates are made by client A, and green updates are made by client B. Here, node 2 hasn’t yet replicated update 5, and node 1 hasn’t yet replicated update 3. Still, since each client sticks to one node, they eventually see a consistent view of their own operations , despite possible failures such as a partition between the nodes. Now, let’s discuss a partially replicated system where nodes are replicas for subsets of data items. Here’s a (dummy) partitioning system where even-numbered updates go to node 1 and odd-numbered ones to node 2: Blue updates are made by client A, and green updates are made by client B. Here, clients can’t just stick to a single node. Instead, they must maintain stickiness with a single logical copy of the database, which may consist of multiple nodes. Clients can also help implement this model by acting as servers themselves. For example, a client could cache its own reads and writes, allowing it to return responses even during indefinitely long partitions. Highly available is too vague; watch out when you read or hear it. It might mean different things depending on the system or author. Majority available means a majority of nodes can still perform some operations. This model supports stronger consistency. Totally available means each non-faulty node can handle requests independently. It favors latency, but usually comes with weaker consistency. Sticky available means clients can make progress as long as they keep talking to a replica that reflects their own history. Availability models help us reason at the operation level, not just the system level. What matters is which operation can succeed, and under what conditions. ❤️ If you enjoyed the post, please consider giving it a like. It’s a helpful signal to decide what to write next. 💬 When someone says their system is “highly available,” what do you assume they mean? Leave a comment The PACELC Theorem Exploring Database Isolation Levels Latency and User Experience A distributed systems reliability glossary - Antithesis High Availability Database Definition - ScyllaDB Consistency Models - Jepsen Introduction Here are two database whitepapers: Megastore , a Google storage system: Dynamo , an Amazon key-value store: In the presence of a partition, a system must choose between availability and consistency. In the absence of partition, a system must choose between latency (the upper-bound limit during which a request should receive a non-error response) and consistency. In the nominal case, everything works: a client connects, and the database can process all operations. Now, imagine two nodes go down. Maybe they crash, maybe there’s a network issue causing a partition: We’re left with three nodes that can still talk to each other. That’s a majority. If the database can still perform operations in this situation, we say it’s majority available . That’s how the Megastore whitepaper defined highly available : being majority available. This model is often used when consistency matters. For example, when a write or a leader election requires a majority to agree before responding to the client. So even if some nodes are unreachable, the system can still make safe progress as long as the majority is intact. Total availability Definition : A system is totally available if every non-faulty node can execute any operation. Let’s take the same setup: a database with five nodes. This time, three of them are faulty: In a majority available model, we can’t do much as we don’t have a majority. But in a totally available model, the system can still handle operations. Indeed, in this model, each non-faulty node can act on its own. It doesn’t need to coordinate with others or wait for a network round-trip. This model favors latency. Just handle the request and move on. That’s how the Dynamo whitepaper defined highly available : being totally available. The tradeoff is consistency. Totally available systems can’t enforce strong guarantees because the nodes don’t necessarily sync before responding. That’s why this model typically goes hand in hand with weaker consistency models . Sticky Available Definition : A system is sticky available if whenever a client’s transactions are executed against a copy of database state that reflects all of the client’s prior operations, it eventually receives a response, even in the presence of indefinitely long partitions. Let’s look at an example. Two clients, A and B, are connected to a database and make updates over time: Blue updates are made by client A, and green updates are made by client B. Sticky available means: After update 5, client A will eventually get a response that reflects at least updates 1, 4, and 5. After update 3, client B will eventually get a response that reflects at least updates 2 and 3. Blue updates are made by client A, and green updates are made by client B. Here, node 2 hasn’t yet replicated update 5, and node 1 hasn’t yet replicated update 3. Still, since each client sticks to one node, they eventually see a consistent view of their own operations , despite possible failures such as a partition between the nodes. Partially Replicated System Now, let’s discuss a partially replicated system where nodes are replicas for subsets of data items. Here’s a (dummy) partitioning system where even-numbered updates go to node 1 and odd-numbered ones to node 2: Blue updates are made by client A, and green updates are made by client B. Here, clients can’t just stick to a single node. Instead, they must maintain stickiness with a single logical copy of the database, which may consist of multiple nodes. Clients can also help implement this model by acting as servers themselves. For example, a client could cache its own reads and writes, allowing it to return responses even during indefinitely long partitions. Summary Highly available is too vague; watch out when you read or hear it. It might mean different things depending on the system or author. Majority available means a majority of nodes can still perform some operations. This model supports stronger consistency. Totally available means each non-faulty node can handle requests independently. It favors latency, but usually comes with weaker consistency. Sticky available means clients can make progress as long as they keep talking to a replica that reflects their own history. Availability models help us reason at the operation level, not just the system level. What matters is which operation can succeed, and under what conditions. The PACELC Theorem Exploring Database Isolation Levels Latency and User Experience A distributed systems reliability glossary - Antithesis High Availability Database Definition - ScyllaDB Consistency Models - Jepsen

0 views
The Coder Cafe 4 months ago

Why I Switched to Vim Keybindings

🔕 This post is part of the Lattes & Stories section, where I share personal reflections and stories (not the regular Concepts section). If you want, you can turn off notifications for this section here : Notifications → Disable “Lattes & Stories“. Hello! Today, let’s talk about why I switched to Vim, or more precisely, to Vim keybindings. From the early days of my career, I used various IDEs but never spent much time memorizing keybindings. Yet, in 2016, I switched to IntelliJ, and with that move, I decided that for once, I would work on my productivity and make sure to touch the mouse as little as possible. Good decision or not, I also decided to customize nearly 90% of the keybindings. It got to a point where someone familiar with IntelliJ wouldn’t even recognize my setup. If you’ve ever gone through something like that, you know it takes time to be productive. At first, your brain melts for every single action. But after a few weeks, you start getting used to your config and can be really productive. Everything was perfect for seven years as I kept working with IntelliJ until… I joined Google. Indeed, it was quite a shock for me to see that Google has its own IDE 1 . There was a plugin to import IntelliJ’s default config, but since I had overridden almost everything, I had to reconfigure a ton of shortcuts manually. And even then, some keybindings I was using just didn’t exist in Google's IDE. At this point, I decided to make one of the move I’m the happiest about: switching to Vim keybindings . Let me just clarify what I mean by Vim keybindings. I didn’t switch to the Vim editor itself; I switched to Vim shortcuts. Indeed, many IDEs like IntelliJ or VS Code allow you to use Vim keybindings. For example, if you open this sandbox , it’s not the Vim editor, but it’s an editor based on Vim keybindings. The learning curve was pretty steep, at least for me. Also, it’s worth noting that using Vim with a QWERTY keyboard layout is much more efficient than with other layouts. If you’re using a layout that’s quite different from QWERTY you may also need to switch layouts. NOTE : When I joined Google, I switched to Vim keybindings and a new keyboard layout at the same time. I remember sharing my IDE with a colleague during one of my first days and judging by how slowly I was typing, I’m pretty sure he thought I was dumb. So what are the benefits? First, default Vim keybindings give you almost everything you need to be productive . Sure, there are still some IDE-specific features (like refactoring functions automatically), but most of what you need to navigate and manipulate code quickly is right there. It took me some time to adapt, but today, I can say I’m even faster than with my old customized IntelliJ setup. Second, and probably the essence of this post: switching to Vim keybindings gave me a consistent editing experience . Whether I’m coding in Google IDE, on Google Colab , at home on IntelliJ (with the IdeaVim plugin ), or editing a remote file through a terminal with Neovim, the experience is the same. I don’t need to adapt to a different setup every time. And if one day I switch to another IDE, chances are high that it will support Vim keybindings too, as the Vim community is really active. Being efficient and keeping a consistent editing style across tools is why I would strongly recommend having a look at Vim keybindings. Maybe it’s not for you, and that’s perfectly fine. But if it clicks, it might give you the same productivity boost it gave me. Credits The Coder Cafe Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to master the fundamentals. Written by a Google SWE and trusted by 3,000+ readers, we help you grow as an engineer, one coffee at a time. By Teiva Harsanyi 💬 Do you use Vim keybindings? If not, have you ever considered trying them? Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. So, I Wrote a Book Practical Vim // The book I used to learn Vim. Most of what I know comes from it, and I highly recommend it if you decide to switch to Vim. My own Vim cheat sheet Vim Tips You Probably Never Heard of: IntelliJ 25% discount code on all product pack: . // This post is not sponsored by IntelliJ, but since they supported 1,000 Subscribers, 1 Coding Challenge! I thought it was fair to mention it since I talked about IntelliJ in this post. “Google has its own X”. Replace X with almost anything, and you pretty much get how things work at Google. From the early days of my career, I used various IDEs but never spent much time memorizing keybindings. Yet, in 2016, I switched to IntelliJ, and with that move, I decided that for once, I would work on my productivity and make sure to touch the mouse as little as possible. Good decision or not, I also decided to customize nearly 90% of the keybindings. It got to a point where someone familiar with IntelliJ wouldn’t even recognize my setup. If you’ve ever gone through something like that, you know it takes time to be productive. At first, your brain melts for every single action. But after a few weeks, you start getting used to your config and can be really productive. Everything was perfect for seven years as I kept working with IntelliJ until… I joined Google. Indeed, it was quite a shock for me to see that Google has its own IDE 1 . There was a plugin to import IntelliJ’s default config, but since I had overridden almost everything, I had to reconfigure a ton of shortcuts manually. And even then, some keybindings I was using just didn’t exist in Google's IDE. At this point, I decided to make one of the move I’m the happiest about: switching to Vim keybindings . Let me just clarify what I mean by Vim keybindings. I didn’t switch to the Vim editor itself; I switched to Vim shortcuts. Indeed, many IDEs like IntelliJ or VS Code allow you to use Vim keybindings. For example, if you open this sandbox , it’s not the Vim editor, but it’s an editor based on Vim keybindings. The learning curve was pretty steep, at least for me. Also, it’s worth noting that using Vim with a QWERTY keyboard layout is much more efficient than with other layouts. If you’re using a layout that’s quite different from QWERTY you may also need to switch layouts. NOTE : When I joined Google, I switched to Vim keybindings and a new keyboard layout at the same time. I remember sharing my IDE with a colleague during one of my first days and judging by how slowly I was typing, I’m pretty sure he thought I was dumb. So what are the benefits? First, default Vim keybindings give you almost everything you need to be productive . Sure, there are still some IDE-specific features (like refactoring functions automatically), but most of what you need to navigate and manipulate code quickly is right there. It took me some time to adapt, but today, I can say I’m even faster than with my old customized IntelliJ setup. Second, and probably the essence of this post: switching to Vim keybindings gave me a consistent editing experience . Whether I’m coding in Google IDE, on Google Colab , at home on IntelliJ (with the IdeaVim plugin ), or editing a remote file through a terminal with Neovim, the experience is the same. I don’t need to adapt to a different setup every time. And if one day I switch to another IDE, chances are high that it will support Vim keybindings too, as the Vim community is really active. Being efficient and keeping a consistent editing style across tools is why I would strongly recommend having a look at Vim keybindings. Maybe it’s not for you, and that’s perfectly fine. But if it clicks, it might give you the same productivity boost it gave me. Credits The Coder Cafe Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to master the fundamentals. Written by a Google SWE and trusted by 3,000+ readers, we help you grow as an engineer, one coffee at a time. By Teiva Harsanyi 💬 Do you use Vim keybindings? If not, have you ever considered trying them? Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. 📚 Resources More From the Inner Brew Category So, I Wrote a Book Practical Vim // The book I used to learn Vim. Most of what I know comes from it, and I highly recommend it if you decide to switch to Vim. My own Vim cheat sheet Vim Tips You Probably Never Heard of: IntelliJ 25% discount code on all product pack: . // This post is not sponsored by IntelliJ, but since they supported 1,000 Subscribers, 1 Coding Challenge! I thought it was fair to mention it since I talked about IntelliJ in this post.

0 views
The Coder Cafe 4 months ago

What Makes Code Beautiful

Hello! Today we will discuss the concept of beauty in programming. Let me ask you a simple question: what makes a code beautiful? Try thinking about this question (at least a little) before scrolling down. You probably came up with a few characteristics: It’s readable and modular. Maybe it’s well tested. Or maybe it follows clean code principles or other familiar practices. What if I told you that beautiful code is just… average code ? That might sound silly, but let’s walk through it. Attractive Faces Are Only Average is a foundational study by Judith H. Langlois and Lori A. Roggman in the psychology of facial attractiveness. They created composite images by averaging multiple individual faces—noses, eyes, cheekbones—all mathematically averaged. These composites were consistently rated as more attractive than the individual faces used to make them. In fact, the more individual faces were averaged into a composite, the more attractive the resulting face became . Credits This suggests that facial features close to the population mean are generally preferred. Why? The main theory is processing fluency : the easier something is for our brain to process, the more we like it . Average faces are easier to recognize and interpret, so they feel more pleasant. This principle doesn’t stop at faces. It applies to advertising, art, or even decision-making. When something is easy to process, brain imaging studies show that it activates reward-related regions. We enjoy it more, without even realizing why. So what happens when we apply this principle to code? It means that the more average the code is, the more it aligns with what we’ve seen before, and the more beautiful it feels. The less surprising it is, the more pleasurable it is to read, understand, and maintain. How can we write average code? Use standard naming conventions : Stick to familiar names so others instantly know what things are. Follow language idioms and common patterns : Write code the way others expect it to be written. Favor clarity over cleverness : The easier it is to read, the more enjoyable it is to work with. Be consistent : Predictability makes everything easier to understand. Minimize unnecessary novelty : The more our code looks like what we’ve seen before, the faster it gets processed and the more it’s liked. Beautiful code is the kind that conforms to established patterns, idioms, and norms. It’s not flashy, novel, or overly clever. It’s clear, familiar, and quietly elegant. Just average. What if next time, the best compliment your pull request could receive wasn’t that it was “brilliant” but simply “average”? 💬 Does this idea resonate with you or does it feel wrong? Curious to hear your perspective. Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. Cognitive Load: Reducing Mental Overhead in Software Design Readability: Understanding the What, the Why, and the How Premature Abstractions: Avoiding Unnecessary Complexity in Software Design Attractive Faces Are Only Average // The study mentioned in the post. Your Code as a Crime Scene // I read about this study from this book. Processing fluency Let me ask you a simple question: what makes a code beautiful? Try thinking about this question (at least a little) before scrolling down. You probably came up with a few characteristics: It’s readable and modular. Maybe it’s well tested. Or maybe it follows clean code principles or other familiar practices. What if I told you that beautiful code is just… average code ? That might sound silly, but let’s walk through it. Attractive Faces Are Only Average is a foundational study by Judith H. Langlois and Lori A. Roggman in the psychology of facial attractiveness. They created composite images by averaging multiple individual faces—noses, eyes, cheekbones—all mathematically averaged. These composites were consistently rated as more attractive than the individual faces used to make them. In fact, the more individual faces were averaged into a composite, the more attractive the resulting face became . Credits This suggests that facial features close to the population mean are generally preferred. Why? The main theory is processing fluency : the easier something is for our brain to process, the more we like it . Average faces are easier to recognize and interpret, so they feel more pleasant. This principle doesn’t stop at faces. It applies to advertising, art, or even decision-making. When something is easy to process, brain imaging studies show that it activates reward-related regions. We enjoy it more, without even realizing why. So what happens when we apply this principle to code? It means that the more average the code is, the more it aligns with what we’ve seen before, and the more beautiful it feels. The less surprising it is, the more pleasurable it is to read, understand, and maintain. How can we write average code? Use standard naming conventions : Stick to familiar names so others instantly know what things are. Follow language idioms and common patterns : Write code the way others expect it to be written. Favor clarity over cleverness : The easier it is to read, the more enjoyable it is to work with. Be consistent : Predictability makes everything easier to understand. Minimize unnecessary novelty : The more our code looks like what we’ve seen before, the faster it gets processed and the more it’s liked. Cognitive Load: Reducing Mental Overhead in Software Design Readability: Understanding the What, the Why, and the How Premature Abstractions: Avoiding Unnecessary Complexity in Software Design Attractive Faces Are Only Average // The study mentioned in the post. Your Code as a Crime Scene // I read about this study from this book. Processing fluency

0 views
The Coder Cafe 5 months ago

Don’t Be Ashamed to Say "I Don’t Know"

Hello! Today, let’s discuss the power of “I don’t know“ with a personal story. Last month, I was at the hospital with my partner for the birth of our newborn. During our stay, my partner experienced a specific symptom, and we wanted to understand what could be causing it. So we asked the midwife. We immediately noticed the hesitation in her eyes. When she finally gave an answer, it came with a kind of forced confidence, and we both felt she wasn’t sure about it. At our hospital, midwives do 12-hour shifts. So a few hours later, we asked the exact same question to the next midwife. Same hesitation, but this time, a different answer. And so it went on. Every shift, we asked again. Every time, a different answer. Eventually, it even became a game between my partner and me: trying to guess what the next answer would be. Until… The one. The one who broke this cycle. We asked her the same question. She paused. Thought about it. And then said something unexpected: I don’t know. It might seem counterintuitive, but these three words made us immediately trust her more than anyone else before 1 . Twenty minutes later, she even came back to our room and said: I asked the doctor, the answer is because of [X]. Thanks for asking, I learned something. That brief exchange resonated with me. In our field, we often put a lot of weight on posture. We build up our position as the go-to person for a codebase, a data model, or a framework. The more we know, the more we are seen as the one to consult or include in any related discussion. But from that posture, admitting we don’t know something can feel like pulling out the bottom card in a house of cards. Suddenly, it feels like everything we built to earn that status might collapse. Yet, if we take a step back, admitting we don’t know shouldn’t be seen as something shameful or embarrassing. In fact, it’s often the most responsible thing we can do. Pretending to know can lead to bad decisions. It creates false confidence and can steer a team in the wrong direction leading to possibly terrible outcomes. Authority isn’t built on knowing everything. It’s about being reliable and honest, someone others trust, and they can talk to with confidence. Teams work better when people feel safe to admit uncertainty. It fosters psychological safety among the team members, which Google identified as the top trait of effective teams. Curiosity + humility = real learning. Admitting we don’t know something is what keeps us learning and growing, no matter how experienced we are. Whether it’s for us or others, next time we don’t know something, let’s be like that midwife: let’s just admit it. Without shame. 💬 How comfortable are you with saying “I don’t know“? Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. Lateral Thinking Streetlight Effect 10 Rules I Learned About Technical Writing Understand team effectiveness Turns out, our gut feeling was right. She ended up being better than most of the other midwives we met. Last month, I was at the hospital with my partner for the birth of our newborn. During our stay, my partner experienced a specific symptom, and we wanted to understand what could be causing it. So we asked the midwife. We immediately noticed the hesitation in her eyes. When she finally gave an answer, it came with a kind of forced confidence, and we both felt she wasn’t sure about it. At our hospital, midwives do 12-hour shifts. So a few hours later, we asked the exact same question to the next midwife. Same hesitation, but this time, a different answer. And so it went on. Every shift, we asked again. Every time, a different answer. Eventually, it even became a game between my partner and me: trying to guess what the next answer would be. Until… The one. The one who broke this cycle. We asked her the same question. She paused. Thought about it. And then said something unexpected: I don’t know. It might seem counterintuitive, but these three words made us immediately trust her more than anyone else before 1 . Twenty minutes later, she even came back to our room and said: I asked the doctor, the answer is because of [X]. Thanks for asking, I learned something. That brief exchange resonated with me. In our field, we often put a lot of weight on posture. We build up our position as the go-to person for a codebase, a data model, or a framework. The more we know, the more we are seen as the one to consult or include in any related discussion. But from that posture, admitting we don’t know something can feel like pulling out the bottom card in a house of cards. Suddenly, it feels like everything we built to earn that status might collapse. Yet, if we take a step back, admitting we don’t know shouldn’t be seen as something shameful or embarrassing. In fact, it’s often the most responsible thing we can do. Pretending to know can lead to bad decisions. It creates false confidence and can steer a team in the wrong direction leading to possibly terrible outcomes. Authority isn’t built on knowing everything. It’s about being reliable and honest, someone others trust, and they can talk to with confidence. Teams work better when people feel safe to admit uncertainty. It fosters psychological safety among the team members, which Google identified as the top trait of effective teams. Curiosity + humility = real learning. Admitting we don’t know something is what keeps us learning and growing, no matter how experienced we are. Lateral Thinking Streetlight Effect 10 Rules I Learned About Technical Writing Understand team effectiveness

0 views
The Coder Cafe 5 months ago

Soft vs. Hard Dependency

Hello! Today, we’re exploring a key aspect of distributed systems: how to think about dependencies between components and why it matters for reliability. Introduction When we build a system composed of multiple components (e.g., database, services, caches), it’s important to understand the dependency graph. For example, a service might depend on: A database to store data A messaging layer to exchange information A cache to reduce latency Having a clear understanding of the dependencies in a system helps us maintain it more efficiently. But there's one question we often overlook: Are these dependencies soft or hard? Soft dependency : One that is non-critical for the service to operate properly. Hard dependency : One that is critical for the service to operate properly. “Operate properly” in this context means, for example, that a service responds to requests, doesn’t lose data, and maintains an acceptable level of performance. In short, the service works reliably . Two examples to illustrate the concept of soft and hard dependencies: A recommendation service is a soft dependency for a video platform. If it’s down, users can still watch videos, just without personalized suggestions. An authentication service is a hard dependency for a system that requires users to log in. If authentication is down, users can’t access the system. Understanding the type of dependency helps us make the right decisions: Reliability expectations : Soft: High reliability expectation may not be necessary. Back to the example of a recommendation service for a streaming system, this service doesn’t need 5 9s availability (99.999%) if it isn’t on the critical user journey. Hard: A hard dependency must match or even exceed the reliability of the dependent service. If a critical backend is only available 99.5% of the time but our own SLO is 99.9%, we have a structural problem. Setting the right expectation for a hard dependency is critical. Fault-tolerance strategy : Soft: If the dependency is unavailable, we are not obliged to build a proper fault-tolerant strategy. We can let it degrade gracefully and wait for it to be back. Hard: If the dependency is unavailable, we need to work on a strategy, such as establishing an efficient fallback strategy to keep our service running. Observability and alerting : Soft: Observability is still important, but alerts can often have a lower priority or be routed differently. Hard: The dependency must be tightly monitored. Failures or even minor degradation, such as latency spikes, error rates, or availability dips, must be tracked continuously. Rollout and change management : Soft: Changes can be managed with more flexibility. Rollout may not require tight coordination or strict sequencing, and temporary failures might be acceptable. Hard: Rollouts become delicate operations. We often need tight orchestration between teams, version compatibility checks, gradual rollouts with validation at each step, and well-tested rollback mechanisms. Any mistake could trigger a production incident. Classifying a dependency isn’t always obvious. In some cases, it’s fairly straightforward. For example, if a REST endpoint requires a database query, that database is a hard dependency. But gray areas are fairly common, for example: A service can run without a certain dependency at runtime, but it still needs that dependency at startup to initialize. In this case, the dependency is hard from an operational point of view. If it’s down during a deploy or a scale-out, we can’t even get the service running. A service calls a soft dependency, but the RPC call has no timeout or fallback. If the dependency becomes unresponsive, the latency of our service spikes, possibly exhausting thread pools or request queues. What was supposed to be a soft dependency now puts the entire system at risk. These are examples of soft dependencies not handled correctly, turning into hard ones in practice. Whether a dependency is technically optional doesn't matter if the failure of this dependency ends up blocking our service . In many systems, identifying these cases is not trivial. Approaches like deliberately breaking dependencies or introducing hazardous conditions (e.g., random network delays) can help reveal which dependencies are truly non-critical and which ones only appear to be. To make things even more complex, we need to keep in mind that the type of a dependency is not set in stone. A dependency that starts as soft can easily turn into a hard one over time . Let’s consider a service that reads data from a database. We introduce a cache to reduce latency. Initially, this cache is a soft dependency. If it goes down, we fall back to the database, which results in an acceptable latency increase. Yet, as traffic grows, the service begins to rely on the cache not just for latency but for throughput. At some point, if the cache becomes cold and every request hits the database, the database may no longer be able to handle the load. In this example, the cache was a soft dependency, but it became a hard one due to changes in system conditions (more traffic). This evolution (from soft to hard) is, unfortunately, much more common than the reverse. Without active effort on efficient maintenance and continuous testing, it’s fairly common for a soft dependency to turn silently into a hard one. On the other hand, with active and continuous effort, it’s possible to turn a hard dependency into a soft one . One effective approach is to design a fallback strategy that makes the dependency’s downtime essentially invisible. Designing a solid fallback is anything but simple (we’ll explore this in a future post). However one principle stands out: fallbacks need to be tested, and they need to be tested continuously . A fallback that hasn’t been exercised in months isn’t a fallback. It’s dead code. Once we’ve reached a point where the dependency can go down and users don’t notice, then the dependency is soft. Turning hard dependencies into soft ones is one of the most effective ways to improve the reliability of a system. To manage dependencies effectively, we need to classify them as either soft or hard. To avoid surprises, we must understand that soft dependencies can turn hard without warning, especially as systems scale. To improve reliability, we should actively turn hard dependencies into soft ones using strategies like efficient fallbacks. 💬 Have you seen a soft dependency quietly become critical over time? Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. Reliability: The Most Important Feature a System Can Have Resilient, Fault-tolerant, Robust, or Reliable? The Key Differences Explained Graceful Degradation: Preventing Complete System Failures Defining SLOs for services with dependencies - Google Cloud Introduction When we build a system composed of multiple components (e.g., database, services, caches), it’s important to understand the dependency graph. For example, a service might depend on: A database to store data A messaging layer to exchange information A cache to reduce latency Soft dependency : One that is non-critical for the service to operate properly. Hard dependency : One that is critical for the service to operate properly. A recommendation service is a soft dependency for a video platform. If it’s down, users can still watch videos, just without personalized suggestions. An authentication service is a hard dependency for a system that requires users to log in. If authentication is down, users can’t access the system. Reliability expectations : Soft: High reliability expectation may not be necessary. Back to the example of a recommendation service for a streaming system, this service doesn’t need 5 9s availability (99.999%) if it isn’t on the critical user journey. Hard: A hard dependency must match or even exceed the reliability of the dependent service. If a critical backend is only available 99.5% of the time but our own SLO is 99.9%, we have a structural problem. Setting the right expectation for a hard dependency is critical. Fault-tolerance strategy : Soft: If the dependency is unavailable, we are not obliged to build a proper fault-tolerant strategy. We can let it degrade gracefully and wait for it to be back. Hard: If the dependency is unavailable, we need to work on a strategy, such as establishing an efficient fallback strategy to keep our service running. Observability and alerting : Soft: Observability is still important, but alerts can often have a lower priority or be routed differently. Hard: The dependency must be tightly monitored. Failures or even minor degradation, such as latency spikes, error rates, or availability dips, must be tracked continuously. Rollout and change management : Soft: Changes can be managed with more flexibility. Rollout may not require tight coordination or strict sequencing, and temporary failures might be acceptable. Hard: Rollouts become delicate operations. We often need tight orchestration between teams, version compatibility checks, gradual rollouts with validation at each step, and well-tested rollback mechanisms. Any mistake could trigger a production incident. A service can run without a certain dependency at runtime, but it still needs that dependency at startup to initialize. In this case, the dependency is hard from an operational point of view. If it’s down during a deploy or a scale-out, we can’t even get the service running. A service calls a soft dependency, but the RPC call has no timeout or fallback. If the dependency becomes unresponsive, the latency of our service spikes, possibly exhausting thread pools or request queues. What was supposed to be a soft dependency now puts the entire system at risk. To manage dependencies effectively, we need to classify them as either soft or hard. To avoid surprises, we must understand that soft dependencies can turn hard without warning, especially as systems scale. To improve reliability, we should actively turn hard dependencies into soft ones using strategies like efficient fallbacks. Reliability: The Most Important Feature a System Can Have Resilient, Fault-tolerant, Robust, or Reliable? The Key Differences Explained Graceful Degradation: Preventing Complete System Failures Defining SLOs for services with dependencies - Google Cloud

0 views
The Coder Cafe 6 months ago

Keeping a Mistake Journal

Hello! Today, I wanted to share with you a method I’ve been personally using to learn from my mistakes. Why Track Mistakes? Making mistakes is inevitable. Personally, I even wrote a book about mistakes, and, as I mentioned in it, I was a great source of inspiration for the content. But what really matters is not making the same mistakes over and over . Ideally, we make a mistake once, extract a lesson, and ensure we never repeat it. Do you know that making mistakes can actually help us grow? Research by cognitive scientist Janet Metcalfe suggests that reflecting on errors is one of the most effective ways to reinforce learning. Indeed, when we consciously analyze our mistakes, they don’t just fade away; instead, they reshape our thinking, helping us adapt and avoid repeating them in the future. That’s why, for the past two years, I’ve been following a simple yet effective system to track, reflect on, and learn from my own mistakes. Of course, I don’t log every tiny one. Yet, my rule is simple: if a mistake makes me frustrated with myself , it’s worth tracking. Each mistake I log follows a structured format: Name: A short, descriptive name. Tags: Relevant categories. Context: The situation in which the mistake happened. Problem: A description of the mistake itself. Impacts: The possible consequences of making this mistake. Lessons learned: What I can take away from it. Correction plan: What I will do to prevent this mistake in the future. Latest occurrence: When I last made this mistake. Repetition: How many times I made this mistake. Here’s an example: Name : Skimming instead of reading carefully. Tags : Critical thinking, Attention to detail. Context : When reviewing an email, document, or technical spec. Problem : To save time, I sometimes skim through important content instead of reading it properly. Leads to a partial understanding of a context. Causes misunderstandings, requiring additional clarifications. It can result in wrong decisions based on incomplete information, and/or I can look like an idiot. Correction plan: Better assessment of whether full attention is required. If it is, commit to reading properly. Personally, I keep my mistake journal in Notion , but I guess any tool or even a physical journal could work. Since implementing this system, I have noticed: Fewer repeated mistakes : I make the same mistakes less often. Pattern detection : It helped me recognize patterns in my own mistakes and better understand my own biases. Better decision-making : I catch potential mistakes earlier. Mistakes are part of learning; they don’t have to be failures if we learn from them. By tracking them, we can turn them into lessons that help us grow and improve over time. 💬 Do you also track your mistakes? If yes, what’s your approach? Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. Confirmation Bias Survivor Bias 10 Rules I Learned About Technical Writing Learning from Errors - Janet Metcalfe Why Track Mistakes? Making mistakes is inevitable. Personally, I even wrote a book about mistakes, and, as I mentioned in it, I was a great source of inspiration for the content. But what really matters is not making the same mistakes over and over . Ideally, we make a mistake once, extract a lesson, and ensure we never repeat it. Do you know that making mistakes can actually help us grow? Research by cognitive scientist Janet Metcalfe suggests that reflecting on errors is one of the most effective ways to reinforce learning. Indeed, when we consciously analyze our mistakes, they don’t just fade away; instead, they reshape our thinking, helping us adapt and avoid repeating them in the future. That’s why, for the past two years, I’ve been following a simple yet effective system to track, reflect on, and learn from my own mistakes. Of course, I don’t log every tiny one. Yet, my rule is simple: if a mistake makes me frustrated with myself , it’s worth tracking. Entry Format Each mistake I log follows a structured format: Name: A short, descriptive name. Tags: Relevant categories. Context: The situation in which the mistake happened. Problem: A description of the mistake itself. Impacts: The possible consequences of making this mistake. Lessons learned: What I can take away from it. Correction plan: What I will do to prevent this mistake in the future. Latest occurrence: When I last made this mistake. Repetition: How many times I made this mistake. Name : Skimming instead of reading carefully. Tags : Critical thinking, Attention to detail. Context : When reviewing an email, document, or technical spec. Problem : To save time, I sometimes skim through important content instead of reading it properly. Impacts : Leads to a partial understanding of a context. Causes misunderstandings, requiring additional clarifications. It can result in wrong decisions based on incomplete information, and/or I can look like an idiot. Correction plan: Better assessment of whether full attention is required. If it is, commit to reading properly. Fewer repeated mistakes : I make the same mistakes less often. Pattern detection : It helped me recognize patterns in my own mistakes and better understand my own biases. Better decision-making : I catch potential mistakes earlier. Confirmation Bias Survivor Bias 10 Rules I Learned About Technical Writing Learning from Errors - Janet Metcalfe

0 views
The Coder Cafe 6 months ago

Property-Based Testing

Hello! We recently hit 2,000 subscribers, so thank you very much for the support! 🎉 Today, let’s dive into the world of testing and explore Property-Based Testing. In this post, we will discuss traditional tests, explore their limitations, and see how Property-Based Testing can help us improve our testing strategy by focusing on the fundamental concept of properties. Traditional Tests Imagine this white box contains all the behaviors implemented in a software: Of course, like with any codebase, our software contains bugs (imagine how boring bugs-free software would be!). Let’s make the green part represent all the valid behaviors while the red part the invalid ones (the bugs): Now, let’s write some tests to validate our code. For the sake of visualizations, we will partition the tests written into three sets: A, B, and C—all of them will be passing : Test set A covers 50% of the valid behaviors. Test set B covers 75% of the invalid behaviors. Test set C covers nonexistent behaviors. Test set A: The true positive tests that correctly verify valid behaviors exhibited by the software. The remaining visible green part represents valid behaviors not covered by tests. Test set B : The false negative tests that should catch bugs, but they don’t because they are flawed. The remaining visible red part represents invalid behaviors not covered by tests. Test set C: The invalid tests that check for behaviors that… don’t even exist. They typically arise from misunderstandings of how a feature is supposed to work, and because the tests themselves are flawed, they fail to reveal this misalignment. Let’s break down the four main problems in this example, sorted in increasing order of severity: 1. Invalid tests covering nonexistent behavior: These tests are fundamentally misaligned with reality. They cause unnecessary maintenance overhead and can mislead developers if tests serve as living documentation (see Unit Tests As Documentation ). 2. Test coverage gaps: For now, this is okay . Yet, if something changes later in the uncovered area, regressions may go unnoticed. 3. Missing tests lead to undetected bugs: The consequences start to escalate. Here, bugs exist, but as there’s no test for them, they remain invisible. 4. Flawed tests hiding bugs: Even worse than 3., the tests here should detect bugs but they don’t. This is arguably worse than missing tests because it gives us a false sense of security: we think our software works as intended when it isn’t. In this example, we rely on traditional tests . Regardless of whether we’re talking about unit tests, integration tests, or else, traditional tests follow a common structure: Define a starting state. Apply specific inputs. Assert that output matches expectations. This method has been proven effective for decades, but it has limitations, including: Tests are manually designed , so some scenarios may be forgotten. Indeed, since traditional tests are written by humans, they are limited by our own assumptions and understanding of the software. Therefore, we can miss edge cases or unexpected situations. Passing tests don’t guarantee a bug-free system . A passing test only confirms that the tested scenario behaves as expected, not that the software works as expected. If the test is flawed, bugs can remain unnoticed. Test maintenance may become a burden over time . In contexts where high coverage is enforced, the more tests and edge cases we add, the more tightly coupled our tests become with the implementation, leading to increasing maintenance effort. These limitations highlight a key challenge: traditional tests are only as good as the cases we manually define . They check specific scenarios but don’t ensure broader correctness across all possible inputs. This is where a different approach comes in, one that shifts the focus away from manually defining tests cases and instead focus on the fundamental concept of properties. Instead of defining specific inputs and expected outputs, Property-Based Testing (PBT) focuses on properties , meaning rules that must always hold, regardless of the input. Rather than asking: “ For this input, do I get this expected output? “ We’re asking: “ Regardless of the input, does this property always hold? “ To give a concrete example, let’s discuss a common type of PBT tests known as fuzzing . In general, fuzzing tests focus on the property: “ My software shouldn’t crash “. For example, imagine we wrote a function that manipulates a string. Instead of manually testing edge cases, we can use a fuzzing library to generate random inputs (including unexpected formats or extreme values) to ensure our function doesn’t crash without requiring us to think of every possible edge case . So, what’s the difference between fuzzing and property-based testing? To be honest, the boundary isn’t always crystal clear. My working understanding is based on blog posts from Hypothesis, a property-based testing in Python (the post is referenced in the Sources section). In a nutshell, we should see fuzzing as a subset of PBT: Fuzzing : Primarily used to find crashes or unexpected behaviors by feeding random inputs. It doesn’t typically require a deep knowledge of the software’s expected properties. Property-based testing : Instead of just detecting crashes, PBT defines general properties that the software must always satisfy and generates test cases to verify them. This approach goes beyond failure detection and requires more structured thinking about correctness. Both fuzzing and PBT start with randomized inputs (often called a fuzzer), but while fuzzing primarily focuses on crashes, PBT goes beyond that to formally validate expected system behavior. So, what kind of properties can we define in property-based tests? Here are some examples: Structural properties : Ensure an operation preserves certain structural characteristics of the data, such as a length. Idempotency properties : Ensure that an idempotent function produces the same output when applied multiple times. Commutativity properties : Ensure that a function is commutative, meaning changing the order of its argument gives the same result. Roundtrip properties : For serialization or encoding functions, encoding followed by decoding should return the original value. These are some classic properties we can think of in the context of PBT. Ultimately, PBT is about viewing code as a whole and identifying the fundamental properties it should always preserve. Instead of verifying isolated cases, we validate broader correctness guarantees: Going Beyond So far, the properties discussed in this post share similarities with unit tests, as they focus on local functions. However, nothing in the concept of PBT prevents us from taking a step back and applying it at a broader level, such as at the API level or even at the system level, where multiple applications interact. Once we extend this perspective, we can consider properties such as: Temporal properties : “ 99% of API calls must complete in less than 30ms “. Consistency properties : “ The database must respect causal consistency ”. Application invariants : “ Financial transaction must be balanced (debits = credits) ”. Reliability properties : “ Retrying failed requests must not result in duplicate transactions ”. PBT is not limited to low-level function testing; it can be applied at multiple levels, from local functions to system level to assess property that must always hold for a group of applications. With PBT, we can take any system, inject randomized inputs, and ensure that it behaves correctly—not by verifying specific test cases, but by enforcing fundamental system properties that must always hold, regardless of the input. This approach moves beyond individual test cases and shifts the focus to system-wide correctness. PBT changes how we think about testing. Instead of manually defining scenario which may lead to issues like false negatives, PBT takes a step back and focuses on defining the fundamental properties or invariants that must always hold. Beyond improving test coverage, PBT also serves as living documentation, as the properties capture essential rules that must govern our software or systems. That said, PBT shouldn’t be seen as a replacement for traditional tests, it’s a complementary approach . The real challenge is recognizing that some parts of a system may be too complex to assess manually. In such cases, relying on randomized inputs and predefined properties can be a more effective and simpler way to catch bugs. 💬 I purposely avoided discussing library-specific features because I wanted this post to encourage you to think about whether properties-based tests make sense for the systems you manage. Let me know what you think about PBT in the comments! Leave a comment ❤️ If you made it this far and enjoyed the post, please consider giving it a like. 📣 This post is part of a series written in collaboration with Antithesis, an autonomous testing platform. They are not sponsoring this post—I reached out to them because I was genuinely intrigued by what they were building and ended up falling in love with their solution. We will dive deeper into it in an upcoming post titled Deterministic Simulation Testing. In the meantime, feel free to check out their website or their great blog . Code Coverage Test Behavior, Not Implementation Avoiding Logic in Tests What is Property Based Testing? - Hypothesis Blog // The source I referenced in the post. In praise of property-based testing Software reliability, part 1: What is property-based testing? - Antithesis blog Property based testing: let your testing library work for you By Magda Stożek - Devoxx Proper and Basic Property-Based Testing Traditional Tests Imagine this white box contains all the behaviors implemented in a software: Of course, like with any codebase, our software contains bugs (imagine how boring bugs-free software would be!). Let’s make the green part represent all the valid behaviors while the red part the invalid ones (the bugs): Now, let’s write some tests to validate our code. For the sake of visualizations, we will partition the tests written into three sets: A, B, and C—all of them will be passing : Test set A covers 50% of the valid behaviors. Test set B covers 75% of the invalid behaviors. Test set C covers nonexistent behaviors. Test set A: The true positive tests that correctly verify valid behaviors exhibited by the software. The remaining visible green part represents valid behaviors not covered by tests. Test set B : The false negative tests that should catch bugs, but they don’t because they are flawed. The remaining visible red part represents invalid behaviors not covered by tests. Test set C: The invalid tests that check for behaviors that… don’t even exist. They typically arise from misunderstandings of how a feature is supposed to work, and because the tests themselves are flawed, they fail to reveal this misalignment. 1. Invalid tests covering nonexistent behavior: These tests are fundamentally misaligned with reality. They cause unnecessary maintenance overhead and can mislead developers if tests serve as living documentation (see Unit Tests As Documentation ). 2. Test coverage gaps: For now, this is okay . Yet, if something changes later in the uncovered area, regressions may go unnoticed. 3. Missing tests lead to undetected bugs: The consequences start to escalate. Here, bugs exist, but as there’s no test for them, they remain invisible. 4. Flawed tests hiding bugs: Even worse than 3., the tests here should detect bugs but they don’t. This is arguably worse than missing tests because it gives us a false sense of security: we think our software works as intended when it isn’t. Define a starting state. Apply specific inputs. Assert that output matches expectations. Tests are manually designed , so some scenarios may be forgotten. Indeed, since traditional tests are written by humans, they are limited by our own assumptions and understanding of the software. Therefore, we can miss edge cases or unexpected situations. Passing tests don’t guarantee a bug-free system . A passing test only confirms that the tested scenario behaves as expected, not that the software works as expected. If the test is flawed, bugs can remain unnoticed. Test maintenance may become a burden over time . In contexts where high coverage is enforced, the more tests and edge cases we add, the more tightly coupled our tests become with the implementation, leading to increasing maintenance effort. Fuzzing : Primarily used to find crashes or unexpected behaviors by feeding random inputs. It doesn’t typically require a deep knowledge of the software’s expected properties. Property-based testing : Instead of just detecting crashes, PBT defines general properties that the software must always satisfy and generates test cases to verify them. This approach goes beyond failure detection and requires more structured thinking about correctness. Structural properties : Ensure an operation preserves certain structural characteristics of the data, such as a length. Idempotency properties : Ensure that an idempotent function produces the same output when applied multiple times. Commutativity properties : Ensure that a function is commutative, meaning changing the order of its argument gives the same result. Roundtrip properties : For serialization or encoding functions, encoding followed by decoding should return the original value. Going Beyond So far, the properties discussed in this post share similarities with unit tests, as they focus on local functions. However, nothing in the concept of PBT prevents us from taking a step back and applying it at a broader level, such as at the API level or even at the system level, where multiple applications interact. Once we extend this perspective, we can consider properties such as: Temporal properties : “ 99% of API calls must complete in less than 30ms “. Consistency properties : “ The database must respect causal consistency ”. Application invariants : “ Financial transaction must be balanced (debits = credits) ”. Reliability properties : “ Retrying failed requests must not result in duplicate transactions ”. Code Coverage Test Behavior, Not Implementation Avoiding Logic in Tests What is Property Based Testing? - Hypothesis Blog // The source I referenced in the post. In praise of property-based testing Software reliability, part 1: What is property-based testing? - Antithesis blog Property based testing: let your testing library work for you By Magda Stożek - Devoxx Proper and Basic Property-Based Testing

0 views