Latest Posts (20 found)
Phil Eaton 1 months ago

In response to a developer asking about systems

Sometimes I get asked questions that would be more fun to answer in public. All letters are treated as anonymous unless permission is otherwise granted. Hey [Redacted]! It's great to hear from you. I'm very glad you joined the coffee club and met some good folks. :) You asked how to learn about systems. A great question! I think I need to start first with what I mean when I say systems. My definition of systems is all of the underlying software we developers use but are taught not to think about because they are so solid: our compilers and interpreters, our databases, our operating system, our browser, and so on. We think of them as basically not having bugs, we just count on them to be correct and fast enough so we can build the applications that really matter to users. But 1) some developers do actually have to work on these fundamental blocks (compilers, databases, operating systems, browsers, etc.) and 2) it's not thaaaat hard to get into this development professionally and 3) even if you don't get into it professionally, having a better understanding of these fundamental blocks will make you a better application developer. At least I think so. To get into systems I think it starts by you just questioning how each layer you build on works. Try building that layer yourself. For example you've probably used a web framework like Rails or Next.js. But you can just go and write that layer yourself too (for education). And you've probably used Postgres or SQLite or DynamoDB. But you can also just go and write that layer yourself (for education). It's this habit of thinking and digging into the next lower layer that will get you into systems. Basically, not being satisfied with the black box. I do not think there are many good books on programming in general, and very very few must-read ones, but one that I recommend to everybody is Designing Data Intensive Applications. I think it's best if you read it with a group of people. (My book club will read it in December when the 2nd edition comes out, you should join.) But this book is specific to data obviously and not interested in the fundamentals of other systems things like compilers or operating systems or browsers or so on. Also, I see getting into this as a long-term thing. Throughout my whole career (almost 11 years now) I definitely always tried to dig into compilers and interpreters, I wrote and blogged about toy implementations a lot. And then 5 years ago I started digging into databases and saw that there was more career potential there. But it still took 4 years until I got my first job as a developer working on a database (the job I currently have). Things take time to learn and that's ok! You have a long career to look forward to. And if you end up not wanting to dig into this stuff that's totally fine too. I think very few developers actually do. And they still have fine careers. Anyway, I hope this is at least mildly useful. I hope you join the Software Internals Discord and nycsystems.xyz as well and look forward to seeing you at future coffee clubs! Cheers, Phil I wrote a letter in response to a developer asking about how to learn systems. pic.twitter.com/2ILNpzl662

0 views
Phil Eaton 1 months ago

A simple clustering and replication solution for Postgres

This is an external post of mine. Click here if you are not redirected.

1 views
Phil Eaton 1 months ago

Analytics query goes 6x faster with EDB Postgres Distributed's new analytics engine

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 2 months ago

Set up a single-node EDB Postgres Distributed cluster on Ubuntu

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 2 months ago

What even is distributed systems

Distributed systems is simply the study of interactions between processes. Every two interacting processes form a distributed system, whether they are on the same host or not. Distributed systems create new challenges (compared to single-process systems) in terms of correctness (i.e. consistency ), reliability, and performance (i.e. latency and throughput). The best way to learn about the principles and fundamentals of distributed systems is to 1) read Designing Data Intensive Applications and 2) read through the papers and follow the notes in the MIT Distributed Systems course . For Designing Data Intensive Applications (DDIA), I strongly encourage you to find buddies at work or online who will read it through with you. You can also always join the Software Internals Discord 's #distsys channel to ask questions as you go. But it's still best if you have some partners to go through the book with, even if they are as new to it as you. I also used to think that you might want to wait a few years into your career before reading DDIA but when you have friends to read it with I think you need not wait. If you have only skimmed the book you should definitely go back and give it a thorough read. I have read it three times already and I will read it again as part of the Software Internals Book Club next year after the 2nd Edition is published. Keep in mind that every chapter of DDIA provides references to papers you can keep reading should you end up memorizing DDIA itself. When you've read parts of DDIA or the MIT Distributed Systems course and you want practice, the Fly.io x Jepsen Distributed Systems Challenge is one guided option. Other options might include simply implementing (getting progressively more complex down the list): And if you get bored there you can see Alex Miller's Data Replication Design Spectrum for more ideas and variants. And if you want more people to follow, check out the Distributed Systems section of my favorite blogs page. If these projects and papers sound arcane or intimidating, know that you will see the problems these projects/papers solve whether or not you know and understand these solutions. Developers often end up reinventing hacky versions of these which are more likely to have subtle bugs. While instead you can recognize and use one of these well-known building blocks. Or at least have the background to better reason about correctness should you be in a situation where you must work with a novel distributed system or you end up designing a new one yourself. And again, if you want folks to bounce ideas off of or ask questions to, I strongly encourage you to join the Software Internals Discord and ask there! I wrote a short post on learning the fundamentals of distributed systems, with a few suggested resources to read and a few suggested projects to try. pic.twitter.com/b0EhDP8K0t two-phase commit three-phase commit single-decree Paxos highly available key-value store on top of a 3rd-party consensus library chain replication (or CRAQ), using a 3rd-party consensus library

0 views
Phil Eaton 2 months ago

Stack traces for Postgres errors with backtrace_functions

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 3 months ago

Want to meet people, try charging them for it?

I have been blogging consistently since 2017. And one of my goals in speaking publicly was always to connect with like-minded people. I always left my email and hoped people would get in touch. Even while my blog and twitter became popular, passing 1M views and 20k followers, I basically never had people get in touch to chat or meet up. So it felt kind of ridiculous when last November I started charging people $100 to chat . I mean, who am I? But people started showing up fairly immediately. Now granted the money did not go to me. It went to an education non-profit and I merely received the receipt. And at this point I've met a number of interesting people, from VCs to business professors to undergraduate students to founders and everyone in between. People wanting to talk about trends in databases, about how to succeed as a programmer, about marketing for developers, and so on. Women and men thoughout North America, Europe, Africa, New Zealand, India, Nepal, and so on. And I've raised nearly $6000 for educational non-profits. How is it that you go from giving away your time for free and getting no hits to charging and almost immediately getting results? For one, every person responded very positively to it being a fundraiser. It also helps me be entirely shameless about sharing on social media every single time someone donates; because it's such a positive thing. But also I think that in "charging" for my time it helps people feel more comfortable about actually taking my time, especially when we have never met. It gives you a reasonable excuse to take time from an internet rando. On the other hand, a lot of people come for advice and I think giving advice is pretty dangerous, especially since my background is not super conventional. I try to always frame things as just sharing my opinion and my perspective and that they should talk with many others and not take my suggestions without consideration. And there's also the problem that by charging everyone for my time now, I'm no longer available to people who could maybe use it the most. I do mention on my page that I will still take calls from people who don't donate, as my schedule allows. But to be honest I feel less incentivized to spend time when people do not donate. So I guess this is an issue with the program. But I mitigated even this slightly, and significantly jump-started the program, during my 30th birthday when I took calls with any person who donated at least $30. Anyway, I picked this path because I have wanted to get involved with helping students figure out their lives and careers. But without a degree I am literally unqualified for many volunteering programs. And I always found the time commitments for non-profits painful. So until starting this I figured it wouldn't be until I retire that I find some way to make a difference. But ultimately I kept meeting people who were starting their own non-profits now or donated significantly to help students. Peer pressure. I wanted to do my part now. And 30 minutes of my time in return for a donation receipt has been an easy trade. While only raising a humble $6,000 to date, the Chat for Education program has been more successful than I imagined. I've met many amazing people through it. And it's something that should be easy to keep up indefinitely. I hope to meet you through it too! I wrote about trying to meet like-minded people and fundraising for educational non-profits. pic.twitter.com/UJ9U6DIHGU

0 views
Phil Eaton 3 months ago

Debugging memory leaks in Postgres, jemalloc edition

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 4 months ago

Cheerleading

At work we're so absorbed in the difficulties we face that it becomes easy to forget what we appreciate and what we value in our coworkers. On social media we can be so focused on our own work that we forget to interact with others'. I have heard so many times from founders that they switch between heads down building, disappearing from the community, and then pop up to talk about what they've built. Engineers often say the exact same thing. This kills me. Though I probably still do it to a degree too. The easiest way for no one to give a shit about what you've done is for you to only interact at periodic intervals that are only convenient for you. With networking in general the most important time to interact and build relationships is when you do not need anything. By the time you want someone's support or you want someone to care about what you've done it is way too late to start building a relationship. One of the cheapest and most effective avenues to build genuine relationships is to be the biggest cheerleader you can be. At work this means when someone writes a fantastic blog post you call it out in a public channel. It means when a coworker's work is featured somewhere you call this out in a public channel. At least, this is what I do at work. I don't care that I have no leadership role or haven't met these people before. If I see something amazing that I'd like to see more of, I mention it in public and praise the person. Or when someone shares work they've done in a public channel, you hit all the emojis and you reply "that's awesome". On social media this means engaging seriously and (more) deeply with what people post. If the default behavior on social media is to passively observe, engaging more deeply can be liking a post. If you already show support through liking, engaging more deeply can be commenting or asking questions or (kindly) pointing out mistakes. Engagement is a spectrum. Genuine engagement consistently over time builds genuine relationships. I would not suggest that you do this without feeling it, or I'd rather only encourage you to respond to the degree that you feel. But I personally do feel so strongly when I cheer people on. So much of life and work is drudgery such that when you see something positive, someone taking initiative, someone with talent or potential doing something with their skills, how can you not feel an overwhelming urge to cheer them on and hope to see more of it? Hope to see it develop? What's more, I want to be around people who are trying new things and improving themselves. I want to be around people who celebrate. So I in turn try new things and work to improve myself and I celebrate the people around me. This energy is infectious. And I genuinely think even a single person in a group celebrating publicly changes the group dynamic. And I don't expect people to reciprocate in the same way or to the same degree that I show support. I'm particularly weird and confident and vocal. But if my cheerleading for some person seems like a complete sink then I'll not continue it and invest my time and energy elsewhere. Not everybody needs my support and that's ok! It is better that it's spent where it's most needed. There were people who used to cheer me on who don't much anymore and that's totally fine, I still have positive feelings for them for the time they did cheer me on. Cheerleading electrifies the work atmosphere. It is the proper use of social media. Cheerleading is the imperative of caring individuals as they become more experienced, gain confidence, command a wider audience, and want to continue their own growth through the development of genuine and supportive networks. And an ambitious, high-achieving, celebratory culture is simply the most fun to be a part of. I wrote a post on being a cheerleader. pic.twitter.com/mESNjSjQ0n

0 views
Phil Eaton 4 months ago

Debugging memory leaks in Postgres, heaptrack edition

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 5 months ago

Burn your title

I've been a developer, a manager, a cofounder, and now I'm a developer again. I ran away from each position until being a founder because I felt like I was limited by what I was allowed to do. But I reached an enlightment of sorts during my career progression: that everyone around me was dying for someone to pick things up, for employees to show engagement and agency. We think of our titles as our limits. We're quick to say and believe, "that isn't my job". While in reality titles reflect the minimum expected of us, not the maximum that is open to us. Trying to figure out what (new minimum) you must do to get promoted seems kind of backwards to me, reinforcing our sense of our own limits. Instead, at every stage in your career, focus on doing the intersection of: And this is the path to promotion and a successful and interesting career. Burn your title. Burn your job description. I mean, keep your boss happy for sure. Keep your teammates happy by supporting them and building them up and communicating well. But don't wait to be officially made a lead or given a new title to do what otherwise fits into that intersection above. And if after doing this for some time, demonstrating this level of agency, you are not promoted, it just means you're not at the right company or right organization within your company and you should look elsewhere. What's more, this work you did (at a company that doesn't appreciate your agency, if that happens to be the case) merely makes the case stronger for your successful interview at the next company. There's no downside. The cynical, and perhaps realistic, alternative to this is to do politics to get promoted. Or to not do politics but to do things that don't align with your long-term goals. I'm not personally interested in either path so I'm not covering them here. I'm interested in the intersection of things that move me in the direction I want, things that are useful to the company, and things that I am capable of doing (in addition to whatever minimum work I must actually do). Here's a peek at what this looks like for me as an individual contributor, a programmer, at EnterpriseDB. I started the EDB Engineering Newsletter because it seemed like we needed to do a better job telling the world the awesome things our engineering team is doing. (You know we're one of the biggest contributors to Postgres? Bruce Momjian, Robert Haas, Peter Eisentraut, etc. work here? The guy who implemented the WAL and MVCC in Postgres is my teammate?) Nobody asked me to do that. I started publishing blog views for the entire company once a month internally. Nobody asked me to do that. I wrote a number of internal docs and tutorials on the product because we were just obviously missing them. Nobody asked me to do that. I started a fortnightly incident review meeting for my team because it seemed like we were missing chances to update docs and teach each other. Nobody asked me to do that. I write a odd posts for the company blog on what I've learned. Nobody asked me to do that. These are just a few of the random things that seemed like a good idea for me to do on top of my Actual Work as a developer, which I think I do a decent job of on its own. Don't burn out. Don't do things you aren't asked for and don't find rewarding. Or that won't pave the way toward the career you want. I'm trying to be very careful not to advocate anything along those lines. But also don't wait to be asked to do something. Do what is interesting and obvious and rewarding to you. Interesting opportunities seem to come most reliably when you make them for yourself. Burn your title pic.twitter.com/4bQRPMX4EZ what you see needs to be done (that isn't being done) what you are capable of doing what you have the desire/energy (or would find fulfillment) doing

0 views
Phil Eaton 5 months ago

Transactions are a protocol

Transactions are not an intrinsic part of a storage system. Any storage system can be made transactional: Redis, S3, the filesystem, etc. Delta Lake and Orleans demonstrated techniques to make S3 (or cloud storage in general) transactional. Epoxy demonstrated techniques to make Redis (and any other system) transactional. And of course there's always good old Two-Phase Commit . If you don't want to read those papers, I wrote about a simplified implementation of Delta Lake and also wrote about a simplified MVCC implementation over a generic key-value storage layer. It is both the beauty and the burden of transactions that they are not intrinsic to a storage system. Postgres and MySQL and SQLite have transactions. But you don't need to use them. It isn't possible to require you to use transactions. Many developers, myself a few years ago included, do not know why you should use them. (Hint: read Designing Data Intensive Applications .) And you can take it even further by ignoring the transaction layer of an existing transactional database and implement your own transaction layer as Convex has done (the Epoxy paper above also does this). It isn't entirely clear that you have a lot to lose by implementing your own transaction layer since the indexes you'd want on the version field of a value would only be as expensive or slow as any other secondary index in a transactional database. Though why you'd do this isn't entirely clear (I will like to read about this from Convex some time). It's useful to see transaction protocols as another tool in your system design tool chest when you care about consistency, atomicity, and isolation. Especially as you build systems that span data systems. Maybe, as Ben Hindman hinted at the last NYC Systems , even proprietary APIs will eventually provide something like two-phase commit so physical systems outside our control can become transactional too. Transactions are a protocol short new post pic.twitter.com/nTj5LZUpUr

0 views
Phil Eaton 6 months ago

Things that go wrong with disk IO

There are a few interesting scenarios to keep in mind when writing applications (not just databases!) that read and write files, particularly in transactional contexts where you actually care about the integrity of the data and when you are editing data in place (versus copy-on-write for example). We'll go into a few scenarios where the following can happen: And how real-world data systems think about these scenarios. (They don't always think of them at all!) If I don't say otherwise I'm talking about behavior on Linux. The post is largely a review of two papers: Parity Lost and Parity Regained and Characteristics, Impact, and Tolerance of Partial Disk Failures . These two papers also go into the frequency of some of the issues discussed here. These behaviors actually happen in real life! Thank you to Alex Miller and George Xanthakis for reviewing a draft of this post. Some of these terms are reused in different contexts, and sometimes they are reused because they effectively mean the same thing in a certain configuration. But I'll try to be explicit to avoid confusion. The smallest amount of data that can be read and written atomically by hardware. It used to be 512 bytes, but on modern disks it is often 4KiB. There doesn't seem to be any safe assumption you can make about sector size, despite file system defaults (see below). You must check your disks to know. Typically set to the sector size since only this block size is atomic. The default in ext4 is 4KiB . A disk block that is in memory. Any reads/writes less than the size of a block will read the entire block into kernel memory even if less than that amount is sent back to userland. The smallest amount of data the system (database, application, etc.) chooses to act on, when it's read or written or held in memory. The page size is some multiple of the filesystem/kernel block size (including the multiple being 1). SQLite's default page size is 4KiB. MySQL's default page size is 16KiB. Postgres's default page size is 8KiB. By default, file writes succeed when the data is copied into kernel memory (buffered IO). The man page for write(2) says: A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data. In this case, some errors might be delayed until a future write(), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data. If you don't call fsync on Linux the data isn't necessarily durably on disk, and if the system crashes or restarts before the disk writes the data to non-volatile storage, you may lose data. With O_DIRECT , file writes succeed when the data is copied to at least the disk cache . Alternatively you could open the file with (or ) and forgo fsync calls. fsync on macOS is a no-op. If you're confused, read Userland Disk I/O . Postgres, SQLite, MongoDB, MySQL fsync data before considering a transaction successful by default. RocksDB does not. fsync isn't guaranteed to succeed. And when it fails you can't tell which write failed. It may not even be a failure of a write to a file that your process opened : Ideally, the kernel would report errors only on file descriptions on which writes were done that subsequently failed to be written back. The generic pagecache infrastructure does not track the file descriptions that have dirtied each individual page however, so determining which file descriptors should get back an error is not possible. Instead, the generic writeback error tracking infrastructure in the kernel settles for reporting errors to fsync on all file descriptions that were open at the time that the error occurred. In a situation with multiple writers, all of them will get back an error on a subsequent fsync, even if all of the writes done through that particular file descriptor succeeded (or even if there were no writes on that file descriptor at all). Don't be 2018-era Postgres . The only way to have known which exact write failed would be to open the file with (or ), though this is not the only way to handle fsync failures. If you don't checksum your data on write and check the checksum on read (as well as periodic scrubbing a la ZFS) you will never be aware if and when the data gets corrupted and you will have to restore (who knows how far back in time) from backups if and when you notice. ZFS , MongoDB ( WiredTiger ), MySQL ( InnoDB ), and RocksDB checksum data by default. Postgres and SQLite do not (though databases created from Postgres 18+ will). You should probably turn on checksums on any system that supports it, regardless of the default. Only when the page size you write = block size of your filesystem = sector size of your disk is a write guaranteed to be atomic. If you need to write multiple sectors of data atomically there is the risk that some sectors are written and then the system crashes or restarts. This behavior is called torn writes or torn pages. Postgres , SQLite , and MySQL ( InnoDB ) handle torn writes. Torn writes are by definition not relevant to immutable storage systems like RocksDB (and other LSM Tree or Copy-on-Write systems like MongoDB (WiredTiger)) unless writes (that update metadata ) span sectors. If your file system duplicates all writes like MySQL (InnoDB) does (like you can with data=journal in ext4 ) you may also not have to worry about torn writes. On the other hand, this amplifies writes 2x. Sometimes fsync succeeds but the data isn't actually on disk because the disk is lying. This behavior is called lost writes or phantom writes. You can be resilient to phantom writes by always reading back what you wrote (expensive) or versioning what you wrote. Databases and file systems generally do not seem to handle this situation. If you aren't including where data is supposed to be on disk as part of the checksum or page itself, you risk being unaware that you wrote data to the wrong place or that you read from the wrong place. This is called misdirected writes/reads. Databases and file systems generally do not seem to handle this situation. In increasing levels of paranoia (laudatory) follow ZFS , Andrea and Remzi Arpaci-Dusseau, and TigerBeetle . I wrote a post covering some of the scenarios you might want to be aware of, and resilient to, when you write systems that read and write files. pic.twitter.com/7FxbpMo1xm Data you write never actually makes it to disk Data you write get sent to the wrong location on disk Data you read is read from the wrong location on disk Data gets corrupted on disk

0 views
Phil Eaton 6 months ago

Phil Eaton on Technical Blogging

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 7 months ago

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

This is an external post of mine. Click here if you are not redirected.

0 views
Phil Eaton 8 months ago

From web developer to database developer in 10 years

Last month I completed my first year at EnterpriseDB. I'm on the team that built and maintains pglogical and who, over the years, contributed a good chunk of the logical replication functionality that exists in community Postgres. Most of my work, our work, is in C and Rust with tests in Perl and Python. Our focus these days is a descendant of pglogical called Postgres Distributed which supports replicating DDL, tunable consistency across the cluster, etc. This post is about how I got here. I was a web developer from 2014-2021†. I wrote JavaScript and HTML and CSS and whatever server-side language: Python or Go or PHP. I was a hands-on engineering manager from 2017-2021. I was pretty clueless about databases and indeed database knowledge was not a serious part of any interview I did. Throughout that time (2014-2021) I wanted to move my career forward as quickly as possible so I spent much of my free time doing educational projects and writing about them on this blog (or previous incarnations of it). I learned how to write primitive HTTP servers, how to write little parsers and interpreters and compilers. It was a virtuous cycle because the internet (Hacker News anyway) liked reading these posts and I wanted to learn how the black boxes worked. But I shied away from data structures and algorithms (DSA) because they seemed complicated and useless to the work that I did. That is, until 2020 when an inbox page I built started loading more and more slowly as the inbox grew. My coworker pointed me at Use The Index, Luke and the DSA scales fell from my eyes. I wanted to understand this new black box so I built a little in-memory SQL database with support for indexes. I'm a college dropout so even while I was interested in compilers and interpreters earlier in my career I never dreamed I could get a job working on them. Only geniuses and PhDs did that work and I was neither. The idea of working on a database felt the same. However, I could work on little database side projects like I had done before on other topics, so I did . Or a series of explorations of Raft implementations, others' and my own. From 2021-2023 I tried to start a company and when that didn't pan out I joined TigerBeetle as a cofounder to work on marketing and community. It was during this time I started the Software Internals Discord and /r/databasedevelopment which have since kind of exploded in popularity among professionals and academics in database and distributed systems. TigerBeetle was my first job at a database company, and while I contributed bits of code I was not a developer there. It was a way into the space . And indeed it was an incredible learning experience both on the cofounder side and on the database side. I wrote articles with King and Joran that helped teach and affirm for myself the basics of databases and consensus-based distributed systems. When I left TigerBeetle in 2023 I was still not sure if I could get a job as an actual database developer. My network had exploded since 2021 (when I started my own company that didn't pan out) so I had no trouble getting referrals at database companies. But my background kept leading hiring managers to suggest putting me on cloud teams doing orchestration in Go around a database rather than working on the database itself. I was unhappy with this type-casting so I held out while unemployed and continued to write posts and host virtual hackweeks messing with Postgres and MySQL. I started the first incarnation of the Software Internals Book Club during this time, reading Designing Data Intensive Applications with 5-10 other developers in Bryant Park. During this time I also started the NYC Systems Coffee Club . After about four months of searching I ended up with three good offers, all to do C and Rust development on Postgres (extensions) as an individual contributor. Working on extensions might sound like the definition of not-sexy, but Postgres APIs are so loosely abstracted it's really as if you're working on Postgres itself. You can mess with almost anything in Postgres so you have to be very aware of what you're doing. And when you can't mess with something in Postgres because an API doesn't yet exist, companies have the tendency to just fork Postgres so they can. (This tendency isn't specific to Postgres, almost every open-source database company seems to have a long-running internal fork or two of the database.) Two of the three offers were from early-stage startups and after more than 3 years being part of the earliest stages of startups I was happy for a break. But the third offer was from one of the biggest contributors to Postgres, a 20-year old company called EnterpriseDB. (You can probably come up with different rankings of companies using different metrics so I'm only saying EnterpriseDB is one of the biggest contributors.) It seemed like the best place to be to learn a lot and contribute something meaningful. My coworkers are a mix of Postgres veterans (people who contributed the WAL to Postgres, who contributed MVCC to Postgres, who contributed logical decoding and logical replication, who contributed parallel queries; the list goes on and on) but also my developer-coworkers are people who started at EnterpriseDB on technical support, or who were previously Postgres administrators. It's quite a mix. Relatively few geniuses or PhDs, despite what I used to think, but they certainly work hard and have hard-earned experience. Anyway, I've now been working at EnterpriseDB for over a year so I wanted to share this retrospective. I also wanted to cover what it's like coming from engineering management and founding companies to going back to being an individual contributor. (Spoiler: incredibly enjoyable.) But it has been hard enough to make myself write this much so I'm calling it a day. :) I wrote a post about the winding path I took from web developer to database developer over 10 years. pic.twitter.com/tf8bUDRzjV † From 2011-2014 I also did contract web development but this was part-time while I was in school.

0 views
Phil Eaton 8 months ago

Edit for clarity

I have the fortune to review a few important blog posts every year and the biggest value I add is to call out sentences or sections that make no sense. It is quite simple and you can do it too. Without clarity only those at your company in marketing and sales (whose job it is to work with what they get) will give you the courtesy of a cursory read and a like on LinkedIn. This is all that most corporate writing achieves. It is the norm and it is understandable. But if you want to reach an audience beyond those folks, you have to make sure you're not writing nonsense. And you, as reviewer and editor, have the chance to call out nonsense if you can get yourself to recognize it. But especially when editing blog posts at work, it is easy to gloss over things that make no sense because we are so constantly bombarded by things that make no sense. Maybe it's buzzwords or cliches, or simply lack of rapport. We become immune to nonsense. And even worse, without care, as we become more experienced, we become more fearful to say "I have no idea what you are talking about". We're afraid to look incompetent by admitting our confusion. This fear is understandable, but is itself stupid. And I will trust you to deal with this on your own. So as you review a post, read it out loud to yourself. And if you find yourself saying "what on earth are you talking about", add that as a comment as gently as you feel you should. It is not offensive to say this (depending on how you say it). It is surely the case that the author did not know they were making no sense. It is worse to not mention your confusion and allow the author to look like an idiot or a bore. Once you can call out what does not make sense to you, then read the post again and consider what would not make sense to someone without the context you have. Someone outside your company. Of course you need to make assumptions about the audience to a degree. It is likely your customers or prospects you have in mind. Not your friends or family. With the audience you have in mind, would what you're reading make any sense? Has the author given sufficient background or introduced relevant concepts before bringing up something new? Again this is a second step though. The first step is to make sure that the post makes sense to you . In almost every draft I read, at my company or not, there is something that does not make sense to me. Do two paragraphs need to be reordered because the first one accidentally depended on information mentioned in the second? Are you making ambiguous use of pronouns? And so on. Clarity on its own will put you in the 99th percentile of writing. Beyond that it definitely still matters if you are compelling and original and whatnot. But too often it seems we focus on being exciting rather than being clear. But it doesn't matter if you've got something exciting if it makes no sense to your reader. This sounds like mundane guidance, but I have reviewed many posts that were reviewed by other people and no one else called out nonsense. I feel compelled to mention how important it is. Wrote a new post on the most important, and perhaps least done, thing you can do while reviewing a blog post: edit for clarity. pic.twitter.com/ODblOUzB3g

0 views
Phil Eaton 8 months ago

An explosion of transitive dependencies

A small standard library means an explosion in transitive dependencies. A more comprehensive standard library helps you minimize dependencies. Don't misunderstand me: in a real-world project, it is practically impossible to have zero dependencies. Armin Ronacher called for a vibe shift among programmers and I think that this actually exists already. Everyone I speak to on this topic has agreed that minimizing dependencies is ideal. Rust and JavaScript, with their incredibly minimal standard libraries, work against this ideal . Go, Python, Java, and C# in contrast have a decent standard library, which helps minimize the explosion of transitive dependencies. I think the standard library should reasonably include: But I don't think it needs to include: Neither of these are intended to be complete lists, just examples. Minimal standard libraries force growing companies to build out their own internal collection of "standard libraries". As one example, Bloomberg did this with C++. And I've heard of companies doing this already with Rust. This allows larger companies to manage and minimize the explosion of transitive dependencies over time. All growing companies likely do something like this eventually. But again, smaller standard libraries incentivize companies to build this internal standard library earlier on. And the community benefits relatively little from these internal standard libraries. The community would benefit more if large organizations contributed back to an actual standard library. Smaller organizations do not have the capacity to build these internal standard libraries. Maybe the situation will lead to libraries like Boost for JavaScript and Rust programmers. That could be fine. A comprehensive standard library does not prevent the language developers from releasing new versions of the standard library. It is trivial to do this with naming like Go has done with the v2 pattern. math/rand/v2 is an example. I'm primarily thinking about maintainability, not security. You can read about the security risks of using a language with an ecosystem like Rust from someone who is an expert on the matter. My concern about the standard library does not stop me from using Rust and JavaScript. They could choose to invest in the standard library at any time. We have already begun to see Bun and Deno to do exactly this. But it is clearly an area for improvement in Rust and JavaScript. And a mistake for other languages to avoid repeating. While zero dependencies is practically impossible, everyone I've spoken to agrees that minimizing dependencies is ideal. Rust and JavaScript work against this ideal. But they could change at any time. And Bun and Deno are already examples of this. https://t.co/qkSh6oW1Yd pic.twitter.com/mY1MNErZG7 JSON, CSV, and Parquet support HTTP/2 support (which includes TLS, compression, random number generation, etc.) Support for asynchronous IO A logging abstraction A SQL client abstraction Key abstract data types (BTrees, hashmaps, sets, and growable arrays) Utilities for working with Unicode, time and timezones Excel support PostgreSQL or Oracle clients Flatbuffers support Niche data structures

0 views