GreatReads - Blog Aggregator · Phoenix Framework

Let Your Teams Own Their Processes

When it comes to team operating principles, I'm sure we've all encountered the leader who values uniformity seemingly for the sake of it. Every team has the same ceremonies, on the same days. Status reports are via a templated slide deck with the same format and the same information as all the other teams. I get it, this can be comforting. I worked at a place like that. Every team used Scrum, estimated with story points, sprints all started and ended on the same day, and the last day of the sprint featured a full day of sprint-reviews presented via PowerPoint decks that all basically looked the same save for maybe some screenshots of whatever feature the team was working on. One of my teams, however, was really struggling. Because of the nature of their scope of ownership and who their stakeholders were, they could rarely get through a sprint without some new information or requirements coming to light that either injected new, urgent work or made some of the work they had planned pointless. Going by the book, this should have caused us to cancel nearly every sprint (we didn't, we just soldiered on). During a retro, we all concluded that this team would be better served by using Kanban and tracking Cycle Time as the primary metric for doing estimations. The story points method was proving pretty wasteful because we'd often break down and estimate work that we didn't end up doing, then have to point some new work on the fly mostly because we needed the stats for our sprint review deck. I could talk all day about how we could have handled things differently, but that's not the point of this post. The upshot is, the proposal to switch to Kanban was shot down by the execs since it proposed reporting cycle time and throughput metrics rather than the points-based velocity they were used to. (We agreed to keep doing "Sprint" reviews on the same cadence even though we weren't planning to do sprints). The perceived control they got from seeing a graph showing whether or not velocity was going up or down was too valuable to them to give up and having two systems of measurement was a bridge too far, I guess. Ultimately, a path that could have improved the satisfaction of both the team and its stakeholders was cut off to them. This is the failure mode that uniform process creates. A concept I first heard from Jean-Michel Lemieux when he was leading engineering at Shopify was creating an "API For Teams". Like an API in software, the API For Teams defines a contract for interfacing with the rest of the business, but leaves the implementation up to the implementors. A good basic interface would be something like: This information should be available any time, with reasonable expectations on around staleness for things like estimates and health. (Put it in a Slack integration or some other info delivery system and people can even query it without interrupting anyone). In an organization with good, aligned incentives , something like that can give stakeholders the information they need to make decisions without over-specifying the processes in a way that removes so much autonomy from the team that they cease taking any interest in finding ways to improve. The less you demonstrate that you trust your team, the less they'll try. The API concept lets teams decide for themselves what processes best match their skills, the type of work they do, the way they deploy software to customers. I've had teams choose to switch to Kanban from Scrum because in the world of continuous delivery they found the starting and stopping of sprints was a weird artificial construct that caused things like code reviews to bunch up until later in the sprint. I've had other teams fully embrace sprints because they could divide up a couple week's work among themselves and mostly stay heads down without a lot of coordination overhead. No one system or way of working ever emerged as "the best" in a way that I would have imposed it on every other team. Keep the API minimal and be mindful that you're not adding things on to the point where you've dictated the implementation. This is where the API concept shines. The information that's expected should be clear, non-negotiable, and where metrics are involved, not easily gamed. The incentives should be aligned around extracting truthful, honest data. Don't give teams the incentive to lie, don't punish them if they report out data you don't like. Try to help when teams are struggling, and encourage a culture of sharing so that when one team finds something that works, they can share that with other teams who might also benefit. For their part, teams need to be accountable to the outcomes, not just providing the data. If they keep missing their target dates, that's a contract violation and they probably need to change their implementation. They should hear that. Again, I understand the comfort that uniformity can bring but taking it too far will be stifling and counterproductive. On the flip side, getting too into the details s a bad use of a leader's time and they'll never have the amount of information needed to make decisions that the team doing the work has. The chaotic-good manager embraces some ambiguity and loss of control and in exchange they'll often find they get a better outcome. " Tangye instrument panel " by Elsie esq. is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Topics you want to hear about? Get in touch ! Project Current Project or Deliverable Health of the project. Phase of the project (preview, beta, GA, etc) Current estimate for next delivery People Open roles on the team Who is on-call and/or handling support escalations? Risk Current level of Planned vs. Unplanned work. Any escalations from their last retrospective that need addressing.

Agile

0 views

Chaotic-Good Engineering Management 1 months ago

How to Be Better at Talking About 'Tech Debt'

Tech Debt started out as a useful metaphor. You 'borrow' from the future to achieve an outcome today. Much in the same way you can achieve home ownership earlier by taking out a mortgage after saving enough for a down payment. However, the way it's used now is long past its usefulness as an explanation for anything. You might as well blame the phase of the moon. It's about as edifying. You wouldn't tell the doctor your problem is "pain" without giving more details, yet I hear 'tech debt' as the entire explanation way too often. Part of your job as a leader is to make technical problems understandable by non-technical stakeholders. Analogies can serve you well here: I've said it before, but at its core, management is the art of efficiently deploying capital for the highest return on investment. In a software company most of that capital is in the form of engineering time. To make the best decisions we need to be more precise than just saying we want to use it to "pay down tech debt". That doesn't sell because it is a statement devoid of useful information. It doesn't say anything about what kind of return on investment you might be getting. While specifically quantifying the ROI of preventative maintenance or restorative work in pure dollar terms can be hard or even impossible, using an analogy that most people understand can help make your case. First things first, you need to understand what kind of debt you're dealing with. When you're talking about intentional tech debt, the shortcuts you took to get a product out the door, talk about the debt in concrete terms. Discuss the specific shortcuts you took and what the 'interest payments' are on them. For example: This is the kind of debt you sometimes should take on but when you're deliberate about it and clear about why you're doing it and what you get out of it, you can also be clear about what's needed to pay it off and what the costs of not doing so are. You can even do all of that without saying the words "technical debt", which is becoming a term that makes people stop listening. This kind of debt doesn't really need metaphors to explain, as long as you're clear about what the trade-offs and ongoing costs are. This is probably the most common form and it's where the debt metaphor is often the least useful. You can build the best system you possibly could with the information available and the world can change in ways that your previously well-designed system simply can't handle. Many seemingly good decisions can prove to have been the wrong one in retrospect. This has a lot of the same characteristics as deliberate debt. You've got ongoing costs related to previous decisions where a project to 'pay down' the debt would remedy the situation. However because it was not taken on deliberately it can be harder to nail down the source of the pain and make a clear case for the remedy. This is especially true when the decisions were in the long ago past and nobody knows or remembers why they were made. To make matters worse, when there's a lot of it it can just sort of feel like a "big blob of yuck" This is the kind of situation that leads to people saying "we should just rewrite it" (Don't). Now, the inability to clearly articulate the problem can be a problem in and of itself . It's entirely possible that the understanding of the full shape of the system is lacking to the point where nobody can effectively diagnose things in specific enough terms to be useful. This is common in high-turnover organizations or ones that never developed good habits of documenting their decisions. Whatever the cause, it should go on your list of things to remedy. When you've got these kinds of problems, the age old technique of elephant-eating (one bite at a time) is the solution. Try to categorize, break down, and understand in more detail the types of problems you have. If you must stick with the debt metaphor you can categorize it by 'interest rate': Doing this breakdown and classification is probably best done as a group effort. One person's payday loan might be another person's mortgage debt. It's best to understand it in the broadest business context you can and try to be as scientific as you can be. There's bound to be some subjectivity to it and biases and preferences can skew the perception of how bad a problem truly is. Many eyes make this kind of planning more effective. If your product is built on a framework that's still maintained, and the version you're on is going end-of-life because you haven't kept up, that's not tech debt . You've neglected basic maintenance. If your porch is falling apart because the stain protecting the wood has been gone for years and water is rotting the boards that's not increasing the size of your mortgage. If it is debt, it's money you borrowed from the mafia and now they're on their way to break your legs. As the saying goes, code doesn't age like wine, it ages like milk. With that comes risks that need to be managed. There is often pressure to ignore those risks in favour of new features or other, more exciting things. Who wouldn't rather buy a new flashy electronic gadget than paint the porch again? You know that's irresponsible and so is neglecting basic maintenance of your software systems. This is one of the reasons it's critically important to align the incentives of the EM and PM partners . If only the EM has the incentive to keep the porch in good shape and only the PM has the incentive to buy new toys, you're gonna have a bad time. The best way to deal with this type of problem is to not get into it in the first place. The 'good news' is that if you leave it long enough the problem will become so obvious that you won't have to justify fixing it. The bad news is that the fix is going to be way more expensive than if you'd done the basic maintenance that could have prevented it. Don't borrow money from loan sharks! As engineering leaders, we have the responsibility to frame the necessity to do maintenance work and to pay down both deliberate and unintentional technical debt in business terms. Often the problem is that nobody is able to do that successfully until it becomes a crisis. Taking on deliberate debt with a full understanding and plan to pay it off can get you the software company equivalent of home ownership sooner with manageable interest payments. Being able to make business case for 'engineering work' or things that aren't just new features is a critical skill for engineering leaders to develop. And whenever you can, make sure your Product counterpart shares both the pain and the glory of having a well-functioning product. Once y0u develop that skill, you can use it to avoid taking on those payday loans with the highest interest payments and spend your capital on what's best for the business without saddling your future self or your successor with big bills. " Debt Consolidation, Circa 1948 " by Orin Zebest is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! We hung up a sheet of plastic instead of a door when we built the garage so we could start parking the car inside. We need to install the door now. We need to change the oil on the car now so the engine doesn't seize later. We should patch the leaky basement so that we don't need to deal with the foundation collapsing. "We shipped this in June to get fast customer feedback knowing it would need to be hardened before Black Friday in November. We need to make these specific improvements before then or we won't make it through the holidays without major downtime." "We knew this code wasn't optimized and we deployed it on bigger servers to make up for it. If we spend a couple weeks on performance improvements we can save X thousand dollars over the next year by redeploying it on smaller, less expensive machines." Your traffic or account sizes could vastly exceed your expectations. A library, component, or framework you depended on can become abandoned. You made a trade off without realizing it. You didn't fully know what you were doing when you wrote it in the first place. Payday Loan Debt (emergency): This is the stuff that's hurting you daily and costing you the most. You're coping, not solving anything . Pay it off ASAP. Credit Card Debt (not good): This stuff is a major drag on velocity and there's a clear case to be made for paying it off. Make that case, focus on the ROI. Mortgage debt (might be ok): This stuff might or might not have a business case for paying it off. It might bug you a little but it might be worth putting up with. But only by understanding it can you make that call.

Career

Business

0 views

Chaotic-Good Engineering Management 2 months ago

Quarter one retrospective

I registered this domain about 3 months ago and started posting shortly after so it seems like a good time to do a quarterly retrospective. I've always thought the retrospective, done well, is one of the most powerful tools in a team's arsenal for making their lives better and improving their processes, so I'm going to try to do one periodically with myself for this blog. My aim when I set out was to publish about 1 per week in the Mon-Wed time frame. I didn't want to be super-strict about the schedule in case other things came up or a post took longer than I thought to get together. Mostly got there. The longest gap between posts was 12 days but that was a job-interview heavy week so I'll give myself a little grace there. I haven't been super strict on achieving a Mon-Wed delivery either. Takeaway: Setting the one-post-per-week goal at the beginning was beneficial. I'd like to tighten it up for the next quarter and try to hit a more predictable schedule and post every Wednesday. The How to be a leader when the vibes are off post did the best numbers and it's not even close. Ironically, it was the first one I didn't post to HackerNews because the previous ones got basically zero attention. Someone else posted it (thank you, whoever you are) and it got some serious traction. It's sitting around 40,000 views right now whereas my next top performer (unsurprisingly: Why AI Probably Won't Help Your Team Ship More Product is in second place with 3,500. That one got picked up by a newsletter that drove some traffic. The other posts were much more modest, bordering on disappointing. I'll attribute that to my own discomfort with self-promotion which is something I'm probably going to have to get over if I expect that to change meaningfully. Other than HackerNews and that newsletter, Reddit ( ) and LinkedIn have been the best drivers of traffic - probably because those are the only places I promote them. Mastodon, similarly doesn't seem to drive much traffic but I don't have a huge following there. I started writing some promotional blurbs when I post to LinkedIn and Reddit. It makes me uncomfortable, I'm not going to lie. There's so much cringey content on LinkedIn especially that I hate to be one of those people but it does seem to help and I'm trying to keep it from sounding too ridiculous or clickbaity. Takeaway: I should probably work a bit harder on threading the needle between doing next-to-nothing to promote this blog and becoming a shameless shill for my own brand without going full-influencer. Thankfully, no internet randos have come to yell at me about anything I've written or not written (I didn't read the HN comments). Most of the feedback has been from friends and former colleagues and has been basically all positive. My favourite one: That part is important to me. For what it's worth I don't use AI to write them. I write them in markdown using vim without any kind of plugins or assistants. I'm not anti-AI as a rule. I do use ChatGPT and Claude pretty much every day. In the context of writing I definitely use it for proof reading since I don't even have a spell-checker in vim 😄and I sometimes use it for helping me organize my thoughts around an idea but every word I've published has been written by me (or is an attributed quote from somewhere I consider trustworthy). I did also have one company looking for someone to lead their engineering team reach out and ended up interviewing with them. That was always kind of a secondary goal of this endeavour so happy that worked out (though the job didn't - which is fine - I had a good experience with them). Takeaway: The feedback has been encouraging enough that I think I'm on the right track. Stay the course. Main thing to change I think is doing a bit more promotion and seeing if I can grow the audience for this. That said, it's not political hot-takes so there's a long-tail opportunity to let it just grow over time organically which is also fine. I have no direct plans to 'monetize' this content. I publish with Ghost and it does have a paywall/subscription feature but at this point it doesn't make sense to try to turn this into cash-flow. At the very least I don't think 12 posts are enough to say I've demonstrated enough value that anyone's should be ready to pull out their wallet. In any case, it's probably better suited to just be a capsule summary of my management style and can serve as a long form advertisement for hiring me to do manager-y things rather than writing for a living. That said, I am living off my savings right now so I reserve the right to change my mind if I need income more urgently.😆 I've been coding! While we're in the midst of all of this AI stuff I've been exploring that space, primarily the tooling and code generation aspects while working on a real product that might be in shape to launch soonish. The product itself isn't in the AI arena, but I've been curious about the tools and how well they do. My approach has been not to 'vibe-code' but to find the way to use them while still being aware of what they're up to enough that if they poofed out of existence I would not be left with an unmaintainable mess. The code base I'm working on was initially built by someone else over the course of a year or so and was build in a stack that was pretty much all stuff I'd never worked with professionally. So, steep learning curve, plus I've only really dabbled in hobby projects for the past 15 years since my day jobs have all been manager roles. I'll maybe save the details for another post since my first AI one did so well, but I'll summarize my findings this way: It's really tempting to just let it run sometimes but it becomes counter-productive at some point if it takes me longer to figure out what it did than it would to have just written it myself. I think this is where some of the "feels faster but it's actually slower" measurements that are filtering in from out there are coming from. It still feels like early days for this tech though. It's already changed so much since the first time I tried it. Lots left to be discovered, good and bad. Anyways, I've used ChatGPT Codex, Cursor, and Claude Code and right now I'm mostly using GPT Codex but I keep it on a short leash. After all, the humans are still the ones on-call. As mentioned, I've taken a few interviews here and there. One of the reasons I wanted to write this is to get some more of my management philosophy out there so any prospective employer can see what they're getting. I really want there to be a 'love match' between me and any future employers, with strong alignment on philosophy and approach. When that's not there it's never a good time. Work is enjoyable when you're doing what you love in the way you feel is best and it's no fun at all when you feel like you have to compromise on that. Thank you again for being a subscriber. I really appreciate it. Please feel free to get in touch if you have something you feel like I should write about or if I could be a help to you or your organization in any way. No plans to stop at the moment. Hopefully I don't run out of ideas! " Driver's side rear view mirror on 1955 Carpenter / Steelcraft school bus (built on 1954 GMC chassis) " by Darron Birgenheier is licensed under CC BY-SA 2.0 . 10 posts by me. 1 guest post. 1 'meta' post to kick things off and an 'about' page. Helping me find my way around an unfamiliar code base: 👍 Explaining what some bit of code is doing: 👍 Writing some tedious/boring code with a small scope: 👍 Writing anything with any amount of complexity: 😵

Career

Writing

0 views

Chaotic-Good Engineering Management 2 months ago

The Trust Triangle

A theme I find myself coming back to over and over again is the three-way relationship between Autonomy , Accountability , and Alignment and how those interact to produce high-performing teams. In case I need to define the terms: These three aspects form a Triangle of success, and the thing that provides it structural integrity; its load bearing element; is: trust . Before getting into each leg of the triangle, it's worth looking into what happens when one or more sides weaken. When teams don't have healthy levels of all three aspects of this Triangle, they stop thinking for themselves and wait to be told what to do because that's the safest course of action. This presents as rigidity and unresponsiveness. They become unwilling or unable to adapt or compromise because there is too much inherent risk. They will seek permission before taking any steps and will resist beneficial changes that have any aspect of uncertainty in them for fear of being punished if things don't work out. If people don't believe they'll be supported, even in failure, they'll stop trying. They'll outsource the risk to the higher-ups by letting them make all the decisions and hand them out in the form of dictums. This is a mode that doesn't scale up well and one that is all but guaranteed to result in a slower team producing suboptimal results. Once a team is in this state, it's hard to get them out. This is learned behaviour that came from many interactions over time. The way to turn it around is to reverse the process. Give the team some agency over something that should be in their control, celebrate wins, learn from failures, and repeat this as many times as you can while trying to increase the impact potential of each iteration. When you have good outcomes, share them with other leaders in the org. Removing nitpicky oversight and second-guessing reduces escalations and creates focus and efficiency. An individual or team empowered to make decisions that further their goals without having to constantly convince the bosses that they're making the right decisions is going to be faster than one who is constantly being second-guessed or overruled. Additionally, people will work harder on a solution of their own devising than one that was imposed upon them that they don't necessarily agree with or believe in, so you typically will get better results as well as faster results if you give people freedom to run. To put it another way, if the job is cleaning the dishes, I don't care if you use a sponge, a brush, or a cloth as long as the dishes are clean. Use whatever works. In software terms, I've worked with people who like a fully-tricked-out IDE with all kinds of autocomplete and magic refactoring tools (and now AI code generation) and people who use minimal Vim setups. This is something where I would never dictate a choice, even though I might have a strong opinion about what I would use. Whatever gets the job done! When trust is low, people stop using the autonomy they have for the good of the project and instead use it to make sure they're not the ones who will get the blame if things go wrong. In low-trust, highly regimented environments people will use their agency and initiative not for the good of the project, but to manipulate the system in their favour. When trust is high, people don't have to dedicate brain cycles to staying out of trouble or appeasing the bosses and can devote their energies to the team's goals instead. Your job is to give them the room to innovate on the work the business expects from them, not to innovate on ways to work around you. I used to dislike the term 'alignment', thinking it was an unnecessary business-speak substitute for 'agreement'. Now I think there's a subtle but important difference that becomes clear when you realize that alignment is what supports and enables autonomy. It really is about direction; everyone pointing their efforts at a common target. It's the shared understanding of why you're doing what you're doing. Too little of it and you get entropy. People can be working hard and doing great work but if it's not directed it quickly becomes chaotic. It's broader and less specific than simple agreement and takes more effort to create and sustain. Alignment is more of a process than an event . You have to treat it as a living conversation and make tweaks and corrections as you go. At the same time, focusing on it too much as a manager can make it indistinguishable from 'control'. I like to think of myself as a participant in the alignment process first and foremost. Yes, you do have a responsibility to create and maintain that alignment, but don't force it. When done right there should be push and pull from all participants with the general feeling that everyone is seeking both to be understood and to understand. Alignment is what turns autonomy into progress and ensures it aimed at the right target. Without accountability, things can drift. The 'mass' of accountability provides stability to the team and keeps them focused on the results. If they know they will have to stand up in front of their stakeholders and peers and report out the results of their efforts, they're going to do their best to achieve those results in a way they can be proud of. Of the three aspects of the Triangle, this one is the hardest for people who tend to default to low-trust to get right. It does not mean that "successes are rewarded and failures are punished". Good, healthy accountability is self-imposed. People want to uphold the standards, they want to achieve the objectives, and when they miss the mark they are forthcoming about that fact and take steps to learn from their mistakes. They are not sent to the Gulag or ordered to " commit seppuku by sunset tomorrow " but neither should they simply shrug their shoulders and move on after something goes wrong. If one of your SaaS vendors has a production incident that impacts you, think about the kind of accountability you'd want to see in from them in a post-incident report. A good one probably has: What you would probably not want to see is something like "Ricky the intern pushed the wrong button so we fired him. Problem solved." That's blame, not accountability and it would harm rather than increase my trust of that vendor. Even a less extreme example that feels evasive or tries to deflect responsibility onto the vendor's vendors or something can erode trust. It should be the same for teams. Nothing goes right 100% of the time and no strictures or processes you put in place can guarantee good outcomes. With that in mind it becomes super important to have a clear understanding of how you'll behave when things miss the mark. Admittedly, this is tricky if you aren't wired to be trusting by default. Not everyone is and our human cognitive biases tend to steer us towards wanting to place blame. But, it's worth trying to build that muscle because the payoffs can be enormous. Plenty has been written on how to build a blameless culture and modern adaptations of the idea do take into account our natural tendencies and reframe the idea of blameless culture as 'blame-aware'. A few practical ideas for building your team's Triangle of Trust. I try to go into every new team situation with the following axioms: Occasionally I'm wrong about some aspect of that and I need to make adjustments, but as a starting point it's worked really well for me. When I do need to make adjustments I also try to work back to a state where those things are true again. If someone needs upskilling or they're maybe in the wrong role, make the necessary moves and revisit your assessment after a little while. Be careful about over-correcting so that you don't get into learned helplessness territory. If your team does a retrospective process, this is a good opportunity to work some of these assessments into your schedule and have the team participate. You don't have to keep this your secret rubric, you can be straight up with your team and tell them you're trying to build this Triangle and get them involved in helping. If you don't have a retrospective process, this is a good reason to introduce one! If you don't invest in building it in both directions, your team will become an inefficient sclerotic mass that requires constant input and energy from you to move. Projects run on plans, and teams run on trust. " Old railway bridges " by radkuch.13 is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Want help with your organization? Get in touch ! Autonomy: The team or individual has some agency over how they accomplish their goals. They are not micro-managed and are given some decision-making authority. Accountability: The team or individual takes responsibility for the outcomes. They celebrate their wins and learn from their mistakes when they miss the mark without pointing fingers or scapegoating their own team members or other people in the organization. Alignment: The team or individual understands the goals clearly and shares this understanding with their peers and stakeholders. A reasonably detailed description of what went wrong. What the contributing factors were. What the remediation steps were. What is being done to prevent it happening again. The people I'm working with are skilled adults. They are intrinsically motivated to do their best work. When we don't get the outcomes we want, look at the system first, individuals second. Are people getting aligned and staying aligned? Are they accepting the autonomy you're trying to give them or do you need to give it in smaller doses? Are they holding each other accountable in healthy ways or pointing fingers? And most importantly: Are we getting the outcomes we want?

Career

Business

0 views

Chaotic-Good Engineering Management 2 months ago

Gatekeepers vs. Matchmakers

I estimate I’ve conducted well over 1,000 job interviews for developers and managers in my career. This has caused me to form opinions about what makes a good interview. I’ve spent the majority of it in fast-growing companies and, with the exception of occasional pauses here and there, we were always hiring. I’ve interviewed at every level from intern (marathon sessions at the University of Waterloo campus interviewing dozens of candidates over a couple of days) to VP and CTO level (my future bosses in some cases, my successor in roles I was departing in others). Probably the strongest opinion that I hold after all that is: adopting a Matchmaker approach builds much better teams than falling into Gatekeeper mode. Once the candidate has passed some sort of initial screen, either with a recruiter, the hiring manager, or both, most “primary” interviews are conducted with two to three employees interviewing the candidate—often the hiring manager and an individual contributor. (Of course, there are innumerable ways you can structure this, but that structure is what I’ve seen to be the most common.) Interviewers usually start with one of two postures when interviewing: the Gatekeeper or the Matchmaker : The former, the Gatekeeper , I would say is more common overall and certainly more common among individual contributors and people earlier in their career. It’s also a big driver of why a lot of interview processes include some sort of coding “test” meant to expose the fraudulent scammers pretending to be “real” programmers. All of that dates back to the early 2000s and the post-dotcom crash. Pre-crash, anyone with a pulse who could string together some HTML could get a “software developer” job, so there were a lot of people with limited experience and skills on the job market. Nowadays, aside from outright fraudsters (which are rare) I haven’t observed many wholly unqualified people getting past the résumé review or initial screen. If you let Gatekeepers design your interview process, you’ll often get something that I refer to as “programmer Jeopardy! .” The candidate is peppered with what amount to trivia questions: …and so on. For most jobs where you’re building commercial software by gluing frameworks and APIs together, having vague (or even no) knowledge of those concepts is going to be plenty. Most devs can go a long time using Java or C# before getting into some sort of jam where learning intimate details of the garbage collector’s operation gets them out of it. (This wasn’t always true, but things have improved.) Of course, if the job you’re hiring for is some sort of specialist role around your databases, queuing systems, or infrastructure in general, you absolutely should probe for specialist knowledge of those things. But if the job is “full stack web developer,” where they’re mostly going to be writing business logic and user interface code, they may have plenty of experience and be very good at those things without ever having needed to learn about consensus algorithms and the like. Then, of course, there’s the much-discussed “coding challenge,” the worst versions of which involve springing a problem on someone, giving them a generous four or five minutes to read it, then expecting them to code a solution with a countdown timer running and multiple people watching them. Not everyone can put their best foot forward in those conditions, and after you’ve run the same exercise more than a few times with candidates, it’s easy to forget what the “first-look” experience is like for candidates. Maybe I’ll write a full post about it someday, but it’s my firm conviction that these types of tests have a false-negative rate so high that they’re counterproductive. Gatekeeper types often over-rotate on the fear of “fake” programmers getting hired and use these trivia-type questions and high-pressure exercises to disqualify people who would be perfectly capable of doing the job you need them to do and perfectly capable of learning any of that other stuff quickly, on an as-needed basis. If your interview process feels a bit like an elimination game show, you can probably do better. You, as a manager, are judged both on the quality of your hires and your ability to fill open roles. When the business budgets for a role to be filled, they do so because they expect a business outcome from hiring that person. Individual contributors are not generally rewarded or punished for hiring decisions, so their incentive is to avoid bringing in people who make extra work for them. Hiring an underskilled person onto the team is a good way to drag down productivity rather than improve it, as everyone has to spend some of their time carrying that person. Additionally, the absence of any kind of licensing or credentialing structure✳️ in programming creates a vacuum that the elimination game show tries to fill. In medicine, law, aviation, or the trades, there’s an external gatekeeper that ensures a baseline level of competence before anyone can even apply for a job. In software, there’s no equivalent, so it makes sense that some interviewers take a “prove to me you can do this job” approach out of the gate. But there’s a better way. “Matchmaking” in the romantic sense tries to pair up people with mutual compatibilities in the hopes that their relationship will also be mutually beneficial to both parties—a real “whole is greater than the sum of its parts” scenario. This should also be true of hiring. You have a need for some skills that will elevate and augment your team; candidates have a desire to do work that means something to them with people they like being around (and yes, money to pay the bills, of course). When people date each other, they’re usually not looking to reject someone based on a box-checking exercise. Obviously, some don’t make it past the initial screen for various reasons, but if you’re going on a second date and looking for love, you’re probably doing it because you want it to work out. Same goes for hiring. If you take the optimistic route, you can let go of some of the pass/fail and one-size-fits-all approaches to candidate evaluation and spend more time trying to find a love match. For all but the most junior roles, I’m confident you can get a strong handle on a candidate’s technical skills by exploring their work history in depth. I’m a big fan of “behavioural” interviewing, where you ask about specific things the candidate has done. I start with a broad opening question and then use ad hoc follow-ups in as conversational a manner as I can muster. I want to have a discussion, not an interrogation. Start with questions like: If you practice, or watch someone who is good at this type of interview, you can easily fill a 45–60 minute interview slot with a couple of those top-level questions and some ad hoc follow-ups based on their answers. Of the three examples I gave, two are a good starting place for assessing technical skills. Most developers will give you a software project as the work they’re the most proud of (if they say “I raised hamsters as a kid,” feel free to ask them to limit it to the realm of software development and try again). This is your opportunity to dig in on the technical details: Questions like that will give you a much stronger signal on their technical skills and, importantly, experience. You should be able to easily tell how much the candidate was a driver vs. a passenger on the project, whether or not they thought about the bigger picture, and how deep or shallow their knowledge was. And, of course, you can keep asking follow-up questions to the follow-up questions until you’ve got a good sense. Interviewing and taking a job does have a lot of parallels to dating and getting married. There are emotional and financial implications for both parties if it doesn’t work out, there’s always a degree of risk involved, and there’s sometimes a degree of asymmetry between the parties. In the job market, the asymmetry is nearly always in favour of the employer. They hold most of the cards and can dictate the terms of the process completely. You have a choice as a leader how much you want to wield that power. My advice is to wield it sparingly—try to give candidates the kind of experience where even if you don’t hire them, they had a good enough time in the process that they’d still recommend you to a friend. Taking an interest in their experience, understanding what motivates them, and fitting candidates to the role that maximizes their existing skills, challenges them in the right ways, and takes maximum advantage of their intrinsic motivations will produce much better results than making them run through the gauntlet like a contestant on Survivor . " Old Royal Naval College, Greenwich - King William Court and Queen Mary Court - gate " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Questions? Topic Suggestions? Get in touch ! I don’t want to hire this person unless they prove themselves worthy . It’s my job to keep the bad and fake programmers out. ( Gatekeeper ) I want to hire this person unless they disqualify themselves somehow. It’s my job to find a good match between our needs and the candidate’s skills and interests. ( Matchmaker ) What’s a deadlock? How do you resolve it? Oh, you know Java? Explain how the garbage collector works! What’s the CAP theorem? What’s the work you’ve done that you’re the most proud of?✳️ What’s the hardest technical problem you’ve encountered? What was the resolution? Tell me about your favourite team member to work with. Least favourite? What language or frameworks did they use? Did they choose, or did someone else? How was it deployed? What sort of testing strategy did they use? What databases were involved? How did their stuff fit into the architecture of the broader system?

Business

Career

HTML

Java

2 views

Chaotic-Good Engineering Management 2 months ago

A Fire is Not an Emergency for the Fire Department

Yesterday (Oct 20, 2025), AWS had a major incident . A lot of companies had a really bad day. I’ve been chatting with friends at impacted companies and a few had similar stories. Their teams were calm while their execs were not. There was a lot of “DO SOMETHING!” or “WHY AREN’T YOU FREAKING OUT?!” Incidents happen, but panic doesn’t do anything to solve them. Fires aren’t emergencies for the fire department and incidents shouldn’t be for engineering responders either. In this case, very little was likely in their control. For all but the most trivial of systems, you’re not going to migrate off of AWS in the middle of an incident and expect a good outcome. But the principle stands: the worst thing you can do is freak out, especially when the incident is self-inflicted . Calm leadership is going to be better than panic every time. Imagine you lit your stove on fire and called the fire department to put it out. Which of these two responses would you prefer after the fire chief hops out of the truck: I’m sure the first response would have you questioning who hired this person to be the fire chief and would not give you a lot of confidence that your whole house wasn’t going to burn down. Similarly, the second response might let you be able to exhale and leave things to the professionals. Like fires, not all incidents are created equally. A pan on fire on the stove is much more routine than a large wildfire threatening a major city. I would argue the need for calm leadership gets MORE important the bigger the impact of the incident. A small incident might not attract much attention, while a big incident certainly will. In yesterday’s AWS incident it would most definitely not be reassuring to see whoever was in charge of the response making an announcement about how bad it was and how they were on the verge of a panic attack. Same goes for wildfires, and same goes for your incidents. When you’re in a leadership position during an incident, it serves no useful purpose to yell and scream or jump up and down or tear your hair out. Your responders want normal operations to be restored as much as you do, especially if it’s off-hours and they’ve had their free time or sleep interrupted. Shouting isn’t going to make them solve the problems any faster. Now and then someone else will react negatively to someone in the Incident Commander or the engineering management seat behaving in a calm and collected way during an incident. Usually it’s someone who is in the position to get yelled at by a customer, like an account rep or a non-technical executive. They’re under a lot of pressure because incidents can cost your customers money, can result in churn or lost sales opportunities, and all of those things can affect the bottom line or career prospects of people who have no control over the actual technical issue. So they’re stressed about a lot of factors that are not the incident per se, but might be downstream consequences of it, much in the same way you would be after setting your kitchen on fire. Sometimes this comes out as anger or frustration that you, the technology person responsible for this, are not similarly stressed. This is understandable. Keep in mind too that to them, they are utterly dependent on you and the tech team in this moment. They don’t know how to fix whatever’s broken, but they’re still going to hear about it from customers and prospects who might not be nice about it. This can cause a feeling of powerlessness that results in expressions of frustration. This is not a reason to change your demeanour. The best thing you can do is keep the updates flowing, even if there’s no new news. People find reassurance in knowing the team is still working the problem. Keep the stakeholder updates jargon-free and high-level. Show empathy that you understand the customer impact and the potential consequences of that, but reassure them that the focus right now is restoring normal operations and, if appropriate, you’ll be available to talk through what happened after the dust settles. A negative reaction to the calm approach is usually born out of a fear that ‘nothing is happening’. The truth is often a big part of incident response is waiting. Without the technical context to understand what’s really going on, some people just want to see something happening to reassure themselves. You can mitigate this with updates: Use your tools to broadcast this as often as you can and you’ll tamp down on a lot of that anxiety. Big cloud provider incidents are fairly rare. Self-inflicted incidents are not. The responders trying to fix a self-inflicted problem are very likely to be the ones who took an action that caused or contributed to the outage. Even in a healthy, blameless culture this is going to cause some stress and shame, and in a blameful, scapegoating culture it’s going to be incredibly difficult. Having their manager, manager’s manager, or department head freaking out is only going to compound the problem. Highly stressed people can’t think straight and make mistakes. Set a tone of calm, methodical response and you’ll get out from under the problem a lot faster. This might be culturally hard to achieve if you have very hands-on execs who want to be very involved, even when their involvement is counterproductive (hint: they don’t see it that way even if you do). But to the extent you can, have one communication channel for the responders and a separate one for updates for stakeholders. Mixing the two just distracts the responders and lengthens the time-to-resolution. The key to this being successful is not neglecting the updates. I recommend using a timer and giving an update every 10 minutes whether there’s news or not. People will appreciate it and your responders can focus on responding. I don’t want this to be a “how to run a good incident” post. There are plenty of comprehensive materials out there that do a better job than I would here. I do want to focus on how to be a good leader in the context of incidents. Like many things, it comes down to culture and practice. Firefighters train to stay calm when responding to fires. They’re methodical and follow well-known good practices. To make this as easy as possible on yourself: If you can, work “fires aren’t emergencies for the fire department” into your company lexicon and remind people that in the context of incidents, the engineering team are the fire department. " Wildfire " by USFWS/Southeast is marked with Public Domain Mark 1.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Questions? Get in touch ! [Screaming] “Oh dear! A fire! Somebody get the hose! Agggh!” “We see the problem, we know how to deal with it, we’ll handle it and give you an update as soon as we can.” Here’s what we’ve done. Here’s what we’re doing (or waiting on the results of). Here’s what we’ll likely do next. Set a good example when you’re in the incident commander seat (whether in an official or ad-hoc capacity). Run practice incidents where the stakes are low. Build up runbooks and checklists to make response activities automatic. Define your response process and roles. Define your communication channels so everyone knows where to find updates.

DevOps

Cloud

0 views

Chaotic-Good Engineering Management 3 months ago

Co-Pilots, Not Competitors: PM/EM Alignment Done Right

In commercial aviation, most people know there's a "Pilot" and a "Co-Pilot" up front (also known as the "Captain" and "First Officer"). The ranks of Captain and First Officer denote seniority but both pilots, in practice, take turns filling one of two roles: "pilot flying" and "pilot monitoring". The pilot flying is at the controls actually operating the plane. The pilot monitoring is usually the one talking to Air Traffic Control, running checklists, and watching over the various aircraft systems to make sure they're healthy throughout the flight. Obviously, both pilots share the goal of getting the passengers to the destination safely and on time while managing the costs of fuel etc. secondarily. They have the same engines, fuel tank and aircraft systems and no way to alter that once airborne. They succeed or fail together. It would be a hilariously bad idea to give one pilot the goal of getting to the destination as quickly as possible while giving the other pilot the goal of limiting fuel use as much as possible and making sure the wings stay on. Or, to take an even stupider example, give one pilot the goal of landing in Los Angeles and the other the goal of landing in San Francisco. Obviously that wouldn't work at because those goals are in opposition to each other. You can get there fast if you don't care about fuel or the long-term health of the aircraft, or you can optimize for fuel use and aircraft stress if you don't need to worry about the travel time as much. It sounds ridiculous! And yet, this is how a lot of organizations structure EM and PM goals. An EM and PM have to use the same pool of available capacity to achieve their objectives. They are assigned a team of developers and what that team of developers is able to do in a given period is a zero-sum-game. In a modern structure, following the kinds of practices that lead to good DORA metrics , their ownership is probably going to be a mix of building new features and care and feeding of what they've shipped before. Like a plane's finite fuel tank, the capacity of the team is fixed and exhaustible. Boiled down to its essence, the job of leaders is managing that capacity to produce the best outcomes for the business. Problems arise, however, in determining what those outcomes are because they're going to be filtered through the lens of each individual's incentive structure. Once fuel is spent, it’s spent. Add more destinations without adjusting the plan, and you’re guaranteeing failure. It would not be controversial to say that a typical Product Manager is accountable and incentivized to define and deliver new features that increase user adoption and drive revenue growth. Nor would it be controversial to say that a typical Engineering Manager is accountable and incentivized to make sure that the code that is shipped is relatively error-free, doesn't crash, treats customer data appropriately, scales to meet demand, etc etc. You know the drill. I think this is fine and reflects the reality that EMs and PMs aren't like pilots in that their roles are not fully interchangeable and there is some specialization in those areas they're accountable for. But if you just stop there, you have a problem. You've created the scenario where one pilot is trying to get to the destination as quickly as possible and one is trying to save fuel and keep the airplane from falling apart. You need to go a step further and coalesce all of those things into shared goals for the team. An interview question I like to ask both prospective EM and PM candidates is "How do you balance the need to ship new stuff with the need to take care of the stuff you've already shipped?" The most common answer is something like "We reserve X% of our story points for 'Engineering Work' and the rest is for 'Product Work'. This way of doing things is a coping strategy disguised as a solution. It's the wrong framing entirely because everything is product work . Performance is a feature, reliability is a feature, scalability is a feature. A product that doesn't work isn't a product, and so a product manager needs to care about that sort of thing too. Pretending there's a clean line between "Product Work" and "Engineering Work" is how teams can quietly drift off-course. On the flip-side, I'm not letting engineering managers off the hook either. New features and product improvements are usually key to maintaining business growth and keeping existing customers happy. All the scalability and performance in the world don't matter if you don't have users. Over-rotating on trying to get to "five nines" of uptime when you're not making any revenue is a waste of your precious capacity that could be spent on things that grow the business and that growth will bring more opportunities for solving interesting technical challenges and more personal growth opportunities for everyone who is there. EMs shouldn't ignore the health of the business any more than PMs should ignore the health of the systems their teams own. Different hats are fine, different destinations are not. If you use OKRs or some other cascading system of goals where the C-Suite sets some broad goals for the company and those cascade into ever more specific versions as you walk down the org chart hierarchy, what happens when you get to the team level? Do the EM and PM have separate OKRs they're trying to achieve? Put them side by-side and ask yourself the following questions: If you answer 'no' to any of those, congratulations, you've just planned your team's failure to achieve their goals. To put it another way your planning is not finished yet! You need to keep going and narrow down the objectives to something that both leaders can get behind and do their best to commit to. You've already got your goals side-by-side. After you've identified the ones that are in conflict, have a conversation about business value. Don't bring in the whole team yet, they're engineers and the PM might feel outnumbered. Just have the conversation between the two of you and see if you can come to some kind of consensus about which of the items in conflict are more valuable. Prioritize that one and deprioritize the one that it's in oppostition to. Use "what's best for the business" as your tie-breaker. That's not always going to be perfectly quantifiable so there's a lot of subjectivity and bias that's going to enter into it. This is where alignment up the org chart is very beneficial. These concepts don't just apply at the team level. If you find you can't agree and aren't making progress, try to bring in a neutral third party to facilitate the discussion. If you have scrum-masters, agile coaches, program managers or the like they can often fill this role nicely. You could also consult a partner from Customer Support who will have a strong incentive to advocate for the customer above all to add more context to the discussion. Don't outsource the decision though, both the EM and PM need to ultimately agree (even reluctantly) on what's right for the team. If your org, like many, has parallel product and engineering hierarchies, alignment around goals at EACH level is critical. The VP Eng and VP Product should aim to have shared goals for the entire team. Same thing at the Director level, and every other level. That way if each team succeeds, each leader up the org chart succeeds, and ultimately the customer and the business are the beneficiaries. If you don't have that, doing it at the team level alone is going to approach impossible and you should send this post to your bosses. 😉 But in seriousness, you should spend some energy trying to get alignment up and down the org chart. If your VP Eng and VP Product don’t agree on direction, your team is flying two different flight plans. No amount of team-level cleverness can fully fix that. Your job is to surface it, not absorb it. Things you can do: You likely set goals on a schedule. Every n months you're expected to produce a new set and report on the outcomes from the last set. That's all well and good but I think most people who've done it a few times recognize that the act of doing it is more valuable than the artifacts the exercise produces are. (Often described as "Plans are useless, planning is critical.") The world changes constantly, the knowledge you and your team have about what you're doing increases constantly, all of these things can impact your EM/PM alignment so it's also important to stay close, keep the lines of communication very open, and make adjustments whenever needed. The greatest predictor of a team's success, health, and happiness is the quality of the relationship between the EM and PM. Keeping that relationship healthy and fixing it if it starts to break down will keep the team on the right flight path with plenty of spare fuel for surprise diversions if they're needed. Regardless of what framework you're using, or even if you're not using one at all, the team should have one set of goals they're working toward and the EM and PM should succeed or fail together on achieving those goals. Shared goals don’t guarantee a smooth landing. Misaligned ones guarantee a crash. " A380 Cockpit " by Naddsy is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! If we had infinite capacity, are these all achievable? Or are some mutually exclusive? Example: Launch a new feature with expensive storage requirements (PM) vs. Cut infra spend in half (EM) Does achieving any of these objectives make another one harder? Example: Increase the number of shipped experiments and MVPs (PM) vs. Cut the inbound rate of customer support tickets escalated to the team. (EM) Do we actually have the capacity to do all of this? Example: Ship a major feature in time for the annual customer conference (PM) vs. Upgrade our key framework which is 2 major versions behind and is going out of support. (EM) If your original set of goals conflicted, the goals at the next level up probably do to. Call that out and ask for it to be reconciled. Escalate when structural problems get in the way of the team achieving thier goals. Only the most dysfunctional organizations would create systems that guarantee failure on purpose .

DevOps

Business

0 views

Chaotic-Good Engineering Management 3 months ago

Am I solving the problem or just coping with it?

Solving problems and putting in processes that eliminate them is a core part of the job of a manager. Knowing ahead of time whether or not your solution is going to work can be tricky, and time pressures and the urgency that pervades startups can make quick solutions seem really attractive. However, those are often a trap that papers over the real issue enough to look like it’s solved, only to have it come roaring back worse later. If you’ve been around the world of incident response and the subsequent activity of retrospectives/post-mortems, you’ve probably heard of the “ 5 whys ” method of root-cause analysis. In the world of complex systems, attributing failures to a single root cause has fallen out of favour and 5 whys has gone with it. Nevertheless, there is value in digging deeper than what initially presents itself as the challenge to uncover deeper, more systemic issues that might be contributing factors. Rather than “why,” I like to ask myself if I’m really solving this dysfunction or just coping with it. There’s a lot of temptation to cope. Coping strategies have some features that make them attractive. Imagine your problem is that your team is “always behind schedule.” That’s a problem I’m sure most people are familiar with. There are plenty of easy-to-grab band-aid strategies for that pattern that are very attractive: All of these can make you feel like you’re solving the problem but: This dichotomy is probably more familiar and easy to recognize in the technical realm. If your code has a memory leak you can either proactively restart it every now and then (coping) or you can find the leak and fix it (actual solution). The reasons you should prefer the actual solution are the same in both scenarios. The 'strategy' of restarting the service will work for a while, but you know in the back of your mind that eventually it’s going to manage to crash itself between restarts. Same goes for your people processes. At one of my jobs we had a fairly chaotic release process. This was a while ago and “ Continuous Delivery r” was still a fairly newfangled idea that wasn’t widely adopted, so we did what most companies did and released on a schedule. It was a SaaS product, so releasing was really just pushing a batch of changes to production. The changes had ostensibly been tested (we still had human QA people then) and everything looked good, but nevertheless nearly every release had some catastrophic issues that needed immediate hotfixes. We didn’t have set working hours, except on release day. We expected all of the developers to be in the office when the code went out in case they were needed for fixes. We released around 9am and firefighting often continued throughout the morning and into the afternoon. When that happened, the office manager would usually order some pizzas so people could have something to eat while fixing prod. Eventually this happened so often that it just became routine. Release day meant lunch was proactively ordered in for the dev team, then chaos ensued and was fixed. Of course, we did eventually tackle the real pain points by increasing the frequency of releases (when something hurts, do it more, it’ll give you incentives to fix the real problems), adding more automated testing, and generally just getting better at it all. The lunches continued well past the point where the majority of the people on the team remembered or ever knew why they were there. Some other practical questions you can ask yourself to try to recognize stopgap strategies that aren’t addressing the real problems. As mentioned before, anything that stops working when you stop doing it is likely not a good solution. If you’ve got a system where someone has to push the “don’t crash production” button every day or else production crashes, eventually someone’s going to fail to push the button and production is going to crash. The button is not a solution; it’s a coping strategy. If you’re having a lot of off-hours incidents and your incident response team is complaining about the burden, you could train more people in incident response to reduce their burden (and maybe you should), but that’s not solving the problem, which is really that you’re having too many incidents. A true solution would be understanding why and addressing that. Code not sufficiently tested? Production infra not sized appropriately? Who knows, but the answer is not simply throwing more bodies at it. That will reduce the acute problem of burnout in your responders (symptom) but not the chronic issues that are causing the incidents in the first place. If you’re not changing the underlying conditions you’re probably not fixing the problem. If all your meetings are running over time you could appoint someone to act as the timing police and give warnings and scold people who talk too long, but that’s just adding work for someone and not touching the real reasons, which could be unclear agendas, poor facilitation, or the fact that the VP can’t manage to stay on topic for any amount of time. The solution is to fix your meeting culture. Require agendas, limit the number of topics, train people on facilitation, give some candid feedback to the VP. These solutions actually remove the failure mode that causes meetings to run long. The timekeeper “strategy” doesn’t do anything about that. This is similar to the first question, but it’s something you can think of like a metric, even though it’s probably more a heuristic than something directly measurable. Treating your releases like pre-scheduled incidents like we did would be a good warning sign that you’re coping, but if each one is less dramatic than the last and wraps up sooner, those are signs that you’re on the right track. Getting to the point where releases are unremarkable and you don’t need the whole team on standby would be a good indicator that you’ve got solid solutions in place. When production is down, anything you can do to get it back up again is the right move. If you’re bleeding and you have to make a tourniquet with your own shirt, you do it. There are lots of scenarios where a short-term fix is the best thing for the situation. But keep in mind, this is effectively debt . Like technical debt, it should be repaid — ideally sooner rather than later. A good incident process will recover production by any means necessary. Once, during an incident related to a global DNS provider’s outage, we were literally uploading /etc/hosts files to our infrastructure to keep the part of our product that relied on some 3rd parties working. Needless to say, once the DNS incident was resolved, we went back and cleaned those up. You can do the same with processes. When there’s immediate pain that can be relieved with a quick patch-up, you should do it, and use the fact that the pain has abated to fix the problem permanently. You also can’t fix them all. You might not have enough influence or political capital to make the kinds of changes that real solutions require. In that case, your job is to advocate for the right things to happen, point out the ways that coping is hurting the organization, and exert influence over the parts that are in your control. I’ve tried to give you some strategies for recognizing when you’re just coping with a situation rather than fixing it. The differences can be subtle, but once you start to spot them it gets easier and you’ll find things start to get better in more permanent, sustainable ways. " Temporary Fix " by reader of the pack is licensed under CC BY-ND 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! They can be contained within your team ; you don’t need to influence people outside your sphere. (We’ll just work through the weekend and get the project back on track.) They’re highly visible. (Look at Jason’s team putting in the extra effort! What a good manager!) They can move vanity metrics in the short term, which is often something rewarded. (We increased our velocity and did 20% more story points!) They come with a cost . (The team will burn out and quit if they have to work through too many weekends.) If you stop doing them, the situation comes right back . (The next project wasn’t different and we’re back behind schedule again.) They only address the symptoms , not the causes. (Improving your culture around estimations and deadlines would be a better fix.)

DevOps

Business

Career

0 views

Chaotic-Good Engineering Management 3 months ago

How to Be a Leader When the Vibes Are Off

It feels different in tech right now. We’re coming off a long era where optimism carried the industry. Something has curdled. AI hype, return-to-office mandates, and continued layoffs have shifted the mood. Managers are quicker to fire, existential dread has replaced the confidence that a tight job market for developers provided for decades. The vibes are for sure off. (What follows are generalizations. If your company is escaping some or all of these, I applaud you. I’m sure there are exceptions. ) AI has injected some destabilization . “I don’t need junior devs when I can just pay $20/month for Cursor” has an effect on everyone even if this turns out to be silly down the road. I see lots of people worried that the aim of all of this is to ultimately have a robot do their entire job. Whether or not this is possible doesn’t mean people aren’t going to try. And it’s the trying that raises people’s anxiety. On top of that, we’ve also got “ AI Workslop ” to contend with as well, which is making work harder for the diligent among us. Return to Office feels like trust has been broken. Teams that continued to work well (or in some cases, better) after everyone in the industry went remote are now being told to come back to desks in offices. I’ve even heard tales of this happening despite there not being enough office space for everyone , which seems very silly. Also, for the first time in my nearly 30-year career, I’ve even heard of people being told they need to be “at their desks at 9am” and “expected to stay until 5pm at a minimum. ” Even before COVID-19 and the mass move to remote work, most companies were flexible on start and stop times. I almost never heard of set hours for software developers until recently. Rules like that scream “we don’t trust you unless we can see you,” even if that’s not really the reason for the mandates

Career

2 views

Chaotic-Good Engineering Management 4 months ago

Stop Hunting for Heroes and Villains: Start Thinking in Systems

I was listening to a podcast the other day (I'm not going to link to the specific episode because it's about current events and there are plenty of places to discuss those. I don't want this to become another one but I'll credit the creator, Andrew Heaton ). It broke people down into two categories; "Agentic" thinkers and "Systems" thinkers. The thesis was basically that agentic thinkers blame or credit the performance and actions of individual actors; eg. "Goods are expensive because merchants are greedy" and systems thinkers take a broader view of all the inputs and incentives that lead to outcomes; "Goods are expensive because of supply and demand issues, interest rates, taxes, and the general state of the economy". Of course, any classification with two categories is going to be a vast oversimplification but it got me thinking about how it might apply to managing software teams and some of the underlying causes of scapegoating and blame culture as were discussed in last week's guest post from Lin Byrne. Managers are often tasked with identifying the 'source' of problems on their teams. This is even more prevalent now that there exists some standardization of metrics that we can use to measure our teams ( DORA , SPACE , DevEx ) and available tools that monitor ticket systems and source control to try to quantify some aspects of the development process. These metrics and tools have increased the information available and thus increased the ability for leadership to draw conclusions about team performance. Importantly (and unfortunately) there's no guarantee those conclusions will be sound and the decisions that they lead to will be good decisions. I don't want to make the whole post about the dangers of over-rotating on metrics. That's a pretty well covered topic elsewhere

Business

0 views

Chaotic-Good Engineering Management 4 months ago

Scapegoating Only Worked in Ancient Times

Scapegoating comes from an ancient ritual where a community would place its sins onto a goat and send it away. The act served as a form of relief, but it offered no real reflection or learning. Problems weren’t examined, and behaviors weren’t adjusted. It was a way to move on without improvement. Applied to modern teams, the same instinct to “blame and move forward” keeps organizations from breaking out of recurring mistakes. In today’s work culture, mistakes are often treated as something to be hidden or pinned on someone else, rather than acknowledged as a natural part of building complex systems. But mistakes will happen, the question is how an organization responds to them. When teams discuss what went wrong, the goal shouldn’t stop at identifying the cause. The real value comes from asking what can be done differently next time: introducing the right process, improving communication, or adding tooling that reduces the chance of a repeat. Without that step, conversations about failure become empty postmortems instead of opportunities for meaningful change. Accountability means more than admitting something went wrong. It’s about owning the responsibility to make improvements. In a strong organization, that responsibility is clear at every level. The engineering team is accountable for the quality of the code they ship and the reliability of the systems they run. The product manager is accountable for shaping features that deliver real value. The engineering manager is accountable for the processes that keep the team effective and prevent repeat failures. And at the highest level, the director is accountable for ensuring the product holds together as a whole. When any one of these layers avoids accountability, cracks appear: bugs keep resurfacing, features miss the mark, teams stumble, and the product suffers. But when accountability is embraced, issues are confronted directly, and the path to improvement becomes obvious

Culture

0 views

Chaotic-Good Engineering Management 4 months ago

‘Labs’ teams, Skunkworks, and Special Projects: Beware

In a previous post , I talked about balancing ‘creating work’ and ‘destroying work’ such that the backlog does not become a huge mental burden on everyone to the point that it gives the impression that “the dev team is slow” or “we’re not making enough progress”. These are common themes at most of the places I’ve worked. One typical reaction to that is to pick a particular project and try to run it differently than “business as usual”. The projects chosen are often something other than just an incremental feature for the existing project. It might be an idea for an entirely new product or business unit or some grand re-imagining of the existing product. Often a founder is nostalgic for the days when they were coding entire features in a matter of days. Unencumbered by meetings, customers, existing code, internal stakeholders, compliance concerns, and everything else that comes along for the ride, they truly could crank out product in a way they’ve never seen all these expensive devs do since. To try to recapture some of that velocity that was ‘lost’ and get back to ‘the scrappy start-up days’, someone will inevitably propose cordoning off a special squad that can behave like they used to ‘back in the basement/garage’. They go by many names: Skunkworks , Tiger-Team, Startup-within-a-Startup, “Project Mayhem” and the like. I’m here to rain on the parade and tell you why this is almost always going to go poorly. If you’re an engineering manager or product manager asked to participate you should almost always push back against this type of idea. There is one exception though, which I will get to. The typical setup for a team like this:

Agile

0 views

Chaotic-Good Engineering Management 4 months ago

Your Backlog is not a Hoarder House

A theme I return to a lot in my head is the difference in friction between "creating" and "destroying" work to be done. It's much much easier to dream up another thing to build, a new feature, heck even a new product than it is to actually make the thing. As such, most backlogs tend to be out of balance and hopelessly overloaded. This goes for both net-new development work and bugs. How bad this is can range from subtle (there's just a big list of ideas we probably won't get to) to insidious (actual planning and design have gone into a project that you have no free capacity to assign to it. Or worse, it's been promised to some important stakeholder before you figured out who was actually going to do the work). The cheaper and in some ways braver way to destroy some of that work is to admit you're not going to do it. Not just to yourself, but to everyone in the organization. Over the years I've participated in approximately 1 billion meetings with the goal of deciding what the dev team should work on in over the next month/quarter/year. This usually starts with the making of a list. The list is all the things the team could possibly do according to customer support, advisory boards, the execs or anyone else in the company with an idea. Then you try to rank the list with your favourite scheme:

Career

1 views

Chaotic-Good Engineering Management 4 months ago

What's this all about then?

I've always had a background thread running in my head about ideas that would make good blog post topics but never acted on it. I wrote a bit on my personal site back when I was doing open-source development work which ended with a whimper of a post about "committing to focus". I guess I committed pretty hard since that was 12 years ago. I decided to start another site specifically on topics related to Software Engineering Management to share my thoughts and experience on various topics. This is that. In keeping with the times, it's available online or you can subscribe and get it in your email. It also publishes to RSS if that's more your jam. I've spent my career in growing companies. I've watched software organizations grow from a single, small, easily coordinated team led by a single founder to a multiple product multiple department Organization with VPs and Directors and matrixed org charts. When companies are growing, it almost becomes a cliché to say "what got us here won't get us there". In other words, it's important to throw out things that stop working or aren't going to work. Recognizing what those are is a bit of an art but it gets easier with experience.

Career

Writing

Open Source

0 views

Chaotic-Good Engineering Management 5 months ago

Why AI Probably Won't Help Your Team Ship More Product

So… how’d we do? You still arrive at 9:00. You optimized the longest step but ignored the real constraint : the train only comes at :05 and :35. Arriving earlier doesn’t help. To get ahead, you’d need to adjust other parts of the routine to catch the 8:05 train instead. Every product development team I've been a part of has one thing in common: more product ideas than capacity to build it. Leaders often think: if we could just build faster we can ship more of massive backlog. AI coding tools now have Product Managers, Engineering Managers, and Execs salivating at the promise of a step-change in developer productivity. The assumption is that faster coding ➡️ faster backlog burn-down ➡️ more product shipped. But chances are that's just the same as the breakfast robot. You're making one step faster out of many, but you may not be addressing the true constraints of the system. I think it's fair to say that at this point we don't really know if these tools are going to have a positive impact on productivity or not. Early signs are saying 'no' but I think the jury is still out on what the full impact will be and this will be true for a while still. Let's be generous and optimistic and assume that some gains will be available or will become available in the near ish term as the tools get better and we get better at understanding what they're good at and how to use them.

Business

0 views