Latest Posts (13 found)
Jampa.dev 1 months ago

The rise of one-pizza engineering teams

It is undeniable that using AI tools like Claude Code lets us write code faster now. But the question is, how does that impact everything else? In most teams, coding - reading, writing, and debugging code - used to be the part that took engineers the most time, but that is no longer the bottleneck . The Theory of Constraints  states that every system has a bottleneck, since without one, it would operate infinitely fast, which is impossible. Let's see  the new bottlenecks,  their effect on the size and roles of engineering teams, and why Amazon's two-pizza-team rule  - “teams should be small enough to be fed by two large pizzas, ideally comprising 5-8 people”  - is being phased out.  Designers and PMs will stay, but won’t be part of a specific team. This chart shows we still need designers. Product and design are the new bottlenecks Currently, LLMs are less useful for product managers and designers than for engineers. On the Designer side, they struggle to create great prototypes. You notice how every AI product homepage looks the same. I believe this is a limitation of LLMs: they tend to generate ideas near the middle of their training data’s bell curve, which prevents bad design but also limits truly innovative concepts. On the Product Manager (PM) side, LLMs can gather data and insights, but the most time-consuming part for the PM is communicating and talking with clients, which can't be automated as effectively. This creates a new bottleneck: The project output starts to depend on the delivery speed of product specs and wireframes. And it gets worse: generally, you have 4-7 engineers but a single (sometimes shared) PM and Designer in a team, creating an imbalance. AI coding caught every PM by surprise - “We're gonna need a bigger roadmap”. Sharing the Legos Some companies recognized this imbalance and  asked, “What if the engineers were involved in Product and Design instead of just receiving the product specs?”  Then they started hiring  Product Engineers.  This is not a new concept; the role has existed for over 16 years. But I am highlighting this because they are now more relevant than ever. There is already a rise in hiring “Product Engineers”. Source: Hacker News Big Query table Okay, so what are product engineers? They are software engineers empowered to handle some responsibilities of PMs and Designers, balancing the roles. Product engineers assume traditional PM roles, including owning the roadmap, engaging with users, analyzing data, framing opportunities, and determining what to build.  However, they do not replace a PM . The PM still provides context but is no longer the main driver of implementation. On the designer side, they also assemble the building blocks of a design system. The designer still creates those blocks and collaborates on the UX flow, but they are no longer responsible for producing “pixel-perfect prototypes” that engineers must follow. Not everyone should or will be a product engineer. But the traditional software engineer who is "A jack of all trades, master of none" will not surpass an engineer who is a “master of one”. AI is good at producing code of acceptable quality, but it is rarely excellent. You can't simply prompt and merge it into a mature codebase without in-depth human review. There are many problems with AI coding: AI often approaches coding carelessly,  neglecting second-order effects : it modifies or removes essential code without considering the consequences. For an LLM to fix bugs, they need to make sense (and for the ones that pierce through the abstraction layers, they rarely do), so you can't just have a team entirely using AI without a solid understanding of the tools they use. AI will replicate destructive patterns in your codebase, causing a decline in code quality over time because no one will detect and fix them. So,  we need specialists  to manage the platform code. It’s not that they won’t use AI, but they will act as gatekeepers during reviews and prevent bad patterns from being merged into the codebase. We will probably see fewer full-stack engineer openings and more roles for back-end and front-end engineers. This doesn't mean they will do only one or the other, but they will be expected to be an expert in one area. Besides the previously mentioned imbalance, large teams also face two main challenges: communication overhead and the need to divide work so team members can work independently toward a shared goal. For the second problem, it wasn't too bad. We could divide the work into epics that would take an engineer two weeks to finish, and they could work on them independently. But now two weeks feels like a long time. Another issue is that AI performs best when given a wider context for a problem. This, in turn, makes dividing work even more difficult. The ideal team size now appears to be  2-3 engineers per project.  Even with a larger team, you can divide it into smaller groups of 2 engineers for a set period and observe how quickly they progress.  However, avoid assigning large projects to a single person . Working alone for an extended period without anyone to brainstorm, review code, or collaborate can be challenging. From my experience, an individual contributor's frustration tends to increase over time when they are on their own. Also, developers need to maintain and improve their communication skills, which are essential for growth. I get many DMs on Reddit and LinkedIn, pitching AI manager tools that track “productivity” and use AI to “evaluate” an engineer’s performance. Most fail at the concept stage. Manager performance tools aren’t new, but they all fail by trying to outsmart the manager. AI will never have even 40% of the context a manager needs. It can only evaluate “quantitative” metrics. But managers who don’t code will be rare : With smaller teams, their responsibility with the People pillar decreases, freeing up time for the “Programming” pillar. I already explained that the “well-defined engineering manager role” is a myth , and engineering management always involves adapting to a team, so that will always remain the same. AI makes it easier for engineering managers to participate in coding. They already know how to break down a larger problem into smaller, reviewable goals, just like any senior engineer would. So you prompt Claude, attend a meeting, and then review the code when you return! But the role of a manager will differ from that of an engineer. They will not be assigned the same tasks  because situations that demand a manager’s attention will always take priority . So it’s crucial to avoid work that could potentially block the team if halted. I believe this is just the first wave of changes. My post assumes that AI won't improve much beyond its current state, which seems like a safe prediction. The progress isn't primarily from better AI models but from how we use them (tools, thinking capacity). We're also seeing significant investment in designer and PM tooling where AI is not a centerpiece but a complement. I wonder how much more they can still improve. There are many unknowns I haven’t addressed because they’re still uncertain, such as QA: How much can AI take over QA, and what will the roles of the QA Engineers be? What else do you think is going to change? If you know, drop a comment. Thanks for reading Jampa.dev! Subscribe for free to receive new posts about AI and Engineering. Designers and PMs will stay, but won’t be part of a specific team. This chart shows we still need designers. Product and design are the new bottlenecks Currently, LLMs are less useful for product managers and designers than for engineers. On the Designer side, they struggle to create great prototypes. You notice how every AI product homepage looks the same. I believe this is a limitation of LLMs: they tend to generate ideas near the middle of their training data’s bell curve, which prevents bad design but also limits truly innovative concepts. On the Product Manager (PM) side, LLMs can gather data and insights, but the most time-consuming part for the PM is communicating and talking with clients, which can't be automated as effectively. This creates a new bottleneck: The project output starts to depend on the delivery speed of product specs and wireframes. And it gets worse: generally, you have 4-7 engineers but a single (sometimes shared) PM and Designer in a team, creating an imbalance. AI coding caught every PM by surprise - “We're gonna need a bigger roadmap”. Sharing the Legos Some companies recognized this imbalance and  asked, “What if the engineers were involved in Product and Design instead of just receiving the product specs?”  Then they started hiring  Product Engineers.  This is not a new concept; the role has existed for over 16 years. But I am highlighting this because they are now more relevant than ever. There is already a rise in hiring “Product Engineers”. Source: Hacker News Big Query table Okay, so what are product engineers? They are software engineers empowered to handle some responsibilities of PMs and Designers, balancing the roles. Product engineers assume traditional PM roles, including owning the roadmap, engaging with users, analyzing data, framing opportunities, and determining what to build.  However, they do not replace a PM . The PM still provides context but is no longer the main driver of implementation. On the designer side, they also assemble the building blocks of a design system. The designer still creates those blocks and collaborates on the UX flow, but they are no longer responsible for producing “pixel-perfect prototypes” that engineers must follow. The rise of specialists Not everyone should or will be a product engineer. But the traditional software engineer who is "A jack of all trades, master of none" will not surpass an engineer who is a “master of one”. AI is good at producing code of acceptable quality, but it is rarely excellent. You can't simply prompt and merge it into a mature codebase without in-depth human review. There are many problems with AI coding: AI often approaches coding carelessly,  neglecting second-order effects : it modifies or removes essential code without considering the consequences. For an LLM to fix bugs, they need to make sense (and for the ones that pierce through the abstraction layers, they rarely do), so you can't just have a team entirely using AI without a solid understanding of the tools they use. AI will replicate destructive patterns in your codebase, causing a decline in code quality over time because no one will detect and fix them.

0 views
Jampa.dev 1 months ago

Lessons learned after 10 years as an engineering manager

It’s been 10 years since my boss told me we needed to start hiring. And since I was responsible for hiring, I should handle onboarding too… And since I knew the roadmap, I could own that… And since I knew the people, I could coach them in their careers. I didn’t know at the time, but he was dooming me to be an engineering manager. Since then, I’ve worked across four companies as a manager and met some amazing people. I will skip the standard advice and lessons on Engineering Management and focus on the non-obvious ones. There is no standardized definition of an Engineering Manager. If you pick two random managers, they can do wildly different things. Even if they are at the same company. In every company I’ve worked at, my job has never been the same. The only constant is that the role is defined by the team's needs, requiring you to balance across four pillars: Product, Process, People, and Programming . Some examples: Large team? Say goodbye to programming. You’ll focus on careers, coordination, and navigating the org to get resources for your team. Small team? You’ll manage scope to match reality, and with less communication overhead, you might actually code. No PM? You own the product entirely: validating features, prioritizing the roadmap, and talking to clients. This dominates your time because shipping features with no user value makes everything else pointless. Reporting to the CEO? You’re now the bridge to sales, operations, and client communications. The key is identifying where your team’s bottleneck lies. Examine your software development lifecycle. You’ll likely shift between pillars as circumstances change, and that’s the point: the role demands flexibility. Interview tip: Don’t ask what a manager is expected to do. Some managers assume their experience is industry standard and will look at you funny. Instead, ask about their daily life and what challenges consume most of their time. A few times in my career as a developer, I wondered, “Who is this feature even for? Who will use it?” No one on my team knew. We were doing it because we were told to . Morale was low. We felt we were working on things that didn’t matter - and we were. Every time, eventually, our disbanded and engineers scattered across other projects. The most common reason companies fail is building things that don’t provide value to users, who then don’t pay. “Oh, but I have a PM for that,” you might say. But having a PM is not enough. Everyone needs to care about the product. Your team isn’t paid to deliver code but to use it to solve problems . Code only has value when it affects the end user. Sometimes a no-code integration beats a custom solution. Sometimes it's better not to do the work at all to avoid maintaining a system. Teams that understand the problem, not just the spec, can pivot when needed rather than clinging to a bad solution just because the code is already written. Every process trades time and attention in exchange for reliability or quality. The problem is when teams stop questioning whether the trade is still worth it. Ceremonies become rituals. Metrics become goals. Nobody remembers why we do the things we do. Process bloat creeps in slowly. An engineer ships broken UI to production. Designers complain, managers panic, and suddenly every PR needs designer approval. The whole team pays tax for a single isolated incident. Good process serves you so you can serve customers. But if you’re not watchful, the process can become the thing. You stop looking at outcomes and just make sure you’re doing the process right. The process is not the thing. It's always worth asking, do we own the process or does the process own us? — Jeff Bezos, 2016 Letter to Shareholders The right process depends on context: team size, experience levels, and deadline pressure. What works for a mature team won’t work for a new one. Keep questioning, keep iterating. If a process isn’t making delivery better , cut it. Your direct reports are the people who interact with you the most. They look to you for leadership and clarity, and trust that you’ll tell them what they need to know. That’s why lying or withholding information that affects them causes irreparable damage. They might not leave right away, but they’ll resent you. I have a friend who still resents a manager for a lie told three years ago. They found another company, but they’re still angry about it. “Trust arrives on foot and leaves by horseback.” - Old Dutch saying I've seen some managers describing the role as “a shield that blocks everything from above”, and I disagree. A good manager is more like a transparent umbrella. They protect the team from unnecessary stress and pressure, but don’t hide reality from them. Telling the team: “Our users aren’t thrilled so far. We need to find ways to better serve them. The project risks cancellation if we don’t.” That’s fair game. They deserve to know. When you do deliver hard news, state it plainly, and focus on what the team will do about it. If you act scared, they’ll be scared too. Your job is to get them thinking about the path forward. I see managers walk into exec meetings with “we’re not sure what to do - maybe X, maybe Y?” and walk out told to do Z, which serves neither the team nor the project. Execs can’t ponder every decision. When a problem reaches them, it’s because a decision needs to be made, and they’ll make one. People above you have limited time to think about your specific problems. You can’t info dump on them. If they take a misguided action based on what you told them, that’s on you. If you believe in something, state your case clearly, outlining the advantages and drawbacks. Don’t expect higher-ups to think for you. It’s fine to bounce half-formed ideas off your direct manager, but beyond that, cook a bit more - no one will think harder about your problems than you. The further up you go, the more this matters. Structure your message plainly: context → problem → plan → what support you need . By the time you reach a skip-level, the options should be clear. They’re used to situations requiring immediate action, and that’s the response you’ll get. Player (10%): Yes, only 10%. You might take on work your team isn’t excited about, but that matters: CI/CD improvements, flaky tests, process tooling. But you need to stay off the critical path. The moment you start taking essential tickets, you’ll block your team when the managerial work pulls you away. There are many engineers, but only one manager. Keep yourself available for work that only you can do. Coach (30%): Your output as a manager is the sum of your team’s output. Coaching means ensuring problematic behavior doesn’t become normalized: toxicity, repeated mistakes, consistent under-delivery. It also means helping engineers grow: stretching them with the right challenges, giving the correct feedback, and building skills they’ll carry forward. Cheerleader (60%): Praise people more than you think you should. Validation matters. Most engineers prefer feeling celebrated to having a ping-pong table. But praise genuinely, not reflexively. I once joined a team where retros had 30 minutes of mutual praise - n-squared compliments every week. It felt hollow. Not every week has something grand, and when praise becomes expected, it loses meaning. The hedonic treadmill is real. Make your engineers’ wins visible beyond your squad. Encourage them to aim for impact outside the team, and celebrate them when they do. Every squad is like a small company within the larger one - its morale often runs independent of the company’s. Most managers don’t plan to become bottlenecks. It happens gradually. A critical tool needs an owner, and you think, “I’ll handle this for now.” Someone needs to be the point of contact for another team, and it’s easiest if it’s you. Technical decisions keep landing on your desk because you’re the one with context. Before you know it, work stops without you. If you can’t take a month off and return to a well-functioning team, you need to work towards making that possible. You’re too busy to be the bottleneck. If people keep reaching you for recurring tasks, delegate: teach someone else. Point people directly to each other, or even better, create group chats and let discussions happen naturally. Don’t become the bus factor of 1. Teach others to do what you do, so things keep moving even when you’re overwhelmed or unavailable. Avoid making people feel they need your permission for small decisions, especially reversible ones. You want them to have agency. Ask to be kept in the loop on whatever they decide, but let them make the technical decisions. Micromanagers micromanage because they don’t trust. Ask yourself: can you trust every engineer on your team to do their best to complete something without you looking over their shoulder? If not, something needs to change - either in you or in them. Trust isn’t about technical proficiency. If I told my current engineers (mobile and web devs) to build a Gameboy emulator from scratch, they wouldn’t know where to start. They’d probably take months (some just weeks). But I’m certain they’d try their best to find a way to run Pokémon Gold. You need to trust both their skills and their honesty. If you can’t trust their skills at their level of seniority, it’s your job to help them improve. If you can’t trust their honesty, and you have clear reasons not to, then you need to part ways. Even great engineers get stuck without realizing it. Keeping an eye on progress helps you catch when they need support before others see them as underperforming. Processes like sprints and OKRs are mostly about the “verify” part (see, your manager is doing this with you too). They’re a shared interface to ensure things get done. This doesn’t mean lack of trust, but accountability. Verification means using metrics and evidence. There are two kinds: quantitative and qualitative. Quantitative is easy: PRs merged, points completed, code reviewed. You can glance at these, but never base decisions on them alone. If you could derive engineer performance from numbers, managers wouldn’t be necessary. Qualitative metrics are where you prove you’re worth your salt. “This engineer has fewer PRs, but they’re always watching Slack and hopping into calls to help others.” “This engineer always discusses tickets with product first - their output ends up far better than our original specs.” “This engineer explains complex concepts in ways everyone can understand and makes other teams use our tool better.” These observations require knowing your team. This is why most “management AI tools” are set up for failure. They only see quantitative metrics. They don’t sit in your standups, don’t watch the Slack channels, don’t know who’s quietly holding the team together. A good manager does. Stop having pet projects; that’s a Staff Engineer’s domain. For a manager, every project is cattle: it needs to be completed, automated, delegated, or cancelled. Managers cling to projects for many reasons. Sometimes it’s comfort - you know this system, you built it, it feels good to stay close to it. Sometimes it’s identity - you want to stay “technical” and not lose your edge. Sometimes it’s fear - you don’t trust it’ll be done right without you. None of these is a good reason to hold on. The “I can do it faster myself” thinking might be correct, but in the long term, it’s not sustainable. Every time you do it yourself, you rob someone of the chance to learn, and you guarantee you’ll be doing it forever. Be risk-averse, not risk-paranoid. You can’t account for every variable. Some things you can’t anticipate, and overcorrecting can be worse than the original problem. Hiring is where I see this most often. After a bad hire, managers start requiring referrals, but almost anyone, no matter how unskilled or dishonest, can find someone to vouch for them. Others add more interviewers to the panel, thinking more eyes means better vetting. The opposite happens: each interviewer becomes more lax, expecting someone else to be “the bad guy.” Responsibility gets diluted. Three great interviews beat seven mediocre ones. There’s a worse second-order effect too: while you’re scheduling that seventh round, good candidates are accepting offers elsewhere. The best talent moves fast. A slow, risk-averse process filters out exactly the people you wanted to hire. If any of this resonated, my free online book goes deeper. If you’re a manager too, I’d love to hear what you’ve learned - drop it in the comments. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. Large team? Say goodbye to programming. You’ll focus on careers, coordination, and navigating the org to get resources for your team. Small team? You’ll manage scope to match reality, and with less communication overhead, you might actually code. No PM? You own the product entirely: validating features, prioritizing the roadmap, and talking to clients. This dominates your time because shipping features with no user value makes everything else pointless. Reporting to the CEO? You’re now the bridge to sales, operations, and client communications.

1 views
Jampa.dev 4 months ago

Writing with AI without the Slop

I suck at writing. I open too many parentheses, and my thoughts scatter (everywhere). So when ChatGPT launched, I thought it would finally replace Grammarly. But LLMs have their own problems: “It’s not just x—it’s y,” Rhetorical questions? Affirmative answers! “Here’s the kicker”: That preface was entirely unnecessary, And in the end, it ends with recaps — that repeat everything already said, now with bullet points. The problem with AI text is that when you read it, your first thought is: “Did this person actually invest time in this, or did they write a two-line prompt and expect me to read something they never even thought about?” And as some people put it: “I’d rather just read the prompt.” The current state of Reddit, basically. LLMs can’t be genuine because they don’t know how to be a person. They read text from multiple public sources and average it out. They weren’t trained by eavesdropping on authentic conversations or messages. (At least I hope not) The more the AI creates for you, the worse the output becomes. That’s why when you ask it to  keep it casual,  it turns into “How do you do, fellow kids?” and when you ask for a professional tone, it becomes “Alas, who’d’ve done this?”. If you want LLMs to cook, you need to provide ingredients. As a general writing (and cooking) tip, start it raw. Don’t use autocorrect. In fact, don’t even look at what you’re typing. Close your eyes and let raw ideas flow, along with grammatical mistakes and misconstrued sentences. Just make it coherent enough. Make bullet points to answer: “What’s the point of me writing this?” Connect those bullet points with your personality, which dictates how you link sentences. A serious person uses serious connectors; a casual person throws in verbal expressions (and memes). How LLMs can help When you have the first draft, the key is using the right edits. The biggest mistake people make is while prompting. If you prompt like a casual writer, it treats you like one. Saying, "Improve the text below for my email,” makes the AI slopify everything: it accesses the neural latent space of “This person needs my help immensely.”. You need to signal, “Hey, I know what I’m writing. I just need help improving the flow while keeping my own words.” You can do this by using the verbiage editors and publishers use during the different editing phases, from solidifying the overall scope to minor edits like correcting grammar. While the LLM won't write for you, it can help you immensely, because writing words is not the hard part once you get the hang of it. For me, the editing takes 80% of the overall time . Most people start as slow writers because they try to write and edit simultaneously. With chain-of-thoughts in newer models, you don't need much prompt engineering anymore . You just need to know the right words so the LLM's thinking can go into the embeddings. Content editing improves flow and structure at the sentence level. It is useful when you know what you want to say but are unsure how to connect thoughts. It’s the most destructive, so it's better only to use it once. Example Prompt: “You are a content editor. Improve the flow of the sentences and make the text stronger and more structured.” The AI will make many edits to make your text make sense, and the places where the AI misunderstood your intentions will stick out like sore thumbs. You will need to adjust them and add points that solidify your premise. As you add (and cut) content for a second draft, it's time to move to line editing. Line editing is where AI shines, especially for short texts like announcements. Use this when you know what and how you want to say something, but specific words escape you, or phrasing could be simpler. I spend most of my time here, line editing multiple times until nothing stands out badly. Example Prompt: “Line edit this (Slack message / blog post).” Proofreading happens when you’ve “mastered” the copy. It’s always safe to run multiple times without fearing the AI will destroy your voice, because you will tempt yourself to write small additional bits here and there. Example Prompt: “You're Grammarly, fix the mistakes in the text:” This is basically a cheap Grammarly (but better). Writing text is not magic, and you must put in effort. Even if we have better AI, I don’t think we will ever remove the AI scent of text writing. So we as humans will need to write until we get tired and don't even want to finis- Thanks for reading Jampa.dev! Subscribe for free to receive new posts. (And avoid getting shot by a snip- Note: I’ve added all the editing phrases of this article here . You can see how the content was changed from draft to final editing. I used Claude Sonnet for the editing part. Overall, I did one content edit and 18 line edits (on different snippets), and I lost count of how much proofreading I used.

0 views
Jampa.dev 4 months ago

Things I’ve learned in my 7 Years implementing AI

Even though the impacts of LLMs have never been seen before, they feel familiar to earlier assumptions. For context: I wasn’t the “PhD scientist,” working on models. I was the guy who worked on productionizing their proof-of-concept code and turning it into something people could actually use. I worked in industries ranging from software/hardware automated testing at Motorola to small startups dealing with accessibility and education. So here is what I've learned: This AI hype cycle is missing the mark by building ChatGPT-like bots and “✨” buttons that perform single OpenAI API calls. For example, Notion, Slack, and Airtable now lead with “AI” in their page titles instead of the core value they provide. Slack calls itself “AI Work Management & Productivity Tools,” but has anyone chosen Slack for its AI features? Most of these companies seem lost on how to implement AI. A simple vector semantic search on Slack would outperform what they’ve shipped as “AI” so far. People don’t use these products due to these “✨” AI solutions. The best AI applications work beneath the surface to empower users. Jeff Bezos comments about this  ( in 2016! ) You don’t see AI as a chatbot on the Amazon homepage. You see it in “demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations.” That’s where AI comes in, not as “the thing”  but as “ the tool that gets you to the thing .” Relevant XKCD, which is not relevant anymore… What if a problem that took a team of PhDs one year to solve could be solved better in four hours? That's when LLM shines: When I worked on accessibility for nonverbal people, one of our projects aimed to make communication cards (“I want,” “Eat,” “Yes,” “No”) context-aware to allow nonverbals to express their desires faster, similar to an autocomplete. For example, the user is home at 7 AM and taps “I want to eat” card. The next cards should anticipate their needs (which are more likely to be breakfast items), but there are caveats: What a person typically eats for breakfast depends on their country, the type of establishment they are in (home, hotel, restaurant), the day of the week, and, of course, current personal preferences, which also change over time. After a year of work, our team of researchers from two universities achieved a  55% rate  (of the suggested options). It was a massive success at the time. We even won an award for best accessibility solution. When ChatGPT 3.5 was released, I replicated a solution for this project and, after hacking over the weekend, got an 82% accuracy rate when running against the same test database. AI skeptics ask, “If AI is so good, why don’t we see a lot of new startups?” Ask any founder. Coding isn’t even close to the most challenging part of creating a startup. What I do see is a boom in internal tools. This year alone, I shipped projects that would never have been viable. As an engineering manager, spending weeks coding means neglecting the team. The “Nice to have” bucket is when a project dies. It means there is no engineering capacity to tackle it, so it goes into the backlog limbo—until now. Now, I can build these projects using Claude, running prompts, and reviewing the output between meetings. I see many people releasing new things that are incredibly helpful and productive, which would not have happened without Claude or Cursor. Like with all tools before it, we’re coming closer to the top of the S-curve for LLMs: Note: Take this graph with a grain of salt. It is hard to compare earlier models because most benchmarks came much later. The last releases were unimpressive. Does anyone know a real application where ChatGPT 5 can do something that o3 could not? The good news is that what we have is enough for most people. AI tools like KNNs are very limited but still valuable today. This also kills the reverse FOMO: “If I wait for the technology to mature, I won’t have to deal with their earlier quirks,” is less relevant now. But AI research is definitely not over: We will still see cheaper, faster, and open models, like those that can run on a mobile device and are as capable as ChatGPT 4o. Creating AI models is hard, but working with them is simple. I put off implementing earlier AI tools because I couldn’t grasp how neural networks, sigmoids, and all that worked. Then someone said, “What are you doing? If you want to apply the technology, just use Scikit-learn.” If you’ve never used AI for coding, install Claude Code and start using it for small tasks. That gets you 70% of AI’s current benefits without diving into prompt optimization or chain-of-thought mechanics. Eventually, you’ll need to learn to leverage LLMs better when you hit bottlenecks. You will realize that you will still need to review code and CLI commands. You will naturally be better at prompting. You will know when and when not to use it. AI is the new Agile: something simple, that makes you faster but has limits, yet people will position it as the solution for every problem, preaching: “Oh, you’re using (AI / Agile) wrong. In fact, it seems like what you need is even more of (AI / Agile)” The tool has limits, especially when breaking new ground. LLMs are limited by their training data. For example, when I tried to vibecode a mod for a recently released Unity game, the AI failed to complete even a basic hook. Automatic railway gates replaced crossing attendants. But if those gates worked 99% of the time (or even 99.99%), would that be good enough? LLMs are very far from being 99% accurate. They fix problems, but they tend to miss the root cause. I see many cases where the LLM suggested a fix by adding multiple lines, which an experienced engineer did by removing one. Recognizing this requires senior-level skills, such as valuing simplicity over complexity and knowledge gained from dealing with similar bugs in the past. This creates a problem for juniors, who, when using LLMs , will have problem-solving done for them and won’t develop this skill, hurting their code reviewing abilities. I see many companies that have stopped hiring juniors altogether. The Internet was a bubble in 1999, and you know the result. The internet died completely, but it was good for a while. Man, I miss the Internet. But seriously, we are seeing great tools coming to boost productivity, a new era of AI memes, while VCs and Big Tech pay for most of them. It’s a win-win. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. Also, here is my current favorite SORA video: (Warning: LOUD ) (I had to remove the video because a bug in Substack causes the space bar to play the video instead of scrolling down—sorry for the jumpscare. Here’s the Reddit link instead: https://www.reddit.com/r/SoraAi/comments/1nwcx9e/some_body_cam_footage/ )

0 views
Jampa.dev 5 months ago

Using Claude Code SDK to Reduce E2E Test Time by 84%

End-to-end (E2E) tests sit at the top of the test pyramid because they're slow, fragile, and expensive. But they're also the only tests that completely verify complete user workflows actually work across systems. Due to time constraints, most teams run E2E nightly to avoid CI bottlenecks. However, this means bugs can slip through to production and be harder to fix because there are so many changes to isolate the root cause. But what if we could run only the relevant E2E tests for specific code changes of a PR? Instead of waiting hours for the entire suite, we could get the results in under 10 minutes , catch bugs before they ship, and keep our master branch always clean. The first logical step toward running only relevant tests would be using glob patterns. We tell the system to test what changed by matching file paths. Here's how a typical could work: But globs are very limited. They require constant maintenance as the codebase evolves. Every new feature would require updating the glob patterns file. More importantly, they cast too wide a net. A change to might need to trigger every E2E test that involves any page with a button interaction, depending on how deep the change is. So, how can we determine which E2E tests should run for a given PR with both coverage and precision? We need  coverage  because missing a critical test could let bugs slip through to production. But we also need  precision because running tests that will obviously pass just wastes time and resources. The naive approach might be to dump the entire repository and changes into an LLM and ask it to figure out which tests are relevant. But this completely falls apart in practice. Repositories can easily contain millions of tokens worth of code, which makes it impossible for all AI models. Claude Code takes a fundamentally different approach because of one key differentiator: tool calls . Instead of trying to process your entire codebase, Claude Code strategically examines specific files, searches for patterns, traces dependencies, and incrementally builds up an understanding of your changes. So here's the hypothesis: If I see a PR, I will know which E2E tests it should run because I know the codebase. The question is: Can Claude Code replicate my human intuition by searching for it? Let's build and find out. For the E2E selection to be successful, Claude needs to know what I know: the PR modifications, the E2E tests, and the codebase structure. We need to glue all three together in a well-crafted prompt. This is perhaps the easiest piece - we can leverage git to get exactly what we need. We start with the basic command: This gives us the changes of a branch, but we can do much better. First, we want git to be less verbose, so we add to focus on the actual code changes rather than whitespace noise. We also don't care about deleted files since we'll need to remove references in existing files anyway (unless we don't care about those tests), so we add to exclude deleted files and focus on (A)dded, (C)opied, (M)odified, and (R)enamed files. Finally, we need some strategic excludes because there are generally large files in PRs like that would blow up our token count. We add to keep things manageable. Putting it all together: The result is a clean diff showing the actual code modifications: We could hardcode a list of test files in our prompt, but that violates the single source of truth principle. We already maintain this list for our daily benchmarks, so let's reuse it. For example, if the test configuration lives in a WebdriverIO ( ) config file, we can extract it programmatically: This script dynamically reads the file and outputs our exact test suite configuration: The prompt needs to be precise about what we want. We start by setting clear expectations: The key phrase here is "think deep" . This tells Claude Code not to be lazy with its analysis (while spending more thinking tokens). Without it, the output was very inconsistent. I used to joke that without it, Claude runs in “engineering manager mode” by delegating the work. Next, we set boundaries: The "only run tests listed" constraint was added because Claude was being "too smart," finding work-in-progress spec files and scheduling them to run. We added the last piece because it is better to run more specs than leave a test out. I initially asked for JSON output , and since I didn't want Claude's judgment to be a black box, I requested two keys: the list of tests to run and an explanation . This makes it easy to benchmark whether the reasoning is sound. I initially tried using JSON mode and asking Claude to output only JSON: But Claude has strong internal system instructions and couldn't stop adding commentary . I initially fixed this with a regex JSON parser to remove the commentaries, but when you use regex to solve a problem, you get two problems. But then I realized: Claude Code is used to write files, duh So instead of fighting with JSON mode and regex, I asked: Works every time! The final pipeline combines everything with what might be the ugliest bash command known to humankind: The result command is piped to Claude: We add So it can write our file. By the way , you should never use which gives all permissions, including . I am surprised by how many people are taught to do this . If we did add this flag, someone could write in the prompt file and instruct Claude to read our environment variables and send them to a URL using Fetch(). Since the CI runs on a PR open, not a merge, this would be similar to a “0-click” exploit. I won't lie - this exceeded my expectations. We used to run all core tests, which took 44 minutes (and now it would take us more than 2 hours, since we keep adding tests). Most PRs complete E2E testing in less than 7 minutes, even for larger changes. Even if it performed worse, it would still be an incredible success because our system has so many complexities that other types of tests (unit and integration) are nowhere near as effective as E2E. The solution scales well because adding E2E test names consumes few tokens, and PR changes are mostly constant. Claude doesn't read all test files: it focuses on the ones with semantic naming and explores modified file patterns, which is surprisingly effective. Did Claude catch all the edge cases? Yes, and I'm not exaggerating. Claude never missed a relevant E2E test. But it tends to run more tests than needed, which is fine - better safe than sorry. How much does it cost? Without getting into sensitive details, the solution costs about $30 per contributor per month. Despite the steep price, it actually saves money on mobile device farm runners. And I expect these costs will drop as models become cheaper. Overall, we're saving money, developer time, and preventing bugs that would make it to production. So it's a win-win-win! Thanks for reading Jampa.dev! Subscribe for free to receive new posts! We need  coverage  because missing a critical test could let bugs slip through to production. But we also need  precision because running tests that will obviously pass just wastes time and resources.

0 views
Jampa.dev 6 months ago

Why AI for coding is so polarizing

If you spend any time online, you've probably seen the wildly different opinions on using LLMs in coding. On one side, Twitter bros bragging about how they built “a $1k revenue app in just 10 days using AI”. On the other hand, engineers who refuse to use any LLM tool at all. You'll find them in every thread, insisting that AI sucks, produces garbage code, and only adds to technical debt. Alt text: The most civilized Anti-AI vs Pro-AI conversation on Twitter. Joking aside, some people use AI to do great things daily, while others have problems with it and have given up.  The difference is context. An LLM has no sapience. Everything the AI cooks up is a product of its training corpus, fine-tuning, and a system + user prompt. (with a bit of randomness for seasoning). No matter how clever your prompt is, the training data is its foundation. This is why companies are so aggressively scraping the web. If you create a new language tomorrow called FunkyScript, the AI will be terrible at it, regardless of your prompt. This explains the different experiences of AI detractors and champions. On the one hand, you have people new to coding working on greenfield projects with popular tools like Tailwind and React (which have a massive training corpus). On the other hand, you have engineers working with more niche tools. A great example is CircleCI’s YAML configuration. Since CircleCI has documentation that's difficult for an AI to ingest (because it sucks). So the AI starts hallucinating and spitting out code for GitHub Actions instead. Then there's the context window, the "short-term memory" of the AI. It's a known issue that the more context you stuff into a prompt, the "dumber" the model can get. When you're working on a greenfield project, there are no existing files or dependencies, so you don't need to provide much context, which saves you from spending tokens on it. But greenfield projects aren't the norm . The norm is a legacy codebase built by multiple people who changed many parts and then left the company. Some of it has parts that don't make sense even to a human, much less to an LLM. All this extra context weighs down the LLM tokens. Consider the same prompt: " Change all the colors to blue on my Auth page ." In a new project, the AI can probably find and handle the relevant files. But on a mature codebase, that auth page is tied to a color system, part of a larger design system. Now the AI is in trouble. Throw in some unit tests that will inevitably break, and the AI is completely lost. "Hey AI, you broke this stuff" — You say, thinking you are not using AI enough Then the AI sycophantly replies: "You are absolutely right! Let me try another approach!" Now you're the one in trouble . It's time to shut the AI down and salvage what you can from the wreckage. This isn't a perfect fix, but there is a strategy to make the AI less destructive and, eventually, genuinely helpful. You'll have to decide if the upfront effort is worth it compared to manually coding. It won't be worth it for the FunkyScript codebase, but I succeeded on niche stacks, like Mobile E2E. In complex codebases, an AI must learn your project's unique patterns with every prompt. The solution is to give it that knowledge upfront, rather than making it rediscover everything at "runtime." Having a good , for example, which an LLM can read before performing a task, helps the AI understand what makes your project different from its base model. Your is not for you to say “ do it right, stop making it wrong ” like a lot of people do. We can even use the AI itself to help. Here is an example prompt. You should provide more high-level context for a real project, especially if your README.md sucks . You are a senior engineer onboarding another senior engineer to our codebase. Analyze the provided files at a high level. Study its structure and patterns, then write a document explaining how to work on it. Highlight the parts that differ from common industry patterns for this language and framework. For example, do you use Bun instead of npm? Inline styles instead of CSS? These are crucial details the model needs to know; otherwise, it will default to the most common patterns in its training data. So, the next time someone gives an opinion on AI that differs from yours, maybe don't immediately jump to arguing. They aren't necessarily doomers who will be replaced, nor are they grifters selling snake oil. Consider that not every engineer works on your stack / codebase. …. or maybe they are all koopas: Thanks for reading Jampa.dev! Subscribe for free to receive my shitposts and Goomba fallacies.

0 views
Jampa.dev 6 months ago

My advice to the new generation of software engineers

The job market is tough for junior engineers right now, and many companies have drastically reduced hiring for these roles. Some claim this is due to AI, which still needs someone to operate it. Others blame outsourcing, a practice that's been part of the industry since, well, forever. But the truth is, junior engineers have never had it easy. When I first started applying, I had seven years of experience writing software as a hobby and still struggled to get an interview. It wasn't until I was on the other side of the table, hiring juniors myself, that I finally understood what I had been doing wrong. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. Ironically, the current situation is worse for juniors because, just a few years ago, things were too easy for everyone. Let's rewind to the COVID era. Governments initiated a ZIRP (Zero-Interest-Rate Policy), meaning "safe investments" like bonds became less attractive. Investors with a lot of cash lying around needed another avenue to generate returns. As a result, tech startups became a huge target for investment. Also, during COVID, people needed to digitize their processes. Tools like Zoom, which were rarely used in person, suddenly became multibillion-dollar companies. And, with more people at home, industries like advertising, entertainment, and gaming also saw a massive infusion of cash. These factors caused companies to hire like crazy, causing a shortage of software engineers. With few senior engineers available, the high tide raised all the boats, and companies started poaching engineers from each other, including juniors. Recruiters were reaching even for boot camp graduates. I had a friend who was being paid and had a job offer lined up if he attended a boot camp. However, the hiring frenzy created a long-term problem for juniors. Everyone realized they were losing their junior engineers to other companies. Since juniors require training and would quickly get an offer elsewhere as “Senior" and leave anyway, companies stopped hiring them. With the interest rates up, our industry is in a downturn (unless you slap “AI” in your product). I still believe juniors will make a comeback. The core principles of breaking into this industry haven't changed, and the demand for software engineers still exists. So, how do you do that when the odds are stacked against you? Even before COVID, many computer science graduates left the industry after getting their degrees. The problem was that many of them focused only on getting good grades and forgot about the actual craft of programming. Imagine yourself as an employer reading resumes. What would set a candidate apart: their GPA or the fact that they worked on live projects that people are actually using? I worked at a company that hired many juniors. When we reviewed their resumes, they mostly fell into two buckets, with a very imbalanced split: 95% of resumes just listed a bootcamp certificate or a college GPA. The rest of the space was filled with fun facts, big headshots, and flashy modern designs. The other 5% listed side projects, college research assistant work, public GitHub repos, Jupyter notebooks, or personal websites. We only interviewed candidates from that 5% bucket. We knew there were capable people with high potential in the 95% pile, but like most companies, we didn't have the budget or time to interview everyone. It used to be that you got the job, and then you got the experience. Now, it's the other way around. If you're in college, you're in the perfect place to build connections and find opportunities, but very few students take advantage of them. Don't wait until your final semester or when you desperately need an internship to start thinking about your career. Unless you plan to be a researcher and pursue a master's degree, college is primarily a launchpad for your employment prospects. It's also a chance to taste different areas of computer science. Most bootcamp grads end up as mobile, front-end, or back-end engineers, but in college, you can explore other segments like working with embedded devices, firmware, or even game development. While you're there, find other students interested in building cool stuff. Many of the biggest tech unicorns were founded by people who met in college and shared a passion. You can also pursue research opportunities with professors. You'll learn a ton, and some have valuable industry contacts who can provide strong referrals. If you're not in college, you can start your career by doing "odd jobs" instead of only pursuing full-time employment. For example, try creating a startup. Even if the idea is bad, it's the fastest way to learn. Freelancing is one possibility, like doing a product for that friend-of-a-friend who owns a business. This path isn't for everyone, though. After a while, you might spend more time on business and negotiation than on becoming a better coder. Those are important skills, but they can get boring if your passion is the code itself. Another option, if you're in a country where college is cheap or free, is to go to college and apply the advice I mentioned earlier. That's what I did, I knew how to code, but wanted to learn more. The most important thing is to keep making things and sharing them. It doesn't need to be a viable business or make a single cent. It can just be something you find helpful that might be useful to others, too. Even when you "fail," you meet many new people. Even silly projects can lead to amazing things. I enjoy making scrapers, so back when Pokémon Go was at its peak, I built a map for my city. One of my first users was the CTO of one of the biggest companies in my city, who encouraged me to apply to his company. If you do enough of that, your resume will eventually cross the 95%/5% gap, and people will start calling you for interviews. These "odd jobs" are career-defining. You will be forced to learn about optimization, caching, Redis, N+1 queries, microservices, and DevOps. You can also drop their links in your resume. So, after a while, you will start to get interviews! Which is only half the battle. Interviewing is a skill that has almost nothing to do with your actual skill as a programmer, but LeetCode-style problems aren't going away. You need to read at least the *Cracking the Coding Interview*, even if you aren't aiming for a FAANG job. And even after you've read it, prepare to flop a few interviews. Remember that CTO who invited me to interview? I totally blew it. They asked me how I would design a database system using the Windows filesystem and folders. I basically told them the idea was silly. "Why would you create a production database with TXT files on Windows? If you need a NoSQL-style system, why not use an actual RDBMS and avoid the Windows overhead?" That's exactly how their system was built, and they didn't appreciate my candidness. You will make mistakes in your first interviews. That's fine. It's how you learn to navigate the corporate world. You'll learn what you can say and, more importantly, what you can't. In the end, despite no one saying so, most interviewers aren't looking for the best candidate. They're more concerned with avoiding hiring the worst ones. Once you land that first job offer and accumulate years of real-world experience, finding the next job gets easier. (It's never easy, of course, unless you have great connections). One final piece of advice: don't focus on money too early in your career. Career growth is way more important. Joining a large enterprise might offer more job security, but a startup often gives you more opportunities to shine and get promoted faster. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. Governments initiated a ZIRP (Zero-Interest-Rate Policy), meaning "safe investments" like bonds became less attractive. Investors with a lot of cash lying around needed another avenue to generate returns. As a result, tech startups became a huge target for investment. Also, during COVID, people needed to digitize their processes. Tools like Zoom, which were rarely used in person, suddenly became multibillion-dollar companies. And, with more people at home, industries like advertising, entertainment, and gaming also saw a massive infusion of cash. 95% of resumes just listed a bootcamp certificate or a college GPA. The rest of the space was filled with fun facts, big headshots, and flashy modern designs. The other 5% listed side projects, college research assistant work, public GitHub repos, Jupyter notebooks, or personal websites.

0 views
Jampa.dev 11 months ago

Testing the Big Five LLMs: Which AI Can Better Redesign My Landing Page?

The best thing about AI is that it can code snippets I am not passionate about. I am glad that I no longer need to think as hard to write Javascript .reduce() or any Swift code. With the new Flagship models coming in hot this month, like Gemini 2.5 and o1-pro, I thought it would be perfect to try those out in one category I suck most: visual design . This is the perfect opportunity for me to replace the designer in me who is terrible with hand-eye coordination and always got bad grades in art classes because my teacher thought I was “not taking it seriously enough.” However, it is cheaper to benchmark LLMs than to go to therapy. I am writing a FOSS book about Engineering Management , and I have created a monstrosity of a home page below—as you can clearly see, I should get a designer . Job to be done We will take a screenshot and the code of the current homepage and feed it to multiple flagship AI models to ask them to make it less horrible. The most popular reasoning models currently are Google Gemini 2.5, OpenAI o1 Pro High, xAI Grok3 Think, DeepSeek R1 , and Claude 3.7 Sonnet , so we are going with those. I tweaked the system prompt in some preemptive benchmarks to improve common confusion points and things I missed. I find this prompt good enough, even for non-reasoning models: One thing I am learning with system prompts is that if you ask LLM too much, they start failing catastrophically . For example, it is okay to ask an LLM agent to create a unit test for a component, but if you ask: “create all missing units tests in my codebase,” it shits the bed terribly. So I am going to keep it simple. Let's establish a system to evaluate the AI results. I have created a scorecard system to evaluate what we want the most: Visual Design (50 points) - This is what we came for, so it should be half of the overall score Interactivity (25 points) - Relates to mouse button hovers and scroll animations, basically “making it pop.” Code quality (15 Points) - We should judge the code since having visual improvements is good and shouldn't come at the cost of code maintainability Dark mode compatibility (10 points) - A “nice to have”: Our prompt doesn't even mention it so that we can focus on the above. If the AI messes this up, it is a quick fix. I ran my current code against all the models and will share the code and the prompt when possible. Here is the original branch from which the code is used . Let's start with the oldest one (I cannot believe I am calling an LLM released in January old). Deepseek produced a great concept with a few caveats. The art doesn't mean much. The links with the underlines under them are fugly, and, most important, the shadows are terrible, and the hover effect of a shadow is dated. But at least it works with dark mode! The  code is also not bad;  it adds a lot of SCSS, but it is expected. Result: Visual Design: 30 | Interactivity: 15 | Dark mode: 10 | Code Quality 10 Google's new LLM has performed very well in many of my benchmarks. It makes the best even with the worst prompts. This is very good. I have a few complaints about this design. I don't like the double columns of the chapters, but that is about it. The hovering is great and presents the chapter in a very solid way! In terms of  code , it did surprisingly well, and it even fixed my bad dark mode logic, but on the other hand, it had a lot of unnecessary comments, which would not be ideal for pushing to the main as is. Visual Design: 40 | Interactivity: 25 | Dark mode: 10 | Code Quality 10 Grok is the first one to disappoint me so far, but at least it made me feel better about my own design. It repeated the book icon, “locked” the chapters as if I was selling a SaaS plan, and visually nested the cards too much. Overall, there were some small changes, but none were positive. It's also not stellar on the code side either, but at least I don't see any negative downsides. Visual Design: 10 | Interactivity: 15 | Dark mode: 5 | Code Quality 10 o1 knows the key to my heart: I love blue and gradients. Maybe the fact that it did exactly how I liked it without me asking creeps me out. I really like the header—it draws more attention than others. And instead of adding a generic SVG, it went with the best approach with the current LLM capabilities. The chapter's menu is also not bad. I like that it starts over the banner, so it flows better. The padding around the chapter's card could be better, and the broken white padding around the whole site might have been a mistake. The dark mode is utterly broken. On the code side , it is nothing bad; I could merge this as is. Visual Design: 40 | Interactivity: 25 | Dark mode: 0 | Code Quality 15 I almost forgot Claude, to be honest. I always hear compliments on the code quality and how people prefer to use it on Cursor and Windsurf. I am glad to see its greatness in UI design as well. That gradient is super sleek, and the subtle blue backdrop is also cool. The SVG, while not the most incredible one, was at least the most relevant. I just didn't like that it added the famous “3 boxes with icons” that you see in every landing page template. Also, the bullet points and double columns are not visually pleasing. Ironically, since Claude LLMs are praised for their code quality, I expected more ; it tried to import a new font and tried to make the book's page flip, which clearly did not work. Visual Design: 35 | Interactivity: 20 | Dark mode: 15 | Code Quality 5 The results are in: Gemini 2.5: 40 + 25 + 10 + 10 = 85 points o1 Pro High: 40 + 25 + 0 + 15 = 80 points Claude 3.7 Sonnet: 35 + 20 + 15 + 5 = 75 points Deepseek R1: 30 + 15 + 10 + 10 = 65 points Grok 3: 10 + 15 + 5 + 10 = 40 points One thing I missed in all the LLMs was improving the header text. I know that my callout is terrible because I am very bad at selling things. The system prompt even mentioned this, but all LLMs ignored it. But at least all the LLMs were successful in fixing 2 grammatical errors! It is impressive that the AI “knows” how to better design a website without any visual aid to validate it afterward. Sure, it's not the Linear home page, and it won't win any Awwards, but in the end, I think that is the current LLM limitation. It is a blender of text absorbed by the corpus, resulting in an average of all designs worldwide. I am also impressed by how the AI advanced. If I had tried this a few months ago, the results wouldn't be ready-to-run code. There would be just a few improvements in the code, but mostly not visual. In the end, yes, any of those LLMs except for Grok would improve the current landing page, and I should have just applied the improvements instead of writing an article. However, I don't want an average book page. Even if I don't get paid for it, I want an excellent one. So, for that reason, I am still getting a designer. Thanks for reading my blog! Subscribe for free to receive new posts about AI and Tech Careers content. This is the perfect opportunity for me to replace the designer in me who is terrible with hand-eye coordination and always got bad grades in art classes because my teacher thought I was “not taking it seriously enough.” However, it is cheaper to benchmark LLMs than to go to therapy. The Problem I am writing a FOSS book about Engineering Management , and I have created a monstrosity of a home page below—as you can clearly see, I should get a designer . Job to be done We will take a screenshot and the code of the current homepage and feed it to multiple flagship AI models to ask them to make it less horrible. The most popular reasoning models currently are Google Gemini 2.5, OpenAI o1 Pro High, xAI Grok3 Think, DeepSeek R1 , and Claude 3.7 Sonnet , so we are going with those. System Prompt I tweaked the system prompt in some preemptive benchmarks to improve common confusion points and things I missed. I find this prompt good enough, even for non-reasoning models: One thing I am learning with system prompts is that if you ask LLM too much, they start failing catastrophically . For example, it is okay to ask an LLM agent to create a unit test for a component, but if you ask: “create all missing units tests in my codebase,” it shits the bed terribly. So I am going to keep it simple. Scorecard Let's establish a system to evaluate the AI results. I have created a scorecard system to evaluate what we want the most: Visual Design (50 points) - This is what we came for, so it should be half of the overall score Interactivity (25 points) - Relates to mouse button hovers and scroll animations, basically “making it pop.” Code quality (15 Points) - We should judge the code since having visual improvements is good and shouldn't come at the cost of code maintainability Dark mode compatibility (10 points) - A “nice to have”: Our prompt doesn't even mention it so that we can focus on the above. If the AI messes this up, it is a quick fix. Let's go! I ran my current code against all the models and will share the code and the prompt when possible. Here is the original branch from which the code is used . Deepseek R1 Let's start with the oldest one (I cannot believe I am calling an LLM released in January old). Deepseek produced a great concept with a few caveats. The art doesn't mean much. The links with the underlines under them are fugly, and, most important, the shadows are terrible, and the hover effect of a shadow is dated. But at least it works with dark mode! The  code is also not bad;  it adds a lot of SCSS, but it is expected. Result: Visual Design: 30 | Interactivity: 15 | Dark mode: 10 | Code Quality 10 Gemini 2.5 Google's new LLM has performed very well in many of my benchmarks. It makes the best even with the worst prompts. This is very good. I have a few complaints about this design. I don't like the double columns of the chapters, but that is about it. The hovering is great and presents the chapter in a very solid way! In terms of  code , it did surprisingly well, and it even fixed my bad dark mode logic, but on the other hand, it had a lot of unnecessary comments, which would not be ideal for pushing to the main as is. Visual Design: 40 | Interactivity: 25 | Dark mode: 10 | Code Quality 10 Grok 3 Grok is the first one to disappoint me so far, but at least it made me feel better about my own design. It repeated the book icon, “locked” the chapters as if I was selling a SaaS plan, and visually nested the cards too much. Overall, there were some small changes, but none were positive. It's also not stellar on the code side either, but at least I don't see any negative downsides. Visual Design: 10 | Interactivity: 15 | Dark mode: 5 | Code Quality 10 o1 Pro High o1 knows the key to my heart: I love blue and gradients. Maybe the fact that it did exactly how I liked it without me asking creeps me out. I really like the header—it draws more attention than others. And instead of adding a generic SVG, it went with the best approach with the current LLM capabilities. The chapter's menu is also not bad. I like that it starts over the banner, so it flows better. The padding around the chapter's card could be better, and the broken white padding around the whole site might have been a mistake. The dark mode is utterly broken. On the code side , it is nothing bad; I could merge this as is. Visual Design: 40 | Interactivity: 25 | Dark mode: 0 | Code Quality 15 Con… Oh wait, it's Claude Sonnet 3.7, with a steel chair! I almost forgot Claude, to be honest. I always hear compliments on the code quality and how people prefer to use it on Cursor and Windsurf. I am glad to see its greatness in UI design as well. That gradient is super sleek, and the subtle blue backdrop is also cool. The SVG, while not the most incredible one, was at least the most relevant. I just didn't like that it added the famous “3 boxes with icons” that you see in every landing page template. Also, the bullet points and double columns are not visually pleasing. Ironically, since Claude LLMs are praised for their code quality, I expected more ; it tried to import a new font and tried to make the book's page flip, which clearly did not work. Visual Design: 35 | Interactivity: 20 | Dark mode: 15 | Code Quality 5 Conclusion The results are in: Gemini 2.5: 40 + 25 + 10 + 10 = 85 points o1 Pro High: 40 + 25 + 0 + 15 = 80 points Claude 3.7 Sonnet: 35 + 20 + 15 + 5 = 75 points Deepseek R1: 30 + 15 + 10 + 10 = 65 points Grok 3: 10 + 15 + 5 + 10 = 40 points

0 views
Jampa.dev 11 months ago

The Battle for Attention

LinkedIn shows that I have six notifications, I know that none are interesting, but I click on them anyway. Like an empty fridge, I always fall for it. If I don't read it, it will start sending me emails. Speaking about emails, I am unsure when I last had a real conversation using them. Most of them just want to tell me that their company exists. “We are updating our privacy policy.” — We both know that's not why you are emailing me. No one, not even you, cares about your privacy policy. Email shadow advertising also takes many forms: like “How did we do? Give us feedback that we won't ever read” and “Your package has updates! (It is transit city that doesn't matter.)” When we talk about attention, we always think about social media, which tries to get you addicted to it somehow. If you want to test this, create a new account on Twitter / TikTok / Instagram and see what the algorithm feed "seed” posts are. They are always quasi-nudity, religion, public fights, hustlebros, or politics. It is bad even when there is no algorithmic feed. Some years ago, Reddit's “all” page was used to entertain and have interesting news, but not anymore. If I open my Reddit /r/all page, all the posts are about <x> destroying the world, but what can I do? None of those posts offer anything interesting or actionable. Most are rage-bait relationship stories written by AI and “clever comebacks” at a boomer politician who doesn't even know what an “ecks” is. Source: ExtraFabulousComics It is bad, even for creators The problem is that it didn't even click with me until I started a startup, but attention is more valuable than short-term financial success. Views and signups are cool, but it is better if the user returns to you daily and you spend a lot of time in your app. Later, you can figure out how to turn this attention into money. There is no limit to wealth, but there is to people's lives and how much of it they spend on your app. If you want to show the world something you made, you need to gather that attention—“Build and they will come” is BS for anyone who ever created something. And if you want that attention, you need to piggyback on existing platforms to try syndicating some of their attention. But they will always get the last laugh when they rug pull you, when “your attention” is stolen back into the “platform's attention.” For example, if I post on LinkedIn with a link attached, no one will ever see the post because LinkedIn doesn't want people to click links that get you out of the website. If you are a YouTube creator, you know that the people subscribing is not enough—you have to ask them to hit the bell. The algorithm sometimes won't prioritize your videos to your subscribers and will prefer random videos instead. Ironically, even Substack (this platform) does this. When I go into Substack.com, my home is not the dashboard but other unrelated people's blogs, which are nothing close to things I care about. Why Substack, why are you not opening my Dashboard by default? The only way out of this hell is to treat your time as a currency. “Time is money” is now on a whole new meaning. Now, your time is someone else's money . Now, I have the same disregard for people who want to steal my time as I do for people trying to scam me out of my money. Send me an AI cold email, and I report it as spam. Have fun getting deservedly banned by GMail. Mobile notifications are a privilege not given to any social media. Attention-seeking spam keeps coming in, but I slowly push it out. Nowadays, I limit my social media to Subscribed Subreddits and Hacker News, but honestly, I can't resist sometimes going to Reddit's /r/all page and my local news websites and regretting it. But like a diet, I know I might improve if I consume fewer garbage products.

0 views
Jampa.dev 11 months ago

How promotions happens after Senior

From junior to senior level, promotions depend mainly on meeting the proficiency bar for technical skills and communication abilities. But after reaching a senior level, whether your next step is toward Staff or Manager, clearing the proficiency bar alone isn't enough—there must also be a clear "business need." Positions typically become available when your company expands, someone leaves, or new initiatives emerge. The "business need" means that advancing beyond senior relies not just on you but also on your company's circumstances. This makes getting promoted in companies that aren't growing much harder because new roles or opportunities are rarely created. And when a new senior role does open up, many internal candidates will compete for it. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. This is what makes growing companies attractive. They constantly need to find new ways to capture more market share, resulting in frequent hiring, formation of new teams and initiatives, and consequently, more senior+ roles. If you want to accelerate your career growth, consider joining a rapidly growing startup. You'll likely have greater opportunities to handle significant challenges and rapidly develop your skills. The trade-off is usually stability and possibly a lower salary. Before accepting a new role, take the time to research the company's growth trajectory. Publicly available metrics such as headcount growth, Glassdoor reviews, funding rounds, and media coverage are good indicators of a company's health. For larger companies, you can also assess their long-term potential by reviewing customer experiences on Reddit or other forums. Generally, satisfied customers point out a positive trajectory for the company. It's important not to take a passive approach to your career advancement. There's a well-known saying in Western culture: "The squeaky wheel gets the grease." In conversations with friends who mentioned deep frustrations at work, I would often ask, "What does your manager say when you discuss it?" Surprisingly often, they had never even brought it up. Ideally, your manager should initiate conversations about your career development at least every six months. Unfortunately, many managers never discuss career development, meaning you'll likely need to be the one to bring up the discussions. Nobody is more invested in your career than you — Be explicit about your career goals during your one-on-ones with your manager. Doing so enables more relevant feedback tailored to your desired path and helps identify skills to focus on. If your current manager understands your goals for advancement beyond senior, they can more effectively mentor you by highlighting relevant lessons from situations that arise. Without an awareness of your aspirations, they may miss opportunities to provide valuable context or guidance. Proactively seek opportunities with your manager to learn essential skills relevant to your desired path—whether it's interviewing candidates, running meetings effectively, strategic thinking, or directly interacting with customers. If possible, schedule occasional skip-level meetings with your director. These conversations offer insight into the company's higher-level challenges. It also doubles as a opportunity to know and build rapport with someone who could potentially become your future boss. Senior leaders generally know exactly which individual contributors (ICs) on their teams are fully prepared to step into higher-level roles if needed. Typically, these people are highly competent senior ICs who teammates naturally approach when they need help. They're effective communicators, strong mentors, and skilled at analyzing business requirements alongside technical tradeoffs; this makes them "go-to" advisors for their colleagues. Another invaluable trait senior leadership looks for is curiosity, essential when offering feedback constructively. For instance, if a colleague takes an unusual approach in their work, instead of calling them out directly, ask why they chose that method rather than the accepted 'ideal' approach. Framing your concern as a question demonstrates trust in their abilities and encourages them to reflect and learn. One of the most powerful actions you can take if you're seeking career growth is to position yourself as a "force multiplier." Although this concept often goes unnoticed, it significantly impacts a project's success. Force multiplier is work that once completed, enables others on the team to deliver faster and more effectively. Whether you aspire to become a Staff-level engineer or Manager, becoming a force multiplier is an essential milestone you'll need to achieve. Staff-level leaders don't necessarily do 2–10 times the direct work of a senior individual contributor. Instead, they identify the most impactful opportunities, remove critical roadblocks, or improve infrastructure, making their teams significantly more productive. For example, if frequent production bugs constantly stall your team's progress, implementing critical automated testing can be a significant multiplier. This ensures improvements can safely ship and saves hours of debugging later. Another scenario could be a situation where shifting requirements frequently force team members to redo their work repeatedly. One individual who effectively communicates with stakeholders and creates comprehensive specifications before the team's efforts begin can save many hours of redundant work. Force multiplier efforts increase everyone's efficiency. Imagine you reduce your teammates' task times by 5 minutes each, and they complete around 3 tasks per day. Across a 5-person team, you've effectively added up to **375 working days per year** to your team's productivity! Identifying possible multipliers can be challenging since they're often highly specific to a particular context or team dynamic. However, common areas worth exploring include: * Overly complex processes or architecture forcing team members to duplicate effort across multiple areas, introducing errors. * Lack of robust quality checks that slow team efforts due to manual testing and debugging cycles. * Outdated or inefficient tools complicating maintenance and hindering effective troubleshooting or productivity. * Non-technical process inefficiencies, like unclear task definitions or ineffective planning meetings which add unnecessary overhead. The greater the number of people impacted by your improvement, the stronger the multiplier effect. Look proactively for gaps within your team's workflow or processes. Your manager can help you spot these bottlenecks, but as someone actively involved in day-to-day work, you're perfectly placed to notice inefficiencies firsthand. Asking your teammates directly what they like or dislike about the project can also yield valuable insights. Strive to progressively take on additional responsibilities through incremental trust-building with your manager and teammates. Promotions to Staff+ roles appear to happen overnight but take months of gradual progression. Like many others, I started as an individual contributor at a growing startup, and my responsibilities gradually shifted—from technical tasks to hiring, mentoring, planning, and client interaction. Eventually, I found myself spending most of my time guiding and leading rather than executing individually. When it became clear that I was essentially doing the higher-level work without official recognition, I approached my senior leadership about it. Shortly afterward, I received the official recognition and title, along with greater strategic responsibilities. Finally, even if promotion opportunities don't happen in your current company, the skills and experiences you develop make excellent talking points when interviewing elsewhere. Thanks for reading Jampa.dev! Subscribe for free to receive new posts.

0 views
Jampa.dev 1 years ago

Becoming a force multiplier

One of the most impactful things you can do is aim to become a force multiplier. This concept is under-noticed among engineers but can make a sizable difference in a project. Force multiplier work is one that, when delivered, will allow other people to deliver faster. Whether you want to be a Staff Engineer or an Engineering Manager, this is the most crucial bar you need to clear. A staff engineer or an architect does not do 2-10x the work of senior engineers. Still, they make teams 2-10x more productive by working on areas that will improve productivity. Force multiplier work is one that, when delivered, will allow other people to deliver faster. If you make a specific improvement that reduces by 5 minutes of your engineer's time in each task, and if they perform about 3 tasks in a working day. In a team with 5 people, you just gave your team 375 days of work in a year . There is no 10x engineer, but there is work done by one that can have a 10x impact. For example, a team impacted by constantly shipped bugs leads an engineer to create critical automated testing. This results in fewer bugs breaking production and speeds up every engineer on the team by reassuring them that their tickets will not break when shipping, also saving hours of debugging. Another case would be a team that has to work back and forth frequently because new technical requirements constantly change. However, one engineer can produce excellent technical specifications by communicating effectively with stakeholders before writing the first line of code, saving the team hours of rewriting code. The larger the number of people impacted by a positive change, the more critical the multiplier aspect is. Not every change needs to be so dramatic, but some can be even more productive, especially the ones that remove a communication overhead. Seeking opportunities Force multipliers are difficult to detect because they are specific to a team and codebase. But as a general rule, good potential candidates are bottlenecks and inefficiencies that drive daily aspects of the engineer's work, for example: Overengineered code architecture that requires the engineer to write duplicated code across codebases, with some occasional bugs. Quality control gaps, especially in automation, make the engineers take more time testing their tickets or shipping bugs requiring new tickets to fix. Outdated or reinvent-the-wheel code that introduces a lot of bugs, requires constant attention and makes it hard to find solutions to problems online due to documentation only supporting newer versions. There are also non-technical cases, such as proposing improvements to the shipping process, such as ticket structure and ceremonies, to avoid additional meeting overhead. Therefore, look for gaps in your team. Your manager can help you identify those, but as someone working actively in the code, you are privileged to know how things are. Asking your fellow engineers what they dislike but have to spend their time on the project is also a good way to get that information. I am writing an open-source book for engineering managers (EMs) seeking to enhance their skills and senior engineers aspiring to transition into an EM role. If that is your thing, check out the Github Subscribe for free to receive new posts and support my work. Seeking opportunities Force multipliers are difficult to detect because they are specific to a team and codebase. But as a general rule, good potential candidates are bottlenecks and inefficiencies that drive daily aspects of the engineer's work, for example: Overengineered code architecture that requires the engineer to write duplicated code across codebases, with some occasional bugs. Quality control gaps, especially in automation, make the engineers take more time testing their tickets or shipping bugs requiring new tickets to fix. Outdated or reinvent-the-wheel code that introduces a lot of bugs, requires constant attention and makes it hard to find solutions to problems online due to documentation only supporting newer versions. There are also non-technical cases, such as proposing improvements to the shipping process, such as ticket structure and ceremonies, to avoid additional meeting overhead.

0 views
Jampa.dev 1 years ago

Google AI tools are surprisingly underrated

Google has a problem releasing: they start by announcing the product, which generates a lot of hype, but all we get is a landing page and a paper. After a few months, when people stop caring, Google quietly releases the tool… but it is a slow rollout in the US only. 🤦 Compare this with the OpenAI strategy. They create hype long before releasing, then casually drop it with a closed beta, with a public release right after. It's also Google's fault that nobody follows its tools—the names change occasionally: They invested millions in Bard just to change to Gemini. Gemini also has multiple tiers and versions with caveats, like “1.5 Pro” is better than “1.0 Ultra”. There are multiple tiers and versions, like 1.5 Pro 002—the last number being extra padded with 2 zeros means more will come to confuse everyone. A “Bard” ad I saw in Shibuya, Japan. So much money wasted on brand recognition… So, why should you care about them? Their AI is far from capable like ChatGPT 4o's — Gemini feels more like GPT 3.5. Well, because it excels in the needle-in-haystack problems. ChatGPT fumbles a lot of its tokens. It theoretically gives you 16k tokens in GPT-4o, which seems like a lot, but it tends to forget the earlier tokens, the more tokens you feed it. This is probably because it might use a “rolling-window” approach, so it does not consider those earlier tokens as much as it should, but you still pay for all of them. If you want to use Gemini tools, you must also use different websites with generic names that will probably change twice before the project gets killed. The tools are  Notebook LLM  and  AI Studio . Not to be confused with  LLM Studio , a popular FOSS tool not made by Google — I told you it was confusing. Notebook LLM sells itself short. It claims to be a tool for studying and brainstorming ideas using your documents. They also claim they can turn documents into an AI podcast(?). At first, it seems like things a student would use to cheat help with their exams. But the tool shines by being an excellent “ Google Search for your documents .” If you feed it with large documents, it can retrieve information and make interesting critiques. For example, I am using this tool extensively for my new engineering manager book. Having the AI do extensive critiques with sources to back them up helps me write better and not lose my agency as an author by turning things into slop. I don't want to use GenAI since it only gives generic advice from SEO-hungry websites. Using NotebookLLM means I am still in control of my writing. Unlike ChatGPT, which is always positive, feedback from Notebook LLM provides a good critique that is sometimes very humbling. AI Studio What Notebook LLM can do for large documents, AI Studio can do for large videos. It is very useful to extract details from any media you have and seems to be the only tool to do so. I imagine this tool being a game changer for people who use video to document things, like scouts for filming locations or real estate agents looking to write a pitch. I sent 3 videos I made for later reference when I visited a house. I asked them to write a pitch for those videos. One thing that amazed me is that it saw a grapevine and used it for the pitch! Overhyped pitch aside, it was impressive how well it gathered features from the property. I've been using this tool for transcribing and annotating video evidence. My house inspection is 17 minutes long, and there was no way I was ever going back to watch the whole thing again. This transcription helps a lot to get the action points. Something is very strange with it, though: Since the Lite mode worked so well. I thought the “Pro” model would work even better, but surprisingly, it was way worse. I don't have an explanation for this. Gemini Flash 1.5: It was surprisingly good despite its verbosity… So the Gemini Pro 1.5 should be even better, right? Gemini 1.5 Pro: Not only did it take 1:20 minutes to run this, but the output is more incomplete than the Flash version, and it triggered a content warning for no reason. Another limitation from Google is that the model is fine-tuned towards safety . Some very casual videos I've uploaded triggered the warning that I might have been breaking the ToS. It also uses the word “diverse” a lot; it probably means that, in its internal token, it should focus on diversity, but it starts using this word for everything from “diversity of rooms” to “diverse problems.” It uses this word like ChatGPT uses “delve.” Gemini is far from beating ChatGPT and perhaps even Claude Sonnet. They are playing safe with what they have, but we should not count them out of the game. Those tools are currently free, and NotebookLLM does not train on your data. With ChatGPT, segmenting these documents/videos into tokens/images is prohibitively expensive. The thing left for us to decide is how long "free” will last… If you are interested in the engineering manager book, I will send you the first chapters for free. These chapters will include the best and worst parts, how to succeed in the manager interview, and how the role differs in every company. If you provide feedback, I will send you the digital version of the book for free once it is released!

0 views
Jampa.dev 1 years ago

Wardriving for a place to live

I moved into a small city where I don't know anyone, and now I want to buy a place. Problem is: there are no listings anywhere, almost no real estate agents, and no updated information on the web. In a city where you have to call to order deliveries, and Google Maps won't give you their number, how can I find a place to live? Moving to a high-quality neighborhood here would cost way less than in a normal one in a big city, so searching for it would be worth it, especially for a first-time buy By driving around, I see there are a lot of properties "for sale" announced in a sign with a number and without a real estate agency attached. I could find one by just driving it. But the city is big, with too many streets to go, and too many unmapped dead ends. It has expanded organically. And the high and low infrastructure places are just one street apart. The city is a colonial-era city where the main export was/is sugarcane, near a major highway, with a lot of dead-ends with no urban planning. With Street View, you can see the “good parts” of the town. But that is still too much work and honestly, I can barely remember where I "drove" later.  If there was a way to "cull" neighborhoods I would know where the potential home places are without driving into dangerous places. Afterward, I can narrow my search to only those places.  Get all streets in the city area and points equally distributed along the road. Get all the Street View pictures Make GPT4 the judge of the neighborhoods. Plot on a map! I did not bother with Google Maps at first because it just had too many "false" streets, and would probably cost something. It would take years of my time to download the 30MB on JS bundle of the Google Cloud Console and navigate their menus. So I asked ChatGPT to generate code for me using OpenStreetMaps. Not gonna lie, I didn't even want to bother reading the code it made: "I just want the .exe" I said. But the ChatGPT got horribly wrong and wouldn't bother fixing it. So like a caveman, I had to read it and fix it.  For the roads I marked the points at every 70m, in imperial units that is 0.76 football fields, I had a “bug” that every point at the intersection on the road was marked. But it was due that the intersection is the "beginning of the road". Duh! It is super duper expensive now to use Google APIs. And Playwright is easier than reading Google documentation, so I am cashing the favor for contributing long enough to Google Maps.  One huge tip is to make the dimension of the image a multiple of 512 for OpenAI, a 512x512 image costs 4x less than a 513 x 513 one . It says in the documentation that it splits the image in 512 x 512 chunks, or 85 tokens each, we do 1024x512, so it is wide and only 2 tiles! Made sure to capture one angle along the road and the other inverted by 180 to avoid bias. Separately the first image ranks higher than the second despite it being 2 angles of the same point. After the images were downloaded I spent hours promptly writing and got mad that the first short draft I made was the best one. GPT4 gets biased if you give too many details, like if it is Brazil mentioned, it starts to sugarcoat how bad the street is: "This precarious street without sidewalk […] is typical of a Brazilian suburb - I give 4 / 5 " - GPT with more context. Was that a roast? Was the AI being ironic? If you add what factors to consider to the prompt, it will hyperfocus on looking for a positive aspect in the input. A street with an abandoned lot was given credit for its “greenery”. Probably ChatGPT has instructions for it to be positive with what you give in the prompt and this case is not ideal. Another good surprise is that the new structured output is awesome, you don’t need to YELL at the AI to not over-explain things, or safeguard it, and it spared a lot of time not calibrating the prompt. Last thing, there are some loose spots, where there are some few good houses in an overall bad street. Those are not ideal candidates, I used something similar to KNN to remove that "noise". Yes, I made a O(n**2) code since my time is worth more than the energy that my Mac will use. Overall it was great enough, looking at the rating and the images in a vacuum like the AI did I would rate them the same as well. There were no disagreements between me and AI, which would invalidate the whole thing. I also removed the beginning of the street in my code and removed nearby points to save some tokens. Here is the final thing, using QGIS and TIM interpolation among points: Insights The good news was every place got at least 1, despite telling it to rate from 0-5, there were no legit zeros given. Even on muddy roads with large vegetation. But that was a good thing, it independently decided to reserve 0 to "not found" images and I could easily filter those out. The bad news was that no place got a 5, the one that got a 4 was removed by the KNN, which is good for the quality of the rating but bad news for the city overall. I think a 4-5 would be a posh American suburbia with an HOA and lots of grass lawns and a large asphalt road with no overhead wires, and we have that here. The only place that got a 4 was because someone left their garage door open. The AI saw a pool and thought: “Hmmm, yes, very upscale”. A surprise was how consistent it was to judge according to its own parameters, all the prompting was done in isolation, but remained the same across nearby road spots. There were rarely sudden jumps between 1 to 3 for example. Another surprise was that the AI loved parks, it could have the saddest playground ever, if there was a park it got at least a 1-point bump, but that makes sense, a park is a “permanent view" of some sort. Codepen from another run . Ignore Google Warning modal. What's next Now I need to visit those places and drive around to see what is there to sell, next steps might be getting a 360 camera, putting it on the roof of the car, a lá street view, and taking pictures and looking for "for sale signs". Stay tuned! Subscribe now The city is a colonial-era city where the main export was/is sugarcane, near a major highway, with a lot of dead-ends with no urban planning. With Street View, you can see the “good parts” of the town. But that is still too much work and honestly, I can barely remember where I "drove" later.  If there was a way to "cull" neighborhoods I would know where the potential home places are without driving into dangerous places. Afterward, I can narrow my search to only those places.  The plan(TM): Get all streets in the city area and points equally distributed along the road. Get all the Street View pictures Make GPT4 the judge of the neighborhoods. Plot on a map!

0 views