GreatReads - Blog Aggregator · Phoenix Framework

DevOps

0 views

Kaushik Gopal 3 weeks ago

Podsync - I finally built my podcast track syncer

I host and edit a podcast 1 . When recording remotely, we each record our own audio locally (I on my end, my co-host on his). The service we use (Adobe Podcast, Zoom, Skype-RIP) captures everyone together as a master track. But the quality doesn’t match what each person records locally with their own microphone. So we use that master as a reference point and stitch the individual local tracks together. This is what the industry calls a “ double-ender ”. Add a guest and it becomes a “triple-ender”. But this gets hairy during editing. Each person starts their recording at a slightly different moment — everyone hits record at a different time. Before I can edit, I need to line everything up. Drop all the tracks into a DAW, play the master alongside each individual track, nudge by ear until the speech aligns. Add a guest and it gets tedious fast. 10–15 minutes of fiddly, ear-straining alignment before I’ve even started editing. There’s also drift. Each machine’s audio clock runs at a slightly different rate, so two tracks that are perfectly aligned at minute one might be 200ms apart by minute sixty. So I built PodSync 2 . I first heard of a similar technique from Marco Arment — back in ATP episode 25 . He had a new app for aligning double-ender tracks and was already thinking about whether something so niche was even worth releasing publicly. I don’t think he ever released it. Being a Kotlin developer at the time, I figured I’d build my own. Java was mature. Surely there were audio processing libraries that could handle this. There weren’t 😅. At least not in any clean, usable form. Getting the right signal processing pieces together in JVM-land was awkward enough that my interest fizzled, so I kept doing it by hand. When I revamped Fragmented , I finally came back to this. I used Claude to help me build it — in Rust, no less. 3 But before you chalk this up to another vibecoded project, hear me out. The interesting part here wasn’t just that AI made it easier. It was thinking through the actual algorithm: Voice activity detection ( VAD ) to find speech regions. MFCC features to fingerprint the audio. Cross-correlation to find where the tracks match. Some real signal processing techniques, not just prompt engineering. Now, could I have prompted my way to a solution? Probably. But I like to think, years of manually aligning tracks — and some sound engineering intuition — helped me steer AI towards a better solution. Working on this felt refreshing. In an era where half the conversation is about AI replacing engineering work, here’s a problem where the hard part is still the problem itself — understanding the domain, picking the right approach, knowing what “correct” sounds like. It gives me confidence that solving real problems well still has its place. I like how Dax put it: thdxr on twitter I really don’t care about using AI to ship more stuff. It’s really hard to come up with stuff worth shipping. The core idea: take a chunk of speech from a participant track, compare it against the master recording, find where they match best. That position is the time offset. The trick is picking which chunk of speech to use. Rather than betting on a single region, Podsync finds a few strong candidates per track (longer contiguous speech blocks preferred) and tries each one against the master. For long candidates, it samples from the start, middle, and end. The highest-confidence match wins; if a second independent region agrees on the same offset, that corroboration factors in as a tie-breaker. After finding the offset, Podsync pads or trims each track to align with the master and match its length (and outputs some info on the offset). Drop the output into my DAW at 0:00. Done. I even wrote an agent skill you can just point your agent harness to and it will take care of all the steps for you : What used to be 10–15 minutes of alignment per episode is now a single command. Marco, if you ever read this, would still love to see your implementation! His solution (as I understand) is aimed more at correcting the drift vs getting the offset right. In practice, I haven’t found drift to be much of a problem. It exists but stays minor, and I’m typically editing every second of the podcast anyway so it’s easy enough to handle by hand. I even had a branch that corrected drift by splicing at silence points, but it complicated things more than it helped. It’s a podcast on AI development but we strive to make it high signal. None of that masturbatory AI discourse . ↩︎ See also Phone-sync . ↩︎ I chose Rust (it’s what interests me these days ) and a CLI tool with no runtime dependency is more pleasant to distribute. ↩︎ It’s a podcast on AI development but we strive to make it high signal. None of that masturbatory AI discourse . ↩︎ See also Phone-sync . ↩︎ I chose Rust (it’s what interests me these days ) and a CLI tool with no runtime dependency is more pleasant to distribute. ↩︎

Java

Rust

Kotlin

0 views

Kaushik Gopal 4 weeks ago

Here’s my list of reasons for using Opencode

Here’s my list of reasons for using Opencode . I’m often experimenting with the bleeding edge models as they come out. I actively switch between models for tasks and I use them all enough where I can tell the difference. Opencode lets me switch between models mid-task or mid-conversation. Fluidly. I wrote about this and agentic fluidity in more detail but tldr: Opencode has the client/server architecture baked in. So I can just start an opencode server on one machine, expose it through and start using it on my phone or other machines. I talked about this on my podcast in some detail but Opencode has the best implementation of subagents and modes. You can switch to a subagent definition as your primary mode, then operate other subagents from there. It makes orchestrator-type tasks super easy. I love that OpenCode is opinionated about their UX. They don’t try to be Claude Code or Codex. In the process they have some really nice UX patterns like a sidebar with ongoing file changes, context/cost, MCPs connected etc. It’s the first time I’ve not needed to worry about a custom statusline.sh or building one. The plugin ecosystem is highly customizable. To the point where you can add new features, integrate with external services or even modify OpenCode’s default behavior. The wonderful Jesse Vincent mentioned this to me when I was stupidly contemplating a fork. It’s not all rainbows and sunshine. Anomaly — the team behind OpenCode — is small . Which sometimes shows, because there’s definitely bug s and some features missing . But I will say… none that’s deterred me from using it for the last two months, exclusively . Go give it a shot . Many of the serious AI coders I know are really liking it and switching.

0 views

Kaushik Gopal 1 months ago

Agentic Fluidity - OpenCode is OpenClaw for coding

One of the reasons OpenClaw got so popular was how fluidly you can chat with and operate your agents. Pull up your phone, send a quick message on WhatsApp, and you’re in business. As we focus more on agent orchestration 1 in 2026, I think an important aspect will be access fluidity . How do you hop into your agent’s context from any device, terminal, or IDE and just start coding? Claude Code supports this in a limited way, while others like Cursor and Codex take a cloud-based approach. The best option I’ve found for this “on-the-go” agentic coding is an open-source one — OpenCode. OpenCode - your best “on-the-go” option for agentic coding. OpenCode uses a native server-client architecture. You can simply spin it up in a regular terminal tab, just like or . But the power move is running it as a server and connecting multiple clients. A client can be your terminal tab, a mobile device, or a desktop computer. Each terminal tab becomes a new, isolated CLI session that connects to the server. Couple this with Tailscale , and you can securely connect to a dev machine running an OpenCode server from anywhere. I’d start by using like a regular CLI tool. Once it feels familiar, switch to server/web mode. The beauty is you can open that URL in any browser, and it’s fully synced. Credit to my co-host Iury for tooting the OpenCode horn early, and my Instacart colleague Spencer for questioning my luddite tmux ways. 2 I’ll write a future post singing OpenCode’s other praises. For now, if you’re exploring the bleeding edge of agent access fluidity, don’t sleep on it. See my post on AI paradigms . ↩︎ I noticed some memory leaks when using tmux sessions with OpenCode, and Spencer asked me: why not lean on the server-client model more and use regular Ghostty tabs and splits. ↩︎ See my post on AI paradigms . ↩︎ I noticed some memory leaks when using tmux sessions with OpenCode, and Spencer asked me: why not lean on the server-client model more and use regular Ghostty tabs and splits. ↩︎

0 views

Kaushik Gopal 3 months ago

AI model choices 2026-01

Which AI model do I use? This is a common question I get asked, but models evolve so rapidly that I never felt like I could give an answer that would stay relevant for more than a month or two. This year, I finally feel like I have a stable set of model choices that consistently give me good results. I’m jotting it down here to share more broadly and to trace how my own choices evolve over time. GPT 5.2 (High) for planning and writing, including plans Opus 4.5 for anything coding, task automation, and tool calling Gemini ’s range of models for everything else: Gemini 3 (Thinking) for learning and understanding concepts (underrated) Gemini 3 (Flash) for quick fire questions Nano Banana (obv) for image generation NVIDIA’s Parakeet for voice transcription

Machine Learning

0 views

Kaushik Gopal 3 months ago

Wi-Fi sharing is a killer Android feature

Ubiquiti announced a new travel router . Much of the internet is excited. So am I. Then I tried to remember the last time I actually needed a travel router. You see, Android has supported a feature I’ll call Wi-Fi sharing for years. 1 Your phone connects to an existing Wi-Fi network and re-shares it as a hotspot. This might sound like a regular hotspot feature that most phones (including the iPhone) come with. But it’s not. iPhones can share mobile data. They can’t re-share a Wi-Fi connection as a hotspot. Wi-Fi sharing Your phone connects to Wi-Fi, and then re-shares that same Wi-Fi as a hotspot. This is different from typical hotspot functionality where the phone shares its mobile data connection (vs Wi-Fi). Neat trick, but why bother? Can’t you just connect each device to Wi-Fi? Captive portals are annoying when you’re carrying multiple devices. I typically travel with 3-4 devices that want internet. Signing each one in, every time, gets old fast. Some devices are worse: Chromecast and Fire TV sticks are particularly painful to get past captive portals. If everything connects to your hotspot, you only deal with the portal once. 2 On a plane, I sometimes want both my laptop and phone online. Some paid Wi-Fi plans only allow one device at a time. Unless you’re ok paying twice, Wi-Fi sharing is simpler. 3 Hotels and conference centers do the same: sign-in plus device limits. Wi-Fi sharing works around it. This one is less obvious, but common in hotels and conference Wi-Fi: your devices have internet, but they can’t see each other locally. Chromecast (or printers) won’t show up as a cast target because it doesn’t appear on the network. That’s usually client/AP isolation. 4 Put your devices on your phone’s hotspot, and local discovery usually works again. This is slightly advanced. With a Tailscale setup and an explicit exit node, you basically have a private VPN. 5 On phones where hotspot traffic routes through that VPN, you only have to set it up on your Android phone, and every device that connects to your phone gets the same “safe” path out. If I have to log in to bank accounts when roaming or connecting to “free” Wi-Fi, this helps me feel safer knowing the local network can’t see or tamper with the contents of my traffic. 6 I should pause my gloating over iPhones for a second: a few Android devices may not support this feature. The Android OS has Wi-Fi sharing baked in, but it still requires hardware + driver support. Notable exceptions include the Pixel 7a, the Pixel 8a, and yes the (first generation) Pixel Fold. Wi-Fi sharing requires Wi-Fi hardware (chipset + drivers) that can run as both a client and an access point at the same time (STA + AP). 7 Chipsets can implement this in a few ways (DBS, SBS, MCC, SCC). 8 Android doesn’t mandate one mode; it depends on the Wi-Fi chipset. DBS/SBS use multiple radios, so the phone can keep the upstream connection and hotspot truly simultaneous (for example, 5 GHz upstream and a 2.4 GHz hotspot). MCC/SCC share a radio, so the hotspot either stays on the same channel (SCC) or the radio hops channels (MCC). If a phone can’t do STA + AP concurrency well (or at all), OEMs disable Wi-Fi sharing (which is why some phones and many older devices don’t support it). Travel routers still have their place: Ethernet ports, better radios, and an always-on box you can run a VPN on. But if you’re on Android and your phone supports Wi-Fi sharing, you already have the core trick. Android doesn’t call it this in Settings, but it’s the best term I have for “connect to Wi-Fi, then share that Wi-Fi as a hotspot”. In strict networking terms, this isn’t L2 bridging; it’s typically tethering (routing/NAT) with a Wi-Fi upstream. ↩︎ This works because the captive portal only sees your phone; everything else is NATed behind it. ↩︎ Thank you Delta for being one of the few US domestic airlines that don’t place this restriction. Looking at you United. ↩︎ Hotel and conference Wi-Fi often blocks device-to-device traffic on purpose (“client isolation”) so guests can’t discover, scan, or connect to each other’s devices. Your phone’s hotspot creates a separate little LAN, so your devices can talk to each other again. ↩︎ I have a post in the making about this: “With Tailscale you don’t need to pay for a VPN”. ↩︎ HTTPS encrypts the bank session, but open Wi-Fi is still untrusted: a malicious access point can tamper with DNS and try to steer you into phishing. A VPN (or Tailscale exit node) reduces the surface area by encrypting your traffic to a trusted endpoint. ↩︎ Modern devices support AP (Access Point) + STA (Station) Mode, letting them act as both a client to one network and a hotspot for others, allowing Wi-Fi extension or tethering. ↩︎ Definitions from Android’s Wi-Fi vendor HAL ( ): DBS (Dual Band Simultaneous), SBS (Single Band Simultaneous), MCC (Multi Channel Concurrency), SCC (Single Channel Concurrency). ↩︎ Connect your Android phone to the Wi-Fi network you want to share. If it’s behind a captive portal, sign in as needed. Go to Settings → Hotspot & tethering → Wi-Fi hotspot (wording varies) and turn it on. Typically, if your phone does not support Wi-Fi sharing, it will disable Wi-Fi. Some OEMs show a separate toggle to enable Wi-Fi sharing. On Pixel phones, it’s automatic. Android doesn’t call it this in Settings, but it’s the best term I have for “connect to Wi-Fi, then share that Wi-Fi as a hotspot”. In strict networking terms, this isn’t L2 bridging; it’s typically tethering (routing/NAT) with a Wi-Fi upstream. ↩︎ This works because the captive portal only sees your phone; everything else is NATed behind it. ↩︎ Thank you Delta for being one of the few US domestic airlines that don’t place this restriction. Looking at you United. ↩︎ Hotel and conference Wi-Fi often blocks device-to-device traffic on purpose (“client isolation”) so guests can’t discover, scan, or connect to each other’s devices. Your phone’s hotspot creates a separate little LAN, so your devices can talk to each other again. ↩︎ I have a post in the making about this: “With Tailscale you don’t need to pay for a VPN”. ↩︎ HTTPS encrypts the bank session, but open Wi-Fi is still untrusted: a malicious access point can tamper with DNS and try to steer you into phishing. A VPN (or Tailscale exit node) reduces the surface area by encrypting your traffic to a trusted endpoint. ↩︎ Modern devices support AP (Access Point) + STA (Station) Mode, letting them act as both a client to one network and a hotspot for others, allowing Wi-Fi extension or tethering. ↩︎ Definitions from Android’s Wi-Fi vendor HAL ( ): DBS (Dual Band Simultaneous), SBS (Single Band Simultaneous), MCC (Multi Channel Concurrency), SCC (Single Channel Concurrency). ↩︎

Mobile

Hardware

Android

0 views

Kaushik Gopal 4 months ago

Combating AI coding atrophy with Rust

It’s no secret that I’ve fully embraced AI for my coding. A valid concern ( and one I’ve been thinking about deeply ) is the atrophying of the part of my brain that helps me code. To push back on that, I’ve been learning Rust on the side for the last few months. I am absolutely loving it. Kotlin remains my go-to language. It’s the language I know like the back of my hand. If someone sends me a swath of Kotlin code, whether handwritten or AI generated, I can quickly grok it and form a strong opinion on how to improve it. But Kotlin is a high-level language that runs on a JVM. There are structural limits to the performance you can eke out of it, and for most of my career 1 I’ve worked with garbage-collected languages. For a change, I wanted a systems-level language, one without the training wheels of a garbage collector. I also wanted a language with a different core philosophy, something that would force me to think in new ways. I picked up Go casually but it didn’t feel like a big enough departure from the languages I already knew. It just felt more useful to ask AI to generate Go code than to learn it myself. With Rust, I could get code translated, but then I’d stare at the generated code and realize I was missing some core concepts and fundamentals. I loved that! The first time I hit a lifetime error, I had no mental model for it. That confusion was exactly what I was looking for. Coming from a GC world, memory management is an afterthought — if it requires any thought at all. Rust really pushes you to think through the ownership and lifespan of your data, every step of the way. In a bizarre way, AI made this gap obvious. It showed me where I didn’t understand things and pointed me toward something worth learning. Here’s some software that’s either built entirely in Rust or uses it in fundamental ways: Many of the most important tools I use daily are built with Rust. Can’t hurt to know the language they’re written in. Rust is quite similar to Kotlin in many ways. Both use strict static typing with advanced type inference. Both support null safety and provide compile-time guarantees. The compile-time strictness and higher-level constructs made it fairly easy for me to pick up the basics. Syntactically, it feels very familiar. I started by rewriting a couple of small CLI tools I used to keep in Bash or Go. Even in these tiny programs, the borrow checker forced me to be clear about who owns what and when data goes away. It can be quite the mental workout at times, which is perfect for keeping that atrophy from setting in. After that, I started to graduate to slightly larger programs and small services. There are two main resources I keep coming back to: There are times when the book or course mentions a concept and I want to go deeper. Typically, I’d spend time googling, searching Stack Overflow, finding references, diving into code snippets, and trying to clear up small nuances. But that’s changed dramatically with AI. One of my early aha moments with AI was how easy it made ramping up on code. The same is true for learning a new language like Rust. For example, what’s the difference 2 between these two: Another thing I loved doing is asking AI: what are some idiomatic ways people use these concepts? Here’s a prompt I gave Gemini while learning: Here’s an abbreviated response (the full response was incredibly useful): It’s easy to be doom and gloom about AI in coding — the “we’ll all forget how to program” anxiety is real. But I hope this offers a more hopeful perspective. If you’re an experienced developer worried about skill atrophy, learn a language that forces you to think differently. AI can help you cross that gap faster. Use it as a tutor, not just a code generator. I did a little C/C++ in high school, but nowhere close to proficiency. ↩︎ Think mutable var to a “shared reference” vs. immutable var to an “exclusive reference”. ↩︎ fd (my tool of choice for finding files) ripgrep (my tool of choice for searching files) Fish shell (my shell of choice, recently rewrote in Rust) Zed (my text/code editor of choice) Firefox ( my browser of choice) Android?! That’s right: Rust now powers some of the internals of the OS, including the recent Quick Share feature. Fondly referred to as “ The Book ”. There’s also a convenient YouTube series following the book . Google’s Comprehensive Rust course, presumably created to ramp up their Android team. It even has a dedicated Android chapter . This worked beautifully for me. I did a little C/C++ in high school, but nowhere close to proficiency. ↩︎ Think mutable var to a “shared reference” vs. immutable var to an “exclusive reference”. ↩︎

Rust Bash

C++

0 views

Kaushik Gopal 5 months ago

Go with monthly AI subscriptions friends

Go with monthly AI subscriptions friends. I can’t remember where I read this tip, but given how fast the AI lab models move, it’s smarter to stick with a monthly plan instead of locking into an annual one, even if the annual price looks more attractive. I hit a DI issue on Android and was too lazy to debug it myself, so I pointed two models at it. GPT Codex gave me the cleanest, correct fix. Claude Sonnet 4.5 found a fix, but it wasn’t idiomatic and was pretty aggressive with the changes. A month ago, I wouldn’t have bothered with anything other than the Claude models for coding. Today, Codex clearly feels ahead. Google is about to ship its next Gemini model and, from what I’m hearing, it’s going to be absurdly good. In these wonderfully unstable times, monthly subscriptions are the way to go.

0 views

Kaushik Gopal 5 months ago

Firefox + UbO is still better than Brave, Edge or any Chromium-based solution

I often find myself replying to claims that Brave, Edge, or other Chromium browsers effectively achieve the same privacy standards as Firefox + uBlock Origin (uBO). This is simply not true. Brave and other Chromium browsers are constrained by Google’s Manifest V3. Brave works around this by patching Chromium and self-hosting some MV2 extensions, but it is still swimming upstream against the underlying engine. Firefox does not have these MV3 constraints, so uBlock Origin on Firefox retains more powerful, user-controllable blocking than MV3-constrained setups like Brave + uBO Lite. Brave is an excellent product and what I used for a long time. But the comparison often ignores structural realities. There are important nuances that make Firefox the more future-proof platform for privacy-conscious users. The core issue is Manifest V3 (MV3). This is Google’s new extension architecture for Chromium (what Chrome, Brave, and Edge are built on). Under Manifest V2, blockers like uBO used the blocking version of the API ( + ) to run their own code on each network request and decide whether to cancel, redirect, or modify it. MV3 deprecates that blocking path for normal extensions and replaces it with the (DNR) API: extensions must declare a capped set of static rules in advance, and the browser enforces those rules without running extension code per request. This preserves basic blocking but, as uBO’s developer documents, removes whole classes of filtering capabilities uBO relies on. And Google is forcing this change by deprecating MV2 . Yeah, shitty. To get around the problem, Brave is effectively swimming upstream against its own engine. It does this in two ways: They wrote a great post about this too. Brave is doing a great job, but it is operating with a sword of Damocles hanging over it. The team must manually patch a hostile underlying engine to maintain functionality that Firefox simply provides out of the box. A lot of people also say, wait, we now have “uBlock Origin Lite” that does the same thing and is even more lightweight! It is “lite” for a reason. You are not getting the same blocking safeguards. uBO Lite is a stripped-down version necessitated by Google’s API restrictions. As detailed in the uBlock Origin FAQ , the “Lite” version lacks in the following ways: uBlock Origin is widely accepted as the most effective content blocker available. Its creator, gorhill, has explicitly stated that uBlock Origin works best on Firefox . So while using a browser like Brave is better than using Chrome or other browsers that lack a comprehensive blocker, it is not equivalent to Firefox + uBlock Origin. Brave gives you strong, mostly automatic blocking on a Chromium base that is ultimately constrained by Google’s MV3 decisions. Firefox + uBlock Origin gives you a full-featured, user-controllable blocker on an engine that is not tied to MV3, which matters if you care about long-term, maximum control over what loads and who sees your traffic. Native patching: It implements ad-blocking (Shields) natively in C++/Rust within the browser core to bypass extension limitations. Manual extension hosting: Brave now has to manually host and update specific Manifest V2 extensions (like uBO and AdGuard) on its own servers to keep them alive as Google purges them from the store. No on-demand list updates: uBO Lite compiles filter lists into the extension package. The resulting declarative rulesets are refreshed only when the extension itself updates, so you cannot trigger an immediate filter-list or malware-list update from within the extension. No “Strict Blocking”: uBO Lite does not support uBlock Origin’s strict blocking modes or its per-site dynamic matrix. With full uBO on Firefox, my setup defines and exposes a custom, per-site rule set that ensures Facebook never sees my activity on other sites. uBO Lite does not let me express or maintain that kind of custom policy; I have to rely entirely on whatever blocking logic ships with the extension. No dynamic filtering: You lose the advanced matrix to block specific scripts or frames per site. Limited element picker: “Pointing and zapping” items requires specific, permission-gated steps rather than being seamless. No custom filters: You cannot write your own custom rules to block nearly anything, from annoying widgets to entire domains.

Rust

C++

Security

1 views

Kaushik Gopal 5 months ago

Cognitive Burden

A common argument I hear against AI tools: “It doesn’t do the job better or faster than me, so why am I using this again?” Simple answer: cognitive burden. My biggest unlock with AI was realizing I could get more done, not because I was faster , but because I wasn’t wringing my brain with needless tedium. Even if it took longer or needed more iterations, I’d finish less exhausted. That was the aha moment that sold me. Simple example: when writing a technical 1 post, I start with bullet points. Sometimes there’s a turn of phrase or a bit of humor I enjoy, and I’ll throw those in too. Then a custom agent trained on my writing generates a draft in my voice. After it drafts, I still review every single word. A naysayer might ask: “Well, if you’re reviewing every single word anyway, at that point, why not just write the post from scratch?” Because it’s dramatically easier and more enjoyable not to grind through and string together a bunch of prepositions to draft the whole post. I’ve captured the main points and added my creative touch; the AI handles the rest. With far less effort , I can publish more quickly — not due to raw speed, but because it’s low‑touch and I focus only on what makes it uniquely me. Cognitive burden ↓. About two years ago I pushed back on our CEO in a staff meeting: “Most of the time we engineers waste isn’t in writing the code. It’s the meetings, design discussions, working with PMs, fleshing out requirements — that’s where we should focus our AI efforts first.” 2 I missed the same point. Yes, I enjoy crafting every line of code and I’m not bogged down by that process per se, but there’s a cognitive tax to pay. I’d even say I could still build a feature faster than some LLMs today (accounting for quality and iterations) before needing to take a break and recharge. Now I typically have 3–4 features in flight (with requisite docs, tests, and multiple variants to boot). Yes, I’m more productive. And sure, I’m probably shipping faster. But that’s correlation, not causation. Speed is a byproduct. The real driver is less cognitive burden, which lets me carry more. What’s invigorated me further as a product engineer is that I’m spending a lot more time on actually building a good product . It’s not that I don’t know how to write every statement; it’s just… no longer interesting. Others feel differently. Great! To each their own. For me, that was the aha moment that sold me on AI. Reducing cognitive burden made me more effective; everything else followed. I still craft the smaller personal posts from scratch. I do this mostly because it helps evolve my thinking as I write each word down — a sort of muscle memory formed over the years of writing here. ↩︎ In hindsight, maybe not one of my finest arguments especially given my recent fervor . To be fair, while I concede my pushback was wrong, I don’t think leaders then had the correct reasoning fully synthesized. ↩︎ I still craft the smaller personal posts from scratch. I do this mostly because it helps evolve my thinking as I write each word down — a sort of muscle memory formed over the years of writing here. ↩︎ In hindsight, maybe not one of my finest arguments especially given my recent fervor . To be fair, while I concede my pushback was wrong, I don’t think leaders then had the correct reasoning fully synthesized. ↩︎

Career

1 views

Kaushik Gopal 5 months ago

Standardize with ⌘ O ⌘ P to reduce cognitive load

There are a few apps on macOS in the text manipulation category that I end up spending a lot of time on. For example: Obsidian (for notes), Zed (text editor + IDE lite), Android Studio & Intellij (IDE++), Cursor (IDE + AI), etc. All these apps have two types of commands that I frequently use: But by default, these apps use ever so slightly different shortcuts. One might use ⌘ P, another might use ⌘ ⇧ P, etc. I’ve found it incredibly helpful to take a few minutes and make these specific keyboard shortcuts the same everywhere. So now I use: This small change has reduced cognitive load significantly. I no longer have to think about which app I’m in, and what the shortcut is for that specific app. Muscle memory takes over, and I can just get things done faster. Highly recommended! Open a specific file or note Open the command palette (or find any action menu) ⌘ O – Open a file/note ⌘ P – Open the command palette (or equivalent action menu)

0 views

Kaushik Gopal 5 months ago

Claude Skills: What's the Deal?

Anthropic announced Claude Skills and my first reaction was: “So what?” We already have , slash commands, nested instructions, or even MCPs. What’s new here? But if Simon W thinks this is a big deal , then pelicans be damned; I must be missing something. So I dissected every word of Anthropic’s eng. blog post to find what I missed. I don’t think the innovation is what Skills does or achieves, but rather how it does it that’s super interesting. This continues their push on context engineering as the next frontier. Skills are simple markdown files with YAML frontmatter. But what makes them different is the idea of progressive disclosure : Progressive disclosure is the core design principle that makes Agent Skills flexible and scalable. Like a well-organized manual that starts with a table of contents, then specific chapters, and finally a detailed appendix, skills let Claude load information only as needed: So here’s how it works: This dynamic context loading mechanism is very token efficient ; that’s the interesting development here. In this token-starved AI economy, that’s 🤑. Other solutions aren’t as good in this specific way. Why not throw everything into ? You could add all the info directly and agents would load it at session start. The problem: loading everything fills up your context window fast, and your model starts outputting garbage unless you adopt other strategies. Not scalable. Place an AGENTS.md in each subfolder and agents read the nearest file in the tree. This splits context across folders and solves token bloat. But it’s not portable across directories and creates an override behavior instead of true composition. Place instructions in separate files and reference them in AGENTS.md. This fixes the portability problem vs the nested approach. But when referenced, the full content still loads statically. Feels closest to Skills, but lacks the JIT loading mechanism. Slash commands (or in Codex) let you provide organized, hyper-specific instructions to the LLM. You can even script sequences of actions, just like Skills. The problem: these aren’t auto-discovered. You must manually invoke them, which breaks agent autonomy. Skills handle 80% of MCP use cases with 10% of the complexity. You don’t need a network protocol if you can drop a markdown file that says “to access GitHub API, use with .” To be quite honest, I’ve never been a big fan of MCPs. I think they make a lot of sense for the inter-service communication but more often than not they’re overkill. Token-efficient context loading is the innovation. Everything else you can already do with existing tools. If this gets adoption, it could replace slash commands and simplify MCP use cases. I keep forgetting, this is for the Claude product generally (not just Claude Code) which is cool. Skills is starting to solve the larger problem: “How do I give my agent deep expertise without paying the full context cost upfront?” That’s an architectural improvement definitely worth solving and Skills looks like a good attempt. Scan at startup : Claude scans available Skills and reads only their YAML descriptions (name, summary, when to use) Build lightweight index : This creates a catalog of capabilities (with minimal token cost); so think dozens of tokens per skill Load on demand : The full content of a Skill only gets injected into context when Claude’s reasoning determines it’s relevant to the current task ✓ Auto-discovered and loaded ✗ Static: all context loaded upfront (bloats context window at scale) ✓ Scoped to directories ✗ Not portable across folders; overrides behavior, not composition ✓ Organized and modular ✗ Still requires static loading when referenced ✓ Powerful and procedural ✗ Manual invocation breaks agent autonomy ✓ Access to external data sources ✗ Heavyweight; vendor lock-in; overkill for procedural knowledge Token-efficient context loading is the innovation. Everything else you can already do with existing tools. If this gets adoption, it could replace slash commands and simplify MCP use cases. I keep forgetting, this is for the Claude product generally (not just Claude Code) which is cool.

Yaml

0 views

Kaushik Gopal 5 months ago

Cargo Culting

If you’re a software engineer long enough, you will meet some gray beards that throw out-of-left-field phrases to convey software wisdom. For example, you should know if you’re yak-shaving or bike-shedding , and when that’s even a good thing. A recent HN article 1 reminded me of another nugget – Cargo Culting (or Cargo Cult Programming). Cargo Culting : ritualizing a process without understanding it. In the context of programming: practice of applying a design pattern or coding style blindly without understanding the reasons behind it I’m going to take this opportunity to air one of my personal cargo-culting pet peeves, sure to kick up another storm: Making everything small . When I get PR feedback saying “this class is too long, split this!”, I get ready to launch into a tirade: you’re confusing small with logically small – ritualizing line count without understanding cohesion. You can make code small by being terse: removing whitespace, cramming logic into one-liners, using clever shorthand 2 . But you’ve just made it harder to read. A function that does one cohesive thing beats multiple smaller functions scattered across files. As the parable goes, after the end of the Second World War, indigenous tribes believed that air delivery of cargo would resume if they carried out the proper rituals, such as building runways, lighting fires next to them, and wearing headphones carved from wood while sitting in fabricated control towers. While on the surface amusing, there’s sadness if you dig into the history and contributing factors (value dominance, language & security barriers). I don’t think that’s reason to avoid the term altogether. We as humans sometimes have to embrace our dark history, acknowledge our wrongs and build kindness in our hearts. We cannot change our past, but we can change our present and future. The next time someone on your team ritualizes a pattern without understanding it, you’ll know what to call it. Who comes up with these terms anyway? Now that you’re aware of the term, you’ll realize that the original article’s use of the term cargo-cult is weak at best. In HN style, the comments were quick to call this out. ↩︎ You know exactly what I’m thinking of, fellow Kotlin programmers. ↩︎ Now that you’re aware of the term, you’ll realize that the original article’s use of the term cargo-cult is weak at best. In HN style, the comments were quick to call this out. ↩︎ You know exactly what I’m thinking of, fellow Kotlin programmers. ↩︎

Kotlin

0 views

Kaushik Gopal 6 months ago

ExecPlans – How to get your coding agent to run for hours

I’ve long maintained that the biggest unlock with AI coding agents is the planning step. In my previous post , I describe how I use a directory and ask the agent to diligently write down its tasks before and during execution. Most coding agents now include this as a feature. Cursor, for example, introduced it as an explicit feature recently. While that all felt validating, on a plane ride home I watched OpenAI’s DevDay. One of the most valuable sessions was Shipping with Codex . Aaron Friel — credited with record-long sessions and token output — walked through his process and the idea of “ExecPlans.” It felt similar at first, but I quickly realized this was some god-level planning. He said OpenAI would release his PLANS.md soon, but I couldn’t wait. On that flight, with janky wifi, I rebuilt what I could from the talk and grew my baby plan into something more mature — and I was already seeing better results. I pinged Aaron on BlueSky for the full doc, and he very kindly shared the PR that’s about to get merged with detailed information. My god, this thing is a work of art. Aaron clearly spent a lot of time honing it. I’ve tried it on two PRs so far, and it’s working fantastically. I still need to put it through its paces on some larger work projects, but I feel comfortable preemptively calling it the gold standard for planning. I’ve made a few small tactical tweaks to how I use it: This is really a big unlock, folks. Try it now. The latest PLANS.md can be found in Aaron’s PR . Use it as a template in your folder. Then instruct your agent via AGENTS.md to always write an ExecPlan when working on complex tasks. I highly recommend you go watch Aaron’s part of the talk Shipping with Codex . I’ll update this post once it’s merged or if anything changes. Update: I’ve been using this for the last few days (~8 PRs so far) and on an average I’ve definitely gotten my agents to run for much longer successfully (longest was about ~1 hour but frequently >30 mts). This is the way. I instruct the agent to write plans to (works across coding agents) In my AGENTS.md I tell agents to put temporary plans in (which I’ve gitignored) I keep the master Aaron shared at

0 views

Kaushik Gopal 6 months ago

Job Displacement with AI — Software Engineers → Conductors

Engineers won’t be replaced by tools that do their tasks better; they’ll be replaced by systems that make those tasks nonessential. Sangeet Paul Choudary wrote an insightful piece on AI-driven job displacement and a more transformative way to think about it: To truly understand how AI affects jobs, we must look beyond individual tasks to comprehend AI’s impact on our workflows and organizations. The task-centric view sees AI as a tool that improves how individual tasks are performed. Work remains structurally unchanged. AI is simply layered on top to improve speed or lower costs. …In this framing, the main risk is that a smarter tool might replace the person doing the task. The system-centric view, on the other hand, looks at how AI reshapes the organization of work itself. It focuses on how tasks fit into broader workflows and how their value is determined by the logic of the overall system. In this view, even if tasks persist, the rationale for grouping them into a particular job, or even performing them within the company, may no longer hold once AI changes the system’s structure. If we adopt a system-centric view, how does the role of a software engineer evolve 1 ? I’ve had a notion for some time — the role will transform into a software “conductor”. music conductors conducting is the art of directing the simultaneous performance of several players or singers by the use of gesture The tasks a software conductor must master differ from those of today’s software engineer. Here are some of the shifts I can think of: The craft is knowing exactly how much detail to provide in prompts: too little and models thrash; too much and they overfit or hallucinate constraints. You’ll need to write spec-grade prompts that define interfaces, acceptance criteria, and boundaries — chunking work into units atomic enough for clear execution yet large enough to preserve context. Equally critical: recognizing when to interrupt and redirect — catching drift early and steering with surgical edits rather than expensive reruns or loops. You’ll need to design systems that AI can both navigate and extend elegantly. This means clear module boundaries with explicit interfaces, descriptive naming that models can infer purpose from, and tests that double as executable specs. The goal: systems where AI agents can make surgical changes quickly and efficiently without cascading tech debt. We’re moving from building one solution to exploring many simultaneously. This unlocks three levels of experimentation: Feature variants — Build competing product approaches in parallel. One agent implements phone-only authentication while another builds traditional email/password. Both ship behind feature flags. Let users decide which wins. Implementation variants — Build the same feature with different architectures. Redis caching on path A, SQLite on path B. Run offline benchmarks and online canaries to measure which performs better under real load. Personalized variants — Stop looking for a single winner. The most radical shift: each user might get their own variant. Not just enterprise vs consumer, but individual-level personalization where the system learns what works for you specifically. Power users get keyboard shortcuts and dense information; casual users get guided flows with progressive disclosure. Users who convert on social proof see testimonials; analytical users see feature comparisons. AI makes the economics work — what was prohibitively expensive (maintaining thousands of personalized codepaths manually) becomes viable when AI generates, tests, and synchronizes variants automatically. The skill: running rigorous evals, measuring trade-offs with metrics, and orchestrating the complexity of multiple live variants. Every API call has a price, a latency budget, and quality trade-offs. You’ll need to master arbitrage between expensive reasoning models and cheaper models, knowing when to leverage MCPs, local tools, or cloud APIs. Learn how models approach refactors differently from new features or bug fixes, then tune prompts, context windows, and routing strategies accordingly. You’ll need to build golden test sets, trace model runs, classify failure modes, and treat evals like unit tests. Evaluation frameworks with baseline datasets, regression suites, and automated canaries that catch quality drift before production become non-negotiable. Without observability, you can’t iterate safely or validate that changes actually improve outcomes. Framework fluency loses value when AI handles syntax. What matters is depth in three areas: Core computer science fundamentals — Not because AI doesn’t know them, but because you need to verify AI made the right trade-offs for your specific constraints. AI might use quicksort when your dataset is always 10 items. It might optimize a function that runs once a day while missing the N+1 query in your hot path — where you loop through 1000 users making a database call for each instead of batching. Your value is code review with context: catching when AI optimizes for the wrong thing, knowing when simple beats clever, and spotting performance cliffs before they ship. Product judgment — Knowing which problem to solve, not just how to solve it. AI can build any feature you describe, but it can’t tell you whether that feature matters. Understanding user needs, prioritizing ruthlessly, and recognizing when you’re overbuilding becomes the bottleneck. Domain expertise — Deep knowledge of your problem space — whether it’s payments, healthcare, logistics, or graphics. AI can write generic code, but it struggles with domain-specific edge cases, regulations, and the unwritten rules experts know. The more niche your expertise, the harder you are to replace. These are the skills that matter for the next three years. But I don’t have a crystal ball beyond that. At the pace AI is evolving, even conductors might become a role that AI plays better. The orchestration itself could be automated, leaving us asking the same questions about the next evolution. For now, learning to conduct is how we stay relevant. Companies will change how they ship too; but the nearer shift is the individual’s role, so that’s my focus for this post. ↩︎ Companies will change how they ship too; but the nearer shift is the individual’s role, so that’s my focus for this post. ↩︎

Career

0 views

Kaushik Gopal 6 months ago

Sorting Prompts - LLMs are not wrong you just caught them mid thought

Good sensemaking processes iterate. We develop initial theories, note some alternative ones. We then take those theories that we’ve seen and stack up the evidence for one against the other (or others). Even while doing that we keep an eye out for other possible explanations to test. When new explanations stop appearing and we feel that the evidence pattern increasingly favors one idea significantly over another we call it a day. LLMs are no different. What often is deemed a “wrong” response is often4 merely a first pass at describing the beliefs out there. And the solution is the same: iterate the process. What I’ve found specifically is that pushing it to do a second pass without putting a thumb on the scale almost always leads to a better result. To do this I use what I call “sorting statements” that try to do a variety of things Mike Caulfield is someone who cares about the veracity of information. The entire post is fascinating and has painted LLM search results in a new way for me. I now have a Raycast Snippet which expands to this: Already I’m seeing much better results.

0 views

Kaushik Gopal 6 months ago

Build your own /init command like Claude Code

Build your own command Claude’s makes it easy to add clear repo instructions. Build your own and use it with any agent to add or improve on an existing AGENTS.md Here’s the one I came up with . Claude Code really nailed the onboarding experience for agentic coding. Open it, type , and you get a that delivers better results than a repo without proper system instructions (or an ). It’s a clever way to ramp a repo fast. As I wrote last time, it hits one of the three levers for successful AI coding - seeding the right context. Even Codex CLI now comes with a built-in init prompt. There’s no secret 1 sauce: is just a strong prompt that writes (or improves) an instructions file. Here’s the prompt, per folks who’ve reverse‑engineered it: You can write your own and get the same result. I use a custom on new repos to get up and running fast 2 . I tweaked it to work across different coding agents and sprinkled in a few tips I collected along the way. It should create a relevant ; if one exists, it updates it. Save this prompt as a custom command and use it with any tool — Gemini CLI, Codex, Amp, Firebender, etc. You aren’t stuck with any single tool. One more tip: a reasoning model works best for these types of commands. I must say: the more time I spend with these tools, the more “emperor‑has‑no‑clothes” moments I have. Some of the ways these things work are deceptively simple. Claude does a few other things, like instructing its inner agent tools (BatchTool & GlobTool) to collect related files and existing instructions ( , , , , etc.) as context for generating or updating . But the prompt is the meat. ↩︎ I used this prompt when I vibe‑engineered a maintainable Firefox add‑on . ↩︎ Claude does a few other things, like instructing its inner agent tools (BatchTool & GlobTool) to collect related files and existing instructions ( , , , , etc.) as context for generating or updating . But the prompt is the meat. ↩︎ I used this prompt when I vibe‑engineered a maintainable Firefox add‑on . ↩︎

0 views

Kaushik Gopal 6 months ago

Three important things to get right for successful AI Coding

I often hear AI coding feels inconsistent or underwhelming. I’m surprised by this because more often than not, I get good results. When working with any AI agent ( or any LLM tool ), there are really just three things that drive your results: This may sound discouragingly obvious, but being deliberate about these three (every time you send a request to Claude Code, ChatGPT etc.) makes a noticeable difference. …and it’s straightforward to get 80% of this right. LLMs are pocket‑sized world knowledge machines. Every time you work on a task, you need to trim that machine to a surgical one that’s only focused on the task at hand. You do this by seeding context. The simplest way to do this, especially for AI Coding: There are many other ways, and engineering better context delivery is fast becoming the next frontier in AI development 2 . Think of prompts as specs, not search queries. For example: ‘Write me a unit test for this authentication class’ 🙅‍♂️. Instead of that one‑liner, here’s how I would start that same prompt: I use a text‑expansion snippet, , almost every single time. It reminds me to structure any prompt: This structure forces you to think through the problem and gives the AI what it needs to make good decisions. Writing detailed prompts every single time gets tedious. So you might want to create “ command ” templates. These are just markdown files that capture your detailed prompts. People don’t leverage this enough. If your team maintains a shared folder of commands that everyone iterates on, you end up with a powerful set of prompts you can quickly reuse for strong results. I have commands like , , , etc. AI agents hit limits: context windows fill up, attention drifts, hallucinations creep in, results suffer. Newer models can run hours‑long coding sessions, but until that’s common, the simpler fix is to break work into discrete chunks and plan before coding. Many developers miss this. I can’t stress how important it is, especially when you’re working on longer tasks. My post covers this; it was the single biggest step‑function improvement in my own AI coding practice. Briefly, here’s how I go about it: One‑shot requests force the agent to plan and execute simultaneously — which rarely produces great results. If you were to submit these as PRs to your colleagues for review, how would you break them up? You wouldn’t ship 10,000 lines, so don’t do that with your agents either. Plan → chunk → execute → verify. So the next time you’re not getting good results, ask yourself these three things: I wrote a post about this btw, on consolidating these instructions for various agents and tools. ↩︎ Anthropic’s recent post on “context engineering” is a good overview of techniques. ↩︎ the context you provide the prompt you write executing in chunks System rules & agent instructions : This is basically your file where you briefly explain what the project is, the architecture, conventions used in the repository, and navigation the project 1 . Tooling : Lot of folks miss this, but in your AGENTS.md, explicitly point to the commands you use yourself to build, test and verify. I’m a big fan of maintaining a single with the most important commands, that the assistant can invoke easily from the command line. Real‑time data ( MCP ): when you need real-time data or connect to external tools, use MCPs. People love to go on about complex MCP setup but don’t over index on this. For e.g. instead of a github MCP just install the cli command let the agent run these directly. You can burn tokens if you’re not careful with MCPs. But of course, for things like Figma/JIRA where there’s no other obvious connection path, use it liberally. Share the high‑level goal and iterate with the agent Don’t write code in this session; use it to tell the agent what it’s about to do. Once you’re convinced, ask the agent to write the plan in detailed markdown in your folder Reset context before you start executing Spawn a fresh agent, load . Implement only that task, verify & commit. Reset or clear your session. Proceed to and repeat. Am I providing all the necessary context? Is my prompt a clear spec? Am I executing in small, verifiable chunks? I wrote a post about this btw, on consolidating these instructions for various agents and tools. ↩︎ Anthropic’s recent post on “context engineering” is a good overview of techniques. ↩︎

0 views

Kaushik Gopal 6 months ago

Vibe-engineering a Firefox add-on: Container Traffic Control

I wanted to test a simple claim: you can ship maintainable software by vibe-coding end to end. I set strict constraints: In about a day 1 I had a working Firefox add-on I could submit for review. The code meets my bar for readability and long‑term change. Even the icon came from an image model 2 . Introducing Container Traffic Control . Install and source • Install: Firefox add-on listing • Code: github.com/kaushikgopal/ff-container-traffic-control It’s in vogue to share horror stories of decimated vibe-coded repos 3 . But I’m convinced that with the right fundamentals, you can vibe-code a codebase you’d comfortably hand to another engineer. This was my experiment to vet my feelings on the subject. Granted, this was a small and arguably very simple repository, but I’ve also seen success with moderately larger codebases personally. It comes down to scrupulous pruning : updating system instructions, diligent prompting, and code review. I plan to write much more about this later, but let’s talk about some of the mechanics of how it went: I didn’t write a single line of JavaScript by hand. When I needed changes, better structure, reusable patterns, small refactors — I asked the agent. The goal throughout was simple: keep the codebase readable and maintainable. It now has a lot of the things we consider important for a decent codebase: The best part: most of this came together over two days 4 . Some example pull requests from the repository with the exact prompt I used and the plan that was generated: Here’s the very first prompt I used to generate the guts of the code: I captured my prompts but wasn’t diligent about surfacing them in pull requests; here are a few I did capture: The code is open source, so go ahead and check it out . In my last post, How to Firefox , I covered “Privacy power-up: Containers” 5 . “Containers” let you log in to multiple Gmail accounts without separate browser profiles. Add Total Cookie Protection and you get strong isolation. That’s great, but managing it automatically gets tedious fast. Examples: Added these test cases I realized while writing this post I should probably have these exact use cases tested, so I did just that right now… as I continued to flush out this post. You can’t achieve this level of control with default containers unless you micromanage every case — and even then, some are impossible. I tried various add-ons but kept hitting cases that just wouldn’t work . So I built my own. I also prefer how this add-on asks you to set up rules: Overall, I enjoyed the experiment. I’ve been happily using my add-on , and I feel confident that if I needed to make changes, I could do it in what I consider a maintainable codebase. Stay tuned for my tips on how you can use AI coding more constructively. vibe-coding vs vibe-engineering Simon Willison started using the term vibe-engineering for precisely vibe-coding with this level of rigor. I’m trying to adopt this more. The bulk took a few hours; the rest was tweaks between other work. ↩︎ Google’s new 🍌 model . ↩︎ which I don’t for a second deny exist. ↩︎ Honestly, the work put together was probably a few hours. I was issuing commands mostly on the side and going about my business, coming back later when I had time to tweak and re-instruct. ↩︎ I’ve since updated the post to point to my new Firefox add-on. ↩︎ New platform (I haven’t built a browser extension/add-on) Language I’m no longer proficient in (JavaScript) Zero manual code editing Tests (for the important parts) Well-organized code Clear, useful logging Code comments (uses a style called space-shuttle style programming , which I think is increasingly valuable with vibe-coding) Here’s a PR where midway I captured a major feature change: the original version of the add-on used a very different way of capturing the rules. It wasn’t as intuitive, so I decided to change it up. This was more a fun one where I asked it to critique the code as an HN reader would. Some good suggestions came out of it, but the explicit persona callout didn’t generate anything helpful in this specific case. Keep searches in one container, but open result links in my default container. From work Gmail, clicking a GitHub link: if it’s , open in Work; if it’s , open in Personal. In Google Docs (Personal), clicking a Sheets or Drive link should stay in Personal — even though my default for Sheets is Work. The bulk took a few hours; the rest was tweaks between other work. ↩︎ Google’s new 🍌 model . ↩︎ which I don’t for a second deny exist. ↩︎ Honestly, the work put together was probably a few hours. I was issuing commands mostly on the side and going about my business, coming back later when I had time to tweak and re-instruct. ↩︎ I’ve since updated the post to point to my new Firefox add-on. ↩︎

JavaScript

Web Development