Latest Posts (10 found)
Kix Panganiban 6 days ago

Two things LLM coding agents are still bad at

I’ve been trying to slowly ease into using LLMs for coding help again lately (after quitting cold turkey ), but something always feels off -- like we’re not quite on the same wavelength. Call it vibe coding or vibe engineering , but I think I’ve finally pinned down two big reasons why their approach to code feels so awkward. These quirks are why I contest the idea that LLMs are replacing human devs -- they’re still more like weird, overconfident interns. I can’t fully vibe with them yet. LLMs don’t copy-paste (or cut and paste) code. For instance, when you ask them to refactor a big file into smaller ones, they’ll "remember" a block or slice of code, use a tool on the old file, and then a tool to spit out the extracted code from memory. There are no real or tools. Every tweak is just them emitting commands from memory. This feels weird because, as humans, we lean on copy-paste all the time. It’s how we know the code we moved is exactly the same as where we copied it from. I've only seen Codex go against the grain here, sometimes I'd see it issue and to try and replicate that copy-paste interaction, but it doesn't always work. And it’s not just how they handle code movement -- their whole approach to problem-solving feels alien too. LLMs are terrible at asking questions. They just make a bunch of assumptions and brute-force something based on those guesses. Good human developers always pause to ask before making big changes or when they’re unsure (hence the mantra of "there are no bad questions" ). But LLMs? They keep trying to make it work until they hit a wall -- and then they just keep banging their head against it. Sure, you can overengineer your prompt to try get them to ask more questions (Roo for example, does a decent job at this) -- but it's very likely they still won't. Maybe the companies building these LLMs do their RL based on making writing code "faster".

0 views
Kix Panganiban 1 weeks ago

Python feels sucky to use now

I've been writing software for over 15 years at this point, and most of that time has been in Python. I've always been a Python fan. When I first picked it up in uni, I felt it was fluent, easy to understand, and simple to use -- at least compared to other languages I was using at the time, like Java, PHP, and C++. I've kept myself mostly up to date with "modern" Python -- think pure tooling, , and syntax, and strict almost everywhere. For the most part, I've been convinced that it's fine. But lately, I've been running into frustrations, especially with async workflows and type safety, that made me wonder if there’s a better tool for some jobs. And then I had to help rewrite a service from Python to Typescript + Bun. I'd stayed mostly detached from Typescript before, only dabbling in non-critical path code, but oh, what a different and truly joyful world it turned out to be to write code in. Here are some of my key observations: Bun is fast . It builds fast -- including installing new dependencies -- and runs fast, whether we're talking runtime performance or the direct loading of TS files. Bun's speed comes from its use of JavaScriptCore instead of V8, which cuts down on overhead, and its native bundler and package manager are written in Zig, making dependency resolution and builds lightning-quick compared to or even Python’s with . When I’m iterating on a project, shaving off seconds (or minutes) on installs and builds is a game-changer -- no more waiting around for to resolve or virtual envs to spin up. And at runtime, Bun directly executes Typescript without a separate compilation step. This just feels like a breath of fresh air for developer productivity. Type annotations and type-checking in Python still feel like mere suggestions, whereas they're fundamental in Typescript . This is especially true when defining interfaces or using inheritance -- compared to ABCs (Abstract Base Classes) and Protocols in Python, which can feel clunky. In Typescript, type definitions are baked into the language - I can define an or with precise control over shapes of data, and the compiler catches mismatches while I'm writing (provided that I've enabled it on my editor). Tools like enforce this rigorously. In Python, even with strict , type hints are optional and often ignored by the runtime, leading to errors that only surface when the code runs. Plus, Python’s approach to interfaces via or feels verbose and less intuitive -- while Typescript’s type system feels like better mental model for reasoning about code. About 99% of web-related code is async. Async is first-class in Typescript and Bun, while it’s still a mess in Python . Sure -- Python's and the list of packages supporting it have grown, but it often feels forced and riddled with gotchas and pitfalls. In Typescript, / is a core language feature, seamlessly integrated with the event loop in environments like Node.js or Bun. Promises are a natural part of the ecosystem, and most libraries are built with async in mind from the ground up. Compare that to Python, where was bolted on later (introduced in 3.5), and the ecosystem (in 2025!) is still only slowly catching up. I’ve run into issues with libraries that don’t play nicely with , forcing me to mix synchronous and asynchronous code in awkward ways. This experience has me rethinking how I approach projects. While I’m not abandoning Python -- it’s still my go-to for many things -- I’m excited to explore more of what Typescript and Bun have to offer. It’s like discovering a new favorite tool in the shed, and I can’t wait to see what I build with it next. Bun is fast . It builds fast -- including installing new dependencies -- and runs fast, whether we're talking runtime performance or the direct loading of TS files. Bun's speed comes from its use of JavaScriptCore instead of V8, which cuts down on overhead, and its native bundler and package manager are written in Zig, making dependency resolution and builds lightning-quick compared to or even Python’s with . When I’m iterating on a project, shaving off seconds (or minutes) on installs and builds is a game-changer -- no more waiting around for to resolve or virtual envs to spin up. And at runtime, Bun directly executes Typescript without a separate compilation step. This just feels like a breath of fresh air for developer productivity. Type annotations and type-checking in Python still feel like mere suggestions, whereas they're fundamental in Typescript . This is especially true when defining interfaces or using inheritance -- compared to ABCs (Abstract Base Classes) and Protocols in Python, which can feel clunky. In Typescript, type definitions are baked into the language - I can define an or with precise control over shapes of data, and the compiler catches mismatches while I'm writing (provided that I've enabled it on my editor). Tools like enforce this rigorously. In Python, even with strict , type hints are optional and often ignored by the runtime, leading to errors that only surface when the code runs. Plus, Python’s approach to interfaces via or feels verbose and less intuitive -- while Typescript’s type system feels like better mental model for reasoning about code. About 99% of web-related code is async. Async is first-class in Typescript and Bun, while it’s still a mess in Python . Sure -- Python's and the list of packages supporting it have grown, but it often feels forced and riddled with gotchas and pitfalls. In Typescript, / is a core language feature, seamlessly integrated with the event loop in environments like Node.js or Bun. Promises are a natural part of the ecosystem, and most libraries are built with async in mind from the ground up. Compare that to Python, where was bolted on later (introduced in 3.5), and the ecosystem (in 2025!) is still only slowly catching up. I’ve run into issues with libraries that don’t play nicely with , forcing me to mix synchronous and asynchronous code in awkward ways. Sub-point: Many Python patterns still push for workers and message queues -- think RQ and Celery -- when a simple async function in Typescript could handle the same task with less overhead. In Python, if I need to handle background tasks or I/O-bound operations, the go-to solution often involves spinning up a separate worker process with something like Celery, backed by a broker like Redis or RabbitMQ. This adds complexity -- now I’m managing infrastructure, debugging message serialization, and dealing with potential failures in the queue. In Typescript with Bun, I can often just write an function, maybe wrap it in a or use a lightweight library like if I need queuing, and call it a day. For a recent project, I replaced a Celery-based task system with a simple async setup in Typescript, cutting down deployment complexity and reducing latency since there’s no broker middleman. It’s not that Python can’t do async -- it’s that the cultural and technical patterns around it often lead to over-engineering for problems that Typescript, in my opinion, solves more elegantly.

0 views
Kix Panganiban 1 weeks ago

Re: Comprehension Debt

I recently wrote about how LLMs can make you lose grip of your codebase by eroding the mental model you built by writing code yourself. A buddy of mine shared this post with me, and I think it's the most clear and succinct description of this phenomenon -- and Jason Gorman, the author, calls it "comprehension debt". It’s pretty much guaranteed that there will be many times when we have to edit the code ourselves. The “comprehension debt” is the extra time it’s going to take us to understand it first.

0 views
Kix Panganiban 2 weeks ago

NAIA T3 is the world's worst airport

Here I am with my family, waiting to board a flight to Sydney -- and I’m reminded of just how bad NAIA T3 really is. The whole place is packed to the brim with travelers, and it’s hellishly hot. Not just the annoying kind of hot, but the ridiculous, humid, and sticky kind of hot. It feels like being a sardine in a sauna. I honestly thought this would be one of the first things SMC’s takeover would fix. The Seating (or Lack Thereof) There are barely any seats unless you’re cool with rubbing elbows with sweaty strangers. Sure, there are paid lounges -- but with the tiny real estate allocated for them -- they’re almost always full. If you’re traveling solo, you might luck out and snag a spot, but no dice if you’re with a group like us. The Food Choices There’s also barely any decent food options to make the wait bearable. Most stalls in T3 are pasalubong shops -- and yeah, some sell short orders and snacks, but if you’re craving a real meal or a solid cup of coffee, you’re out of luck. Even worse, the few spots with okay seating are just as packed as everywhere else. A Glimmer of Hope? After sweating it out for hours, I couldn’t help but compare this mess to our last international trip from Clark International Airport earlier this year. But before I get into that, I’ll admit one small win for NAIA T3: the check-in process seems smoother than the last time I was here, especially if you arrive early. The number of bag drop desks scaled pretty well with demand from what we’ve seen (checking into Qantas), and the immigration and security check queues were decent and well-managed. Still, these tiny improvements don’t make up for the overall hassle. In Comparison to Clark International Airport It’s no contest -- Clark wins hands down. For starters, Clark is more recently built and feels a lot more modern. The place is spacious, well laid out, and the air conditioning is what you’d expect in a hot, humid country. The food and dining options there aren’t exactly world-class for an international terminal, but with plenty of seating around, that’s just a minor gripe. Honestly, being there with my family felt like a breeze compared to this chaos at T3. Tips from a Frustrated Traveler After dragging ourselves through this sauna of an airport, I’ve picked up a few tricks to make the wait a bit more bearable for me and my family. Hope these help if you’re stuck here too: Look for Fly Cafe at the far end of airside. We stumbled on it after hours of misery, and its air conditioning is solid. They’ve got decent beverages like coffee and matcha. Plus, if you’re traveling with a kid like I am, they have soft-serve and a small play area out front that was a lifesaver. The farther you are from the security check, the better the ambiance tends to be -- less crowd, less heat. The restrooms will actually be (and smell) better too, which was a relief after the chaos near the gates. The terminal Wi-Fi is trash. Don’t even bother -- use mobile data if you can.

0 views
Kix Panganiban 2 weeks ago

We should stop pretending like LLMs are software engineers

I recently wrote about how overdoing it with AI coding tools can lead to complacency , and I wanted to dig deeper to understand why this trap exists in the first place. I think it's because of the marketing hype surrounding AI coding tools. Often, the companies that build and sell these tools present them as if they're software engineers in their own right -- but they're not. Sure, you can ask them to vibe code an idea into a functional thing, but there isn't yet a tool where the produced code is consistently good enough for humans to pick up, read, and maintain. Even with long-term steering mechanisms like CLAUDE.md and AGENTS.md, these tools often stray and write code that works but seems designed and styled in isolation. We humans are good at building mental maps of a codebase, especially because we have the ability to -- and excel at -- organizing things spatially . We also have strong long-term memory, allowing us to learn and apply developer conventions, coding standards, and styles -- especially when they come up in PR reviews and retrospectives. I truly think that to build good software , the best way to utilize AI coding tools is to stop anthropomorphizing them and treat them purely as tools -- just like IntelliSense and the code completion tools that existed before LLMs -- and use them surgically and with intent. This is also why I think Cursor peaked with Cursor Compose, and it all went downhill from there. A tool that can strongly follow instructions and precisely make edits on a few specific files, although it carries some form of tedium, is much much more useful than a more autonomous agent that will likely eventually rabbithole and try to brute-force the codebase into something that works based on its assumptions of your request. After all, I still believe that working software is different from software that feels good to build on regularly.

0 views
Kix Panganiban 2 weeks ago

Cutting the cord on TV and movie subscriptions

In 2025, there's no longer a single subscription that you can pay for to watch any new movie or TV show that comes out. Netflix, Disney, HBO, and even Apple now push you to pay a separate subscription just to watch that one new show that everyone's talking about -- and I'm sick of it. Thanks to a friend of mine, I recently got intrigued by the idea of seedboxing again. In a nutshell, instead of spending $ to pay for 5 different streaming services, you pay a single fee to have someone in an area with lax torrenting laws host a VPS for you -- where you can run a torrent client and a Plex server, download content, and stream it to your devices. I tried a few seedbox services, but the pricing didn't really work for me. And since I'm in the Philippines, many of them suffer from high latency, and even raw download speeds can be spotty. So I put my work hat on and decided to try spinning up my own media server, and I chose this stack: https://github.com/Rick45/quick-arr-Stack For people just getting into home media servers like myself, this stack can essentially be run with just , with a few modifications to the env values as necessary. (For Windows users running this on WSL like me, you'll need to change all containers using networking to instead, and expose all ports one by one. Most of them only need one port, except for the Plex container, which lists them here .) Once it's up, you get: The quick arr Stack repo has a much longer and thorough explanation of each component, as well as how to configure them. Once it's all up and running -- you now have access to any TV show or movie that you want, without paying ridiculous subscription fees to all those streaming apps! Deluge -- the torrent client Plex Media Server -- this should be obvious unless you don't know what Plex is; it hosts all your downloaded content and allows you to access it via the Plex apps or through a web browser Radarr -- a tool for searching for movies, presented in a much nicer interface than manually searching for individual torrents Sonarr -- a tool for searching for TV shows, and as I understand it, a fork of Radarr Prowlarr -- converts search requests from Radarr and Sonarr into torrent downloads in Deluge Bazarr -- automatically downloads subtitles for your downloaded media Torrenting is illegal. That should be obvious. Check your local laws to make sure you're not breaking any. The stack includes an optional VPN client, which you could use if you want to be less detectable. You'll need to configure the right torrent trackers in Prowlarr. Some are great for movies, some for TV shows, and there are different ones for anime. There doesn't seem to be a single tracker that does it all. Even then, some trackers might not work. For example, l337's Cloudflare firewall is blocking Prowlarr. Not all movies and TV shows will be easy to find, so if you're looking for some obscure media, you might need to go with a Usenet tracker. This setup requires a pretty stable internet connection (with headroom for both your torrenting and your regular use), and tons of storage. Depending on how much media you're downloading, you'll probably need to delete watched series consistently or use extremely large drives. Diagnosing issues (Prowlarr can't see Sonarr! Plex isn't updating! Downloads aren't appearing in Deluge!) requires some understanding of Docker containers, Linux, and a bit of command-line work. It's certainly not impossible, but might be off-putting for beginners.

0 views
Kix Panganiban 2 weeks ago

Complacency is the clear and present danger

First came Github Copilot. It was, to me, the first real product that demonstrated how powerful LLMs could be when deployed into coding workflows. Then came Cursor, which took that up a notch with its mind-bendingly fast and powerful autocomplete and -- the now deprecated in favor of Agents -- Cursor Compose tool. And then Anthropic released Claude Code, which I genuinely thought would be the gold standard for agentic coding: you treat it like an obscenely well-read (but inexperienced) developer, give it instructions and guidance, and let it rip autonomously navigating your codebase, grokking files, and writing features and fixing bugs. Fast forward to 2025 and a coworker just told me during a code review session: "I don't think the Kix I know would've written code like this." How did I get there? The biggest risk -- the clear and present danger -- of doing any sort of coding with AI is complacency . And it's easier to fall into than you think. Part of onboarding a team to AI coding tools is teaching the mantra of never merging code that you otherwise wouldn't have personally written . This sounds perfectly clear and easy, but it's so easy to get lost in the vibe-coding sauce and forget about it. Here's how the complacency trap works: When a coding agent works, you run whatever code it outputs and -- after running some tests and manual validation -- feel satisfied that you now have a working new thing that barely cost any mental overhead. The code passes all checks: linting, type validation, test suites, and looks fine. While it's working, you suddenly have a bunch of free time and mental space to context-switch to something else. After all, coding agents do not yet instantaneously write code (not counting those running on ultrafast inference services like Cerebras and Sambanova ). Those two combined -- the instant gratification and downtime -- create a dangerous feedback loop. You're slowly beginning to pay less attention to your well-read intern and you're losing that mental map of your code. Letting the AI agent have at it almost completely autonomously erodes your grip on your codebase and makes you complacent. This leads to situations where you know the thing works, but you haven't written everything (or even most of it) yourself. Building on top of that new code becomes harder than it would have been had you written it yourself. And the cycle continues -- it's harder for you to write on top of it, so you just delegate it to the AI agent again (because it feels easier ) -- and so it writes even more code that you're not completely familiar with, until you end up with a mountain of slop that's deeply unpleasant to read and write. The callout from my coworker was deeply embarrassing for me, and was pretty much the intervention that I needed. I cancelled my $200 Claude Max subscription and completely rethought how I work with AI coding tools. Now I'm back to writing code myself, only delegating small chunks of work to AI. Here's what I found works great for me -- allowing me to balance mental load with productivity while staying in control: 99% of the AI I use is Cursor's fast autocomplete. To this day, I still have not found any other tool that comes close. It does the job perfectly for most of the code that I need to write: For bigger changes, I use Cursor's highlight and add to context feature and let the Auto agent do a first pass. I then review the code and revise it. I don't let it make changes to multiple files at once -- or even to multiple different places in a single file. This lets me keep my mental map of the code I'm working on. For research tasks, I use Perplexity and Claude to start charting where I should look -- but I still pick it up like an absolute neanderthal in 2025 and read through Stack Overflow and GitHub issues myself. If I cannot avoid letting an agent write big swaths of code, I treat it like I would an actual person submitting a PR -- I scrutinize the changes line by line, make review notes, and then perform the edits myself. For all of this, I prepaid a year of Cursor in advance, and so far haven't run into any rate limiting issues or quotas. I guess running into those would be another good indicator that I'm slipping down the slope. This is exactly why I don't think I'll ever buy into fully autonomous driving myself, even as an EV fan. I enjoy driving, and I fear that delegating the entire process would make my "driving muscles" atrophy, make me complacent, and eventually put me into a situation where I'm speeding down a highway in a style my normal fully-aware self wouldn't use. The same principle applies to coding -- the moment you stop being an active participant in the process, you lose something essential about your craft. 99% of the AI I use is Cursor's fast autocomplete. To this day, I still have not found any other tool that comes close. It does the job perfectly for most of the code that I need to write: When I start writing the function signature and docstring, it usually gets most of the body right It takes over repetitive tasks such as changing function calls, log messages, and even trickier things like adding / blocks almost like mind-reading magic It still lets me feel like I wrote the code because it's closely patterned after code I just wrote For bigger changes, I use Cursor's highlight and add to context feature and let the Auto agent do a first pass. I then review the code and revise it. I don't let it make changes to multiple files at once -- or even to multiple different places in a single file. This lets me keep my mental map of the code I'm working on. For research tasks, I use Perplexity and Claude to start charting where I should look -- but I still pick it up like an absolute neanderthal in 2025 and read through Stack Overflow and GitHub issues myself. If I cannot avoid letting an agent write big swaths of code, I treat it like I would an actual person submitting a PR -- I scrutinize the changes line by line, make review notes, and then perform the edits myself.

0 views
Kix Panganiban 1 years ago

A Brief Review of the Clicks Keyboard for iPhone

I’m a fan of Michael Fisher, aka Mr Mobile. So when he announced his new product Clicks - I knew I just had to get one. Who isn’t nostalgic of old-school physical QWERTY keyboards? I was pretty convinced that a physical keyboard will improve my typing experience on mobile, and perhaps even increase my productivity on the go (which mostly means being able to respond to Slack and emails faster). After a week of using Clicks, my typing experience didn’t actually improve. In fact, it got so bad that I was typing so much slower now and I hated having to type on the keyboard. I wanted to give Clicks the benefit of the doubt, so I soldiered on for a few more days, but it was just bad. Here’s why I think we should just let this product die: On the upside: the build quality is pretty good, the usual iOS keyboard shortcuts work really well, and all the keys (including volume switches, action button, and power) are, well, clicky. Definitely not worth the $139 asking price though. For $30, maybe. On-screen keyboards have gotten so good after generations of optimization that moving to a miniature physical keyboard is disorienting, especially when you lose the predictive abilities of touch keyboards. Because touch keyboards have a tolerance for inaccuracy and lack of precision, they usually can predict which letter you are likely to tap on next based on what you’re spelling out - and as a result you can often type clumsily and still write pretty well. A physical keyboard has no such tolerances. I have to be precise all the time. And with the size of the keys on the Clicks (relative to the size of the keys on the touch keyboard), I have to be very precise - and my chubby fingers just lack that kind of dexterity. The new center of mass means I have to “cradle” your phone so that it doesn’t fall off while you’re using it. They even have this Getting Started guide on how you should hold the thing, because the ergonomics are extremely different and one-handed typing is no longer a thing. I find that my fingers get tired very quickly and often cramp, so I can’t use my phone long enough to actually get anything productive done. The keyboard layout is bizarre. The return key is right where I expect delete/backspace to be on the touch keyboard, so I often accidentally send messages while I’m in the middle of typing. Sure, I now have more screen real estate, but it also means that I have to travel across the Atlantic just to be able to swipe down and access the control center or my notifications. Magsafe is gone. So goodbye to my convenient Magsafe car mount and chargers. None of my USB-C accessories work anymore. The passthrough USB-C port on the Clicks seems to be implemented as a terminal device instead of a USB hub, so my portable audio dongles, drone controllers, and microphones no longer work. I can’t really type emojis on this thing, which have become a big part of my vernacular. Not a lot of apps actually support CMD + shortcuts, so web browsing by spamming the space bar is only workable if you’re on Safari.

0 views
Kix Panganiban 2 years ago

I Wrote A Summarizer For HackerNews, Here’s What I Learned

I've been a fan of HackerNews for a while now, but I've been struggling to keep up with the latest news lately. It used to be a total time-suck for me, like Facebook's Reels or YouTube's shorts, where I could mindlessly click and consume content for hours. But after taking a break from HN for a few months, I realized catching up was way too overwhelming. There are just too many interesting links to click on and I can't consume content as fast as I used to. I guess I'm missing my youth and my long attention span. I had an idea to build a version of HackerNews that fetches top stories, summarizes them, and presents them in bite-sized reads. So, I created HackRecap, a quick weekend project to make consuming HN stories easier for me. I had three goals in mind: The result is something that mostly works, and it’s good enough to show me story summaries and get me interested in reading the full stories if I wanted to, but it still has a bunch of limitations: Here’s how HackRecap works and what I learned while building it: To see how all of that is done in detail, here’s the link to the Github repo: https://github.com/KixPanganiban/hackrecap Feel free to fork it, submit Pull Requests, or just give it a spin. You’ll need a working OpenAI API key to run the summarizer. I saw this project as an opportunity to experiment with AI-assisted coding, and I don't think I'll ever go back. Having access to Github Copilot and ChatGPT feels like having a junior developer on hand who is well-read but needs some micromanaging. However, with enough direction and detail, I get amazing results at an astonishingly fast pace. For example, when I wanted to automate the deployment of HackRecap to my go-to Linode VPS, I instinctively reached for Ansible, which was my go-to tool in the past. However, I had forgotten how to write playbooks for it. Instead of searching for information online and trying to relearn everything, I simply asked ChatGPT to The result was a playbook code example that was almost good enough to run, except for a few quirks. Even then, I just had to tell ChatGPT things like and the playbook code was regenerated with the necessary changes. What about the costs? Prototyping HackRecap, fine tuning the completion parameters, and resummarizing hundreds of articles over and over again isn’t cheap, but it isn’t prohibitively expensive either. Since I started the project, I’ve spent around $88 on OpenAI so far: It’s not cheap, but it’s definitely cheaper than rolling out my own ML infrastructure, or heck, even learning how to write and run my own ML code. Overall, this was a really enjoyable project to work on. I learned a lot about OpenAI's APIs, GPT, and AI-assisted coding. Most importantly, I discovered that almost anyone with internet access can now run powerful machine learning workloads without needing to have extensive coding skills. The future feels exciting! Maintain the original spirit of HN by keeping navigation and story browsing experience mostly the same Provide the tl;dr of stories at a glance, while still allowing for easy access to the full article Create an easy-to-maintain and run platform Pages that require Javascript aren’t fetched properly. I suppose I could run a headless browser, render the page, and fetch the text from that, but it’s an additional moving part that I’d like to do away with for now Pages that aren’t necessarily stories or articles, or which display dynamic content, don’t really work. Depending on the fetchable content from the main article body, the summary may be completely unrelated. In this example, has nothing to do with , but is the instruction I give DaVinci to summarize the content for me. In the summarizer script itself, there are still pages that fail to be fetched or can’t be parsed. Because of all the reasons above, the stories presented in HackRecap is just a subset of the actual top stories from HackerNews. So I’m probably missing out on a couple of stories every day. First, I fetch the top stories from the HackerNews API. That’s pretty straightforward: the API first returns a set of story id’s, which I then iterate over to fetch the story detail. For every story fetched, I use Goose to fetch the article text and top image. This bit was surprisingly not as straightforward as I originally thought, since webpages aren’t really structured exactly the same way. As good as Goose is, it’s not perfect: for paywalled articles, and for pages with a lot of sidebar or footer text, the wrong text is fetched resulting in an incorrect summary down the line. Somewhere in there I think there’s a machine learning approach to identifying the proper text, maybe in conjunction with a headless browser, but I haven’t quite cracked it yet. I take the downloaded text and recursively chunk them by counting the tokens using OpenAI’s tiktoken . OpenAI’s text completion API has a token limit of 4096 tokens. Thankfully they provide a library called tiktoken which I use to encode the text into tokens which I can then use to chunk longer contents before sending them over to OpenAI’s API. This bit was what I spent most of my time figuring out. Initially, I naively tried to just send the entire text for summarization, but that ran into the 4096 token limit quickly and many times. My initial chunking approach was also naive, counting characters instead of tokens, which for the most part was ok but I wanted to maximize my limits a little better. It also took me a few iterations to do the recursive summarization and final cohesion summary just right. I recursively chunk and summarize the tokenized text via the API until I get a final summary that’s cohesive. What was surprising to me was how easy this was. In fact, all I had to do was feed every chunk to OpenAI and tell it , take all the chunk summaries together, see if it’s short enough for one last “cohesion summary”, and then that again. Finally, I serve everything up on a simple Flask page. The default Python web app stack with Flask, SQLite, and Redis.

0 views
Kix Panganiban 3 years ago

14 Days of Christmas with the M1 Pro

I was able to get my hands on a new M1 Pro Macbook last December 22. With the current supply shortage, the most reasonably priced model I found in stock was a 14 inch M1 Pro with 8 CPU Cores, 14 GPU Cores, 512GB SSD, and 16 GB RAM . I had a lot of expectations in terms of feel and performance based on numerous reviews I’ve seen online, and after spending about 14 days with it so far here’s what I found: I’m not going to say “most powerful laptop.” Power itself is subjective, and really depends on the part of the system you’re measuring. For example, if your goal is to measure how well it performs on games, you’ll get mixed results at best and likely a ton of disappointment. While I think the Apple SoC packs enough power to be able to run games made for it, there are not too many mainstream games built for ARM/Apple Silicon yet. That said, the performance is still very surprising — especially given how energy efficient the SoC is. I ran Phasmophobia on Windows 11 ARM via Parallels — and I was able to get around 60 fps relatively stable if I crank down the settings to low-medium. The best bit is that the entire system stayed relatively cool at 50-60c (measured CPU Core Average on iStatMenus ), and drew around 30-40W of power. Unplugged from the charger, I was able to game this way for literal hours — the longest I’ve ever been able on a laptop. For comparison, my Intel Mac occasionally shoots up to 90c for no reason, and the best I’ve ever gotten on it was around 3 hours of very light web browsing and calls. Apple Silicon ARM-native games on the other hand, knock it right out of the park. An early access game on Steam called Timberborn (it’s great! You should definitely try it!) can do 120fps without breaking a sweat on max settings, which probably shows how powerful the GPU is for games that are written to utilize it. But gaming isn’t really what made this laptop the most fun I’ve ever owned — it’s the snappiness, responsiveness, and overall experience of using it. See, my biggest gripe with all the other Macbooks I’ve owned in the past was how sluggish things got once you start racking up tabs on Google Chrome and have a bunch of other apps open (in my case, it’s usually Chrome, several instances of Visual Studio Code, Slack, Zoom, Docker, Audio Hijack , and Apple Music). Especially after waking up from sleep with all that memory pressure, my previous Intel Macs would start to crawl and feel really frustrating to use. For example, swiping between desktops and using exposé would be a laggy mess, and using Spotlight to search for things would be a hair-tearing experience. There’s none of that on my M1 Pro. It almost seems too good to be true, but it’s the truth. Even with a measly 16 gigs of memory (compared to the 32 gigs on my Intel i7 work Macbook), the experience remains buttery and snappy (granted, I’m not overloading it with running Docker containers — more on that later). It’s probably due to lower memory latencies and higher bandwidth since the CPU and memory are on the same SoC, but it’s still pleasantly surprising. Even when memory pressure goes above 50% and Mac OS is swapping aggressively, the usability still remains really good. Even x86 apps that run via Rosetta 2 like Luminar , which is what I like to use for photo editing and comes bundled with my Setapp subscription, feel responsive and quick. I do notice spikes in CPU usage a touch slower load times on those apps, but it’s nothing that takes away from the experience. Finally, I’m happy to report that all my connected peripherals work perfectly. I have a Keychron K2 and an MX Master 3 connected via Bluetooth (with no issues whatsoever — I was on the lookout for these based on reports I’ve read on the first gen M1), a Topping E30 DAC for my audio, a Behringer UMC22 for my microphone, and a Logitech Streamcam , and they all work without a hitch. I’m also using a Xiaomi ultrawide monitor connected via HDMI — and that works even better than when connected to an x86 machine, specifically, causing Mac OS to rearrange the windows when connecting and disconnecting the external display is super quick and reconfiguring the monitor arrangement via System Preferences > Displays is seamless — it doesn’t even temporarily black out like Intel Macs do. Ok, last bit, and I think this is what makes the web browsing experience on this Mac just fluid and really fun: Safari scores 241 runs/minute on Browserbench 2.0 , while Google Chrome scores 184. That is huge! It’s pretty amazing how good those scores are! For comparison, my i7 Macbook scores a measly 94.3 on Safari, and Google Chrome scores even worse at 88. Just to throw a wildcard in there, my Ryzen 5600X PC scores 132 on Google Chrome, and at the time I thought that was the fastest I could get. In terms of real usage, this translates to web pages rendering much quicker, so navigating and interacting with web apps will feel more snappy and fluid. This is certainly obvious for monstrosities like Jira and Sentry. This same fast experience feels (but I haven’t measured empirically) the same for all native M1 apps, including Visual Studio Code, Zoom, and Slack. All the good aside, there are still things holding it back from being the best work laptop I’ve ever owned. I’m a developer, and a work on backend and infrastructure quite a lot. One of the areas that M1 Macs haven’t yet caught up on is adoption from developers — many Docker images critical for my work stack aren’t ARM-compatible yet, and I found that forcing x86 emulation through Docker/QEMU doesn’t perform quite as well as native x86. Not to mention that some just outright crash or don’t work when emulated. The way I work around this at the moment is to have my M1 Mac act like a thin client, connecting to my desktop which hosts Docker and the rest of my code using Visual Studio Code’s Remote - SSH extension. It works amazingly well, and even forwards ports remotely, so it almost feels like things are running natively on my machine. To me, the responsiveness of the overall experience outweighs this small con of having to do this SSH dance. Even outside of Docker, there are also some Python libraries for example, that aren’t compatible with ARM yet — and I don’t want to force myself to use a mishmash of x86 and ARM Python environment all over the place which I can foresee giving me grief in the future. Finally, certain apps just refuse to launch altogether. There might be certain x86 calls being made that are just plain incompatible with what Rosetta is capable of translating, but apps like PLEX and Jellyfin which are admittedly not for work but I like to use to stream content while working, just get stuck as bouncing icons on the dock and fail to start up. When the first-gen Apple Silicon Macs came out, I purposely held out on buying because I know that there’s a certain class of headache for early adopters. And I wasn’t entirely wrong: back then, there was even less software intended for Apple Silicon (although Rosetta is pretty amazing), and there were first iteration nuances and caveats that I just didn’t want to deal with (like memory-related crashes and random device disconnections ). This second generation of M1 Macs is already pretty amazing, and I don’t regret getting my Macbook, but it’s still far from perfect. There’s still work to be done in achieving wider adoption from developers (of games and of developer tools) which would ostensibly look better by the time the next generation is released, and there are arguments to be made for next-gen M1 powered desktop Macs. And while my current setup of using my Mac as a thin client to my desktop server is workable, I still wish I could run it completely untethered.

0 views