Latest Posts (9 found)
Nick Khami 3 days ago

XGBoost Is All You Need

import LLMFeatureDemo from "../../components/blog/XGBoostIsAllYouNeed/LLMFeatureDemo.astro"; import ModelComparisonDemo from "../../components/blog/XGBoostIsAllYouNeed/ModelComparisonDemo.astro"; import FeatureImportanceDemo from "../../components/blog/XGBoostIsAllYouNeed/FeatureImportanceDemo.astro"; {/* <!-- TODO: Add a more concrete opening story - maybe a specific moment at the startup where you realized asking LLMs for direct answers was broken? --> */} I spent two and a half years at a well-funded search startup building systems that used LLMs to answer questions via RAG (Retrieval Augmented Generation). We'd retrieve relevant documents, feed them to an LLM, and ask it to synthesize an answer. I came out of that experience with one overwhelming conviction: we were doing it backwards. The problem was that we were asking LLMs "what's the answer?" instead of "what do we need to know?" LLMs are brilliant at reading and synthesizing information at massive scale. You can spawn infinite instances in parallel to process thousands of documents, extract insights, and transform unstructured text into structured data. They're like having an army of research assistants who never sleep and work for pennies. {/* <!-- TODO: Add personality - maybe joke about why you picked this problem, or a detail about trying the "ask LLM directly" approach first and failing? --> */} Forecasting how many rushing yards an NFL running back will gain in their next game is a perfect example of this architecture. It's influenced by historical statistics (previous yards, carries, opponent defense), qualitative factors (recent press coverage, injury concerns, offensive line health), and game context (Vegas betting lines, projected workload). {/* <!-- TODO: Add personality - show a real example of ChatGPT giving a plausible-sounding but wrong prediction? Make it funny? --> */} You could ask ChatGPT's Deep Research feature to predict every game in a week. It would use web search to gather context, think about each matchup, and give you predictions. This approach is fundamentally broken. It's unscalable (each prediction requires manual prompting and waiting), the output is unstructured (you'd need to manually parse each response and log it in a spreadsheet), it's unreliable (LLMs are trained to sound plausible, not to optimize for numerical accuracy), and you can't learn from it (each prediction is independent—there's no way to improve based on what worked). This is the "ask the LLM what's the answer" approach. It feels like you're doing AI, but you're really just creating an expensive, slow research assistant that makes gut-feel predictions. {/* <!-- TODO: Add personality - maybe contrast this with how a human would do feature engineering? Show the "aha" moment when you realized this approach? --> */} Instead of asking "How many yards will Derrick Henry rush for?", we ask the LLM to transform unstructured information into structured features. Search for recent press coverage and rate sentiment 1-10. Analyze injury reports and rate concern level 1-5. Evaluate opponent's run defense and rate weakness 1-10. This is scalable (run 100+ feature extractions in parallel), structured (everything becomes a number XGBoost can use), and improves over time (XGBoost learns which features actually matter). I started with basic statistical features from the NFL API: yards and carries from the previous week, 3-week rolling averages, that kind of thing. These are helpful, but they miss important context. So I had the LLM engineer seven qualitative features: press coverage sentiment, injury concerns, opponent defense weakness, offensive line health, Vegas sentiment, projected workload share, and game script favorability. An agent loop with web search processed context about each player and game to populate these features—searching for news in the week leading up to the game and rating each factor on a numerical scale. <LLMFeatureDemo /> Once we run this process for every running back each week, we end up with a dataset that has both statistical and LLM-engineered qualitative features. {/* <!-- TODO: Add personality - what were you hoping for? What did you expect to happen? --> */} I split the data chronologically—early weeks for training, later weeks for testing—and trained two models. A baseline using only statistical features (previous yards, carries, rolling averages), and an enhanced model using both statistical and LLM-engineered features. {/* <!-- TODO: Add personality - show your emotional reaction to seeing these numbers. Were you shocked? Skeptical? Did you run it again to make sure? --> */} <ModelComparisonDemo /> The LLM-enhanced model reduced prediction error by 22.6% . The baseline model was actually worse than just predicting the average yards (R² of -0.025), while the enhanced model explained 38.6% of the variance. But that's not the interesting part. The interesting part is what XGBoost actually learned. {/* <!-- TODO: Add personality - build up the surprise here. Maybe say "I looked at the feature importance rankings expecting to see..." --> */} <FeatureImportanceDemo /> Six of the top seven most important features are LLM-engineered. The top feature is average carries over the last 3 weeks (statistical). The second most important feature is press coverage sentiment (LLM). Then game script prediction (LLM), Vegas sentiment (LLM), projected workload share (LLM), offensive line health (LLM), and injury concern (LLM). I didn't tell XGBoost that press sentiment matters more than injury concerns, or that game script prediction is more important than offensive line health. The model discovered these patterns on its own by analyzing which features actually correlated with rushing yards. The most predictive LLM feature, press coverage sentiment, captures momentum and narrative that doesn't show up in raw statistics. When a running back is getting positive press coverage, they tend to get more carries and perform better. XGBoost found this signal and learned to weight it heavily. This is the power of the hybrid approach: LLMs transform messy, unstructured context into clean features. XGBoost discovers which features actually matter. Neither could do this alone. {/* <!-- TODO: Add personality - make the transition to "this is actually a bigger problem" more dramatic. Show frustration with the current state? --> */} This isn't just about NFL predictions. Email prioritization, Slack message routing, pull request quality assessment, prediction market opportunities, customer support triage—every one of these problems has the same structure. Some structured data combined with unstructured context that needs to be transformed into a prediction. The architecture is identical every time: use LLMs in parallel to extract features from unstructured data, combine with structured features, train XGBoost to find patterns, deploy and iterate. Setting this up from scratch takes way too much time. I want tools that make this trivial—upload your data, describe what you want to predict, and get back a trained model with a deployment-ready API. {/* <!-- TODO: Add personality - make this section angrier? More pointed? This is your villain reveal. --> */} The tools I'm describing could exist today. The technology is mature and proven. So why hasn't anyone built them? Random forests don't raise $1B rounds. Founders are building pure-LLM systems because that's what gets funded. VCs get excited about foundation models and AGI, not about elegant hybrid architectures that combine 2019-era XGBoost with LLM feature engineering. This is the real problem with modern AI development. Not that the technology isn't good enough—it's that incentives are backwards. VC-led engineering is bad engineering. The best technical solutions rarely align with what makes a compelling pitch deck. Everyone's building the wrong thing because they're building what raises money instead of what solves problems. If you're a builder who cares more about solving real problems than raising huge rounds, there's a massive opportunity here. Build the boring, practical tools that let people deploy these hybrid systems in minutes instead of weeks. Build what actually works instead of what sounds impressive. {/* <!-- TODO: Add personality - end on a more concrete note? What are YOU going to build next? What do you wish existed? --> */} The future of ML isn't pure LLMs or pure classical ML—it's knowing which tool to use for which job. Don't ask LLMs "what's the answer?" Ask them "what do we need to know?" Then let XGBoost find the patterns in those answers. Want to see the full implementation? Check out the complete Jupyter notebook walkthrough with all the code, data processing steps, training, and visualizations.

2 views
Nick Khami 2 weeks ago

Use the Accept Header to serve Markdown instead of HTML to LLMs

Agents don't need to see websites with markup and styling; anything other than plain Markdown is just wasted money spent on context tokens. I decided to make my Astro sites more accessible to LLMs by having them return Markdown versions of pages when the header has or preceding . This was very heavily inspired by this post on X from bunjavascript . Hopefully this helps SEO too, since agents are a big chunk of my traffic. The Bun team reported a 10x token drop for Markdown and frontier labs pay per token, so cheaper pages should get scraped more, be more likely to end up in training data, and give me a little extra lift from assistants and search. Note: You can check out the feature live by running or in your terminal. Static site generators like Astro and Gatsby already generate a big folder of HTML files, typically in a or folder through an command. The only thing missing is a way to convert those HTML files to markdown. It turns out there's a great CLI tool for this called html-to-markdown that can be installed with and run during a build step using . Here's a quick Bash script an LLM wrote to convert all HTML files in to Markdown files in , preserving the directory structure: Once you have the conversion script in place, the next step is to make it run as a post-build action. Here's an example of how to modify your scripts section: Moving all HTML files to first is only necessary if you're using Cloudflare Workers, which will serve existing static assets before falling back to your Worker. If you're using a traditional reverse proxy, you can skip that step and just convert directly from to . Note: I learned after I finished the project that I could have added to my so I didn't have to move any files around. That field forces the worker to always run frst. Shoutout to the kind folks on reddit for telling me. I pushed myself to go out of my comfort zone and learn Cloudflare Workers for this project since my company uses them extensively. If you're using a traditional reverse proxy like Nginx or Caddy, you can skip this section (and honestly, you'll have a much easier time). If you're coming from traditional reverse proxy servers, Cloudflare Workers force you into a different paradigm. What would normally be a simple Nginx or Caddy rule becomes custom configuration, moving your entire site to a shadow directory so Cloudflare doesn't serve static assets by default, writing JavaScript to manually check headers and using to serve files. SO MANY STEPS TO MAKE A SIMPLE FILE SERVER! This experience finally made Next.js 'middleware' click for me. It's not actually middleware in the traditional sense of a REST API; it's more like 'use this where you would normally have a real reverse proxy.' Both Cloudflare Workers and Next.js Middleware are essentially JavaScript-based reverse proxies that intercept requests before they hit your application. While I'd personally prefer Terraform with a hyperscaler or a VPS for a more traditional setup, new startups love this pattern, so it's worth understanding. Here's an example of a working file to refer to a new worker script and also bind your build output directory as a static asset namespace: Below is a minimal worker script that inspects the header and serves markdown when requested, otherwise falls back to HTML: Pro tip: make the root path serve your sitemap.xml instead of markdown content for your homepage such that an agent visiting your root URL can see all the links on your site. It's likely much easier to set this system up with a traditional reverse proxy file server like Caddy or Nginx. Here's a simple Caddyfile configuration that does the same thing: I will leave Nginx configuration as an exercise for the reader or perhaps the reader's LLM of choice. By serving lean, semantic Markdown to LLM agents, you can achieve a 10x reduction in token usage while making your content more accessible and efficient for the AI systems that increasingly browse the web. This optimization isn't just about saving money; it's about GEO (Generative Engine Optimization) for a changed world where millions of users discover content through AI assistants. Astro's flexibility made this implementation surprisingly straightforward. It only took me a couple of hours to get both the personal blog you're reading now and patron.com to support this feature. If you're ready to make your site agent-friendly, I encourage you to try this out. For a fun exercise, copy this article's URL and ask your favorite LLM to "Use the blog post to write a Cloudflare Worker for my own site." See how it does! You can also check out the source code for this feature at github.com/skeptrunedev/personal-site to get started. I'm excited to see the impact of this change on my site's analytics and hope it inspires others. If you implement this on your own site, I'd love to hear about your experience! Connect with me on X or LinkedIn .

1 views
Nick Khami 2 months ago

VPS Evangelism and Building LLM-over-DNS

My most valuable skill as a hacker/entrepreneur is that I'm confident deploying arbitrary programs that work locally to the internet. Sounds simple, but it's really the core of what got me into Y-Combinator and later helped me raise a seed round. When I was starting out hacking as a kid, one of the first complete things I built was a weather reply bot for Twitter. It read from the firehouse API, monitored for mentions and city names, then replied with current weather conditions when it got @'ed. My parents got me a Raspberry Pi for Christmas and I found a tutorial online. I got it working locally and then got completely stuck on deployment. The obvious next step was using my Pi as a server, but that was a disaster. My program had bugs and would crash while I was away. Then I couldn't SSH back in because my house didn't have a static IP and Tailscale wasn't a thing yet. It only worked on and off when I was home and could babysit it. When I started building web applications, I somehow skipped VPS entirely and went straight to Platform as a Service Solutions like Vercel and Render. I have no idea why. I was googling "how do I deploy my create react app" and somehow the top answer was to deploy to some third-party service that handled build steps, managed SSL, and was incredibly complicated and time-consuming. There was always some weird limitation: memory constraints during build, Puppeteer couldn't run because they didn't have the right apt packages. Then I was stuck configuring Docker images, and since AI wasn't a thing yet and I'd never used Docker at a real job, it was all a disaster. I wasted more time trying to deploy my crappy React slop than building it. During college, I got lucky and met a hacky startup entrepreneur who was hiring. I decided to take a chance and join, even though the whole operation seemed barely legitimate. Going into the job, I had this assumption that the "right" way to deploy was on AWS or some other hyperscaler. But this guy's mindset was the complete opposite—he was a VPS maximalist with a beautifully simple philosophy: rent a VPS, SSH in, do the same thing you did locally ( or whatever), throw up a reverse proxy, and call it a day. I watched him deploy like this over and over, and eventually he walked me through it myself a few times. It was all so small and easy to learn, but it made me exponentially more confident as a builder. I never directly thought, "I can't build this because I won't be able to deploy it," but the general insecurity definitely caused a hesitancy and procrastination that immediately went away. I've become an evangelist for this approach and wanted to write about it for a long time, but didn't know how to frame it entertainingly. Then on X, I got inspiration when levelsio posted a tweet about deploying a DNS server on Hetzner that lets you talk to an LLM . Want to see it in action? Try this: Getting that setup is probably more interesting than my rambling, so here's how to deploy your own LLM-over-DNS proxy on a VPS in less than half an hour with nothing other than a rented server. After purchasing your VPS, you'll receive an IP address and login credentials (usually via email). Connect to your server: Replace with your actual server IP address. Many VPS images come with or pre-installed. To avoid conflicts, remove or disable them: Install the required Python packages for our DNS server: Create a Python script that listens for DNS queries, treats the question as a prompt, sends it to the OpenRouter LLM API, and returns the response in a TXT record: Save this as on your VPS. Before running, you need to set your OpenRouter API key. For this tutorial, you can paste it directly into the variable. For anything more serious, you should use an environment variable to keep your key out of the code. Security Note : This is a proof-of-concept. For production use, you'd want proper process management (systemd), logging, rate limiting, and to avoid storing API keys in plaintext. Start the DNS server (port 53 requires root privileges): From another machine, send a DNS TXT query to test your setup: The LLM's response should appear in the output. Common Issues: To restrict access to your DNS-LLM proxy, use UFW (Uncomplicated Firewall) - and yes, it's literally called "uncomplicated" because that's what a VPS is, uncomplicated: This allows SSH access (so you don't lock yourself out) and DNS queries on port 53, while blocking everything else by default. Important : This setup runs as root and stores your API key in plaintext. For anything beyond experimentation, consider using environment variables, proper user accounts, and process managers like systemd. That's it! You now have your own LLM-over-DNS proxy running on a simple VPS. No complex infrastructure needed - just SSH, install dependencies, and run your code. This is the beauty of keeping things simple.

0 views
Nick Khami 2 months ago

I couldn't submit a PR, so I got hired and fixed it myself

import NoAbortVideo from "../../components/blog/DoingTheLittleThings/NoAbortVideo.astro"; For over a year, I was bugged by a search quirk on Mintlify that caused race conditions and wonky search results. Here's the fun irony: I was the founder of Trieve, the company that powered search for their 30,000+ documentation sites, yet their debounced search queries weren't being aborted as you typed. Check out this delightful chaos: <NoAbortVideo /> I had brought this up in our shared Slack before when I was just a vendor to them us (weird), but it wasn't a priority and never got fixed. It was extra frustrating because the race condition on the query was apparent enough that search would sometimes feel low quality since it would return results for a query many characters before the user was done typing. Even worse, as the founder of the search company powering this experience, it felt like a poor reflection on Trieve every time someone encountered these wonky results. Now that I'm on the team, I was able to finally fix it. I added an AbortController to the debounced search function, so that it aborts any previous queries when a new one is made. This means that the search results are always relevant to what the user is currently typing. There's something deeply satisfying about finally being able to fix the things that bug you. It made me feel a bit like George Hotz during his single week at Twitter in 2022, where he joined with overambitious plans to fix Twitter search, gave up due to hubris, and settled for fixing an annoying login popup before leaving. I've always admired engineers who are part hacker, part entrepreneur - people who see a problem and just... fix it. Getting to do something similar here (minus the dramatic exit) felt like a small win in steering my career toward that kind of direct approach. I prefer building and using open source software whenever possible, and this whole situation is a great example of why. With open source - when you encounter a bug or pain point, you can actually fix it yourself. Had this been an open source project during the year I was frustrated with the search race condition, I could have submitted a pull request with the AbortController fix and saved myself (and thousands of other users) the daily annoyance. Instead, it remained a persistent irritation until I happened to join the company and gain access to the codebase. There's something to be said for the immediate empowerment that comes with open source - though I understand why many companies choose different models for various business reasons. If search feels just a bit crisper and more responsive on Mintlify, it’s because of me! I fixed a bug that bothered me for over a year, and it feels great to have made that little improvement to the product. I can't wait to make more. Fixing small issues like this over and over again is how products become legendary. There's something deeply satisfying about finally having the power to fix the things that annoy you - even if they're tiny. Especially if they're tiny.

0 views
Nick Khami 3 months ago

What 7,112 Hacker News users listened to on my side project

I was burnt out from my startup and wanted to recover some of my creative energy, so I decided to build a fun side project called Jukebox . I had the idea of building a collaborative playlist app where you could queue music together with friends and family. I launched it on Hacker News, where it hit frontpage and got a lot of traction. In total, it had 7112 visitors who played 2877 songs . Hacker News users are known for their eclectic tastes, so I was curious to see what kind of music they listened to. I did some data analysis on the usage patterns and music genres, and I wanted to share my findings. Part of the fun of side projects is that you can use them as an opportunity to build your skills. Personally, one of the core skills I want to improve is marketing. Therefore, it was important to me that I actually drove traffic to the app and got people to use it. I'm happy to report that I was able to do that! Here's a full breakdown of the user engagement: <UserEngagementSankey /> The data is reliable because each visitor to the site is assigned an anonymous user account. This allows for accurate tracking of how many unique users visited, how many created a "box" (playlist), and how many engaged with the main features. Conversion rate into the primary "Create Box" CTA was awesome! However, I was sorely dissapointed to see that only 6.7% of people who created a box actually used the app to queue music together, which was the main reason why I built it in the first place. I'd call it a pyhrrhic victory. My product sense was a few rings off the bullseye, but still on the target. I'm not going to continue working on Jukebox, but it certainly fulfilled its core purpose of helping me recover my creative energy and learn some new skills. I was originally planning to talk more about how Jukebox was built, but I think the more interesting part is the data analysis of what music Hacker News users listened to. Spotify is generous with their API, so I was able to hydrate the songs data with genres by using their data. Hacker News users actually disappointed me with their music tastes. I expected them to be more eclectic, but classic rock and rock were 2 times more popular than any other genre. New wave, metal, and rap followed as the next most played genres, but there was a steep drop-off after the top three. The long tail of genres included everything from country and EDM to post-hardcore and progressive rock, but these were much less represented. One thing that surprised me was how country music edged out electronic genres in popularity. I expected a tech-focused audience to gravitate more towards electronic or EDM, but country had a stronger showing among the top genres. It’s a reminder that musical preferences can defy stereotypes, even in communities you’d expect to lean a certain way. <SongsExplorer /> When it comes to artists, the results were a mix of the expected and the surprising. Michael Jackson topped the list as the most played artist—proving that the King of Pop’s appeal truly spans generations and communities, even among techies. Queen and Key Glock followed closely, showing that both classic rock and modern hip-hop have their place in the hearts (and playlists) of Hacker News users. I was surprised to see a strong showing from artists like Taylor Swift and Depeche Mode, as well as a healthy mix of rap, electronic, and indie acts. The diversity drops off after the top few, but there’s still a wide spread: from Daft Punk to Nirvana, Dua Lipa to ABBA, and even some more niche names like Wolf Parade and Day Wave. Overall, while classic rock and pop dominate, there’s a clear undercurrent of variety—perhaps reflecting the broad interests of the Hacker News crowd, even if their musical tastes lean a bit more mainstream than I expected. <ArtistAnalysis /> Dens Sumesh, a former intern at my company, originally had the idea for Jukebox and told me about it at dinner one day. I thought it was a great and had potential, so I decided to build it. AI codegen has made me drastically more willing to build things on a whim. Typically I would have probably quit after finishing the backend, because React slop is not my favorite thing to work on. However, since the AI is good enough at React to do most of that work for me, I was mentally able to push through and finish the project. Another side benefit of building this was that I got a better handle on when AI is an efficient tool versus when it’s better to rely on my own skills. For example, highlighting a component and prompting is a great use of AI. However, more complex asks like are more efficiently handled by a human with intuition and experience. Framing things out manually, or even prompting the frame, consistently seemed to be a more efficient strategy than trying to get the AI to one-shot entire features. Both approaches can work, but breaking things down helps you maintain control and clarity over the process. If you rely too much on one-shot prompts, you can end up in a cycle where your eyes glaze over and you're pressing the "regenerate" button like it's a Vegas slot machine. This slot machining makes launching less likely because you spend more time hoping for a perfect result rather than iterating and moving forward. It's easy to get stuck chasing the ideal output instead of shipping something real and learning from feedback. Build stuff, share it, get feedback, and learn. Shots on goal lead to more opportunities for improvement and innovation. Even though Jukebox is now going into maintenance mode, it was everything I hoped it would be: a fun side project that people actually used. If you want the raw data, you can find it on the GitHub repository . If you want to see the source code for Jukebox, that's on Github at skeptrunedev/jukebox .

0 views
Nick Khami 4 months ago

Building the Server for Threshold Multisigs

You're launching a new Bitcoin ETF worth billions of dollars. You need to secure the funds backing it and are scared to build that security system yourself, so you look for a vendor. You pick Coinbase Custody, probably the most well-known and trusted company in the entire ecosystem, to custody your funds. Trying to be transparent, you publish the Bitcoin addresses holding the funds backing your ETF. Then, the whole world realizes that Coinbase has the entire amount secured behind a single private key, and that private key may or may not be an offline threshold multisig wallet. This is the story of Bitwise, Coinbase Custody, and the accusations that they were not using multisig wallets to secure their Bitcoin ETF funds. Ultimately, the accusations were likely unfounded, Coinbase Custody most likely uses a threshold ECDSA scheme which, while much scarier to implement and maintain than Schnorr, still provides a high level of security. However, the situation is still bizarre and highlights the need for better tools and infrastructure for implementing threshold multisigs in production. I started working on a Bitcoin bridge at ZeroDAO in 2022 and quickly realized that the state of the ecosystem was nascent. There are many libraries for implementing threshold multisigs, but that's the easy part. The hard part is building a complete server implementation that can be run in production. If you want to run a threshold multisig vault, you have to build your own server on top of these libraries. This is a lot of work and requires a deep understanding of the underlying cryptography and protocols. It's like having a powerful engine, but no car to put it in. You can build your own car, but it's a lot of work and you have to figure out how to make it safe and reliable. Bitwise is basically forced to rely on Coinbase Custody or some other vendor to manage their $4B of Bitcoin for them based on a trust model that is not transparent to the public and maybe not even transparent to them. This is not a good situation for the ecosystem, and it's not a good situation for the users of these services. Startups are fu**ing hard. Out of college, I have spent the past two years working 70+ hour weeks, making less than a third of what I would at a reputable tech company, trying to make something of myself and simultaneously put a dent in the world. Our first product, Trieve , is a relevancy optimized search engine in a simple API. We built it because we were frustrated with the state of having to go through the hassle of setting up an ingestion pipeline, indexing data, and optimizing the underlying engine every time you wanted high quality retrieval. I'm very proud of it! Lifetime, we have supported over 300 unique paying customers, made hundreds of thousands of dollars in revenue, served 150M+ searches, and indexed over 1B documents. At this point, it's a mature product and we have fulfilled our initial ambitions for it. There's even a shopify app that over 100 stores are using! But I still felt like there was something missing. I wanted to build something that would have a lasting impact and Trieve started to feel like a distraction from that. I stopped getting sparks of joy every time we onboarded a new customer or launched a new feature. AI is great, but overtime I started to feel more like a glorified data janitor than a builder. Trieve isn't big enough to sell for a life-changing amount of money. Denzell and I talked through some acquihire offers, but they didn't feel right. Denzell and I learned a ton from building Trieve, and we are both very proud of what we accomplished. Our tank of energy was still pretty full, the valve between it and Trieve's product was just closed. So, we started looking around and trying to figure out something that the valve could be opened to. In that process, we got drawn back to the startups we had worked at before Trieve, and the problems we had seen there. When you decide to join a startup, you typically are not doing it for the money. It's a raw deal to be an early employee at a startup. You are taking on a lot of risk, working long hours, and making less money than you would at a big tech company. But you do it because you believe in the mission and you want to be part of something bigger than yourself. I was extremely lucky to have the experience of being bought into two startups doing something I believed strongly in before founding my own. I worked at ZeroDAO , Quai , Breezy , and Botanix . Three out of those four startups were in the permissionless blockchain space, and that's no coincidence. I'm easily nerd sniped by the idea of building something that is open source, permissionless, and can be used by anyone who wants to use it. Ironically, ZeroDAO and Botanix in particular were both working specifically on applications which required managing a vault of Bitcoin assets. ZeroDAO, in typical startup fashion, used infrastructure from a vendor, speficially a company called renvm , to manage the Bitcoin assets backing their bridge. RenVM was owned by Alameda Research who collapsed in late 2022 along with FTX, and the RenVM team was forced to shut down the service. That was the end of ZeroDAO which was a damn shame because, prior to that, we had built a really cool product that processed over 184BTC in less than 7 months . As it turns out, similar to Bitwise, it was also out of scope for ZeroDAO to build their own threshold multisig vault server. Ultimately, the company was forced to shut down because we could not find a vendor to replace RenVM. Unlike Bitwise, we were in no way big enough to force a repuatable company, Coinbase Custody, BitGo, Anchorage, or others, to implement the features we needed. Coming out of that experience, I decided to join Botanix to right that wrong. Botanix is building a Bitcoin sidechain which requires a bitcoin vault the same way a bridge does. Ultimately, Trieve started to get traction while I was there and I left to focus on it. At the time trusting that the work would go on at Botanix and the world would get a threshold multisig vault server implementation with or without me working on it. But, two years later, I'm still waiting fot that to happen. It's a better time to bite this bullet on building this than ever before. The Zcash Foundation has announced that they are doing working on their threshold signature ( FROST if you care about the details) library and have a well-documented and audited implementation in Rust. Lucky for Denzell and I, we are now proficient Rust developers after building Trieve, so we can build on top of that library. Serai also has a fantastic complete reference server implementation in Rust . Building blocks are much more mature than they were two years ago, and it feels like the right time to build on top of them. Also, it doesn't seem like anyone else is really working on it, so perhaps we can reap first mover advantage in a way that we couldn't with Trieve. Coming from the search engine space for the past 2 years, I like to think we are in a similar spot relative to where Shay Bannon was in 2005 when he started building compass , a server abstraction layer on top of a search engine library called Lucene. Compass was a complete server implementation that made it easy to run a search engine in production, and it was the precursor to Elasticsearch, which is now the most popular search engine in the world. Completlely different industry, but I think the analogy holds. I love startups more than anything, but we want to make this work for larger custodians and exchanges first. Ideally, we build a standard piece of infrastrcuture that's used by all of the largest custodians and exchanges in the Bitcoin ecosystem. We want to make it easy for them to run a threshold multisig vault, so they can focus on building their products and services instead of worrying about the underlying security infrastructure. Also, we want to help large institutions be their own custodians. Part of Blockchain's promise is that it's trustless, and we want to help large institutions take advantage of that. We want to help them run their own threshold multisig vaults, so they can be their own custodians and not have to rely on third-party vendors. However, that doesn't mean we are going to ignore the startup and individual developer use cases. People have big blockchain application dreams, from decentralized exchanges to marketplaces to lending protocols, which all require vaults to manage the assets backing them. While it's not our primary focus, I still do want to make sure we build something which would have solved the problems we faced at ZeroDAO back when RenVM shut down. It's just an open source server! Anyone is going to be able to run it. We are not going to be shutting down Trieve! It's a mature product which keeps us profitable and default alive. We are going to keep marketing it and supporting our customers. The only difference is that we are cutting back on the ambition of our roadmap. We are going to continue fixing bugs, adding features that our customers request, and making sure the product is stable and reliable. But we are not going to be adding any new major features or trying to expand into new markets.

0 views
Nick Khami 4 months ago

Taking Dynamic Key Generation (FROST) From Papers to Production

FROST (Flexible Round-Optimized Schnorr Threshold) is a protocol for distributed key generation (DKG) and threshold signatures. It allows a group of participants to jointly generate a public/private key pair, where the private key is split among the participants. This enables secure signing without requiring all participants to be online simultaneously. Algorithmically, it is a simple 2 round protocol. Many implementation libraries exist, but they aren't shaped like other kinds of software products that developers are used to deploying. They are often just libraries that require a lot of boilerplate code to get started with, and they don't provide a clear way to run the protocol in a production environment. We at Threshold Security have been working on a single binary Node which is designed to be a reusable piece of infrastructure for running FROST DKG in production. This Node is designed to be run by a group of participants who want to jointly generate a key pair and use it for signing. It provides simple command-line and RPC interfaces for starting the DKG process, managing participants, and generating keys. It's not completely ready for production yet, but we are making good progress. In this post, I will explain the background of FROST, how it works, and what our Node implementation currently supports. If you are interested in the details of FROST math, I recommend reading the paper FROST: Flexible Round-Optimized Schnorr Threshold Signatures by Chelsea Komlo and Ian Goldberg. It provides a comprehensive overview of the protocol and its security properties. For our purposes, the math doesn't matter as much as the protocol structure, so I will only be providing explanations for the necessary pieces of the protocol to implement on top of the FROST libraries which already exist. The DKG process is divided into two rounds of communication followed by a simple sum of public keys to produce a final public key. The rounds are as follows: In the first round, each particpant generates a private key, compute commitments to it, and sends their commitments to all other participants. The commitments are based on the private key and a random nonce, which ensures that the commitments are unique and cannot be predicted by other participants. These messages do not contain any secret information, so they can be sent over an insecure channel. The commitments are used to prove that the participants have generated their private keys correctly and to ensure that they are commited to them for the duration of the DKG process. mikelodder7/frost-dkg ⭐ 1 An implementation of the Frost Distributed Key Generation. cmdruid/frost ⭐ 13 — Flexible, round-optimized threshold signature library for BIP340 taproot. bytemare/frost ⭐ 20 — Go implementation of RFC9591: the FROST (Flexible Round-Optimized Schnorr Threshold) signing protocol. taurushq-io/frost-ed25519 ⭐ 68 — Implementation of the FROST protocol for threshold Ed25519 signing. topos-protocol/ice-frost ⭐ 18 — A modular Rust implementation of the static version of the ICE-FROST signature scheme. zellular-xyz/pyfrost ⭐ 2 — Python implementation of the FROST algorithm. LFDT-Lockness/givre ⭐ 9 — Threshold Schnorr Signatures based on FROST in Rust. ZcashFoundation/frost ⭐ 190 — Rust implementation of FROST (Flexible Round-Optimised Schnorr Threshold signatures) by the Zcash Foundation. BlockstreamResearch/bip-frost-dkg ⭐ 59 — Bitcoin Improvement Proposal proposes ChillDKG, a distributed key generation protocol (DKG) for use with the FROST Schnorr threshold signature scheme.

0 views
Nick Khami 4 months ago

Web Developer's Guide to Midjourney

I tried to use Midjourney unsuccessfully a few times before, but decided to give it one more try after reading a good tutorial thread on X by @kubadesign . His thread got me most of the way there, but I picked up an additional trick with Midjourney's describe feature that I think is worth sharing. My initial plan was to use these images for uzi.sh , a tool for parallel LLM coding agents. While I ultimately chose a different final image for that project because I only needed one, the learning process was valuable. For this set, I aimed for a red color scheme to evoke action and speed, which you can see in the images below. <MasonryImages /> Following Kuba's advice, I went to pinterest and found a cool base image that I liked. I went with something that had a lot of red, but also darker colors on the border since I knew that I would need space for text and other elements if I wanted to use these on an actual website. <PinterestStarter /> You can't just start by describing the specific kind of image you want and expect to get good results. Style reference images are needed to give Midjourney the boundaries it needs to stick to your desired aesthetic and theme once you get more specific with your exact asks. It's unlikely that you'll be able to find multiple images on the internet which match your desired style exactly, so I recommend using neutral prompts to generate additional images in order to create a cohesive gallery. My neutral prompt here was . Essentially, any prompt describing a general subject and its characteristics without getting too specific about style, camera angle, action, or other details should work well. <SimilarStyleGeneration /> One of the images I wanted was an eagle diving. However, wasn't getting me the results I wanted. I accidentally discovered the describe feature dragging images around, and instantly realized that it was a cheat code for getting the images I wanted. <DescribeFeature /> You can take any one of these and pair them with the style reference images you generated earlier to get the eagle or any other image you described into the style you want. I could not believe how well this worked. <EaglePromptWithStyleReference /> Midjourney produces images which are too crisp to fit well as background images for the web. I recommend adding grain to them when you put them into your final site to create a more cohesive feel. If done correctly, you'll end up with a result that looks like the below. CSS is great for this! See a complete guide of how I applied the filter for the uzi.sh site below, full code on Github here . You first need to add a svg filter definition to your HTML file such that the CSS can reference it later to put the grain on top of the background image. The filter uses to create a fractal noise pattern, and to adjust the opacity. You can experiment with values like in or the alpha channel (the in the ) to finetune the grain's intensity and texture. Once you have the filter defined, you can apply it to a grain overlay element in your CSS. This element will cover the background image and apply the noise effect. Finally, you need to structure your HTML to ensure the layering works correctly. The grain overlay should be positioned above the background image but below any content you want to display. While there's ongoing debate about AI generated art, I see Midjourney as just another tool in the toolkit. The key is using it to bring your vision to life, not to replace your creativity. Take inspiration from what you see, but make it your own. Use AI to bridge the gap between the style you have in your head and what actually shows up on screen. The techniques I've shared here are all about developing your unique voice and letting AI help you express it better. The goal isn't to generate something generic. It's to create images that actually work for your projects and feel intentional, not obviously AI generated.

0 views
Nick Khami 4 months ago

LLM Codegen go Brrr – Parallelization with Git Worktrees and Tmux

import Pile from "../../components/blog/UsingGitWorktreesWithAI/Pile.astro"; This realization isn't unique to me; the effectiveness of using Git worktrees for simultaneous execution is gaining broader recognition, as evidenced by mentions in Claude Code's docs , discussion on Hacker News , projects like Claude Squad , and conversation on X . <br></br> <Pile /> I'm building a component library called astrobits and wanted to add a . To tackle the task, I deployed two Claude Code agents and two Codex agents, all with the same prompt, running in parallel within their own git worktrees . Worktrees are essential because they provide each agent with an isolated directory, allowing them to execute simultaneously without overwriting each other's changes. The number of agents I choose to rollout depends on the complexity of the task. Over time, you'll develop an intuition for estimating the right number based on the situation. Here, I felt 4 was appropriate. <br></br> <ImageFourSquareWorktrees /> <br></br> Voila, results! Only one of the four LLMs produced a solution that actually saved me time. This validates the necessity of rolling multiple agents: if each has a chance of producing something useful, then running four gives a chance that at least one will succeed . Four agents was essentially the bare minimum to have reasonable confidence in getting a workable solution. With LLMs being so affordable, there's virtually no downside to running multiple agents. The cost difference between using one agent ($0.10) versus four ($0.40) is negligible compared to the 20 minutes of development time saved. Since the financial risk is minimal, you can afford to be aggressive with parallelization. If anything, I could have run even more agents to further increase the odds of getting a perfect solution on the first try. And yet, the process of running them is still cumbersome and manual, it's more effort to setup 8 than 4, so I'm often lazy and opt to run the minimum number of agents I think will get the job done. This is where the problem comes in, and why I'm excited to share my proposed solution. Right now, I manually create git worktrees using , start a session for each one, run Claude Code in the first pane, paste a prompt, into a new pane, run to get a preview, switch to my browser to review, repeat if no agents succeed, then finally commit, push, and create a PR once I'm satisfied with an output. Here are the top frustrations: I feel like I've been through the wringer enough times with this process that I can see a solution shape which would create a smoother experience. To address these challenges head-on, the ideal developer experience (DX) would involve a lightweight CLI that wraps tmux, automating this complex orchestration. My co-founder Denzell and I felt these pain points acutely enough that we've begun developing such a tool, which we're calling uzi . The core idea behind uzi is to abstract away the manual, repetitive tasks involved in managing multiple AI agent worktrees. See some of the commands we are thinking to implement below. Our goal is to make the workflow more seamless while sticking closely to the existing mechanics of worktres and tmux. We want to make sure that we feel at home using alongside standard unix tools like , , and . These commands would primarily operate via instructions to the appropriate sessions. We don't want to reinvent the wheel; we just want to polish the existing process and make it more efficient. While focuses on software developers, its methodology isn't limited to tech; the principle of leveraging multiple agents running in parallel to increase the odds of finding an optimal solution applies universally. Consider a company like versionstory , which is pioneering version control for transactional lawyers. An attorney could leverage their software to run multiple instances of an agent to redline a contract. After reviewing the outputs, they could select and merge the best components to finalize the document. This approach would provide additional confidence in the quality of the final review as it would be based on multiple independent analyses rather than a single agent's output. Similarly, a marketing team could employ this parallel strategy to perform data analysis on ad performance. By prompting multiple AI instances, they could quickly gather a range of analyses, review them, and select the most insightful ones to inform their strategy. More coverage of the solution space leads to better decision-making and more effective campaigns. This parallel paradigm isn't just a new technique for developers; it's a glimpse into a more efficient, robust, and powerful future for AI-assisted productivity across various fields. I expect to see existing software products start to gain more powerful version control and parallel execution capabilities which emulate the workflow enabled by git worktrees for software development. My DMs are open if you want to chat about this topic or have any questions. I'm happy to discuss.

0 views