GreatReads - Blog Aggregator · Phoenix Framework

Are LLMs Plateauing? No. You Are.

“GPT-5 isn’t that impressive.” People claim the jump from GPT4o to GPT-5 feels incremental. They’re wrong. LLM intelligence hasn’t plateaued. Their perception of intelligence has . Let me explain with a simple example: translation from French to English. GPT-4o was already at 100% accuracy for this task. Near-perfect translations, proper idioms, cultural context. Just nailed it. Now try GPT-o1, o3, GPT-5, or whatever comes next. The result? Still 100% accurate. From your perspective, nothing changed. Zero improvement. The model looks identical. They have plateaued. But here’s the thing: most people’s tasks are dead simple. - “Do this math for me” - “Explain this concept” - “Translate this text” - “Rewrite that email” These tasks were already saturated by earlier models. They are testing intelligence on problems that have already been solved. Of course they don’t see progress. They are like someone measuring a rocket’s speed with a car speedometer. Once you hit the max reading, everything looks the same. Intelligence is multi-dimensional. It’s a spectrum of capabilities tested against increasingly difficult tasks. Think about how we measure human intelligence: - A 5-year-old doing addition → Smart kid - A PhD solving differential equations → Brilliant mathematician - A Fields Medalist proving novel theorems → Genius Same concept, wildly different difficulty levels. You wouldn’t judge the mathematician by giving them 2+2 . Yet that’s exactly what we’re doing with LLMs. We test them on tasks that earlier models already maxed out, then declare progress has stopped. Raw LLM intelligence is exploding. But it’s happening at the frontier. On tasks that push the absolute limits of reasoning. Take GPT-5-Pro. It demonstrated the ability to produce novel mathematical proofs . Not “solve this known problem.” Not “explain this proof.” Create new mathematics. Example: In an experiment by Sébastien Bubeck , GPT-5-Pro improved a bound in convex optimization from 1/L to 1.5/L. It reasoned for 17 minutes to generate a correct proof for an open problem. Read that again. An LLM improved a mathematical bound . It generated original research. This isn’t just solving known problems. The AI is creating new knowledge. We’re approaching a world where AI models will tackle the hardest unsolved problems in mathematics. The Millennium Prize Problems. P vs NP. The Riemann Hypothesis. Problems that have stumped humanity’s greatest minds for decades or centuries. This isn’t incremental. This is a model operating at the level of professional mathematicians. And this capability emerged in the latest generation. But if you’re only asking it to “explain gradient descent” or “fix my Python bug,” you’ll never see this intelligence. You’re testing a Formula 1 car in a parking lot . Current frontier models (GPT-5-Pro, Claude 4.5) can already outperform most humans on most intellectual tasks. Not “simple” tasks. Most tasks. - Legal analysis? Better than most lawyers. - Medical diagnosis? Better than most doctors - Code review? Better than most senior engineers. - Financial modeling? Better than most analysts. And they do it in seconds. No fatigue. No ego. No “I need to look that up.” (also no close to no compensation, lol!) Soon, these models will be smarter than most humans combined . The collective intelligence of humanity, accessible in a chat interface. But here’s what’s missing today: the ability to work over time with tools . Thanks for reading! A human doesn’t rely on raw brain power alone. You use tools: - Reading text to gather information - Writing to organize thoughts - Maintaining todo lists to track objectives - Asking for feedback to improve - Using calculators, spreadsheets, databases, software. Your brain isn’t that powerful in isolation. Your intelligence emerges from orchestrating tools toward a goal . LLMs sucked at this. They were brilliant in a single conversation but couldn’t persist, iterate, or coordinate across time. That’s changing. The breakthrough isn’t smarter models. It’s models that can orchestrate their intelligence over time . Software engineers experienced that firsthand with coding agents. GPT-5-Codex, an open-source coding agent, can read, edit, execute code autonomously. For instance, to refactor a 12,000-line legacy Python project, it will: - Address dependencies - Add test coverage - Fix three race conditions - Run for 7 hours in a sandboxed environment This isn’t “write me a function.” This is sustained, multi-step reasoning with tool use. Planning, executing, validating, iterating. The model maintained context, managed a todo list, ran tests, read errors, and adapted. Just like a human engineer would. That’s the leap. Not raw intelligence but applied intelligence . It will take over most valuable knowledge worker jobs . Here’s where it gets real: the AI Productivity Index (APEX) , the first benchmark for assessing whether frontier AI models can perform knowledge work with high economic value . APEX addresses a massive inefficiency in AI research: outside of coding, most benchmarks test toy problems that don’t reflect real work. APEX changes that. APEX-v1.0 contains 200 test cases across four domains: - Investment banking - Management consulting - Primary medical care How it was built: 1. Source experts with top-tier experience (e.g., Goldman Sachs investment bankers) 2. Experts create prompts reflecting high-value tasks from their day-to-day work 3. Experts create rubrics for evaluating model responses This isn’t “explain what a stock is.” It’s “analyze this M&A deal structure and identify regulatory risks in cross-border jurisdictions.” The results? Current models can already answer a significant portion of these questions. Not all, but enough to be economically valuable. Take stock research for instance. A model can read a 10-K filing and answer questions about it perfectly. At my company Fintool we saturated that benchmark in 2024. But now the challenge is for our AI to do investor’s job: - Monitor earnings calls across hundreds of companies - Extract precise financial metrics and projections - Generate comprehensive research reports - Compare performance across competitors - Track industry trends over time - Identify investment opportunities autonomously Same “intelligence,” radically different capability. The raw LLM power is enhanced with tools . When we tested Fintool-v4 against human equity analysts we found that our agent was 25x faster and 183x cheaper, with 90% accuracy on expert-level tasks. What Happens Next The plateau isn’t in the model. It’s in your benchmark. The next wave isn’t smarter models, it’s models that can actually do things. Even if raw intelligence plateaued tomorrow, expanding agentic capabilities alone would trigger massive economic growth . It’s about: - Models that can maintain todo lists and execute over weeks - Models that can read documentation, try solutions, fail, and iterate - Models that can coordinate with other models and humans - Models that can ask for help when stuck And when millions of these agents are deployed, the world changes. Not because the models got smarter. Because they got useful. Intelligence without application is just a party trick. Intelligence with tool use is the revolution. It’s accelerating. Exponentially. But the real action is happening at the edge. Thanks for reading! Subscribe for free to receive new posts and support my work.

Python

AI

0 views

Nicolas Bustamante 1 months ago

LLMs Eat Scaffolding for Breakfast

We just deleted thousands of lines of code. Again. Each time a new LLM model comes out, that’s the same story. LLMs have limitations so we build scaffolding around them. Each models introduce new capabilities so that old scaffoldings must be deleted and new ones be added. But as we move closer to super intelligence, less scaffoldings are needed. This post is about what it takes to build successfully in AI today. Every line of scaffolding is a confession: the model wasn’t good enough. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. etc, etc... millions of lines of code to add external capabilities to the model. But look at models today: GPT-5 is solving frontier mathematics, Grok-4 Fast can read 3000+ pages with its 2M context window, Claude 4.5 sonnet can ingest images or PDFs, all models have native reasoning capabilities and support structured outputs. The once essential scaffolding are now obsolete. Those tools are backed in the model capabilities. It’s nearly impossible to predict what scaffolding will become obsolete and when. What appears to be essential infrastructure and industry best practice today can transform into legacy technical debt within months. The best way to grasp how fast LLMs are eating scaffolding is to look at their system prompt (the top-level instruction that tells the AI how to behave). Looking at the prompt used in Codex, OpenAI coding agent from GPT-o3 model to GPT-5 is mind-blowing. GPT-o3 prompt: 310 lines GPT-5 prompt: 104 lines The new prompt removed 206 lines. A 66% reduction. GPT-5 needs way less handholding. The old prompt had complex instructions on how to behave as a coding agent (personality, preambles, when to plan, how to validate). The new prompt assumes GPT-5 already knows this and only specifies the Codex-specific technical requirements (sandboxing, tool usage, output formatting). The new prompt removed all the detailed guidance about autonomously resolving queries, coding guidelines, git usage. It’s also less prescriptive. Instead of “do this and this” it says “here are the tools at your disposal.” As we move closer to super intelligence, the models require more freedom and leeway (scary, lol!). Advanced models require simple instructions and tooling. Claude Code, the most sophisticated agent today, relies on a simple filesystem instead of a complex index and use bash commands (find, read, grep, glob) instead of complex tools. It moves so fast. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead. Having an edge in building AI applications require deep technical understanding, insatiable curiosity, and low ego. By the way, because everything changes, it’s good to focus on what won’t change Context window is how much text you can feed the model in a single conversation. Early model could only handle a couple of pages. Now it’s thousands of pages and it’s growing fast. Dario Amodei the founder of Anthropic expects 100M+ context windows while Sam Altman hinted at billions of context tokens . It means the LLMs can see more context so you need less scaffolding like retrieval augmented generation. November 2022 : GPT-3.5 could handle 4K context November 2023 : GPT-4 Turbo with 128K context June 2024 : Claude 3.5 Sonnet with 200K context June 2025 : Gemini 2.5 Pro with 1M context September 2025 : Grok-4 Fast with 2M context Models used to stream at 30-40 tokens per second. Today’s fastest models like Gemini 2.5 Flash and Grok-4 Fast hit 200+ tokens per second. A 5x improvement. On specialized AI chips (LPUs), providers like Cerebras push open-source models to 2,000 tokens per second. We’re approaching real-time LLM: full responses on complex task in under a second. LLMs are becoming exponentially smarter. With every new model, benchmarks get saturated. On the path to AGI, every benchmark will get saturated. Every job can be done and will be done by AI. As with humans, a key factor in intelligence is the ability to use tools to accomplish an objective. That is the current frontier: how well a model can use tools such as reading, writing, and searching to accomplish a task over a long period of time. This is important to grasp. Models will not improve their language translation skills (they are already at 100%), but they will improve how they chain translation tasks over time to accomplish a goal. For example, you can say, “Translate this blog post into every language on Earth,” and the model will work for a couple of hours on its own to make it happen. Tool use and long-horizon tasks are the new frontier. The uncomfortable truth: most engineers are maintaining infrastructure that shouldn’t exist. Models will make it obsolete and the survival of AI apps depends on how fast you can adapt to the new paradigm. That’s what startups have an edge over big companies. Bigcorp are late by at least two paradigms. Some examples of scaffolding that are on the decline: Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. This cycle accelerates with every model release. The best AI teams master have critical skills: Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. It’s not easy. Teams are fighting powerful forces: Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt” In AI the best team builds for fast obsolescence and stay at the edge. Software engineering sits on top of a complex stack. More layers, more abstractions, more frameworks. Complexity was a sophistication. A simple web form in 2024? React for UI, Redux for state, TypeScript for types, Webpack for bundling, Jest for testing, ESLint for linting, Prettier for formatting, Docker for deployment…. AI is inverting this. The best AI code is simple and close to the model. Experienced engineers look at modern AI codebases and think: “This can’t be right. Where’s the architecture? Where’s the abstraction? Where’s the framework?” The answer: The model ate it bro, get over it. The worst AI codebases are the ones that were best practices 12 months ago. As models improve, the scaffolding becomes technical debt. The sophisticated architecture becomes the liability. The framework becomes the bottleneck. LLMs eat scaffolding for breakfast and the trend is accelerating. Thanks for reading! Subscribe for free to receive new posts and support my work. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt”

Bash

UI

AI

TypeScript

JSON

0 views

Nicolas Bustamante 1 months ago

ChatGPT Killed the Web: For the Better?

I haven’t used Google in a year. No search results, no blue links. ChatGPT became my default web browser in December 2024, and it has completely replaced the entire traditional web for me. Soon, no one will use search engine. No one will click on 10 blue links. But there is more: No one will navigate to websites. Hell, no one will even read a website again. The original web was simple. Static HTML pages. You could read about a restaurant—its menu, hours, location. But that was it. Pure consumption. Then came interactivity. Databases. User accounts. Now you could *do* things like reserve a table at that restaurant, leave a review, upload photos. The web became bidirectional. Every click was an action, every form a transaction. Now we’re entering a new evolution. You don’t navigate and read the restaurant’s website. You don’t fill out the reservation form. An LLM agent does both for you. Look at websites today. Companies spend millions building elaborate user interfaces—frontend frameworks, component libraries, animations that delight users, complex backends orchestrating data flows. Teams obsess over pixel-perfect designs, A/B test button colors, and optimize conversion funnels. All of this sophisticated web infrastructure exists for one purpose: to present information to humans and let them take actions. But if the information is consumed by a LLM - why does it need any of this? You don’t need a website. You need a text file: That’s it. That’s all an LLM needs to answer any question about a restaurant. No need for UI, clean UX etc. Here’s what nobody’s talking about: we don’t need thousands of websites anymore. Take a French boeuf bourguignon recipe. Today, there are hundreds of recipe websites, each with their own version: - AllRecipes with its community ratings - Serious Eats with detailed techniques - Food Network with celebrity chef branding - Marmiton for French speakers - Countless food blogs with personal stories Why do all these exist? They differentiated through: - Better UI design - Fewer ads - Faster load times - Native language content - Unique photography - Personal narratives before the recipe But LLMs don’t care about any of this. They don’t see your beautiful photos. They skip past your childhood story about grandma’s kitchen. They ignore your pop-up ads. They just need the recipe: Language barriers? Irrelevant. The LLM translates instantly. French, Italian, Japanese. It doesn’t matter. What this means: Instead of 10,000 cooking websites, we need maybe... a couple? or a single, comprehensive markdown repository of recipes. This pattern repeats everywhere: - Travel guides - Product reviews - News sites - Educational content The web doesn’t need redundancy when machines are the readers. Wait, there is more: LLM machines can create content too. Web 2.0’s breakthrough was making everyone a writer. YouTube, Instagram, TikTok—billions of people creating content for billions of people to read. But here’s the thing: why do you need a million human creators when AI can be all of them? Your favorite cooking influencer? Soon it’ll be an AI chef who knows exactly what’s in your fridge, your dietary restrictions, and your skill level. No more scrolling through 50 recipe videos to find one that works for you. Your trusted news anchor? An AI that only covers YOUR interests—your stocks, your sports teams, your neighborhood. Not broadcasting to millions, but narrowcasting to one. That fitness instructor you follow? An AI trainer that adapts to your fitness level, your injuries, your equipment. Every video made just for you, in real-time. Web 2.0 writing : Humans create content → Millions read the same thing Web 3.0 writing : AI creates content → Each person reads something unique The entire creator economy—the crown jewel of Web 2.0—collapses into infinite personalized AI agents. Social media feeds won’t be filled with human posts anymore. They’ll be generated in real-time, specifically for you. Every scroll, unique. Every video, personalized. Every post, tailored. The paradox: We’ll have infinite content variety with zero human creators. Maximum personalization through total artificial generation. Just as 10,000 recipe websites collapse into one markdown file for LLMs to read, millions of content creators collapse into personalized AI agents. The “write” revolution of Web 2.0 is being replaced by AI that writes everything, for everyone, individually. Ok what about taking actions like booking a restaurant? Web 2.0 gave us APIs—structured endpoints for programmatic interaction: - `POST /api/reservations` - Rigid schemas: exact field names, specific formats - Documentation hell: dozens of pages explaining endpoints - Integration nightmare: every API different, nothing interoperable APIs assumed developers would read documentation, write integration code, and handle complex error scenarios. They were built for humans to program against; requiring manual updates whenever the API changed, breaking integrations, and forcing developers to constantly maintain compatibility. MCP isn’t just another API. It’s designed for LLM agents: - Dynamic discovery : Agents explore capabilities in real-time through tool introspection - Flexible schemas : Natural language understanding, not rigid fields - Universal interoperability : One protocol, infinite services - Context-aware : Maintains conversation state across actions What makes MCP special technically: - Three primitives : Tools (functions agents can call), Resources (data agents can read), and Prompts (templates for common tasks) - Transport agnostic : Works over STDIO for local servers or HTTP/SSE for remote services - Stateful sessions : Unlike REST APIs, MCP maintains context between calls - Built-in tool discovery : Agents can query `listTools()` to understand capabilities dynamically—no documentation parsing needed Traditional APIs are like giving someone a thick manual and saying “ follow these exact steps. ” MCP is like having a smart assistant who can figure out what’s possible just by looking around . When you walk into that restaurant, the agent doesn’t need a 50-page guide—it instantly knows it can check tables, make reservations, or view the menu. And unlike APIs that forget everything between requests (like talking to someone with amnesia!), MCP remembers the whole conversation—so when you say “ actually, make it 8pm instead ,” it knows exactly what reservation you’re talking about. With traditional API: The agent handles all complexity. No documentation needed. No rigid formats. Just natural interaction. Even better: when the restaurant adds new capabilities—like booking the entire venue for private events, adding wine pairings, or offering chef’s table experiences—there’s no developer work required. The LLM agent automatically discovers the expanded schema and adapts. Traditional APIs would break existing integrations or require manual updates. MCP just works. With markdown for reading and MCP for acting, the entire web infrastructure becomes invisible: - Read : LLM ingests markdown → understands everything about your service - Act : LLM uses MCP → performs any action a user needs Websites become obsolete. Users never leave their chat interface. The web started as simple text documents linked together. We spent 30 years adding complexity such as animations, interactivity, rich media. Now we’re stripping it all away again. But this time, the simplicity isn’t for humans. It’s for machines. And that changes everything . The web as we know it is disappearing. What replaces it will be invisible, powerful, and fundamentally different from anything we’ve built before. For someone like me who love designing beautiful UIs, this is bittersweet. All those carefully crafted interfaces, micro-interactions, and pixel-perfect layouts will be obsolete. But I’m genuinely excited because it’s all about the user experience, and the UX of chatting (or even calling) your agent is infinitely better than website navigation. I can’t wait. Thanks for reading! Subscribe for free to receive new posts and support my work.

HTML

AI

Web Development

0 views

Nicolas Bustamante 2 months ago

The RAG Obituary: Killed by Agents, Buried by Context Windows

I’ve been working in AI and search for a decade. First building Doctrine, the largest European legal search engine and now building Fintool , an AI-powered financial research platform that helps institutional investors analyze companies, screen stocks, and make investment decisions. After three years of building, optimizing, and scaling LLMs with retrieval-augmented generation (RAG) systems, I believe we’re witnessing the twilight of RAG-based architectures. As context windows explode and agent-based architectures mature, my controversial opinion is that the current RAG infrastructure we spent so much time building and optimizing is on the decline. In late 2022, ChatGPT took the world by storm. People started endless conversations, delegating crucial work only to realize that the underlying model, GPT-3.5 could only handle 4,096 tokens... roughly six pages of text! The AI world faced a fundamental problem: how do you make an intelligent system work with knowledge bases that are orders of magnitude larger than what it can read at once? The answer became Retrieval-Augmented Generation (RAG), an architectural pattern that would dominate AI for the next three years. GPT-3.5 could handle 4,096 token and the next model GPT-4 doubled it to 8,192 tokens, about twelve pages. This wasn’t just inconvenient; it was architecturally devastating. Consider the numbers: A single SEC 10-K filing contains approximately 51,000 tokens (130+ pages). With 8,192 tokens, you could see less than 16% of a 10-K filing. It’s like reading a financial report through a keyhole! RAG emerged as an elegant solution borrowed directly from search engines. Just as Google displays 10 blue links with relevant snippets for your query, RAG retrieves the most pertinent document fragments and feeds them to the LLM for synthesis. The core idea is beautifully simple: if you can’t fit everything in context, find the most relevant pieces and use those . It turns LLMs into sophisticated search result summarizers. Basically, LLMs can’t read the whole book but they can know who dies at the end; convenient! Long documents need to be chunked into pieces and it’s when problems start. Those digestible pieces are typically 400-1,000 tokens each which is basically 300-750 words. The problem? It isn’t as simple as cutting every 500 words. Consider chunking a typical SEC 10-K annual report. The document has a complex hierarchical structure: - Item 1: Business Overview (10-15 pages) - Item 1A: Risk Factors (20-30 pages) - Item 7: Management’s Discussion and Analysis (30-40 pages) - Item 8: Financial Statements (40-50 pages) After naive chunking at 500 tokens, critical information gets scattered: - Revenue recognition policies split across 3 chunks - A risk factor explanation broken mid-sentence - Financial table headers separated from their data - MD&A narrative divorced from the numbers it’s discussing If you search for “revenue growth drivers,” you might get a chunk mentioning growth but miss the actual numerical data in a different chunk, or the strategic context from MD&A in yet another chunk! At Fintool, we’ve developed sophisticated chunking strategies that go beyond naive text splitting: - Hierarchical Structure Preservation : We maintain the nested structure from Item 1 (Business) down to sub-sections like geographic segments, creating a tree-like document representation - Table Integrity : Financial tables are never split—income statements, balance sheets, and cash flow statements remain atomic units with headers and data together - Cross-Reference Preservation : We maintain links between narrative sections and their corresponding financial data, preserving the “See Note X” relationships - Temporal Coherence : Year-over-year comparisons and multi-period analyses stay together as single chunks - Footnote Association : Footnotes remain connected to their referenced items through metadata linking Each chunk at Fintool is enriched with extensive metadata: - Filing type (10-K, 10-Q, 8-K) - Fiscal period and reporting date - Section hierarchy (Item 7 > Liquidity > Cash Position) - Table identifiers and types - Cross-reference mappings - Company identifiers (CIK, ticker) - Industry classification codes This allows for more accurate retrieval but even our intelligent chunking can’t solve the fundamental problem: we’re still working with fragments instead of complete documents! Once you have the chunks, you need a way to search them. One way is to embed your chunks. Each chunk is converted into a high‑dimensional vector (typically 1,536 dimensions in most embedding models). These vectors live in a space where, theoretically, similar concepts are close together. When a user asks a question, that question also becomes a vector. The system finds the chunks whose vectors are closest to the query vector using cosine similarity. It’s elegant in theory and in practice, it’s a nightmare of edge cases. Embedding models are trained on general text and struggle with specific terminologies. They find similarities but they can’t distinguish between “revenue recognition” (accounting policy) and “revenue growth” (business performance). Consider that example: Query: “ What is the company’s litigation exposure ? RAG searches for “litigation” and returns 50 chunks: - Chunks 1-10: Various mentions of “litigation” in boilerplate risk factors - Chunks 11-20: Historical cases from 2019 (already settled) - Chunks 21-30: Forward-looking safe harbor statements - Chunks 31-40: Duplicate descriptions from different sections - Chunks 41-50: Generic “we may face litigation” warnings What RAG Reports: $500M in litigation (from Legal Proceedings section) What’s Actually There: - $500M in Legal Proceedings (Item 3) - $700M in Contingencies note (”not material individually”) - $1B new class action in Subsequent Events - $800M indemnification obligations (different section) - $2B probable losses in footnotes (keyword “probable” not “litigation”) The actual Exposure is $5.1B. 10x what RAG found. Oupsy! By late 2023, most builders realized pure vector search wasn’t enough. Enter hybrid search: combine semantic search (embeddings) with the traditional keyword search (BM25). This is where things get interesting. BM25 (Best Matching 25) is a probabilistic retrieval model that excels at exact term matching. Unlike embeddings, BM25: - Rewards Exact Matches : When you search for “EBITDA,” you get documents with “EBITDA,” not “operating income” or “earnings” - Handles Rare Terms Better : Financial jargon like “CECL” (Current Expected Credit Losses) or “ASC 606” gets proper weight - Document Length Normalization : Doesn’t penalize longer documents - Term Frequency Saturation : Multiple mentions of “revenue” don’t overshadow other important terms At Fintool, we’ve built a sophisticated hybrid search system: 1. Parallel Processing : We run semantic and keyword searches simultaneously 2. Dynamic Weighting : Our system adjusts weights based on query characteristics: - Specific financial metrics? BM25 gets 70% weight - Conceptual questions? Embeddings get 60% weight - Mixed queries? 50/50 split with result analysis 3. Score Normalization : Different scoring scales are normalized using: - Min-max scaling for BM25 scores - Cosine similarity already normalized for embeddings - Z-score normalization for outlier handling So at the end the embeddings search and the keywords search retrieve chunks and the search engine combines them using Reciprocal Rank Fusion. RRF merges rankings so items that consistently appear near the top across systems float higher, even if no system put them at #1! So now you think it’s done right? But hell no! Here’s what nobody talks about: even after all that retrieval work, you’re not done. You need to rerank the chunks one more time to get a good retrieval and it’s not easy. Rerankers are ML models that take the search results and reorder them by relevance to your specific query limiting the number of chunks sent to the LLM. Not only LLMs are context poor, they also struggle when dealing with too much information . It’s vital to reduce the number of chunks sent to the LLM for the final answer. The Reranking Pipeline: 1. Initial search retrieval with embeddings + keywords gets you 100-200 chunks 2. Reranker ranks the top 10 3. Top 10 are fed to the LLM to answer the question Here is the challenge with reranking: - Latency Explosion : Rerank adds between 300-2000ms per query. Ouch. - Cost Multiplication : it adds significant extra cost to every query. For instance, Cohere Rerank 3.5 costs $2.00 per 1,000 search units, making reranking expensive. - Context Limits : Rerankers typically handle few chunks (Cohere Rerank supports only 4096 tokens), so if you need to re-rank more than that, you have to split it into different parallel API calls and merge them! - Another Model to Manage : One more API, one more failure point Re-rank is one more step in a complex pipeline. What I find difficult with RAG is what I call the “cascading failure problem”. 1. Chunking can fail (split tables) or be too slow (especially when you have to ingest and chunk gigabytes of data in real-time) 2. Embedding can fail (wrong similarity) 3. BM25 can fail (term mismatch) 4. Hybrid fusion can fail (bad weights) 5. Reranking can fail (wrong priorities) Each stage compounds the errors of the previous stage. Beyond the complexity of hybrid search itself, there’s an infrastructure burden that’s rarely discussed. Running production Elasticsearch is not easy. You’re looking at maintaining TB+ of indexed data for comprehensive document coverage, which requires 128-256GB RAM minimum just to get decent performance. The real nightmare comes with re-indexing. Every schema change forces a full re-indexing that takes 48-72 hours for large datasets. On top of that, you’re constantly dealing with cluster management, sharding strategies, index optimization, cache tuning, backup and disaster recovery, and version upgrades that regularly include breaking changes. Here are some structural limitations: 1. Context Fragmentation - Long documents are interconnected webs, not independent paragraphs - A single question might require information from 20+ documents - Chunking destroys these relationships permanently 2. Semantic Search Fails on Numbers - “$45.2M” and “$45,200,000” have different embeddings - “Revenue increased 10%” and “Revenue grew by a tenth” rank differently - Tables full of numbers have poor semantic representations 3. No Causal Understanding - RAG can’t follow “See Note 12” → Note 12 → Schedule K - Can’t understand that discontinued operations affect continuing operations - Can’t trace how one financial item impacts another 4. The Vocabulary Mismatch Problem - Companies use different terms for the same concept - “Adjusted EBITDA” vs “Operating Income Before Special Items” - RAG retrieves based on terms, not concepts 5. Temporal Blindness - Can’t distinguish Q3 2024 from Q3 2023 reliably - Mixes current period with prior period comparisons - No understanding of fiscal year boundaries These aren’t minor issues. They’re fundamental limitations of the retrieval paradigm. Three months ago I stumbled on an innovation on retrievial that blew my mind In May 2025, Anthropic released Claude Code, an AI coding agent that works in the terminal. At first, I was surprised by the form factor. A terminal? Are we back in 1980? no UI? Back then, I was using Cursor, a product that excelled at traditional RAG. I gave it access to my codebase to embed my files and Cursor ran a search n my codebase before answering my query. Life was good. But when testing Claude Code, one thing stood out: It was better and faster and not because their RAG was better but because there was no RAG. Instead of a complex pipeline of chunking, embedding, and searching, Claude Code uses direct filesystem tools: 1. Grep (Ripgrep) - Lightning-fast regex search through file contents - No indexing required. It searches live files instantly - Full regex support for precise pattern matching - Can filter by file type or use glob patterns - Returns exact matches with context lines - Direct file discovery by name patterns - Finds files like `**/*.py` or `src/**/*.ts` instantly - Returns files sorted by modification time (recency bias) - Zero overhead—just filesystem traversal 3. Task Agents - Autonomous multi-step exploration - Handle complex queries requiring investigation - Combine multiple search strategies adaptively - Build understanding incrementally - Self-correct based on findings By the way, Grep was invented in 1973. It’s so... primitive. And that’s the genius of it. Claude Code doesn’t retrieve. It investigates: - Runs multiple searches in parallel (Grep + Glob simultaneously) - Starts broad, then narrows based on discoveries - Follows references and dependencies naturally - No embeddings, no similarity scores, no reranking It’s simple, it’s fast and it’s based on a new assumption that LLMs will go from context poor to context rich. Claude Code proved that with sufficient context and intelligent navigation, you don’t need RAG at all. The agent can: - Load entire files or modules directly - Follow cross-references in real-time - Understand structure and relationships - Maintain complete context throughout investigation This isn’t just better than RAG—it’s a fundamentally different paradigm. And what works for code can work for any long documents that are not coding files. The context window explosion made Claude Code possible: 2022-2025 Context-Poor Era: - GPT-4: 8K tokens (~12 pages) - GPT-4-32k: 32K tokens (~50 pages) 2025 and beyond Context Revolution: - Claude Sonnet 4: 200k tokens (~700 pages) - Gemini 2.5: 1M tokens (~3,000 pages) - Grok 4-fast: 2M tokens (~6,000 pages) At 2M tokens, you can fit an entire year of SEC filings for most companies. The trajectory is even more dramatic: we’re likely heading toward 10M+ context windows by 2027, with Sam Altman hinting at billions of context tokens on the horizon. This represents a fundamental shift in how AI systems process information. Equally important, attention mechanisms are rapidly improving—LLMs are becoming far better at maintaining coherence and focus across massive context windows without getting “lost” in the noise. Claude Code demonstrated that with enough context, search becomes navigation: - No need to retrieve fragments when you can load complete files - No need for similarity when you can use exact matches - No need for reranking when you follow logical paths - No need for embeddings when you have direct access It’s mind-blowing. LLMs are getting really good at agentic behaviors meaning they can organize their work into tasks to accomplish an objective. Here’s what tools like ripgrep bring to the search table: - No Setup : No index. No overhead. Just point and search. - Instant Availability : New documents are searchable the moment they hit the filesystem (no indexing latency!) - Zero Maintenance : No clusters to manage, no indices to optimize, no RAM to provision - Blazing Fast : For a 100K line codebase, Elasticsearch needs minutes to index. Ripgrep searches it in milliseconds with zero prep. - Cost : $0 infrastructure cost vs a lot of $$$ for Elasticsearch So back to our previous example on SEC filings. An agent can SEC filing structure intrinsically: - Hierarchical Awareness : Knows that Item 1A (Risk Factors) relates to Item 7 (MD&A) - Cross-Reference Following : Automatically traces “See Note 12” references - Multi-Document Coordination : Connects 10-K, 10-Q, 8-K, and proxy statements - Temporal Analysis : Compares year-over-year changes systematically For searches across thousands of companies or decades of filings, it might still use hybrid search, but now as a tool for agents: - Initial broad search using hybrid retrieval - Agent loads full documents for top results - Deep analysis within full context - Iterative refinement based on findings My guess is traditional RAG is now a search tool among others and that agents will always prefer grep and reading the whole file because they are context rich and can handle long-running tasks. Consider our $6.5B lease obligation question as an example: Step 1: Find “lease” in main financial statements → Discovers “See Note 12” Step 2: Navigate to Note 12 → Finds “excluding discontinued operations (Note 23)” Step 3: Check Note 23 → Discovers $2B additional obligations Step 4: Cross-reference with MD&A → Identifies management’s explanation and adjustments Step 5: Search for “subsequent events” → Finds post-balance sheet $500M lease termination Final answer: $5B continuing + $2B discontinued - $500M terminated = $6.5B The agent follows references like a human analyst would. No chunks. No embeddings. No reranking. Just intelligent navigation. Basically, RAG is like a research assistant with perfect memory but no understanding: - “Here are 50 passages that mention debt” - Can’t tell you if debt is increasing or why - Can’t connect debt to strategic changes - Can’t identify hidden obligations - Just retrieves text, doesn’t comprehend relationships Agentic search is like a forensic accountant: - Follows the money systematically - Understands accounting relationships (assets = liabilities + equity) - Identifies what’s missing or hidden - Connects dots across time periods and documents - Challenges management assertions with data 1. Increasing Document Complexity - Documents are becoming longer and more interconnected - Cross-references and external links are proliferating - Multiple related documents need to be understood together - Systems must follow complex trails of information 2. Structured Data Integration - More documents combine structured and unstructured data - Tables, narratives, and metadata must be understood together - Relationships matter more than isolated facts - Context determines meaning 3. Real-Time Requirements - Information needs instant processing - No time for re-indexing or embedding updates - Dynamic document structures require adaptive approaches - Live data demands live search 4. Cross-Document Understanding Modern analysis requires connecting multiple sources: - Primary documents - Supporting materials - Historical versions - Related filings RAG treats each document independently. Agentic search builds cumulative understanding. 5. Precision Over Similarity - Exact information matters more than similar content - Following references beats finding related text - Structure and hierarchy provide crucial context - Navigation beats retrieval The evidence is becoming clear. While RAG served us well in the context-poor era, agentic search represents a fundamental evolution. The potential benefits of agentic search are compelling: - Elimination of hallucinations from missing context - Complete answers instead of fragments - Faster insights through parallel exploration - Higher accuracy through systematic navigation - Massive infrastructure cost reduction - Zero index maintenance overhead The key insight? Complex document analysis—whether code, financial filings, or legal contracts—isn’t about finding similar text. It’s about understanding relationships, following references, and maintaining precision. The combination of large context windows and intelligent navigation delivers what retrieval alone never could. RAG was a clever workaround for a context-poor era . It helped us bridge the gap between tiny windows and massive documents, but it was always a band-aid. The future won’t be about splitting documents into fragments and juggling embeddings. It will be about agents that can navigate, reason, and hold entire corpora in working memory. We are entering the post-retrieval age. The winners will not be the ones who maintain the biggest vector databases, but the ones who design the smartest agents to traverse abundant context and connect meaning across documents. In hindsight, RAG will look like training wheels. Useful, necessary, but temporary. The next decade of AI search will belong to systems that read and reason end-to-end. Retrieval isn’t dead—it’s just been demoted.

Machine Learning

Python

AI

TypeScript

0 views

Nicolas Bustamante 1 years ago

But But, You Were Supposed to Be a GPT Wrapper?!

My team and I are building Fintool, Warren Buffett as a service . It's a set of AI agents that analyze massive amounts of financial data and documents to assist institutional investors in making investment decisions. To simplify for customers, we explain Fintool as a sort of ChatGPT on SEC filings and earnings calls. We got our fair share of "yOU aRe JuST a GPT wRapPer" from people who had no clue what they were talking about but wanted to sound smart and provocative. Anyway! For more serious people I thought it would be nice to disclose our infrastructure and unique challenges. Our goal is to ingest as much financial data as possible—ranging from news, management presentations, internal notes, broker research, market data, rating agency reports, alternative data, internal data and much more. We started with SEC filings, but our infrastructure is designed to scale and adapt, with no limit to the types of data sources it can handle. Our data ingestion pipeline uses Apache Spark to efficiently process vast amounts of structured and unstructured data. The primary data source is the SEC database, which provides, on average, around 3,000 filings daily. We've built a custom Spark job to pull data from the SEC, process HTML files, and distribute the workload across our Spark cluster for real-time ingestion. With SEC filings and earnings calls alone, we manage 70 million chunks, 2 million documents, and around 5 TB of data in Databricks for every ten years of data. Many documents are unstructured and often exceed 200 pages in length. Each data source has a dedicated Spark streaming job, ensuring a continuous flow of data into our system, making Fintool one of the very few real-time systems in production in our market. We outperform nearly all incumbents in processing time, often being hours faster. Monitoring the 100% uptime of all these pipelines and catching errors early is a significant challenge. Any failure in these processes could lead to incomplete or delayed data, affecting the reliability of Fintool. Our customers can’t miss a company earnings or an 8-K filing announcing that an executive is departing the company. To address this, we have built robust monitoring tools that help us detect and resolve issues swiftly, ensuring the system remains operational and dependable. To make sense of the different formats, we've developed a custom parser that can handle both structured and unstructured data. This parser extracts millions of data points using a combination of unsupervised machine learning models, all optimized for financial documents. For instance, extracting tables with numerical data and footnotes accurately presents unique challenges, as it requires ensuring the numbers are correctly linked to their respective headers and that important context from footnotes is preserved. Imagine a company reports non-GAAP earnings with a footnote clarifying that $2 billion in employee stock-based compensation isn’t included; without accounting for that $2 billion, the earnings figures could be misleading! One of our goals is to handle as many complex operations offline as possible. By doing this, we save on costs and improve quality, as it allows us to thoroughly analyze the output—something that is not feasible during real-time user queries. We have recently partnered with OpenAI on a research project to use LLMs to extract every data point in SEC filings. Every week, we process 50 billion tokens, equivalent to 468,750 books of 200 pages each, or 12 times the size of Wikipedia. Accounting is exceptionally complex. SEC filings often use different terminologies or formats for similar items—terms like “Revenue,” “Net Sales,” or “Turnover” vary by company or industry—making consistent data extraction a challenge. Key figures like "Net Income" may come with footnotes detailing adjustments (e.g., “excluding litigation costs”), and companies frequently report figures for different time periods, such as quarterly versus year-to-date, within the same filing. Some companies don’t report in USD, and others occasionally change accounting methods (e.g., revenue recognition policies), noted in footnotes, which requires careful adjustments to make financials comparable over time. It’s complex, but Fintool is bringing order to it all. Our advanced data pipelines are engineered to locate, verify, deduplicate, and cross-compare every data point, ensuring unmatched accuracy and insight. This is how we've built the most reliable financial fundamentals database on the market! Next, we break down these documents into manageable, meaningful segments while preserving context—crucial for downstream tasks like search and question answering. We use a sliding window approach with a variable-sized window (typically 400 tokens) to ensure coherence between segments. We also employ hierarchical chunking to create a tree-like structure of document sections, capturing everything from top-level sections like "Financial Statements" to specific sub-sections. Our system treats tables as atomic units, keeping table headers and data cells intact for accuracy. To maintain context, each chunk is enriched with metadata (e.g., document title, section headers), and we use an overlap strategy where consecutive chunks share a small overlap (about 10%) to ensure continuity. This allows us to accurately capture the narrative, even in long documents - a 10-K annual report is between 150 to 200 pages. Those docs are then ready to be embedded! We compute embeddings for each document chunk using a fine-tuned open-source model running on our GPUs. This model was fine-tuned on hundreds of real-life examples from expert financial questions. These embeddings allow us to represent complex financial data in a way that captures semantic meaning. For example, if a document mentions 'net income growth' alongside 'operating cash flow trends,' the embeddings capture the relationship between these terms, allowing the system to understand the context and link related financial concepts effectively. The embedding computation pipeline processes data in batches and stores the results in Elasticsearch, which supports vector storage and search through its dense_vector field type. Elasticsearch enables k-nearest neighbor (kNN) search using similarity metrics such as cosine similarity and dot product. Since we normalize our embeddings to unit length, cosine similarity and dot product yield equivalent results, allowing us to use either for efficient similarity search. We chose not to use a dedicated vector database, as it would add complexity and reduce performance, particularly when merging results from both keyword and vector searches. Managing this combination effectively without compromising speed and accuracy is challenging, which is why we opted for this more streamlined approach. To speed up our embeddings search, we quantize the embeddings, compressing them to significantly reduce memory usage—by as much as 75%. This reduction means we can access and process data faster, allowing for quicker responses while maintaining effective search performance. Quantization not only optimizes memory but also boosts efficiency across the entire search process. Our search infrastructure integrates both keyword-based and semantic search methods to deliver accurate and comprehensive answers. For keyword search, we use an enhanced BM25 algorithm, which helps us find relevant information based on traditional keyword matching. On the semantic side, we leverage vector-based similarity search using ElasticSearch to locate information based on meaning rather than just keywords. Despite all the buzz around vector search, our evaluations revealed that relying on vector search alone falls short of expectations. While many startups offer vector databases combined with vector search as a service, we have more confidence in Elastic's technology. Through extensive optimizations, we’ve achieved a streamlined Elastic index of approximately 500GB, containing about 2 million documents for every 10 years of data This combination of keyword and semantic search allows us to achieve hybrid retrieval, which significantly enhances search relevance and accuracy. For example, keyword search is ideal for finding specific financial terms like 'net income,' which require precise matching. Meanwhile, vector search helps understand broader questions, such as "companies showing signs of liquidity stress," which involves context and relationships between multiple financial metrics. We then use reranking techniques to improve retrieval performance. Our re-ranker takes a list of candidate chunks and uses a cross-encoder model to assign a relevance score, ensuring the most relevant chunks are prioritized. This cross-encoder model allows for a deeper and more precise evaluation of the relationship between the query and each document, resulting in significantly more accurate final rankings. Re-ranking can add hundreds of milliseconds of latency but, in our experience, is worth it. Talking about improving the search, we are currently exploring knowledge graphs since the publication of the GraphRAG framework by Microsoft. It uses an LLM to automatically extract data points to create a rich graph from a collection of text documents. This graph represents entities as nodes, relationships as edges, and claims as covariates on edges. An example of a node in the knowledge graph could be 'Apple Inc. (AAPL)' as an entity, representing the company. Relationships (edges) might include connections like 'has CEO' linked to 'Tim Cook' or 'sold shares on [date].' These nodes and relationships help institutional investors quickly identify key details about companies, such as executive leadership changes, important filings, or financial events. GraphRAG automatically generates summaries for these entities. When a user asks a query, we will leverage the knowledge graph and community summaries to provide more structured and contextually relevant information compared to traditional retrieval-augmented generation approaches. For example, an institutional investor might ask, "Which companies in the S&P 500 are experiencing liquidity stress and have recently made executive changes?" GraphRAG supports both global search to reason about the holistic context (e.g., liquidity stress across the market) and local search for specific entities (e.g., identifying companies with recent executive changes). This hybrid approach helps connect disparate pieces of information, providing more comprehensive and insightful answers. The challenge with GraphRAG search lies in the high cost of both building and querying the graph, as well as managing query-time latency and integrating it with our keyword + vector search. A potential solution could be an efficient, fast classifier to reserve GraphSearch for only the most complex queries. We use LLMs for a variety of tasks such as understanding the query, expanding it, and classifying its type. For each user query, we trigger multiple classifiers that help determine whether the question requires searching specific filings, calculating numerical values, or taking other specific actions. To handle these tasks, we utilize a variety of LLMs—from proprietary models to open-source Llama models, with different sizes and providers to balance speed and cost. For instance, we might use OpenAI GPT4o for complex tasks and Llama-3 8B on Groq, a specialized provider for fast inference, for simpler tasks. We created an LLM Benchmarking Service that continuously evaluates the performance of these models across numerous tasks. This service helps us dynamically route each query to the best-performing model. Having a model-agnostic interface is crucial to ensure we are not constrained by any particular model, especially with new models emerging every six months with enhanced capabilities. This flexibility allows us to always leverage the best available tools for optimal performance. We don't spend any resources training or fine-tuning our own models - we wrote about this strategy in Burning Billions: The Gamble Behind Training LLM Models . As you can see, answering a user's question is not trivial. It relies on a massive infrastructure, dozens of classifiers, and a hybrid retrieval pipeline. Additionally, we use a specialized LLM pipeline to generate accurate citations for every piece of information in the response, which also serves as a way to fact-check everything the LLM outputs. For example, if the answer references a specific SEC filing, the LLM provides an exact citation, guiding the user directly to the original document. Subscribe now Evaluating and monitoring an LLM-based Retrieval Augmented Generation system presents its own challenges. Any problem could originate from various components—such as data pipelines, machine learning models for structuring data, the retrieval search and vector representation, the reranker, or the LLM itself. Identifying the root cause of an issue requires a comprehensive understanding of each part of the infrastructure and its interactions, ensuring that every step contributes effectively to the overall accuracy and reliability of the system. To address these challenges, we have developed specialized monitoring tools that help us catch potential errors across the entire pipeline. We also use Datadog to store a lot of logs so we can quickly identify and fix production issues. Obviously, we want to catch errors early so we always benchmark our product against finance-specific benchmarks. The catch is that some improvements can improve our embeddings but might deteriorate the overall performance of the product. As you see, it’s very complex! There is so much more we could talk about, and I hope this provides a broad overview of our approach. Each of these sections could easily be expanded into a dedicated blog post! In short, I believe that making LLMs work in finance is both highly challenging and immensely rewarding. We're steadily building our infrastructure piece by piece, productizing and delivering each advancement along the way. Our ultimate goal is to create an autonomous "Warren Buffett as a Service" that can handle the workload of dozens of analysts, transforming the financial research landscape. Let me finish by sharing some of the things I'm most excited about for the future Faster inference Many companies are working on specialized chips that are designed to deliver extremely low-latency, high-throughput processing with high parallelism. Today, we are using Groq a provider capable of streaming at 800 tokens per second, but they are now claiming they can reach 2000 tokens per second. To put this into context, processing at multiple thousands tokens per second means that complex responses will be delivered almost instantaneously. I'm more excited by faster inferences than by smaller models like LLama 8B or Mistral 3B. While smaller LLMs are useful because they are faster, if larger models become extremely efficient and deliver superior intelligence, there may be no need for smaller models. The power of large, smart models would make them the optimal choice for most tasks. Why does this matter? With such speed, an advanced AI agent can take control of Fintool to analyze thousands of companies simultaneously, performing billions of searches on company data in a fraction of the time. Imagine if Warren Buffett could read all filings, compute numbers, and analyze management teams instantly for thousands of companies. Cheaper cost per token I'm excited by the price of superintelligence getting closer to zero. The cost per GPT token has already dropped by 99%, and I'm confident it will continue to drop due to intense competition between major players like Microsoft and Meta, as well as innovations in semiconductors and economies of scale with large data centers. With costs continuing to decrease, we are approaching a future where large-scale AI computations are affordable, enabling widespread adoption and insane innovations. Autonomous AI Agents Multi-Agent Systems, which consist of AI agents that can work independently or collaborate with other agents to perform complex tasks. For example, these agents could autonomously collaborate in stress-testing scenarios or optimize complex investment strategies. Additionally, Self-Healing Systems, capable of real-time monitoring, debugging, and repairing themselves could, for instance, detect and correct discrepancies in market data or errors in algorithms, enhancing reliability and resilience. Onwards! Our data ingestion pipeline uses Apache Spark to efficiently process vast amounts of structured and unstructured data. The primary data source is the SEC database, which provides, on average, around 3,000 filings daily. We've built a custom Spark job to pull data from the SEC, process HTML files, and distribute the workload across our Spark cluster for real-time ingestion. With SEC filings and earnings calls alone, we manage 70 million chunks, 2 million documents, and around 5 TB of data in Databricks for every ten years of data. Many documents are unstructured and often exceed 200 pages in length. Each data source has a dedicated Spark streaming job, ensuring a continuous flow of data into our system, making Fintool one of the very few real-time systems in production in our market. We outperform nearly all incumbents in processing time, often being hours faster. Monitoring the 100% uptime of all these pipelines and catching errors early is a significant challenge. Any failure in these processes could lead to incomplete or delayed data, affecting the reliability of Fintool. Our customers can’t miss a company earnings or an 8-K filing announcing that an executive is departing the company. To address this, we have built robust monitoring tools that help us detect and resolve issues swiftly, ensuring the system remains operational and dependable. 50 Billions Tokens per Week? Parsing Complex Financial Data To make sense of the different formats, we've developed a custom parser that can handle both structured and unstructured data. This parser extracts millions of data points using a combination of unsupervised machine learning models, all optimized for financial documents. For instance, extracting tables with numerical data and footnotes accurately presents unique challenges, as it requires ensuring the numbers are correctly linked to their respective headers and that important context from footnotes is preserved. Imagine a company reports non-GAAP earnings with a footnote clarifying that $2 billion in employee stock-based compensation isn’t included; without accounting for that $2 billion, the earnings figures could be misleading! One of our goals is to handle as many complex operations offline as possible. By doing this, we save on costs and improve quality, as it allows us to thoroughly analyze the output—something that is not feasible during real-time user queries. We have recently partnered with OpenAI on a research project to use LLMs to extract every data point in SEC filings. Every week, we process 50 billion tokens, equivalent to 468,750 books of 200 pages each, or 12 times the size of Wikipedia. Accounting is exceptionally complex. SEC filings often use different terminologies or formats for similar items—terms like “Revenue,” “Net Sales,” or “Turnover” vary by company or industry—making consistent data extraction a challenge. Key figures like "Net Income" may come with footnotes detailing adjustments (e.g., “excluding litigation costs”), and companies frequently report figures for different time periods, such as quarterly versus year-to-date, within the same filing. Some companies don’t report in USD, and others occasionally change accounting methods (e.g., revenue recognition policies), noted in footnotes, which requires careful adjustments to make financials comparable over time. It’s complex, but Fintool is bringing order to it all. Our advanced data pipelines are engineered to locate, verify, deduplicate, and cross-compare every data point, ensuring unmatched accuracy and insight. This is how we've built the most reliable financial fundamentals database on the market! Smart Chunking for Context-Aware Document Segmentation Next, we break down these documents into manageable, meaningful segments while preserving context—crucial for downstream tasks like search and question answering. We use a sliding window approach with a variable-sized window (typically 400 tokens) to ensure coherence between segments. We also employ hierarchical chunking to create a tree-like structure of document sections, capturing everything from top-level sections like "Financial Statements" to specific sub-sections. Our system treats tables as atomic units, keeping table headers and data cells intact for accuracy. To maintain context, each chunk is enriched with metadata (e.g., document title, section headers), and we use an overlap strategy where consecutive chunks share a small overlap (about 10%) to ensure continuity. This allows us to accurately capture the narrative, even in long documents - a 10-K annual report is between 150 to 200 pages. Those docs are then ready to be embedded! Custom Embeddings for Semantic Representation We compute embeddings for each document chunk using a fine-tuned open-source model running on our GPUs. This model was fine-tuned on hundreds of real-life examples from expert financial questions. These embeddings allow us to represent complex financial data in a way that captures semantic meaning. For example, if a document mentions 'net income growth' alongside 'operating cash flow trends,' the embeddings capture the relationship between these terms, allowing the system to understand the context and link related financial concepts effectively. The embedding computation pipeline processes data in batches and stores the results in Elasticsearch, which supports vector storage and search through its dense_vector field type. Elasticsearch enables k-nearest neighbor (kNN) search using similarity metrics such as cosine similarity and dot product. Since we normalize our embeddings to unit length, cosine similarity and dot product yield equivalent results, allowing us to use either for efficient similarity search. We chose not to use a dedicated vector database, as it would add complexity and reduce performance, particularly when merging results from both keyword and vector searches. Managing this combination effectively without compromising speed and accuracy is challenging, which is why we opted for this more streamlined approach. To speed up our embeddings search, we quantize the embeddings, compressing them to significantly reduce memory usage—by as much as 75%. This reduction means we can access and process data faster, allowing for quicker responses while maintaining effective search performance. Quantization not only optimizes memory but also boosts efficiency across the entire search process. Search Infra: Combining Keywords and Semantic Search Our search infrastructure integrates both keyword-based and semantic search methods to deliver accurate and comprehensive answers. For keyword search, we use an enhanced BM25 algorithm, which helps us find relevant information based on traditional keyword matching. On the semantic side, we leverage vector-based similarity search using ElasticSearch to locate information based on meaning rather than just keywords. Despite all the buzz around vector search, our evaluations revealed that relying on vector search alone falls short of expectations. While many startups offer vector databases combined with vector search as a service, we have more confidence in Elastic's technology. Through extensive optimizations, we’ve achieved a streamlined Elastic index of approximately 500GB, containing about 2 million documents for every 10 years of data This combination of keyword and semantic search allows us to achieve hybrid retrieval, which significantly enhances search relevance and accuracy. For example, keyword search is ideal for finding specific financial terms like 'net income,' which require precise matching. Meanwhile, vector search helps understand broader questions, such as "companies showing signs of liquidity stress," which involves context and relationships between multiple financial metrics. We then use reranking techniques to improve retrieval performance. Our re-ranker takes a list of candidate chunks and uses a cross-encoder model to assign a relevance score, ensuring the most relevant chunks are prioritized. This cross-encoder model allows for a deeper and more precise evaluation of the relationship between the query and each document, resulting in significantly more accurate final rankings. Re-ranking can add hundreds of milliseconds of latency but, in our experience, is worth it. Share Knowledge Graph, the Next Step to Connect the Dots Talking about improving the search, we are currently exploring knowledge graphs since the publication of the GraphRAG framework by Microsoft. It uses an LLM to automatically extract data points to create a rich graph from a collection of text documents. This graph represents entities as nodes, relationships as edges, and claims as covariates on edges. An example of a node in the knowledge graph could be 'Apple Inc. (AAPL)' as an entity, representing the company. Relationships (edges) might include connections like 'has CEO' linked to 'Tim Cook' or 'sold shares on [date].' These nodes and relationships help institutional investors quickly identify key details about companies, such as executive leadership changes, important filings, or financial events. GraphRAG automatically generates summaries for these entities. When a user asks a query, we will leverage the knowledge graph and community summaries to provide more structured and contextually relevant information compared to traditional retrieval-augmented generation approaches. For example, an institutional investor might ask, "Which companies in the S&P 500 are experiencing liquidity stress and have recently made executive changes?" GraphRAG supports both global search to reason about the holistic context (e.g., liquidity stress across the market) and local search for specific entities (e.g., identifying companies with recent executive changes). This hybrid approach helps connect disparate pieces of information, providing more comprehensive and insightful answers. The challenge with GraphRAG search lies in the high cost of both building and querying the graph, as well as managing query-time latency and integrating it with our keyword + vector search. A potential solution could be an efficient, fast classifier to reserve GraphSearch for only the most complex queries. LLM Benchmarking: Routing to the Best Model We use LLMs for a variety of tasks such as understanding the query, expanding it, and classifying its type. For each user query, we trigger multiple classifiers that help determine whether the question requires searching specific filings, calculating numerical values, or taking other specific actions. To handle these tasks, we utilize a variety of LLMs—from proprietary models to open-source Llama models, with different sizes and providers to balance speed and cost. For instance, we might use OpenAI GPT4o for complex tasks and Llama-3 8B on Groq, a specialized provider for fast inference, for simpler tasks. We created an LLM Benchmarking Service that continuously evaluates the performance of these models across numerous tasks. This service helps us dynamically route each query to the best-performing model. Having a model-agnostic interface is crucial to ensure we are not constrained by any particular model, especially with new models emerging every six months with enhanced capabilities. This flexibility allows us to always leverage the best available tools for optimal performance. We don't spend any resources training or fine-tuning our own models - we wrote about this strategy in Burning Billions: The Gamble Behind Training LLM Models . As you can see, answering a user's question is not trivial. It relies on a massive infrastructure, dozens of classifiers, and a hybrid retrieval pipeline. Additionally, we use a specialized LLM pipeline to generate accurate citations for every piece of information in the response, which also serves as a way to fact-check everything the LLM outputs. For example, if the answer references a specific SEC filing, the LLM provides an exact citation, guiding the user directly to the original document. Subscribe now LLM Evaluation and Monitoring Evaluating and monitoring an LLM-based Retrieval Augmented Generation system presents its own challenges. Any problem could originate from various components—such as data pipelines, machine learning models for structuring data, the retrieval search and vector representation, the reranker, or the LLM itself. Identifying the root cause of an issue requires a comprehensive understanding of each part of the infrastructure and its interactions, ensuring that every step contributes effectively to the overall accuracy and reliability of the system. To address these challenges, we have developed specialized monitoring tools that help us catch potential errors across the entire pipeline. We also use Datadog to store a lot of logs so we can quickly identify and fix production issues. Obviously, we want to catch errors early so we always benchmark our product against finance-specific benchmarks. The catch is that some improvements can improve our embeddings but might deteriorate the overall performance of the product. As you see, it’s very complex! There is so much more we could talk about, and I hope this provides a broad overview of our approach. Each of these sections could easily be expanded into a dedicated blog post! In short, I believe that making LLMs work in finance is both highly challenging and immensely rewarding. We're steadily building our infrastructure piece by piece, productizing and delivering each advancement along the way. Our ultimate goal is to create an autonomous "Warren Buffett as a Service" that can handle the workload of dozens of analysts, transforming the financial research landscape. Let me finish by sharing some of the things I'm most excited about for the future Faster inference Cheaper cost per token Autonomous AI Agents

AI

Backend

HTML

0 views

Nicolas Bustamante 1 years ago

Fintool, Warren Buffett as a Service

As a dedicated Warren Buffett fan, I’ve made it a point to attend the Berkshire Hathaway Annual Meeting every year since I moved to the US. His personal values have greatly influenced my ethics in life, and I'm fascinated by his approach to business. I've written numerous blog posts over the years on investing , competitive moats , Intelligent CEO s, or whether to buy a house —all inspired by Buffett. Concepts like margin of safety and buying below intrinsic value were key to running and eventually selling my previous startup. When I sold my previous company—a legal search engine powered by AI—I invested a portion of my gains into BRK stocks, trusting in Buffett’s methodology. But as someone who has spent over a decade working in AI, a question kept nagging at me: Could an advanced language model do what Warren Buffett does? Jim Simons from Renaissance Technology made over $100B in profits by using machine learning to analyze vast amounts of quantitative data to identify subtle patterns and anomalies that can be exploited for trading. He relies heavily on quantitative data, but what if we could now do the same for qualitative textual data now that LLMs have reasoning capabilities? Warren Buffett's letters, biographies, and investment decisions provide a wealth of knowledge about how to find, analyze, and understand companies. There are even textbooks on value investing that detail the step-by-step process. What if we could break down Buffett’s process into individual tasks and use an AI agent to replicate his approach? At Fintool, we took on that challenge. We deconstructed most of the tasks that Buffett performs to analyze a business—reading SEC filings, understanding earnings, evaluating management decisions—and we built an AI financial analyst to handle these tasks with precision and scale. In some fields, like law, language models are already performing well. Ask an AI to draft an NDA or a Share Purchase Agreement (SPA), and it can quickly generate a document that’s almost ready to go, with minor tweaks. At worst, you might need to provide some context or feed in additional documents, but the model already knows the structure and intent. Ask ChatGPT to generate a Non-Disclosure Agreement (NDA) for a software company and it will do great. Ask ChatGPT to analyze the owner earnings over the past 5 years of founder-led companies in the S&P 500 and it will fail. Finance demands both the strengths and exposes the weaknesses of LLMs. Financial professionals require real-time data, but advanced LLMs like GPT-4 have a knowledge cut-off of October 2023. There is zero tolerance for errors—hallucinations simply aren't acceptable when billions of dollars are at stake. Finance involves processing vast numerical data, an area where LLMs often struggle, and requires scanning multiple companies comprehensively, while LLMs can struggle to effectively analyze even a single one. The combination of financial data complexity, the need for speed, and absolute accuracy makes it one of the toughest challenges for AI to tackle. Let's go back to our question: Compare the owner earnings over the past 5 years of founder-led companies in the S&P 500. Our LLM Warren Buffett needs to do the following: Identify founder-led companies within the S&P 500 by reading at least 500 DEF14A Proxy Statements (approximately 100 pages per document). Understand that Owner Earnings = Net Income + Depreciation and Amortization + Non-Cash Charges - Capital Expenditures (required to maintain the business) - Changes in Working Capital. Extract financial data from the past 5 years (net income, CapEx, working capital changes) for the 500 companies by reading at least 2,500 annual reports. Compute the data by comparing year-over-year owner earnings growth or decline, looking at trends such as increasing CapEx, expanding net income, or significant working capital changes. Write a comprehensive, error-proof report. This is very hard, every step have to be correct. Institutional investors ask hundreds of questions like that. By reading Buffett's shareholder letters, biographies, and value investing textbooks, we broke down Buffett's workflow into specific tasks. Then, we started building our infrastructure piece by piece to replicate these tasks for institutional investors, allowing them to quantitatively and qualitatively analyze a business. I won't go into the hundreds of tasks we identified, but for instance, we created a "screener API" where you can ask qualitative questions on thousands of companies, like " Which tech companies are discussing increasing Capex for AI initiatives? ". With just one data type—SEC filings and earnings calls—we have 70 million chunks, 2 million documents, approximately 500GB of data in Elastic, and around 5TB of data in Databricks for every ten years of data. And that's just one part of the vast amount of data we handle! From Fintool company screener We also built another API for our agents that can retrieve any number from any filings, along with its source. Additionally, we have an API that excels at computing numbers efficiently. For that challenge, we have partnered with OpenAI on a research project to use LLMs to extract every data point in SEC filings. Every week, we process 50 billion tokens, equivalent to 468,750 books of 200 pages each, or 12 times the size of Wikipedia. Our sophisticated data pipelines are designed to locate, verify, deduplicate, and compare every data point for accuracy and insight. Fintool “Spreadsheet Builder” answering a question on precise data points We are continuously adding new capabilities to our infrastructure. Our Warren Buffett Agent will use these APIs around the clock to find investment opportunities, analyze them, and respond to customer requests. Although the final product is still in development, we already have a live version in use. The results are promising. Fintool reaches 97% in FinanceBench , the industry-leading benchmark for financial questions for public equity analysts, far outpacing any other models. Delivering Practical Value to Customers Today I refuse to let our website be a placeholder with vague statements like "we are an AI lab building financial agents." Instead, every part of our growing infrastructure is put to practical use and sold to real customers, including major hedge funds like Kennedy Capital and companies like PwC. Their feedback is essential in refining our product, which we believe will be a significant advancement for the industry. Today, customers use Fintool to ask broad questions like " List consumer staples companies in the S&P 500 that are discussing shrinkage? " or niche questions like " Break down Nvidia CEO compensation and equity package ." They can also configure AI agents to scan news filings for critical information such as an executive departure or earnings restatements. This is only the beginning. Institutional investors are among the most highly paid knowledge workers in the world. They make millions for their ability to sift through thousands of SEC filings, spot insights, and make calculated decisions on which companies to back. As Greylock noted in their article on vertical AI : “There are several attributes that make financial services well-suited to AI. The market is huge, with $11 trillion in market cap in the U.S. alone, and there's demonstrated demand for AI tools.” We couldn’t agree more. When you look at the daily responsibilities of these professionals, it’s easy to see where AI fits in. The work requires a mix of mathematical expertise and human judgment. Yet, a significant portion of their workload involves mundane, manual tasks—tasks that Fintool’s AI can automate and optimize. Subscribe now The financial research industry is one of the largest and most profitable software verticals in the world, dominated by a handful of key players. Just take a look at the numbers: Bloomberg: $12B in revenue S&P Global: $12.5B in revenue, $6.6B EBITDA FactSet: $1.8B in revenue, $842.5M EBITDA MSCI: $2.5B in revenue, $1.7B EBITDA These companies are highly successful because financial professionals are willing to pay a premium for tools that give them an edge. Active investment managers spend more than $30B per year for data and research services. A bloomberg Terminal The Economics of AI in Finance Adding to that, the unit economics of using AI are vastly better than hiring human analysts. At Fintool, we’re building software that can replace expensive knowledge workers, automating processes that once required teams of analysts. It's crucial knowing the industry is having a talent shortage. According to the venture firm NFX , “The biggest opportunities will exist where the unit economics of hiring AI are 100x better than hiring a person to do the job.” At Fintool, we fit perfectly into that framework. Here’s why: Automatable Processes : From screening SEC filings to running detailed financial models, a large part of an investor's workflow can be done by AI. Cost Savings : In an industry where top analysts are paid millions, the cost savings from using AI are astronomical. Hiring Challenges : Recruiting top financial analysts is a competitive and costly process, often with long onboarding periods. AI can eliminate these pain points. Tool Fragmentation : Today’s financial professionals juggle a wide array of tools. Fintool consolidates these into one powerful platform. Vast Training Data : Fintool leverages proprietary data and vast amounts of public filings to create a unique advantage. We’re creating Warren Buffett as a service—a platform that uses advanced language models to find financial opportunities at scale. With the unit economics favoring AI, and the immense potential to revolutionize how institutional investors work, we believe Fintool is positioned to be the next big thing in financial analysis. If we succeed, we won’t just be building a tool to analyze businesses—we’ll be building the future of how financial professionals make decisions. Thanks for reading Nicolas Bustamante! Subscribe for free to receive new posts and support my work. Identify founder-led companies within the S&P 500 by reading at least 500 DEF14A Proxy Statements (approximately 100 pages per document). Understand that Owner Earnings = Net Income + Depreciation and Amortization + Non-Cash Charges - Capital Expenditures (required to maintain the business) - Changes in Working Capital. Extract financial data from the past 5 years (net income, CapEx, working capital changes) for the 500 companies by reading at least 2,500 annual reports. Compute the data by comparing year-over-year owner earnings growth or decline, looking at trends such as increasing CapEx, expanding net income, or significant working capital changes. Write a comprehensive, error-proof report. From Fintool company screener We also built another API for our agents that can retrieve any number from any filings, along with its source. Additionally, we have an API that excels at computing numbers efficiently. For that challenge, we have partnered with OpenAI on a research project to use LLMs to extract every data point in SEC filings. Every week, we process 50 billion tokens, equivalent to 468,750 books of 200 pages each, or 12 times the size of Wikipedia. Our sophisticated data pipelines are designed to locate, verify, deduplicate, and compare every data point for accuracy and insight. Fintool “Spreadsheet Builder” answering a question on precise data points We are continuously adding new capabilities to our infrastructure. Our Warren Buffett Agent will use these APIs around the clock to find investment opportunities, analyze them, and respond to customer requests. Although the final product is still in development, we already have a live version in use. The results are promising. Fintool reaches 97% in FinanceBench , the industry-leading benchmark for financial questions for public equity analysts, far outpacing any other models. Delivering Practical Value to Customers Today I refuse to let our website be a placeholder with vague statements like "we are an AI lab building financial agents." Instead, every part of our growing infrastructure is put to practical use and sold to real customers, including major hedge funds like Kennedy Capital and companies like PwC. Their feedback is essential in refining our product, which we believe will be a significant advancement for the industry. Today, customers use Fintool to ask broad questions like " List consumer staples companies in the S&P 500 that are discussing shrinkage? " or niche questions like " Break down Nvidia CEO compensation and equity package ." They can also configure AI agents to scan news filings for critical information such as an executive departure or earnings restatements. This is only the beginning. Why It Will Be Big Institutional investors are among the most highly paid knowledge workers in the world. They make millions for their ability to sift through thousands of SEC filings, spot insights, and make calculated decisions on which companies to back. As Greylock noted in their article on vertical AI : “There are several attributes that make financial services well-suited to AI. The market is huge, with $11 trillion in market cap in the U.S. alone, and there's demonstrated demand for AI tools.” We couldn’t agree more. When you look at the daily responsibilities of these professionals, it’s easy to see where AI fits in. The work requires a mix of mathematical expertise and human judgment. Yet, a significant portion of their workload involves mundane, manual tasks—tasks that Fintool’s AI can automate and optimize. Subscribe now A Massive and Profitable Industry The financial research industry is one of the largest and most profitable software verticals in the world, dominated by a handful of key players. Just take a look at the numbers: Bloomberg: $12B in revenue S&P Global: $12.5B in revenue, $6.6B EBITDA FactSet: $1.8B in revenue, $842.5M EBITDA MSCI: $2.5B in revenue, $1.7B EBITDA A bloomberg Terminal The Economics of AI in Finance Adding to that, the unit economics of using AI are vastly better than hiring human analysts. At Fintool, we’re building software that can replace expensive knowledge workers, automating processes that once required teams of analysts. It's crucial knowing the industry is having a talent shortage. According to the venture firm NFX , “The biggest opportunities will exist where the unit economics of hiring AI are 100x better than hiring a person to do the job.” At Fintool, we fit perfectly into that framework. Here’s why: Automatable Processes : From screening SEC filings to running detailed financial models, a large part of an investor's workflow can be done by AI. Cost Savings : In an industry where top analysts are paid millions, the cost savings from using AI are astronomical. Hiring Challenges : Recruiting top financial analysts is a competitive and costly process, often with long onboarding periods. AI can eliminate these pain points. Tool Fragmentation : Today’s financial professionals juggle a wide array of tools. Fintool consolidates these into one powerful platform. Vast Training Data : Fintool leverages proprietary data and vast amounts of public filings to create a unique advantage.

Business Analytics

AI

0 views

Nicolas Bustamante 1 years ago

How to build a shitty product

Everyone wants the recipe to build a great product. But if you take Charlie Munger's advice to "always invert," you might ask: How to you build a truly shitty product? One that's confusing, frustrating, hard to understand, and makes you want to throw your computer out the window. Every organization sets out with the intent to build a good product. So why do so many of them end up creating something average? The answer lies in the structure and approach of the product team. A typical product team is composed of product managers, designers, and developers. Product managers (PMs) are the main touchpoint with users; they gather feedback, create specifications, and organize the roadmap. Designers create what they believe is a user-friendly UI/UX based on the PM specs and their interpretation of user needs. Developers, who may include data engineers, backend, frontend, and full-stack specialists, take these specifications implement them into a product. Subscribe now Product teams often fall into the trap of designing and building based on assumptions or abstract user personas rather than real user interaction. PMs become gatekeepers of feedback, filtering and interpreting user needs before they ever reach designers or developers. By the time insights get translated into product decisions, they’ve lost touch with what users actually experience. This lack of direct feedback leads to products that don’t solve real problems because the team is too insulated from the people they're building for. Too often, product specifications are shaped by internal company constraints—usually engineering limitations—rather than customer needs. As Steve Jobs famously said, " You've got to start with the customer experience and work backwards to the technology. " Inverting this process, where the tech defines what’s possible instead of the customer's needs, is a fast track to building something nobody wants. Over-specifying also kills innovation because developers are reduced to coders implementing someone else's vision, without any flexibility to improve or innovate. The typical product team works sequentially: PMs specify, designers design, and developers build. This waterfall mentality feels efficient on paper but is inherently rigid. When each step is done in isolation, the process becomes fragile and slow to adapt to new information. The longer each team works in their silo without iteration, the more likely the end product will miss the mark. Who's ultimately responsible for the product's success or failure? PMs? Designers? Developers? Bureaucracy tends to dilute responsibility, and when everything is driven by consensus, mediocrity often follows. Consensus avoids disasters, but it also avoids greatness. True ownership, where someone is accountable for both success and failure, is missing. Some teams get so caught up in Agile, Scrum, or other project management frameworks that they forget the ultimate goal is building something users love. Meetings, standups, and sprint planning become bureaucratic rituals that distract from the real work. To build something truly great, you need craftsmen. Product builders who are deeply invested, who care about every detail, and who take responsibility from beginning to end. The builder has to be as close as possible to the customer. Talk to them, visit them in person, answer support queries, watch them use the product, and demo it to them. This kind of empathy—truly putting yourself in the customer's shoes—is rare. Builders also need to understand the customer’s underlying problem, not just the feature requests they articulate. Customers may ask for specific features, but often they don't know the best solution; they just know their pain points. The job of a great product builder is to uncover the real issue. As Paul Graham once said, " Empathy is probably the single most important difference between a good hacker and a great one. " You need to understand how little users understand, and if you assume they’ll figure it out on their own, you’re setting yourself up for failure. Builders need to use the product they create. That’s why B2C products are often better than B2B ones—builders use what they build and feel the pain of its shortcomings. Most great B2B products, like Vercel or GitHub, are made for developers by developers. It’s much harder to eat your own dog food when building vertical applications for niche users, like lawyers or doctors, but the best craftsmen find a way. The best products come from small, tight-knit teams with clear responsibility. When it’s easy to identify who’s responsible, it’s easier to make great things. Small teams can iterate quickly, and greatness comes through iterations. The boldest approach is to have the same person design, build, and refine the product. With AI coding tools, it's now possible to have a good engineer with taste and empathy that goes from listening to users to implementing a solution, without the need for PMs or designers. Instead of trying to launch a complete, polished product out of the gate, focus on building something small and functional. Once you have that, get it into the hands of users and iterate quickly based on their feedback. The magic happens in iteration, not in perfectionism. Real users will help you refine your ideas and identify what’s actually valuable. The faster you can cycle through feedback loops, the better your product becomes. Building a delightful product for a few core users is often better than trying to build something for everyone. By focusing on a specific audience, you can deeply understand their needs and create something truly valuable. A product that solves real problems for a small, dedicated group is more likely to gain traction and eventually appeal to a wider audience. When you build for core users, you create passionate advocates who can help drive growth organically. Paul Graham's "taste" metaphor from Hackers and Painters applies here: you should always strive for good taste in both code and design, removing unnecessary complexity. Simplicity doesn’t mean lacking features; it means that every feature has a purpose, and every line of code serves the user. Good taste in design and code means prioritizing what truly matters to users and avoiding bloat. A simple, elegant product is not only easier to maintain but also more delightful to use. It's also essential to kill features over time—removing what is no longer needed or valuable ensures the product remains focused and effective. You create great products with small teams, but it is also the pitfall of most companies. Big teams introduce layers of complexity, miscommunication, and slow decision-making. Small teams are nimble, communicate better, and move faster. When a team is small, it’s easier to stay aligned on the mission, and everyone has a clear stake in the product’s success. It also prevents diffusion of responsibility—everyone is accountable. This sounds ideal, but it's not the default approach—especially in large companies. Why? Because big companies prefer reducing the standard deviation of outcomes. Only a small percentage of developers can design great software independently, and it’s difficult for management to hire them - often they don't like to work for bureaucratic organizations. Instead of trusting one brilliant craftsman, most companies opt for a system where design is done by committee and developers just implement the designs. This approach reduces uncertainty. Great results are traded for predictably average ones. If a brilliant craftsman leaves, the company could be in trouble, but if a committee member leaves, it doesn't matter. There’s redundancy in every role. Take Google—you could fire half the workforce, and it would barely affect product quality. But if you fired someone like Jony Ive from Apple’s small design team, there would be no iPhone. Similarly, look at Telegram Messenger—one of the best digital products ever. They have close to 1 billion active users and yet a small team of just 30 engineers. Pavel Durov takes all the customer-facing decisions while his brother and co-founder, Nikolai, handles decisions regarding infrastructure, cryptography, and backend. They've created amazing results, but if Pavel, Nikolai, or key programmers were to leave, the product would stagnate. Big companies dampen oscillations; they avoid disaster, but they also miss the high points. And that’s fine, because their goal isn’t to make great products—it's to be slightly better than their competition. As a reminder, my new startup is called Fintool . We are building Warren Buffett as a service, leveraging large language models (LLMs) to perform the tasks of institutional investors. We follow an approach that emphasizes small teams with clear responsibilities, a lack of rigid roles like product managers, and a relentless focus on speed and iteration. We keep our team extremely lean, with each member responsible for a specific section of the product. For example, we have one team member focused on data engineering to ingest terabytes of financial documents, another on machine learning for search, retrieval, and LLMs, and a full-stack engineer working on the product interface. By assigning clear ownership to each team member, we ensure accountability and expertise in every aspect of our product. Our accountability is customer-first, with engineers often emailing and interacting directly with customers. This approach means customers know exactly who to blame if something doesn't work. We believe high-performing teams do their best work and have the most fun in person. Remote work is highly inefficient, requiring the whole team to jump on Zoom meetings, write notes to share information, and lacking serendipity. Serendipity is the lifeblood of startups—one good idea shared spontaneously at the coffee machine can change the destiny of the company. Additionally, we value each other's company too much to spend our days in boring Zoom calls. We encourage every craftsman on our team to talk directly with customers, visit them in person, and implement the best solutions. We value discussions and brainstorming, but we minimize meetings to maintain fast iterations and provide high freedom for team members to choose their approach. We follow the "Maker's Schedule," as described by Paul Graham: Makers need long, uninterrupted blocks of time to focus on deep work. A typical maker’s day is structured around productivity and creativity, where interruptions or frequent meetings can be disruptive (I hate meetings.) We value speed and push in production every day. One of our core values is to "Release early, release often, and listen to your customers." Speed matters in business, so we push better-than-perfect updates to customers as soon as possible. We believe mastery comes from repeated experiments and learning from mistakes—it's about 10,000 iterations, not 10,000 hours. Another company value is "Clone and improve the best." We don't reinvent the wheel; we enhance proven successesWe are shameless cloners standing on the shoulders of giants. If a design or an existing pattern works well for our use case, we will copy it. Using AI tools, like Cursor the AI code editor, is mandatory at Fintool. We believe AI provides a massive productivity advantage. Most people prefer sticking to their old ways of worker but it’s not how we operate. We won't hire or retain team members who aren't AI-first. With the speed of AI-assisted front-end coding, we believe that traditional design tools like Figma are becoming less necessary. Anyone can create a nice-looking Figma until they start implementing and discover UX challenges. By leveraging a standard component library like Shadcn UI and using tools that convert prompts directly into interfaces, we can iterate faster and achieve better outcomes. A skilled engineer with good taste can design efficient and visually pleasing interfaces without the need for a designer. It keeps the team smaller and increases the speed. Our approach at Fintool focuses on leveraging the strengths of a small, empowered team, with each member deeply connected to the product's success. This method allows for rapid iteration, close customer relationships, and the ability to deliver a product that truly meets user needs. However, the main drawbacks are the high dependency on our people. If a key team member is on holiday or leaves the company, progress slows down significantly. We also rely heavily on hiring exceptional individuals—those who are not only talented but also open-minded, like to interact with customers, have a craftsman's mindset and the discipline to work hard. Finding such people is extremely challenging but it’s essentiel for building something truly great. It’s hard but worth it. We are hiring . “ There is no easy way. There is only hard work, late nights, early mornings, practice, rehearsal, repetition, study, sweat, blood, toil, frustration, and discipline. ” - Jocko Willink Thanks for reading, you can subscribe for free to receive new posts

UI

Design

Business

1 views

Nicolas Bustamante 1 years ago

San Francisco Life: Insider Tips ♥️

I moved to San Francisco in August 2021, and it quickly became my favorite city. I love it so much that even when I go on vacation, I’m always excited to come back—sometimes I wish I didn’t have to leave at all. There’s so much to adore about this place: the perfect, temperate weather, the proximity to both beaches and stunning natural spots, the walkable and bike-friendly streets, the charming neighborhoods filled with colorful homes, the incredible food scene, and of course, being surrounded by some of the smartest people on the planet. The green zone is hands down the best part of San Francisco. It’s walkable, quiet, beautiful, and conveniently close to everything—grocery stores, restaurants, you name it. The blue zone is great too, though it has a more upscale feel and is a bit less walkable due to the hills. Still, it has its charm, just with a different vibe. The yellow zone is more affordable, but I wouldn’t recommend it unless you’re an avid surfer—it’s foggy for about half the year. As for the red zone, I’d advise staying away, as it’s at the heart of the city’s drug crisis. Other neighborhoods are fine, a bit more suburban and not quite as close to the action, but they offer a good balance of affordability and quality living. Where to eat French : Ardoise , Routier Pasta : Bella Trattoria , The Italian Homemade Company Pizza : Tony’s Steak House : House of Prime Ribs German : Suppenküche Mediterranean : Beit Rima (Cole Valley), Kokkari Brunch : Le Marais Bakery , Wooden Spoon American Breakfast : Pork Store Cafe , Devil's Teeth Baking Company Crêpes : La Sarrasine , Croissants : Arsicault (the one on Arguello and go during the week to avoid an hour long line), Tartine (good but less than Arsicault) Burrito : Underdogs , La Taqueria Ramen : Taishoken , Marufuku Sushi : Ebisu Ice cream : Salt and Straw , The Ice Cream Bar , Philmore Creamer y, Bi-Rite Creamery Coffee shop : Cafe Reveille , Sightglass , The Mill Hot Chocolate : Dandelion Bread : The Mill , Jane Baker y, Thorough Bread Start at the Baker Beach Sea Cliff Access (12 25th Ave, San Francisco, CA 94121) or park here if you have a car. Walk Baker Beach and then climb the Sand Ladder . You will then turn left and start the Batteries to Bluffs Trail till the beautiful Bridge view on Battery Boutelle. The trail is amazing. Be ready to climb a lot of stairs! I’ve hiked there more than I can count and I still love it. Lands end Trail I recommend starting here and to walk to the Lands End Labyrinth . The views are absolutely stunning and it’s hard to think that you are still in a major city! Most of the trail is kid friendly and it works if you have stroller. My favorite beaches Baker Beach Baker Beach is where I like to fish, to picnic and to play Spikeball with friends on a sunny afternoon. I love the incredible view of the bridge and the fact that’s less windy than Ocean Beach. China Beach It’s a cozier and smaller version than Baker Beach. It’s slightly less accessible since you have to go down a hill but there is a parking at the top. I like it even if I prefer Baker because the bridge feels closer. I think what bothers me a bit with China Beach is the abandoned old lifeguard station - so much wasted potential! Ocean Beach Definitely my number one beach to watch the sunset and enjoy a good bone fire! My favorite is to bike and stop at Fulton/Great Highway . I’ve been there so many times and it never disappoints. Please check fog.today first to verify that there is no fog at the beach. Favorite Bike Rides Hawk Hill By far my favorite, I sometimes bike there twice a week. Unless you are an experience biker you will need an electric bike. I like to rent them from SF Wheels or Unlimited Biking for $80 for the whole day. Climbing Hawk Hill offers the best view of the bridge. The best part? Once you reached the top, the downhill is one of the most stunning ride in California. Surfing I’m a beginner Wing Foiler and one of the best spot in the U.S is Crissy Field. I recommend parking at Crissy Field South Beach . If you are more into regular surfing, Ocean beach is a great spot for confirmed surfer. If you are new to surfing, just drive to Pacifica which is an easier spot! Self driving car : Waymo Bike around neighborhood : Castro, Duboce Triangle, Hayes Valley, Cole Valley up to Ocean beach via the Golden Gate Park City hikes : Mount Sutro to Twin Peak , Baker Beach Costal Trail , Lands End Trail Cable Car : map Sunrise : go to Corona Heights or Tank Hill Alcatraz Island : book a night tour Museums : Academy of Science (Thursday night nocture, they have cocktails and DJ) Sunset : verify on fog.today that it’s not foggy and go to Baker Beach or Ocean Beach. Parks : Dolores , bike through the immense Golden Gate Park , walk in Crissy Field Bouldering : Mission Cliff , Movement Surfing : take a lesson in Pacifica or go to Ocean Beach if you are confirmed Tennis : there are free tennis courts all over the city like in Buena Vista or you can book a court in the Golden Gate Park Jiu-jitsu : Ralph Gracie

Travel

Culture

AI

0 views

Nicolas Bustamante 1 years ago

Burning Billions: The Gamble Behind Training LLM Models

Why don’t you train your own large language model? I've been frequently asked this question over the past year. I wrote this piece in September 2023 but never published it, thinking the answer was obvious and would become even more apparent with time. I was asked the same question twice last week, so here is my perspective. As a reminder, Fintool is an AI equity research analyst for institutional investors. We leverage LLM to discover financial insights beyond the reach of human analysis. Fintool helps summarize long annual reports, compute numbers, and find new investment opportunities. We have a front-row seat to witness how LLMs are revolutionizing the way information is organized, consumed, and created. Training large language models is challenging. It requires billions of capital to secure GPUs, hundreds of millions to label data, access to proprietary data sets, and the ability to hire the brightest minds. Vinod Khosla, an early OpenAI investor, estimated that “ a typical model in 2025 will cost $5-10b to train. ” Only hyperscalers like Google, Meta, or Microsoft, who are already spending 25B+ in CAPEX per year, can afford this game. A company like Meta can increase its CAPEX guidance by 3+ billion dollars to train frontier models, and that’s not a big deal considering their $43.847B free cash flow per year. Good luck competing with those guys! The additional challenge is the requirement to always train the next frontier model to stay in the race. If your model is not first, it might as well be last. Users and customers gravitate towards the best, leaving little market for inferior models. It’s a power law where the model with the optimal mix of intelligence, speed, and cost-effectiveness dominates. It’s a multi-billion dollar recurring expense, and the window for monetization is a function of the little time your model can stay at the top of the leaderboard before being outperformed. Sequoia Capital recently emphasized that an estimated $600 billion in revenue would be necessary to justify the massive investments in AI data centers and GPUs. In my view, as seen in most technological booms, a large portion of the money invested will ultimately be wasted, similar to the dot-com bubble that led to excessive investment in telecom infrastructure. The telecom boom saw massive capital inflows into building out networks and laying vast amounts of fiber optic cables. Companies thrived initially, but as the bubble burst, it became evident that much of the infrastructure was redundant, leading to significant financial losses. Global Crossing filed for bankruptcy with $12.4 billion in debt, while WorldCom went bankrupt with $107 billion in largely worthless assets. Similarly, the current surge in investment for LLM infrastructure risks leading to overcapacity and inefficiencies. While a few key players may achieve significant rewards, many others will likely face considerable financial setbacks. Most companies entering the LLM race fail despite massive investments. Bloomberg's effort, BloombergGPT, trained on 363 billion tokens, was quickly outperformed by GPT-3.5 on financial tasks. Even well-funded startups struggle: Inflection, despite raising $1.525 billion, was acqui-hired by Microsoft. Adept, with $415M in funding, is rumored to be exploring a sale, and models developed by Databricks, IBM, or Snowflake are today absent from top LLM rankings. When I usually explains why Fintool doesn’t train its own LLM the pundit always ask: “ Well in that case, why don’t you fine-tune your model on your vertical? ” Subscribe now The reason for fine-tuning is the hope to get better quality on a set of tasks while reducing the cost and increasing the speed because fine-tuned models are smaller than generalist models. In my opinion, this approach is not yet yielding the results worth the millions invested. For instance, OpenAI developed Codex, a model fine-tuned on a large corpus of code, and that model was outperformed by GPT-4, a large generic model. The same was true for text-to-SQL fine-tune models, which were better on some narrow benchmarks but got outclassed by the next general model release. So far, every fine-tuned model was outclassed by the next big generic model. The rapid decline in LLM prices, coupled with significant improvements in quality and latency, makes such investments increasingly unjustifiable, in my opinion. If you don’t like losing millions and billions of dollars, it’s better to stay away from this game. For most organizations, training or fine-tuning is driven by FOMO and a lack of understanding of technological trends. Only a few players, like B2C companies such as Character.ai, which processes 20,000 queries per second (approximately 20% of Google’s search volume), require their own models. LLM are such a commodity that a leaked Google memo stated “ we have no moats nor openai. ” It’s fairly easy to switch models, and the fact that open-source models are getting better fastens the commoditization. There is still a premium for the most intelligent model, but most tasks don’t require the best intelligence. Commoditized tasks are already worth zero, while harder tasks are worth something but not much. Training LLM and selling intelligence as a service is not a great business. Future research estimated that OpenAI makes $2.9B from ChatGPT products versus $510M a year for the API. The fact that the API of the leading provider is only 17% of their revenue exemplifies that most of the value creation and value capture happen at the application layer. Application layers like Fintool are developing model-agnostic infrastructure tailored to specific use cases, leveraging improvements in any AI model. Just as Charlie Munger practices " sit on your ass investing ," waiting for the market to recognize the intrinsic value of his investments, I practice " sit on my ass product building ," where I focus on creating complex workflows that meet specific user needs, while anticipating AI models to become better, faster, and cheaper. When we started Fintool, the cost of analyzing an earnings call for a complex task was roughly $1 with GPT-4. A year later, the cost for GPT-4 has dropped by 79.17%, and the model is significantly smarter and faster. By running open-source models, we dropped the price to less than $0.01. So, while not wasting our time and money on training or fine-tuning, we got better quality and speed with a 99.9% price drop. What’s not to like? Subscribe for free to receive new posts

Business

Sql

AI

Go

Machine Learning

0 views

Nicolas Bustamante 2 years ago

What We Learned Building the Largest GPT-Telegram Bot

Hello friends, I co-founded Doctrine , one of the largest AI legal search engines, and despite working on a search product for years, ChatGPT blew my mind. The underlying technology, commonly referred to as large language model (LLM), is as revolutionary as the printing press or the internet. Thanks for reading Nicolas Bustamante! Subscribe for free to receive new posts and support my work. I was initially skeptical about yet another wave of AI hype, but the fusion of chat interfaces with LLMs got me excited. To understand the technology, my YC co-founder Edouard and I built Lou, the most popular GPT-4 powered chatbot on Telegram Messenger . With thousands of active users posing tens of thousands of questions daily, it became the ideal platform to understand the current state of the technology and explore potential use cases. Let me tell you what I have learned. Chat-based interfaces are the future of the web. In most cases, it's easier to ask a question to a chat and get an answer rather than browsing the web and reading websites. It's a paradigm shift. Search paradigm: keywords -> click on several links -> read webpages -> answer Chat paradigm: question -> answer It means most users no longer need to go on Google or visit a website. Google! Websites! It's the end of the internet as we know it. There are days when I don't search at all; I chat. I ask Lou all my questions, such as: Show me their popular API endpoints for the Telegram bot API Write a short text message to my landlord to give him my notice. Recommend me a good book about Charlie Munger. Furthermore, Lou offers a more intimate experience compared to Google. We discovered that some users even refer to Lou as their "best friend." Essentially, it's like having a brilliant friend available to help you around the clock. As a result, information retrieval has become a deeply personal experience. It wouldn't be surprising if, in the near future, people forge strong friendships or even romantic connections with their AI companions. As voice and image generation technologies advance, the possibilities are virtually limitless. Operating an LLM-powered chatbot has led me to believe that people will increasingly rely on chat interfaces rather than traditional search. Chatting effectively consolidates keyword searching, link clicking, and website browsing into a single process. This approach is faster and more personalized and delivers higher-quality results. Naturally, chat models have some limitations at present. They lack access to live data, possess no memory, exhibit poor formatting, may generate irrelevant information, and do not suggest follow-up questions. However, these issues are solvable. We plan to release an updated version of Lou that enables users to access news, make purchases, check stock prices, and explore a host of other capabilities. As a result, I foresee chat-based interfaces capturing a substantial portion of the market share from Google. This shift is already evident, as ChatGPT reached 100 million active users within a few weeks. To provide context, Bing, which launched in 2009, only achieved 100 million daily active users in the previous month. Who will become the next Google? On one side, OpenAI holds all the cards. However, they may choose to concentrate on developing an infrastructure company that enables artificial general intelligence (AGI), rather than pursuing a B2C startup. On the flip side, tech giants like MAMAA face a daunting innovator's dilemma due to their bureaucratic nature. Embracing the chat interface could significantly reduce their ad search revenue. Nevertheless, they possess a captive user base, control distribution channels, operating systems, and even produce hardware! It's hard to tell who will do it, but it will transform the web. The global, horizontal chat interface is poised to dominate the internet in ways Google could never have imagined. This chat will serve as a super aggregator, maintaining direct relationships with users and enjoying near-zero marginal costs for onboarding new users while commoditizing suppliers. User interactions with the internet will increasingly occur via chat, compelling suppliers (all websites) to adapt their architecture to align with chat APIs. Why would anyone visit Zillow to find an apartment, Booking to reserve a hotel, or NerdWallet to compare insurance when the super-chat can provide answers and facilitate direct purchases? Just as these services previously optimized their products to fit Google's algorithms, they will now tailor their offerings to suit the chat interface. Commoditization will reach unprecedented levels, as, in many cases, websites will no longer differentiate value propositions. The super-chat will prioritize the fastest, most affordable, and highly-rated options, driving commoditization and reducing profit margins to benefit consumers. Only the best horizontal player will withstand this shift. I also believe that AI chat solutions integrated vertically in the fields of legal, finance, and healthcare will evolve into monster businesses. I also anticipate a gradual transition from text-based to voice-based interfaces. Why type when you can converse with your AI assistant? In the long run, we may not even need phones, as earbuds and smart glasses could suffice. All right! Moving away from speculative ideas, let me share our insights from a technical perspective. The most remarkable experience is that GPT generates a significant amount of code, shortening our product development cycle. You can literally ask to describe the Telegram API and write Python code to create a bot. How wild is that? We currently dramatically underestimate the productivity boost from this technology for humanity. Another great thing is that GPT models are excellent at various NLP tasks, from coding to translating to creating a recommendation system. Instead of using several machine learning models, we can use one API for almost everything. GPT outperforms most of the models out there, regardless of their specialization. For instance, GPT-4 outperforms Codex, an OpenAI models fine-tuned to write code. You might think it's expensive to run all your backend tasks on GPT, and you're partially correct. Yes, it's expensive, but not for long. It's a contrarian take, but I think that LLMs will quickly be commoditized. The model's performance tends to plateau at a certain point. For tasks like finding an entity in a document or classifying questions, GPT-4 excels, but so do numerous open-source models. As time goes on, the quality and performance of these freely available open-source models keep improving, steadily narrowing the gap between them and their GPT counterparts. This progress promotes a competitive environment where cutting-edge technology becomes increasingly accessible to a wider audience. Consequently, the cost of using such models is expected to decline over time. OpenAI's recent substantial price reduction for its GPT-3.5 API serves as an example of this trend. Moreover, each day sees the rise of open-source models achieving GPT-like performance in specialized areas. It's likely that, in the near future, most chat interfaces will employ multiple models concurrently, directing queries to those that provide the most accurate responses at the most competitive rates. I foresee that most tasks performed by large language models (LLMs) will be available at no cost except for highly complex tasks. The crucial factor will be maintaining a direct relationship with users and having access to a comprehensive, private dataset. Ok, now, something weird! My most peculiar experience involved prompt engineering. Giving the model guidelines, such as specifying a particular formatting type, is done not through code but with plain English instructions. You communicate with the model in the same manner you would with a human, not a machine! For example, our prompt related to our "code assistant" might be something like: "As an advanced chatbot Code Assistant, your primary goal is to assist users to write code. This may involve designing/writing/editing/describing code or providing helpful information. Where possible you should provide code examples to support your points and justify your recommendations or solutions. Make sure the code you provide is correct and can be run without errors. Be detailed and thorough in your responses. Your ultimate goal is to provide a helpful and enjoyable experience for the user. The Format output in Markdown." The paradigm shift is remarkable; the most potent coding language has now become English, not JavaScript or Python! However, I should note that I'm not entirely convinced about the long-term potential of prompt engineering in its current form. We extensively used prompt engineering with GPT-3.5 but later discovered that GPT-4 was so proficient that much of the prompt engineering proved unnecessary. In essence, the better the model, the less you need prompt engineering or even fine-tuning on specific data. What I find even more intriguing is the idea that the model could auto-correct and improve itself, much like a living organism. As LLMs evolve, they have the potential to become increasingly autonomous, enabling them to auto-correct and improve themselves over time. One way this could be achieved is through continuous learning and adaptation, where LLMs refine their responses based on user feedback and real-time data. By giving them access to APIs, they will interact with various information sources to expand their knowledge base and maintain up-to-date information. Over time, these advancements could result in self-sufficient AI agents capable of proactively learning from their environment and autonomously enhancing their performance, thereby transforming how we interact with technology and the digital world. Please note that this is not merely science fiction but rather an engineering challenge poised to be solved in the coming months. We live in such an exciting time! In conclusion, building Lou, the largest GPT-4 powered chatbot on Telegram, has provided invaluable insights into the potential of large language models and chat-based interfaces. The paradigm shift from keyword-based search to chat-based interactions is imminent, and it will redefine the way users engage with the internet. It’s so far an incredible experience from a learning perspective. We will probably switch to a vertical AI chat product in the future as it better fits our respective backgrounds. Thanks for reading Nicolas Bustamante! Subscribe for free to receive new posts and support my work. Show me their popular API endpoints for the Telegram bot API Write a short text message to my landlord to give him my notice. Recommend me a good book about Charlie Munger.

JavaScript

Python

AI

0 views

Nicolas Bustamante 3 years ago

The End of My Crypto Explorations

My crypto journey started in late 2012 when I encountered Bitcoin while reading about the free banking system for my high school thesis. As a fan of Hayek and Von Mises, I was fascinated by the idea of a currency free from the government's manipulation. I downloaded bitcoin core (the blockchain was less than 10GB!), made some transactions, and looked for things to buy. There were few people to transact with and nothing interesting to buy beyond the stuff on Silk Road . Bitcoin was volatile; its price collapsed from $1000+ in late 2013 to $200ish in August 2015. I watched the space on and off, which I perceived as a gigantic casino. Remember Namecoin, MaidSafe, Bitconnect, and Bitshares? All these coins had billions in volume and later disappeared, leaving investors shirtless. I started Doctrine , an AI company operating in the legal industry (think Bloomberg for lawyers). I witnessed the 2017 crypto bubble with thousands of projects raising tens of millions for non-existing products tackling non-existing problems. I was sickened by these pumps and dumps and delighted to use AI to create value for thousands of customers! I ignored the space until 2018, when our first teammate, Antoine Riard , started to contribute to the Lightning Network, a protocol to make instant and cheap Bitcoin payments. Bitcoin has survived, and my friends kept building on Ethereum despite a 90% drop in price. Speculation had dried out, and promising use cases were emerging. I started running a Bitcoin and Lightning Node on the weekend to understand the state of the technology. Fast forward, I moved to San Francisco and decided to explore the space in 2022. I was thrilled to join revolutionary young builders working on decentralizing the Internet and improving our financial system. As a Bitcoin enthusiast , I looked around infrastructure products to sustain the lightning network, like node managers or stablecoin on lightning, via RGB . It was tough because no one had yet built a successful company on the lightning network. First, it will take years, if not decades, to develop the network - a thing I've learned running the SF Lightning Dev Meetup - and, second, most people don't want to pay with crypto, especially when current payment systems are improving rapidly (see UPI in India or Pix in Brazil )! Most friends were building on Ethereum and Solana, so I looked at these options. I made it clear that I wasn't interested in building for speculative use cases. In my opinion, trading is a negative-sum game in which unsophisticated market participants lose their savings while exchanges and intermediaries capture gigantic, and often hidden, fees. The unfortunate truth is that the current crypto killer feature is the creation of a global, permissionless, gigantic casino of worthless digital assets. This is quite far from the ideas of decentralization, privacy, and unstoppable digital assets we read in The Sovereign Individual . Those ideals are worth fighting for, so I started to believe that speculative use cases were temporary anomalies. Yes, token pumps and dumps were disgusting, but tomorrow we will have equity tokens that are way better than the current paper shares. Yes, NFT collections of ugly profile pics are useless - a guy bought a picture of a rock for $1.8M lol - but it's the premise of NFT as digital property rights on a decentralized and open ledger! That was my thought process for accepting today's crypto industry, but that wasn't easy. I met daily with crypto founders raving about their latest multi-million fundraising round or their secret NFT mints in which they flipped a jpeg for thousands of dollars. I asked questions regarding product usage, pain points solved for customers, and the business model, and I haven't felt so old in my life! I was a 27 tech founder, but I thought I was a 70-year-old guy asking what seemed like irrelevant questions. An avalanche of money from investors and retail traders can easily fake a product-market fit. I had the great opportunity to help Nanoly 's founders, the largest data aggregator in decentralized finance. Hundreds of thousands of retail investors visited the website to find the best yields for their digital assets. Yield farming was all the rage with juicy APY of a couple of hundred percent. Tokens were created out of thin air to reward token liquidity providers. I met with full-time yield farmers and people who worked full-time launching tokens to feed this loop. WTF... Ultimately the high-yield farming market collapsed, leading dozens of companies and funds into bankruptcy. Most of my contacts moved to NFT, creating several collections of profile pictures and selling them to gamblers. Again, this use case sucks, but the promise of NFTs as unique digital property rights stored on a worldwide and permissionless ledger is interesting. I dug into crypto infrastructure products but came to a harsh realization. Crypto speculation is a vast and fast-growing market, while other use cases are small. I've done hundreds of customer interviews and learned that most crypto organizations weren't buying crypto software, which explains why crypto products, from analytics to dev tools, struggle to generate revenue. I understand the narrative that these startups are waiting for the market to grow, but the difference between the Internet in 1999 and crypto today is that Amazon or Netflix had viable customers and growing revenue back then. The bear market helped me to have honest conversations with founders. Most of them have raised millions and enjoyed the hype but are now wondering if they will one day reach product market fit. Talking about fundraising, I got more offers while exploring crypto and with better terms than I could have dreamt with my previous web2 startup (with dozens of millions of ARR, fast-growing and profitable)! I think there are monster businesses to create in crypto around the casino use case. Anything that reduces the cost of trading or makes trading more convenient will be a big business. Most great businesses in the space are wallets with a trading feature (Metamask, Phantom, Fireblocks), exchanges (Binance, FTX, Opensea), fiat on-ramp (Moonpay, Transak), etc. Many founders I've met are iterating in crypto, hoping to launch a startup unrelated to trading. I made the same mistake of finding niches only to realize that the market wasn't there. If there are no viable customers, no traction, then there is no market - even if the idea seems valuable for humanity. In short, it's a good-looking technology looking for problems to solve. Note that crypto is complex; it takes months, if not years, to get a decent understanding of the tech stack. Adding to that difficulty, it's evolving fast, so you have to keep up with the latest developments - proof of stake, sharding, zk-rollup - making developing in the crypto industry harder than in web2. Exploring crypto from a tech perspective is fascinating and takes a long time, but what's today's use case beyond speculation? Even Vitalik Buterin, in a Time interview , recognized that: "The peril is you have these $3 million monkeys, and it becomes a different kind of gambling," adding that "t here definitely are lots of people that are just buying yachts and Lambos " and " those are often far from what's actually the best for the world. " I had a fantastic time and met very talented builders and explorers; many of them will build great companies inside or outside this industry. Crypto combines the best dreamers pushing the frontier of a decentralized civilization and the worst snake oil scammers. The positive energy in the space and the amount of creative destruction are breathtaking. I do not doubt that the industry will mature over the following decades! I've decided to stop exploring crypto and focus on other sectors and technologies that suit me better. It's hard to understand the stress and anxiety caused by the constant ups and downs of being a founder. I want to thank all my friends and family for being by my side in my entrepreneurial journey! If you found this article valuable, please consider sharing it 🙌 Thanks for reading Nicolas Bustamante! Subscribe for free to receive new posts and support my work.

AI

0 views

Nicolas Bustamante 3 years ago

Startups Selling Sand in the Desert

Today's story is about startup guys who work extra hard, match all their competitors' features, lower their prices, increase the scope of their free plan, spend millions to generate pennies, and give everything to kill their rivals because, after all, it's a war for survival! These guys are selling sand in the desert. Most entrepreneurs compete to be the best. They think there can only be one winner, like in war or sport. To win the competition, rivals must be eradicated by relentless execution, price warfare, and constant product imitations. Those entrepreneurs live in what economists call pure and perfect competition. The latter concept refers to a competitive state where all companies sell equivalent products, driving profits to the marginal cost of production. I confess that as a consumer, I love this zero-sum game. I remember traveling for free in 2015 in San Francisco when Uber and Lyft engaged in a price war. The same happened in Paris in 2017, where I ate at no cost for weeks when food delivery companies were involved in a race to the bottom. What else to be happy in life than free food and free transportation funded by VC money? Compete harder, please! The ones who compete to be the best are losers. Because competition is for losers . This form of competitive convergence is the path to mutually assured destruction. Unlike sport, there can be multiple winners in business. One should aim at being the only one selling water in the desert. The antidote to the disease of competition is a unique and singular value proposition. Michael Porter is one of the brightest minds regarding competitive analysis. His articles What Is Strategy? (1996) and The Five Competitive Forces That Shape Strategy (2008), as well as his many books, are excellent. If you don't have time to read his complex work, I recommend reading Understanding Michael Porter by Joan Magretta. Porter's solution to the competitive dilemma is to thrive on being unique, not the best, focusing on creating value, not beating rivals. He defines strategy as: " building defenses against the competitive forces or finding a position in the industry where the forces are weakest." He identifies five forces that determine an industry structure, indicating its competitiveness and thus profitability. The intensity of rivalry among existing competitors. Sometimes, rival firms are irrationally committed to the business, and financial performance isn't the primary goal. For instance, FANG companies often provide products for free, whatever the cost, to preserve their market position. What I worry the most about is dumb guys burning millions hoping to kill competitors. Look at the scooter company Bird; they raised and spent $723M for a business that is today valued at $170M. Even if you were a reasonable entrepreneur in this market, you wouldn't have survived this mindless capital allocation. (btw, thank you for the free rides!) The bargaining power of buyers . Influential buyers can lower prices while demanding more product value. The buyer captures all value creation, not the company selling the product. Companies that sell to a highly concentrated industry, such as plane manufacturers or telecommunications carriers, deal with powerful buyers. The bargaining power of suppliers. Powerful suppliers will charge high prices and ask for favorable terms, reducing their customers' profitability. Think about companies selling semiconductors in today's shortage. They can ask for outrageous prices because buyers have no alternatives. The threat of substitutes . There is no high profitability if it's easy to shift to a product that offers the same value proposition. Most B2B SaaS productivity software falls into this trap. They have a lot of users but no customers paying a reasonable price because all products are the same and it's easy to switch. The threat of new entrants. If it is easy to enter an industry by creating a similar product, then profitability will be low. Amazon Web Services enjoys significant profit because entering their industry is very hard. I underestimated how the industry's structure determines business success. In short, as Marc Andreessen put it: the market always wins . The most determinant factor of a startup's success is the market. He wrote: " In a great market -- a market with lots of real potential customers -- the market pulls product out of the startup. " " Conversely, in a terrible market, you can have the best product in the world and an absolutely killer team, and it doesn't matter -- you're going to fail. " Andy Rachleff sums it up: When a great team meets a lousy market, market wins. When a lousy team meets a great market, market wins. When a great team meets a great market, something special happens. Entrepreneurs should aim at building unique and defendable products in a highly profitable and fast-growing industry. In short, products with significant competitive moats! My idol, Warren Buffet, wrote: " We think of every business as an economic castle. And castles are subject to marauders. And in capitalism, with any castle, you have to expect that millions of people out there are thinking about ways to take your castle away. Then the question is, What kind of moat do you have around that castle that protects it? " It's not the size of the castle that matters but how defensible it is! Buffet again: " The most important thing to me is figuring out how big a moat there is around the business. What I love, of course, is a big castle and a big moat with piranhas and crocodiles. " A business protected by crocodiles, excellent! What are these moats? Intangible Assets: benefits such as patents, brands, reputation, or proprietary process. Think about Coca-Cola, a company that has sold the same beverage since 1886 and whose brand is a childhood symbol for billions of people. Who can compete with that? Scale: it allows a limited number of players to provide low-cost services while enjoying high margins. Think about Vanguard, which has $7.2 trillion of assets under management, allowing them to reduce commissions while still earning profits. Same for retail companies such as Cosco or insurance businesses such as GEICO. High switching costs: it makes it costly and risky for customers to switch providers. ERP or CRM such as Salesforce or SAP are so embedded into the customer's organization that it’s impossible to drop these software. Network Effect: when the value of a service or product becomes more compelling as more people use it. By far my favorite. Consider Facebook; it's not hard to build a similar web app, but impossible to add their 2.93 billion monthly active users who generate a great data network effect. I recommend reading the great Network Effect Bible by James Currier. Regulation: When the laws protect incumbents with, for instance, local rules, FDA approval, or licenses. Regulation significantly increases the cost of entry and, sometimes, even avoid new entries in the market. I like to analyze a business from the perspective of competitive moats. From my standpoint, every business attributes are either: easy to replicate hard to replicate impossible to replicate A great company has many "impossible to replicate" attributes. Teams who focus on building features similar to competitors to "match their feature sets" don't get that a great business is built on uniqueness. A good strategy requires trade-offs; it's more about what you don't do than the stuff that you do. Go unique, or go home! You will be pleased to know that, not all moats are created equal . Morningstar did a study comparing competitive moats and the profitability associated. Morningstar learned that firms with wide moat are far more profitable than narrow moat firms. These wide-moat companies benefit from multiple moat sources that defend their business. Interestingly, network effect is rated the best moat, while scale is the less likely to drive great performances. What is wild is that only 10% of the 1,500 stocks that Morningstar tracks are considered wide-moat companies! An excellent way to know if a company has powerful moats is to consider the ability to increase the price substantially. Warren Buffet said: " The single most important decision in evaluating a business is pricing power. If you've got the power to raise prices without losing business to a competitor, you've got a very good business. And if you have to have a prayer session before raising the price 10 percent, then you've got a terrible business. " My favorite burrito place in San Francisco kept raising prices , trying to keep up with inflation, so I stopped going. Restaurants are a lousy business because of the many alternatives. Sorry guys, my burrito loyalty stops at $15. The goal of a successful enterprise is to earn profits. It means capturing the value in an industry by having a better position than rivals, suppliers, new entrants, substitutes, and even customers! A good way to analyze a company’s performance and its competitive moats is to focus on return on invested capital (ROIC). In the long run, sustainable value creation is the difference between the return on invested capital (ROIC) and the cost of capital. What is important is the return on investment, how much capital the company can invest at a rate above the cost of capital, and for how long. The length of the competitive advantage period is crucial. According to Morningstar, the durability of economic profits is far more important than the magnitude. Quoting Buffet in his 1992 letters : " the best business to own is one that over an extended period can employ large amounts of incremental capital at very high rates of return." Regarding capital allocation per moat-type, I like Connor Leonard's following framework : Low/No Moat : Companies that may be perfectly well run and sell good products/services, but which do not exhibit characteristics that prevent other companies from competing away there profits if they start earning attractive returns. Most companies fall into this category. Legacy Moat-Dividend: A company that is insulated from competition, but does not have much opportunity to grow through reinvesting cash flow. So they pay most of their cash earnings out as dividends. Legacy Moat-Outsider : A company that is insulated from competition, but does not have much opportunity to grow through reinvesting cash flow. So they deploy their cash flow in service of acquiring other companies as well as paying dividends and opportunistically buying back stock. Reinvestment Moat: A company that is insulated from competition and has the opportunity to reinvest their cash flow into growing the business. Capital-Light Compounder: A company that is insulated from competition and has the opportunity to grow but which doesn't need to reinvest much cash to do so and is, therefore, able to return cash to shareholders even while growing. The stability of the moat in time is a critical factor. Economic moats are rarely stable; they get a little bit wider or narrower every day. There is a relentless regression to the mean in which the companies' moats fade and returns trend towards the industry average. In this matter, all industries are not created equal. Some industries have fast regression to the mean, such as the food and beverage industry, while others are slower such as the banking industry. More importantly, the long-term average mean differs between terrible sectors such as real estate or utility and good ones such as software or professional services. Anyway, there are always great defensible businesses in good as well as bad industries. Michael J. Mauboussin did the above analysis in an article I highly recommend reading: Measuring the Moat: Assessing the Magnitude and Sustainability of Value Creation (2016). I like Mauboussin's work, which showcases a framework for analyzing different industries and companies' positions in the value chain. Mauboussin starts by creating an industry map to understand the competitive landscape and, very importantly, the distribution of profits over time. Focusing on profits is crucial because there are businesses that build great products with millions of users but no ability to generate profits. Mauboussin then measures the industry stability, its attractiveness based on Porter's five forces, and tries to assess the likelihood of being disrupted by innovation. Pro tip: he provides a checklist of questions for assessing value creation page 53 . I think it's an analysis all companies should perform to understand their business. Ok, ok, it is a lot. What did we learn? Choose a highly profitable and fast-growing market Create a product well-positioned in the value chain to capture profits Focus on the company's uniqueness to avoid competition Keep reinforcing the competitive moats Reinvest cash at a high rate of return The final competitive battle: the Startup guy vs the Intelligent CEO : When the startup guy talks about how great the team is, the Intelligent CEO focuses on the market and industry structure. When the startup guy talks about how disruptive the marketing is, the Intelligent CEO focuses on the position in the value chain. When the startup guy talks about product adoption, the Intelligent CEO focuses on the durability and the widening of competitive moats. When the startup guy talks about revenue growth, the Intelligent CEO focuses on profit and reinvesting opportunities. Startup guys sell sand in the Sahara while Intelligent CEOs are the only ones selling water in the hot desert! If you found this article valuable, please consider sharing it 🙌 When a great team meets a lousy market, market wins. When a lousy team meets a great market, market wins. When a great team meets a great market, something special happens. easy to replicate hard to replicate impossible to replicate An excellent way to know if a company has powerful moats is to consider the ability to increase the price substantially. Warren Buffet said: " The single most important decision in evaluating a business is pricing power. If you've got the power to raise prices without losing business to a competitor, you've got a very good business. And if you have to have a prayer session before raising the price 10 percent, then you've got a terrible business. " My favorite burrito place in San Francisco kept raising prices , trying to keep up with inflation, so I stopped going. Restaurants are a lousy business because of the many alternatives. Sorry guys, my burrito loyalty stops at $15. The goal of a successful enterprise is to earn profits. It means capturing the value in an industry by having a better position than rivals, suppliers, new entrants, substitutes, and even customers! A good way to analyze a company’s performance and its competitive moats is to focus on return on invested capital (ROIC). In the long run, sustainable value creation is the difference between the return on invested capital (ROIC) and the cost of capital. What is important is the return on investment, how much capital the company can invest at a rate above the cost of capital, and for how long. The length of the competitive advantage period is crucial. According to Morningstar, the durability of economic profits is far more important than the magnitude. Quoting Buffet in his 1992 letters : " the best business to own is one that over an extended period can employ large amounts of incremental capital at very high rates of return." Regarding capital allocation per moat-type, I like Connor Leonard's following framework : Low/No Moat : Companies that may be perfectly well run and sell good products/services, but which do not exhibit characteristics that prevent other companies from competing away there profits if they start earning attractive returns. Most companies fall into this category. Legacy Moat-Dividend: A company that is insulated from competition, but does not have much opportunity to grow through reinvesting cash flow. So they pay most of their cash earnings out as dividends. Legacy Moat-Outsider : A company that is insulated from competition, but does not have much opportunity to grow through reinvesting cash flow. So they deploy their cash flow in service of acquiring other companies as well as paying dividends and opportunistically buying back stock. Reinvestment Moat: A company that is insulated from competition and has the opportunity to reinvest their cash flow into growing the business. Capital-Light Compounder: A company that is insulated from competition and has the opportunity to grow but which doesn't need to reinvest much cash to do so and is, therefore, able to return cash to shareholders even while growing. The stability of the moat in time is a critical factor. Economic moats are rarely stable; they get a little bit wider or narrower every day. There is a relentless regression to the mean in which the companies' moats fade and returns trend towards the industry average. In this matter, all industries are not created equal. Some industries have fast regression to the mean, such as the food and beverage industry, while others are slower such as the banking industry. More importantly, the long-term average mean differs between terrible sectors such as real estate or utility and good ones such as software or professional services. Anyway, there are always great defensible businesses in good as well as bad industries. Michael J. Mauboussin did the above analysis in an article I highly recommend reading: Measuring the Moat: Assessing the Magnitude and Sustainability of Value Creation (2016). I like Mauboussin's work, which showcases a framework for analyzing different industries and companies' positions in the value chain. Mauboussin starts by creating an industry map to understand the competitive landscape and, very importantly, the distribution of profits over time. Focusing on profits is crucial because there are businesses that build great products with millions of users but no ability to generate profits. Mauboussin then measures the industry stability, its attractiveness based on Porter's five forces, and tries to assess the likelihood of being disrupted by innovation. Pro tip: he provides a checklist of questions for assessing value creation page 53 . I think it's an analysis all companies should perform to understand their business. Ok, ok, it is a lot. What did we learn? Choose a highly profitable and fast-growing market Create a product well-positioned in the value chain to capture profits Focus on the company's uniqueness to avoid competition Keep reinforcing the competitive moats Reinvest cash at a high rate of return

Business

0 views

Nicolas Bustamante 3 years ago

Panic in Startupland!

Startupland is in panic mode. Months ago, pundits lectured about the new normal; now, they are counting their losses. What happened? The new normal was a bunch of things: 100x ARR fundraising multiple, IPO shares that doubled on the first trading day, cash flows considered a bad disease, large secondary round for founders and insiders, hedge funds flooding the later-stage market, meme stocks that 10x overnight, oversubscribed fundraising rounds every six months, acquisitions with over-valued stocks, and more strangeness I can't even recall. The pundits argue that it was caused by an acceleration of the use of software after COVID. Work from home, Zoom calls, and quick digital transformation supposedly unlocked trillions of value. The software industry was bigger than ever, so valuations surged, and the revenue was supposed to match... in the future! I was skeptical. I wrote in How to Beat the Market : In a bull market, speculators believe that "this time is different" which leads to exuberance. A bubble is characterized by the fact that people believe that some new development will change the world, that patterns that have been the rule in the past, such as business cycles, will no longer occur or that the rules regarding valuation norms and standard of value and safety have changed. More often than not, the time is no different and the pendulum switches back. My opinion is the following: central banks lowered the interest rate, flooding the market with money and creating unstoppable inflation and asset speculation. When cash value decreases fast, investors rush to find productive assets to preserve purchasing power. Too many trillion dollars chasing a few assets sent prices to the moon. Central bankers created a gigantic misallocation of resources - as they have always done since the dawn of time. What is happening now? Central bankers increase the interest rate to fight sky-high inflation, causing the stock market to collapse. The Federal Reserve issued the biggest hike rate in two decades, and more rate rises are expected. Accordingly, tech stocks got hammered. -80% for Zoom. Are you serious? The first metaverse company and the pillar of the remote economy! It hurts. Shopify at -80%? Aren't we supposed to shop exclusively online now? Robinhood's valuation is less than all the capital they have raised: wasn't day trading Dogecoin a sure thing? Peloton at -90%? Where are all the digital bikers? Is profitability a thing now? Did Tech Twittos lie to us, or what? ): Let's talk numbers. The attentive reader will notice a sempiternal reversion to the mean. Multiples skyrocketed and are now coming back to a historical average. Nothing more. The "new normal" was bullshit talk similar to Yale economist Irving Fisher who predicted, on the verge of the great depression, in 1929: " Stock prices have reached what looks like a permanently high plateau " Well, thanks for the tip! The multiple of the enterprise value on the next twelve months’ revenue (EV / NTM) went crazy. The median multiple reached 22x in late 2021, sending the cumulative market capitalization of all public SaaS companies to $2tn! The market has since cooled off, wiping out $1tn of market cap (!!) and reverting back to the mean with an EV / NTM of 7.2x. source: Meritech trading data Suppose a SaaS company trading at the median multiple: $1 of additional revenue added $22 of enterprise value back in late 2021. It meant companies were encouraged to invest or acquire up to $22 to generate $1 of revenue. Of course, 22 was the median, and some companies, such as Snowflake, traded at 93x. The high burn was praised, and investors pushed for more spending in this unprecedented environment. What could go wrong? Today, $1 of revenue contributes to only $7.2 of enterprise value - and it keeps falling. Likely, companies didn't manage to adjust their burn rate to the sudden reversion to the mean. It means that they are destroying capital fast. What supposedly made sense yesterday is now totally dumb. The classic mistake is to base business assumptions on an all-time high and speculative market. Last October, in “ I Raise Therefore I am ”, I wrote: The high exit multiples of today drive higher fundraising valuation, but what if exit multiples go back to their historical average? The fundamental issue is that the company's fundraising valuation, terms - aka liquid prefs-, and burn rate won't adapt to the new exit environment. A lot of wealth will be destroyed that way. What does it mean for the immediate future? Fewer unicorns . The "new normal" unicorn was easy. Reach $10M ARR by burning tons of cash, raising at 100x ARR, and voila! A $1bn valuation in Techcrunch. Now, according to a prominent VC, Matt Turck : " to justify a $1B valuation, a cloud unicorn today would need to plan on doing $178M in revenues in the next 12 months if you apply the current median cloud software multiple (5.6x forward rev). " Well, it's finally not that easy to be worth $1 billion. By the way, you should follow Matt, who is smart and hilarious. His latest tweet: 10:52 PM ∙ May 19, 2022 3,532 Likes 533 Retweets Unit economics matters. If you have a high burn rate, low business efficiency, and a short runway... Houston, there is a problem! You just went from being the best in class to the worst in class. Pay attention; teachers changed the rule! Gross margins, net dollar retention, EBITDA, CAC, burn multiple... all that suddenly matter. A lot. When free money stops, businesses discover the underlying quality of their operation. Startup bankruptcies . Keith Rabois put it simply : " If you have a high burn rate and have raised money at high prices, you're going to run into a brick wall very fast ." The end of free money. Who could have imagined that? Startups with a high burn rate that don't manage to become profitable will die like all bad businesses. Expect a hiring freeze and mass layoff. It's the ruthless natural selection of capitalism. Startups got capital killed . This one is subtle. If the startup raised at a crazy valuation, the cash is still here, but the equity might be worthless. It's the Uber, Dropbox, Oscar Health, Lemonade, Robinhood (etc etc etc) common scenario. For instance, Oscar Health raised a total of $1.6bn for a current valuation of $1bn. Many startups that raised money in the last two years will have the same fate. Many VCs will go burst. In the investment biz, price matters. Pay too much, and you will never see your money again. Tiger Global, the daring "new normal" king, took a $17 billion hit on its investment - the biggest dollar decline for a hedge fund in history, according to FT. Ouch. Many VC firms will struggle to raise additional funds and close their door. Investments in venture funds are already dropping (-19% quarter-over-quarter according to CB Insights ). Wow, brutal learnings. Cash flows matter, burn matters, gross margin matters, churn matters, and valuation matters. Building a good business matters. Is your SaaS startup skewed? How to quickly access the damages without spending days in a war room? Introducing the hype ratio and the burn multiple . Hype Ratio = Capital Raised / Annual Recurring Revenue. Burn multiple = Net Burn / Net New ARR If these ratios are superior to 3 then there is a problem. It’s not uncommon to see startups with a 10+ ratio these days. One day you were riding a unicorn only to learn, later on, that you were riding a pig! Reversion to the mean is inevitable. I would say that it's a severe crisis only when multiples go below their historical average. Most people look at returns on paper without asking if the price makes sense in the first place. Price on fundamentals such as revenue or earnings means a lot, especially in their historical context. A few years of irrational exuberance don't make a market. The crisis is, for now, pretty light. The 2008 crash sent the NASDAQ to a level last seen in July 1995! 13 years of stock appreciation were gone! Boum. Today's crash sent the NASDAQ to a level last seen... last year. Well. The market is still expensive . NASDAQ Composite - 45 Year Historical Chart So, how did we get there? Why so many people are surprised (and totally broke) by a classic reversion to the mean? The irrational exuberance of a bull market and a reversion to the mean are common. What is striking each time is the willingness of market participants to bullshit themselves about a new paradigm to jump into the speculative market. A lot of investors and founders jump all in into the cliff. Why is that? Part of the answer is greed and envy. I wrote in How to Beat the Market : Greed is an extremely powerful force that overcomes common sense, prudence, and memory of painful past lessons. It's hard to stay prudent when every speculator around is enjoying significant profits. The combination of the pressure to conform and the desire to get rich cause investor to drop their independence and skepticism which leads to their capitulation by buying into the speculative market. Greed is a drug that affects the investor's rational thinking while envy forces investors to comply with the herd The other part is the incentive, the powerful force that drives the world. Investors who gamble into the speculative market get short-term rewards such as high mark-ups thanks to later fundraising rounds that allow them to raise more money from LPs and enjoy more fees. Founders enjoy large cash-out and money to fuel their business regardless of the unit economics. It’s easy to compromise the long-term future to pocket a big cheque. What about buying a $133M mansion after having sold for $292M on the IPO day? Short-term payday, long-term hell. Ok, what about the silver lining? Good business will have the time of their life. Fast-growing, profitable startups will have the opportunity to buy out struggling competitors, invest in an environment where CAC will decrease, and hire people with fair compensation packages. Startups with positive cash flows are the cool kids again - until the next exuberance of course. It will be fine. Stock market crashes come and go. It's a good reminder that the value of a business is the net present value of its future cash flows. Cashflows are king (I know... such a boomer mindset!) This makes me even more admirative of Warren Buffett or Mark Leonard type of people. Strong-willed people who think for themselves and have the guts to resist short-term temptations. They are the ones ridiculed, the ones not invited to fancy dinners, the ones not covered in magazines, but they are the ones who win. They fight the institutional imperative, the peer pressure, preserve a margin of safety, and are fearful when others are greedy and greedy when others are fearful! There are the Intelligent CEOs . If you found this article valuable, please consider sharing it 🙌

Business

0 views

Nicolas Bustamante 3 years ago

Where is Your Firephone?

Hello, I am Nicolas Bustamante . I’m an entrepreneur and I write about long-term company building. Check out some of my popular posts: The Intelligent CEO The Impact of the Highly Improbable Surviving Capitalism with Competitive Moats Subscribe to receive actionable advice on company building👇 Subscribe now Have you ever heard about the Firephone? Amazon spent hundreds of millions building a revolutionary smartphone but discontinued commercialization in 2015, one year after its introduction. For most, the Firephone was a massive disappointment; for Amazon's CEO, it was a healthy failure contributing to Amazon's success. I agree, and one of my favorite questions to scale up is now: where is your Firephone? After reaching their product-market fit, early-stage startups focus on improving one product. The ambition is to add new features quickly to satisfy early customers. The constant iterations lead to fast revenue growth, high net promoter score, low churn, and high upsell. After a while, the main product isn't sufficient to drive additional growth, and it's required to develop new product lines. These new product initiatives leverage the existing technology and the current competitive advantages. These sustaining innovations are adjacent to the initial developments and sold to existing customers generating new sales and often higher margins for the company. It's relatively straightforward to develop adjacent product lines because customers often ask for the developments, and it's easy to make financial projections before investing. As time goes by, it's harder to grow through these types of sustaining innovations, and companies have to embrace radical innovations. Developing radically new products is challenging but crucial in highly competitive markets. Failure to innovate leads to irrelevance, and thus bankruptcy. Such an innovative process is challenging because there is so much uncertainty about future success. The investment required is more significant than sustaining innovations, while the margins seem inferior to the core product. Additionally, radical innovations might not target current profitable customers but niches in a new fast-growing market. All of this leads to a substantial career risk in most companies because the team might be held accountable for the lengthy and costly failures that happen along the way. The paradox is that managers who successfully launched incremental innovations failed to commercialize radical new product lines. Using the same processes of analyzing risk-adjusted returns and talking to customers, they weed out disruptive product initiatives that are key to tomorrow's growth. Overcoming these challenges requires leaders who understand the power-law distribution of returns. Few product successes will cover the cost for many failed ones. In the words of Jeff Bezos : " a small number of winners pay for dozens, hundreds of failures. And so every single important thing that we have done has taken a lot of risk-taking, perseverance, guts, and some of them have worked out, most of them have not. " There is a world between understanding the power-law distribution and creating a company culture that rewards big, bold bets. It requires adopting a long-term perspective, accepting to lose tons of money, and waiting patiently for that outlier to generate a significant outcome. Additionally, the best organizations fail early, fail often, and don't gamble the company's future on one product launch. They seek positive optionality over time with a low downside and a big upside! From my perspective, the innovative culture is one of the most impressive things about Amazon. They started as an e-commerce company but now dominate the cloud industry with AWS and push the frontiers of hardware with Kindle, Alexa, or AmazonGO. All that while losing billions of dollars due to failed product innovations! And you, where is your Firephone? If you found this article valuable, please consider sharing it 🙌

Business

0 views

Nicolas Bustamante 3 years ago

I Joined Figures as an Advisor

Hello, I am Nicolas Bustamante . I’m an entrepreneur and I write about long-term company building. Check out some of my popular posts: The Intelligent CEO The Impact of the Highly Improbable Surviving Capitalism with Competitive Moats Subscribe to receive actionable advice on company building👇 Subscribe now Six months ago, an ex-teammate and friend, Grégoire, introduced me to Virgile to discuss his startup. Figures is a software to benchmark, review, and communicate compensation plans. I happily jumped on a call and did my best to answer Virgile's questions by asking him more questions. I then sent him a follow-up email sharing my experience and adding several articles to challenge his perspective. After this initial contact, a strange thing happened. Having read all the articles, Virgile replied and asked more thoughtful questions. I answered his questions for what I thought would be the end of the conversation, as it often happens. We continued our discussion, and before I knew it, we were talking about competitive moats , fundraising , OKR methodology , hiring plan , culture , and much more. That was the beginning of my role as an advisor for Figures. I like working with Virgile, an intelligent, humble, hard-working, and funny person, as well as Bastien, Figures' CTO, who independently created an outstanding product that rivals well-funded US competitors. I am grateful to work with them because I like the team and Figures' product matters. I have experienced firsthand the pain of crafting and communicating compensation plans. My company suffered from unnecessary tensions because we lacked the data on specific positions. Regrettably, our business plan second-guessed potential salaries and yearly raises, generating countless mistakes. I remember my team struggling to gather sufficient and up-to-date data per job and seniority. The manual process was painful and inaccurate. With Figures , accessing compensation market data is seamless. I love the dashboard that showcases a compensation index for every department, job, and seniority. Figures also emphasizes a gender-equality index that gave us a fresh perspective on this crucial topic. Today, I use Figures' explorer to browse market compensation data whenever I doubt a specific comp. Even better, Figures has a tool to compare a candidate expected comp to our compensation plan and the whole market. Better information drives better decisions for companies and employees alike. Figures' product is essential, and it solves an age-old problem for companies. Figures is creating a new market category by adding reviewing and communication features within the traditional benchmarking software. Virgile and Bastien have already built solid competitive moats with a valuable data network effect, a strong channel partnership distribution, and great integrations to payroll software. Working on Figures has made me a better builder. While I believe founders should focus on nothing but their early-stage business, late-stage founders should probably work with other startups. At some point, analyzing other's people business is essential to step up and broaden one's perspective. Helping Figures gave me great insight on how to improve my own business. I'm writing this to put my reputation at risk . Because I'm available whenever the founders want to tackle a challenge, it will be my responsibility if Figures doesn't reach its ambitious goals. Nevertheless, any success is due to the Figures' team's hard work and dedication. They are the ones building, selling, and operating the business to create value for their customers, and they are crushing it! If you're working in Europe on compensation plans, do yourself a favor: schedule a demo with Virgile and buy Figures . If you're an ambitious startup builder, consider joining Figures. They are hiring, remotely or in Paris, an engineering team leader, a senior product designer, a CSM, a head of marketing, and country launchers for the UK, Spain, and Benelux . If you found this article valuable, please consider sharing it 🙌

Business

0 views

Nicolas Bustamante 3 years ago

Look How Big My Team is!

Hello, I am Nicolas Bustamante . I’m an entrepreneur and I write about long-term company building. Check out some of my popular posts: The Intelligent CEO The Impact of the Highly Improbable Surviving Capitalism with Competitive Moats Subscribe to receive actionable advice on company building👇 Subscribe now I'm used to getting asked how many employees work at my startup. For most people, it's a quick way to assess a startup's success. Supposedly, the bigger, the better. But does it really reflect success? To cut to the chase, headcount is a lousy metric. Paul Graham has a great tweet that says: " When people visit your startup, they should be surprised how few people you have. A visitor who walks around and is impressed by the magnitude of your operation is implicitly saying, "Did it really take all these people to make that crappy product? " I agree with him, and I'm often surprised by how many people and how much capital it has taken to build mediocre businesses. Headcount is a vanity metric similar to followers on social media or fundraising amount . Plenty of examples exist where small teams outcompete large teams in the same market. Small teams leverage speed as a competitive advantage which is key to winning over competitors. Other benefits include better communication, more engagement, more profits to reinvest, and, overall, better productivity. Small groups avoid the Ringelmann effect , which is the tendency for individuals to become increasingly less productive as the size of their group increases. More importantly, the headcount constraint drives creativity and innovative solutions. Inflating one's team is frequently a bad idea because throwing more people at a problem doesn't solve the problem faster. It often leads to what I call the " hiring death cycle ." A startup faced with a problem tends to hire more people, making it harder to solve the problem and thus requiring even more hires. The death cycle is reinforced by the " next hire fallacy ," in which, supposedly, the next recruit will suddenly solve the problem. It's common to fall into this vicious cycle, and hard to break out of it. Controlling headcount expansion when experiencing fast growth is challenging. Targeting an efficient number of teammates is tricky because it's difficult to know and test when fewer people can achieve more. Additionally, managers have an incentive to grow headcounts as it means more responsibility, prestige, and better compensation. Entrepreneurs and investors often push to hire more as it gives an impression of faster progress. However, the key is to focus on the business's performance over time. An excellent way to assess the efficiency of a business is to compare its revenue to the number of employees. The best companies increase their revenue per employee as they scale, making their revenue grows faster than their cost. One of the critical metrics for SaaS startups is the annual recurring revenue (ARR) divided by full-time employees (FTE). For instance, the median ARR per FTE ratio for private SaaS startups in the $10-$20M revenue range is $138,889. The same benchmark exists for all publicly traded SaaS , with the median being $260,045. The benchmark exemplifies that great companies always increase their ARR/FTE over time. So, one of the few reasons to ask for the startup headcount is to compare to its revenue and quickly evaluate its soundness. Say a fast-growing SaaS startup has 80 employees and a $15M annual recurring revenue. It implies an excellent $187,500 ARR/FTE ratio, and it's likely a fantastic business. Voila! You can now better assess the success of most startups. If you found this article valuable, please consider sharing it 🙌

Career

Business

0 views

Nicolas Bustamante 3 years ago

The Inflated Price of my Burrito

Hello, I am Nicolas Bustamante . I’m an entrepreneur and I write about long-term company building. Check out some of my popular posts: The Intelligent CEO The Impact of the Highly Improbable Surviving Capitalism with Competitive Moats Subscribe to receive actionable advice on company building👇 Subscribe now This is a story about inflation. Now and then, I go to the Taqueria and order a delicious burrito. A few weeks ago, I noticed the price went from $10 to $12, a whopping 20% increase. The honest business owner was forced to pass on the rising cost to pay for ingredients, electricity, rent, and labor to the customer to avoid bankruptcy. Knowing that the disposable income of its customers didn't rise per twenty percent overnight, the Taqueria's owner tried different solutions before raising its price. First, he reduced the size of the burrito and the quantity of food inside. Second, he decreased the quality of the ingredients by buying cheaper avocados, beans, and meat. Assuming he reduced the quantity by 10% and the quality by 15%. It means that, overall, the burrito is now 45% more expensive. After finishing up my burrito, I wondered how bureaucrats estimate inflation. The general price level is calculated using the Consumer Price Index (CPI). It's an index of a weighted average basket of consumer goods and services purchased by households. CPI is a proxy for inflation and thus is a crucial metric. People and businesses use CPI for their economic calculations. For instance, if prices rise by 6%, people need their disposable income to increase by six percent to maintain their living standards. Correspondingly, if a business targets a 3% real rate of return, with a six percent inflation, the nominal return target should be 9%. The same reasoning works with macro indicators with, for instance, the real gross domestic product (GDP) is the GDP inflation-adjusted. It's hard to calculate inflation. It requires compiling a lot of prices, adjusting for a decrease in size or quality, considering new products, and determining a relevant basket of goods for the average household. Needless to say that it's impossible to calculate inflation precisely, and over the past decades, the government has changed many times the way it calculates the CPI. Let's consider a striking example. Historically, CPI was calculated using a fixed basket of goods between two periods. If the same quality and quantity of burrito cost $10 before and $12 now, it's a 20% increase. However, nowadays, the CPI considers the change in purchases in response to price evolution . If I substitute my burrito for a "similar good" worth $10, the increase is 0%. The relevance of this change is controversial. The government has an incentive to lower the CPI. A lower CPI implies a higher real GDP, suggesting that the economy is more robust than the reality. It also reduces the state's expenditures by paying less social security beneficiaries or civil servants. Furthermore, it allows the government to print more money and add more debts misleading investors into thinking that bonds yields are positive. Incentives drive the world, so, understandably, people doubt the government's data on inflation. Estimating inflation is challenging, and no metric is perfect. However, one thing is sure: governments always inflate the money supply to fund their expenditures. The history of money is the chronicle of inflation engineered by governments . The dollar, one of the most stable currencies on the planet, had an average inflation rate of 3.87% per year between 1973 and today , meaning that the purchasing power of one dollar in 1973 is now 0.15 dollars. Money printing shrinks the middle class, increases the gap between the rich and the poor, and distorts economic calculation, preventing society's harmonious development . Looking at the U.S. money stock measure, the so-called M2, the Federal Reserve has recently printed a lot of money. It appears to be in line with the financial assets bubble and the rise of consumer prices. The question is then: how to survive inflation? If you found this article valuable, please consider sharing it 🙌

Data Analysis

Data

Business

0 views

Nicolas Bustamante 4 years ago

Bitcoin is Hope for Humanity

Hello, I am Nicolas Bustamante . I’m an entrepreneur and I write about long-term company building. Check out some of my popular posts: The Intelligent CEO The Impact of the Highly Improbable Surviving Capitalism with Competitive Moats Subscribe to receive actionable advice on company building👇 Subscribe now Today, let's discuss why the government's manipulation of money is the biggest obstacle to the harmonious development of civilization and why, in the middle of the current monetary chaos, Bitcoin is hope for billions of people. Governments fund their expenditures by taxing citizens and companies. Tax as a percentage of GDP varies between, for instance, 27.1% for the United States and 46.2% for France . One might think that collecting that many private resources is enough. Taxation is, however, unpopular and frequently precipitated revolutions , so governments had to find another way to extract more resources: printing money. It's an inalienable law that governments, throughout the ages, inflate the money supply to fund their expenditures. From kings seizing the monopoly of minting coins to debase the currency to modern fractional reserve banking, the history of money is the chronicle of inflation engineered by governments. The monopoly on money production was never intended to offer people good money. Inflating the money supply means that the purchasing power of that currency declines over time. As more time passes, people holding cash are getting poorer. The dollar, one of the most stable currencies on the planet, had an average inflation rate of 3.87% per year between 1973 and today , meaning that the purchasing power of one dollar in 1973 is now 0.15 dollars. Today, 1.4bn people are living under double digits or more inflation per year . If you live in Argentina, for example, the yearly inflation is over 50% . Raging inflation is one of the most significant humanitarian crises of our time. The poor suffer the most from inflation because they don't own assets to offset the decline of the purchasing power of money. On the contrary, the rich are getting richer because assets value, such as stocks, rises faster than the consumer price index . Money printing shrinks the middle class and increases the gap between the rich and the poor. Inflation is thus a violation of property rights leading to an illicit redistribution of income from the poor to the rich. But there is even worse than that; manipulating the money supply destroys the social cooperation between individuals. A free society is based on the voluntary exchange of property rights between people through money. Prices act as signals to allow economic coordination for the utilization of resources . If the price of a specific good rises, people modify their behavior to buy less or find a way to produce more of the good. Messing up the value of money creates a gigantic misallocation of productive resources because the capitalist system ceases to work. In the form of freshly printed money, the new purchasing power creates a wealth effect leading to systematic errors in both production and consumption. Entrepreneurs take on risky loans to invest in unviable projects, and consumers spend on unnecessary goods. People and businesses can't refrain from participating in the widespread misallocation of resources because they follow market incentives that are alas distorted . The wrong economic calculation throws society off balance and prevents its harmonious development. When the inevitable crisis hit, governments print more money to postpone the unpopular recession, encouraging the vicious cycle . Politicians don't care about the disastrous long-term effects as long as the short-term morphine shot allows them to win the next election. The boom and bust cycle led by the manipulation of money and credit seems endless. Nobel Prize recipient Friedrich Hayek stated that : " that there was no hope of ever again having decent money, unless we took from the government the monopoly of issuing money and handed it over to private industry. " Many economists and experts claim that issuing private money isn't possible. Relying on the government's paycheck makes them disregard history. According to Murray Rothbard , money is a " useful commodity chosen by the free market as a medium of exchange ." Many commodities have been used historically as money; for instance, tobacco in colonial Virginia, sugar in the West Indies, or copper in ancient Egypt. The free market chooses the most convenient commodity, and the law of supply and demand determines the money's purchasing power. For the past several centuries, humanity has settled on gold because of its fungibility and scarcity . The gold standard was also the only method found to place discipline on the government's money printing. Alas, over time, governments did their best to separate the name of the paper currency for the underlying commodity backing it. The US eventually devaluated the dollar relative to gold in 1934 before announcing in 1973 the "temporary" suspension of the redeemability of dollars into gold . Since then, we have lived in a new monetary experiment called the fiat standard, where money is only backed by the credibility and threatening power of the issuer. Enter Bitcoin, the cryptocurrency born out of the 2008 crisis ashes and the infamous government bailout of banks with taxpayers' money. Bitcoin is the largest and most successful monetary experiment in human history. It's an engineered commodity designed to become the internet's native currency. With a limited and auditable supply of 21 million coins, Bitcoin is the soundest money on the planet. The best part is that there is no monetary authority to inflate its money supply, and the network's decentralization makes it censorship-resistant. Bitcoin is today used as a store of value. The limited supply and the rising demand for a scarce and censorship-resistant asset in an inflationary environment lead to an appreciation of the price of Bitcoin. The credibility of Bitcoin lies in its monetary policy backed by maths, not politicians . It has never been easier to be your own bank which is good news for the 1.7bn people who are unbanked. Today the $1tn asset class is a saving technology for millions of individuals who buy and self-custody their bitcoin. The short-term volatility is not an issue when your time horizon is in decades. It's, however, a brake on the adoption of bitcoin as money because there is no incentive to spend money that appreciates over time . People use their worthless fiat to acquire sound money that they don't spend because its value rises over time . Adding to that, not so many merchants accept Bitcoin today. I believe that the hyperbitconization of the world will come thanks to the usage of the bitcoin protocol as a new payment infrastructure for the internet. The underlying protocol of Bitcoin the currency might be the new payment rail of the internet. The bitcoin protocol is already a competitive way to transfer value over the internet . Today's average transaction fee is $3 , meaning that the cost is 1% for $300 and only 0.1% for $3000. Even better, the lightning network, a payment protocol on top of bitcoin, allows the transfer of bitcoin instantaneously and for an average base fee per transaction of 1 satoshi meaning $0.0006125680 . It implies the bitcoin protocol combined with the lightning network is the fastest, cheapest, most resilient open monetary network on the planet. I venture to guess that people and merchants will use the lightning network to transfer value over the internet and then settle in their local fiat currencies. Consider an example with Bob, who lives in the US and wants to buy a SaaS product from Alice, who lives in Turkey. Bob will use an app to convert his USD to bitcoin and send them over the lightning network to Alice, who will immediately exchange the Bitcoin to Turkish Lira. The lightning network outcompetes all other closed payment networks such as Visa, Paypal, or China Union Pay. Because Bitcoin is a saving technology, people won't convert everything into their fiat currency. The bigger Bitcoin will grow, the more stable it will be, and the fewer people will choose fiat paper money over the soundest money. I'm excited about this ongoing process. Bitcoin has been the most critical technology protocol since the advent of the internet. It's our best chance to fix our monetary systems and thus to end the boom and bust economy from which we all suffer. There are thousands of reasons why it might not work. However, if there is one chance it might succeed, it's one of the most critical projects worth working on today. It seems hard to believe today, but Bitcoin might achieve the separation of the government and the money. Quoting Hayek : " Three hundred years ago, nobody would have believed that government would ever give up its control over religion, so perhaps in three hundred years we can see that government will be prepared to give up its control over money. " The government's monopoly on money printing prohibits competition for issuing sound money. Bitcoin gave birth to a new industry of competing blockchains and tokens. We now have a global free market for issuing monies, which is a historic achievement. Entrepreneurs now compete fiercely to provide the most valuable currency to people. Providing money is a winner takes all market, and today no token comes close to Bitcoin's trustworthy monetary policy, decentralization, brand, security, and network effect. Bitcoin is hope for a better future. Anonymous survey: What would have made this article more helpful? If you found this article valuable, please consider sharing it 🙌

Business Economics

0 views

Nicolas Bustamante 4 years ago

Life Update

Dear friends, I have professional and personal news to share. I co-founded Doctrine almost six years ago. Doctrine exists because democracy dies in darkness. Our mission is to make legal information accessible and understandable. With the best information at the right time, lawyers can refine their practice and seek justice for their clients. We are building a legal information platform used daily by thousands of delighted customers. By doing so, we improve our society's legal infrastructure, which is key to creating wealth. Last year, I decided to transition from CEO to Chairman of Doctrine. My long-time colleague Guillaume is now in charge of running Doctrine while I focus on strategic topics at the board level. I'm delighted with the move. Guillaume does a great job strengthening Doctrine's competitive moats and scaling Doctrine to a point where we have a fast-growing and profitable business. Taking a step back allows me to see our business from other perspectives, which is critical to success. I have also had the space to reflect on our game plan to build a long-lasting company. I've had the opportunity to discuss ideas with many of you as well as to write articles to guide Doctrine's strategy - How to Reinvest Earnings , How to Buy Back Shares , How to Survive Inflation , The Intelligent CEO , Surviving Capitalism with Competitive Moats , I Raise Therefore I am , and many more . I started Doctrine when I was 21, and it has been a wild ride. I want to thank all our customers, investors, Raphael and Antoine, my co-founders, and our fantastic team. It is also important to say that this is not goodbye; I haven't sold any of my shares over the years, and I don't intend to do so today. I'm bullish on what we are building, and I couldn't be more excited about the next decades to come! On a personal note, I finally had the occasion to marry my fiancé, Natalie, this summer. We organized a civil ceremony in Texas, and we are now eager to celebrate with all our friends in France in the future. We also took the opportunity to complete a long-discussed topic: moving back to the US. We have just settled in San Francisco to begin a new chapter in our life. We love the beautiful city, the incredible access to nature, and the density of smart and kind people. If you're in town, please reach out!

Career

Business

0 views

Nicolas Bustamante 4 years ago

Death by Fundraising

A guy starts a company and invests $1bn. The company is worth $1bn. A startup guy starts a company, creates an insanely great product, hires a stellar team, and seduces millions of users. The company is worth $300M. WTF happened? The startup guy got capital killed, a slow and painful death by fundraising. It's a mega bull market; no day passes without announcements of record fundraising coupled with skyrocketing valuation. Time to get rich quick! On the other hand, veteran investors warn how hard it is to earn returns in startup investing. Garry Tan commented, " I never really paused to consider how rare it is for venture capitalists to actually be successful at what they do ." Oops, I thought it was easy! Venture capital (VC) is a form of private equity financing. To invest in high-growth potential startups, VC firms pool money from limited partners (LP) who are high-net-worth or institutions like pension funds or university endowments. Because of the significant risk, LPs expect a rate of return of at least 3x, meaning a $100 million fund has to return $300m. VC firms' returns follow a steep power law. Andy Rachleff , the co-founder of Benchmark, a top-tier VC firm, writes that "about 3 percent of the universe of venture capital firms – generate 95 percent of the industry's returns." There is also a strong path dependence where " the composition of the top 3 percent doesn't change very much over time." In short, most VC firms barely return investors' money after fees, while the best firms tend to seize all the returns. Looking at recent data, Seth Levine notes that " despite the historic market we've had in the past ten years, and the huge deals often highlighted in the press, venture capital returns haven't shifted much. " VC firms need billion-dollar acquisitions to match the expected returns . Most VC investments will come at a loss, and only billion-dollar outliers can cover the loss and, hopefully, generate returns. Even Ron Conway's angel fund , which invested in seed in Google, only broke even - meaning a 0% IRR. This business is extra tough! The recent bull market led VCs to fund companies with a lot of money in the hope of creating monster businesses. The more capital, the better. Massive fundraising rounds are celebrated, and unicorn entrepreneurs are on magazine covers. The bigger the better right? Let's consider these startups. They all have raised more money than their public market valuation in June 2022. Now, industrial failure happens. It's even the common state as most startups go bankrupt. However, my point is that these "good" businesses became "terrible" businesses because of their fundraising strategy. Breaking news: one can build a monster business without tons of capital infusion. Additionally, entrepreneurs and their teammates don't need to raise money to be successful. There are plenty of great bootstrapped startups. Some of them are selling for billions of dollars, such as Mailchimp which sold for $12 billion to Intuit . It's even more likely to experience better financial success with reasonable funding. Quoting Eric Paley : " the Huffington Post was reportedly acquired for $314 million, and Arianna Huffington made about $18 million. Michael Arrington sold TechCrunch to the same buyer for $30 million and reportedly kept $24 million. To a VC, TechCrunch's sale would have been a "loss," and many VCs would have pushed Michael not to sell. Yet Arrington was more successful, financially, than Huffington ". To sum up, it's possible to sell a startup for a billion dollars and make less than someone who sells theirs for $100 million. Alas, many $100 million companies fail because they raised capital as if they would become multi-billion-dollar unicorns. The above startups built massive businesses without a lot of cash infusion. They focused on cash flows, not raising money. Most of today hot Series C startups have already raised more money. What could go wrong? Raising too much might kill your startup. While fundraising, most founders get too excited by the economics, namely the dilution, which is a function of the valuation and the amount raised. They forget something essential: terms and conditions. One of the essential terms is liquidation preferences. It states that in the event of a sale for less than the last valuation, investors and preferred stockholders get their money back first. A 1x liquid pref means that investors who invested $x, must be paid back $x regardless of their equity ownership. Likewise, a 2x liquid pref means that investors must be paid back $2x the invested amount. Liquid prefs often wipe out the entire cap table. Let's consider an extreme scenario with a startup that raises $10m at a $50m valuation with a 2x liquid pref. If the company sells for $20m, then investors will take 100% of the payout. That's why terms matter more than valuation. Charles Yu wrote an excellent guide to liquidation preferences . These situations are, unfortunately, all too common. According to a study, 66% of venture-backed startups that exited in the last decade did not return any meaningful capital to management . It's not unusual, as we have seen, for a startup to sell for less than the amount raised. We saw in the previous examples that the exit amount is a significant factor in the payout determination. A fundraising strategy should take into account the potential enterprise value at exit. It's, of course, impossible to predict in such an uncertain game, but industry and geographical data can provide perspective. For instance, big exits are concentrated in the US , meaning raising a lot without operating in the US is hazardous. Another essential factor to keep in mind is the exit multiple and their volatility. Exit multiple is a method of calculating enterprise value using a revenue multiple. These multiples are cyclical, which is dangerous. For instance, the median Private B2B SaaS startup was acquired for 5x its revenue in 2016 compared with 12x today in Oct 2021. The high exit multiples of today drive higher fundraising valuation, but what if exit multiples go back to their historical average? The fundamental issue is that the company's fundraising valuation, and terms - aka liquid prefs- won't adapt to the new exit environment. Same for burn rate which is very hard to readjust. A lot of wealth will be destroyed that way. Suppose a company with a $10M annual recurring revenue (ARR) business. This startup will today raise at 50x revenue (or 100 to 200x here in San Francisco, lol), implying a $500M valuation. The VC valuation is higher than today's potential exit value of $120m (12x median revenue multiple) because the assumption is that the company will grow its revenue and thus create value for the investors. However, if the exit multiples go back to say 5x, even with an incredible performance of $40m in revenue, the company will be only worth $200m at the exit. Oops. This dramatic example doesn't even discuss that not all growth is created equal. Consuming a lot of capital to create few revenues is a recipe for disaster. As Paul Graham wrote: " If you raise money you don't need, it will turn into expenses you shouldn't have. " Some startups are like dead stars, they shine for now, but they are already dead. A good company doesn't need a lot of capital to expand. The never-ending fundraising rounds often mask an unsustainable business in which every stakeholder is ruined in the end. Why are there so many fundraising rounds, even if it's a challenging endeavor? The answer lies in the incentives. Venture capitalists have an incentive to deploy their cash. VCs take home a 2% management fee on committed capital, meaning they have an urge to deploy the money and raise a larger fund. On the other hand, founders want to raise to fuel their business with cash in the hope of defeating competitors. Additionally, employees are pressuring management to raise money because it means higher wages. A big fundraising round is the ultimate form of social status in the startup world. It gives an appearance of success; your partner, friends, and family think that you're successful. As if that wasn't enough, founders nowadays take significant cash out when raising money. It might kill the company tomorrow, but the millions in the bank account today are real. It's hard to invest or build a successful business; fundraising is a complex and dangerous art that must be thought through carefully. While first-time founders are obsessed with how much they can raise, successful second-time founders wonder " how little money we can raise ." The strategy of raising little money as possible is backed by evidence. The authors of Overdosing on VC: billion-dollar: Lessons from 71 IPOs noted that: " by examining the technology IPOs of the past five years, we found that the enriched (well-capitalized) companies do not meaningfully outperform their efficient (lightly capitalized) peers up to the IPO event and actually underperform after the IPO. Though increasingly unfashionable in the unicorn era, it is quite possible, and perhaps even advisable, to build a billion-dollar publicly-traded company with under $50M in venture capital. " I'm not against fundraising; my own business raised two rounds led by VC. Venture capital is a fantastic tool to kick off risky business and change the world. The key is to balance the founders' interests and VC's objectives properly because both are not aligned. The other balance to master is that more capital doesn’t mean a better business. Sometimes, cheap capital is available, but it’s better not to take it. It all depends on the context; in the end, however, nothing beats cash flows . Stay safe out there and think long-term! Leaving the last word to Warren Buffet: " If it seems too good to be true, it probably is. " If you found this article valuable, please consider sharing it 🙌

Business

0 views