Posts in Sql (20 found)

Prolog Basics Explained with Pokémon

The project that inspired this post is a little silly—I am about to describe the mechanics of a children’s video game in great detail—but this particular problem is what finally made Prolog click for me, an epiphany I’ve been hunting for ever since reading Bruce Tate’s “Seven Languages in Seven Weeks.” This exercise has taught me a lot about the kinds of interfaces I’m trying to build in somewhat more practical domains . For certain kinds of relationships, logic programming is by far the most concise and expressive programming system I’ve ever used. To understand why, let’s talk about Pokémon. Pokémon is a video game series/multimedia franchise/lifestyle brand set in a world where humans live alongside a menagerie of colorful animal characters. “Pokémon” is both the name of the franchise and the generic term for the animal characters themselves, which all have their own individual species names. There are over a thousand distinct species of Pokémon, from Bulbasaur ( #1 ) to Pecharunt ( #1025 ). There are all sorts of Pokémon games now, but the main series has always been about catching and battling them. During a battle, your team of six Pokémon faces off against another team. Each Pokémon is equipped with four moves that it can choose to (usually) do damage to their opponent. You need to reduce the HP (Hit Points) of all your opponent’s Pokémon to zero before they are able to do so to you. Each Pokémon has unique traits that affects how it battles. They have a set of base stats, a large pool of possible moves, a handful of abilities, and a typing. As you will see in a moment, the immense number of combinations here is the motivation for trying to track this with software. Typing is especially important. Moves have a type, like Fire or Rock, and Pokémon can have up to two types. A move with a type that is Super Effective against the opposing Pokémon will do double damage; a move that is Not Very Effective will do half damage. It’s a little more intuitive with examples. The Fire-type move Flamethrower will do 2x to Grass-type Pokémon, because Grass is weak to Fire, but the Water-type move Surf will only do ½ damage to them, because Grass resists Water. Type modifiers can stack. Scizor is a Bug/Steel type, and both Bug and Steel are weak to Fire, so Fire moves will do 4x damage to Scizor. Electric is weak to Water, but Ground is immune, so if you use an Electric type move against Water/Ground Swampert , you’ll do zero damage, since 0×2 is still 0. Naturally, there is a chart to help you keep track. Those are effectively the mechanics of the Pokémon video games as I understood them when I was 8. Click moves to do damage, try to click moves with good type matchups. These games are for children and, at the surface level, they’re not very hard. Before I explain how wonky the Pokémon mechanics can get under the hood, I first need to explain how logic programming works. Pokémon is a great fit for logic programming because Pokémon battles are essentially an extremely intricate rules engine. Let’s start by creating a file with a bunch of facts. In Prolog, we declare “predicates.” Predicates define relationships: is a , is a , and so on. We refer to this predicate as , because the name of the predicate is and it has one argument. These facts are loaded into an interactive prompt called the “top-level.” You query the top-level by typing a statement into the prompt; Prolog tries to find all the ways to make that statement true. When there’s more than one possible solution, the top-level displays the first solution and then awaits user input. You can then have it display one more solution, all the solutions, or stop entirely. In this first example, we type and hit Enter. The top-level replies Squirtle is, in fact, a Pokémon. Not all things are Pokémon. Let’s add Pokémon types in there, as the predicate . Recall that some Pokémon have just one type while others have two. In the latter case, that’s modeled with two facts. Bulbasaur is a Grass type, and Bulbasaur is a Poison type; both are true. The paradigm is similar to a One-To-Many relation in a SQL database. Interactively, we can confirm whether Squirtle is a water type. Can we state that Squirtle is a Grass type? No, because Squirtle is a Water type. Suppose we didn’t know what type Squirtle was. We can ask! In Prolog, names that start with an upper-case letter are variables. Prolog tries to “unify” the predicate with all possible matches for the variable. There’s only one way to make this particular predicate true though: has to be , because Squirtle’s only type is Water. For Pokémon with two types, the predicate unifies twice. Semantically, that leading semicolon on the third line means “or.” is true when or when . Any of the terms can be be a variable, which means we can ask questions in any direction. What are all the Grass types? Just make the first argument the variable, and set the second argument to . I cut it off, but the prompt would happily would list all 164 of them. Commas can be used to list multiple predicates—Prolog will unify the variables such that all of them are true. Listing all the Water/Ice types is just a matter of asking what Pokémon exist that unify with both the Water and Ice types. Even though is a variable, in the context of the query, both instances of it have to be the same (just like in algebra). The query only unifies for values of where both those predicates hold. For instance, the Water/Ice type Dewgong is a solution because our program contains the following two facts: Therefore, subbing in for the variable satisfies the query. Squirtle, by contrast, is just a Water type: exists, but not . The query requires both to unify, so is not a possible value for . Pokémon have lots of data that you can play around with. Iron Bundle is a strong Water/Ice-type Pokémon with high Special Attack. How high exactly? With Special Attack that high, we want to make use of strong Special moves. What Special moves does Iron Bundle know? Freeze-Dry is a particularly good Special move. Here’s a query for all Ice-type Pokémon with Special Attack greater than 120 that learn Freeze-Dry . One last concept before we move on: Rules. Rules have a head and a body, and they unify if the body is true. A move is considered a damaging move if it’s either a Physical Move or a Special Move. The predicate defines all the moves that do direct damage. This will unify with any moves that do direct damage. Nothing I’ve shown so far is, logically speaking, very ambitious—just “and” and “or” statements about various facts. It’s essentially a glorified lookup table. Still, take a moment to appreciate how much nicer it is to query this database than a plausible alternative, like SQL. For the facts we’ve seen so far, I would probably set up SQL tables like this: Then query it like so: For comparison, here’s the equivalent Prolog query again: I’m not ripping on SQL—I love SQL—but that’s the best declarative query language most people interact with. It’s amazing to me how much simpler and more flexible the Prolog version is. The SQL query would become unmanageably complex if we continued to add clauses, while the Prolog query remains easy to read and edit (once you get the hang of how variables work). With the basics established, here’s some context on the project I’m working on. Pokémon battles have an outrageous number of number of mechanics that all interact in complex and probabilistic ways. Part of the appeal of these games is the futile attempt to keep them all in your head better than your opponent, using that information to out-predict and out-maneuver their plans. It’s a sort of like very silly Poker. The challenge, if you want to build software for this game, is to model all that complexity without losing your mind. Prolog is stunningly good at this, for two main reasons: To illustrate that, here’s how I implemented priority moves for my Pokémon draft league. Pokémon draft is pretty much what it sounds like. Pokémon are given a point value based on how good they are, each player is given a certain amount of points to spend, and you draft until every player has spent their points. Your team ends up with about 8-11 Pokémon and each week you go head to head against another person in the league. My friend and WMI collaborator Morry invited me to his a couple years ago and I’ve been hooked on the format ever since. The games are 6v6, so a big part of the battle is preparing for all the possible combinations of six your opponent could bring, and putting together six of your own that can handle all of them. Naturally, you can only build teams with the Pokémon you drafted. I just made that predicate my name: . What Pokémon do I have that learn Freeze-Dry ? None. Rats. One very important type of move is priority moves. Earlier I mentioned that the Speed stat controls which Pokémon moves first. Some nuance: the Pokémon that used the move with the highest priority goes first, and if they both selected a move of the same priority, then the one with the higher Speed goes first. Most moves have a priority of zero. Ah, but not all! Accelerock has a priority of 1. A Pokémon that uses Accelerock will move before any Pokémon that uses a move with priority 0 (or less), even if the latter Pokémon has a higher Speed stat. I define a predicate that unifies with a Pokémon, the priority move it learns, and what priority that move is. A simple query that asks “what priority moves does my team learn” returns a lot of answers. Although this is technically correct (the best kind), most of these answers are not actually useful. Helping Hand and Ally Switch have very high priority, but they only have a purpose in Double Battles, which isn’t the format I’m playing. To fix this, I define all the Double Battle moves and exclude them. I’m going to exclude the move Bide too, which is functionally useless. The predicate means “true if this goal fails”, and means “these two terms are different.” We get the following results: Much better, but there’s a handful of moves in there that go first because they protect the user from damage or status, like Detect . That’s not really what I mean by priority move—I’m interested in moves that will surprise my opponent with damage or an adverse side effect, like Quick Attack and Sucker Punch . With those rules in place, we arrive at a very useful answer! It’s even more useful to look up what priority moves my opponent for the week has. At this point, I showed the program to Morry and he hit me with a challenge. Pokémon with the Prankster ability get an additional +1 priority on their status moves. Could the rule be extended to note that? I happen to have one such Pokémon on my team. This took me 3 minutes, using Prolog’s if/then construct, . Now the same query includes all of Tornadus’ status moves, with their increased priority. At the top, I said that this experience had taught me about the kinds of interfaces I want to build. One of those lessons is fairly obvious: Prolog can be a little clunky, but it’s an elegant language for expressing and querying relations like the ones described here. That has implications if you, like me, are interested in the judicious use of declarative DSLs for programming. The other lesson is what kinds of tools work for non -programmers. I’m not the first person to think “it would be nice to know what priority moves my opponent’s team has.” The Pokémon community has resources like this, built in the best programming interface of all time: the humble spreadsheet. I use a copy of “Techno’s Prep Doc” , which is one of those spectacularly-advanced Google Sheets you come across in the wild sometimes. You put in the teams and it generates tons of useful information about the matchup. It has a great interface, support for a variety of formats, scannable visuals, and even auto-complete. I was curious about the formula for finding priority moves. It’s gnarly. With a little bit of clicking around, I was basically able to figure out what this does. There’s a “Backend” sheet that lists all the moves. It’s effectively a hard-coded version of my Prolog query. The lookup formula does some filtering, VLOOKUP-ing, and kinda-metaprogramming (INDIRECT returns a cell reference ) to find all the Pokémon on your team that are in that Backend list, and display them. There are a number of reasons that I, personally, would prefer to work on a version of this database implemented in Prolog instead of one implemented with spreadsheet VLOOKUPs. I plan to build webapps with this that do things the existing suite of Pokémon tooling can’t. (If I can ever get scryer-prolog to compile to WASM , that is.) Furthermore, the Prolog paradigm is clearly more extensible. The spreadsheet backend is a hard-coded list of notable moves; my database can look up any move. I still can’t really believe this query, which finds all the Special moves that Tornadus learns which are super-effective against any member of Justin’s team. Nothing like that exists in any tool that I know of—it’s the kind of thing I normally try to figure out by endlessly switching tabs. With the grammar established by my program, I put this together in like 30 seconds. I’m not interested in how structured programming is more extensible than spreadsheets, though. I already know why I don’t do all my programming in spreadsheets. A question I find very important is: What is it about this particular problem, and the kinds of people who were motivated to solve it, where the most well-maintained solution available is a spreadsheet? I believe there are a great many problems like that in the world, and a lot of improvements on that programming paradigm yet to be properly realized. Thanks to Morry Kolman for reading a draft of this blog . Some moves miss a certain percentage of the time, doing no damage. Some moves raise or lower a Pokémon's stats. Pokémon can hold items that have various effects. Damage calculations aren't constant; moves do normally-distributed damage within the calculated range. Pokémon can get frozen, burned, paralyzed, poisoned, or fall asleep; these all have various adverse effects. There are a variety of field effects (like weather, terrain, Trick Room) which alter move damage, turn order, and other things. Pokémon each have an ability that has various effects i.e Levitate makes you immune to ground moves, Drizzle turns the weather to Rain when the Pokemon switches in, Sheer Force disables a move's side effects but multiplies its damage by 1.3x. Players have points they (invisibly) allocate to each Pokémon before the game, to boost chosen stats. Depending on they built the team, each Pokemon might do more damage or take hits better than you were expecting. The challenge, if you want to build software for this game, is to model all that complexity without losing your mind. Prolog is stunningly good at this, for two main reasons: Take a look at the damage calculator to get an idea of what I mean. The query model excels at describing ad-hoc combinations. The data model is perfectly suited to layering rules in a consistent way. I joined the draft league in Season 3, lost in finals, then won Seasons 4 and 5. We just started Season 6. If you want it, you can have the crown . There are a number of coders in this draft league and I have gotten precisely zero of them to try out my Prolog program. That’s kind of the point though! It needs to be a website… The Prolog implementation I’m using is Scryer Prolog , a modern Prolog implementation that emphasizes standards and formal correctness. The creator, Markus Triska, has a terrific online book, “The Power of Prolog,” and accompanying YouTube channel that has soundtracked my breakfast for weeks. Scryer Prolog is also designed to encourage more constructs that preserve logical completeness and monotonicity , which means I’m not really supposed to use the or predicates. I couldn’t really figure out how to express what I wanted with the replacements offered, though. Happy to edit if anyone wants to help. Also, on Markus’ website : “My goal is to provide programs that work as intended, reliably and conveniently, with zero surprises. Programs that you can run for multiple decades without any issues such as crashes, resource leaks or other unexpected behaviour.” This guy and I have some similar interests! I did some fun metaprogrogramming to get all the data into Prolog predicates using the Pokémon Showdown NodeJS API. Yes, putting the accent on the “e” everywhere but the code blocks was very annoying.

0 views
Evan Schwartz 2 weeks ago

Scour Year End Update 2025

I thought about sending out a personalized "Scour Wrapped"... until I got the 7th Wrapped from some random service. So instead, I'll just say Happy New Year and thanks for your support in 2025! 🥂 These were the new features added since the last update in October. Scour now identifies articles that are paywalled and indicates them with a yellow dollar sign next to the domain. In your settings , you can opt to hide paywalled content. If you do, you can also exempt specific domains where you have a subscription so you will see their content even if it is behind the paywall. Thank you to Johnny and Allen for requesting this feature! For anyone interested in the technical details, I wrote a blog post about a neat SQL trick I came across while building this: Short-Circuiting Correlated Subqueries in SQLite . You can also now block content from specific websites. The option to block a domain can be found by clicking the "..." button below each post. You can see and manage your excluded domains in your settings . Thanks to Vahe for this suggestion! If you subscribe to specific feeds (as opposed to scouring all of them), Scour will now recommend other sources for you to follow right in your personalized feed. These recommendations are based on Scour looking for content that matches your interests that you aren't currently getting. You can find more recommendations on your Feeds page . Each feed also now displays its three most recent posts below its description to make it easier to know what you'll get if you subscribe. You can click on the feed's title to see all of the posts from that feed. Thanks to Tiago for this suggestion! By default, clicking on a link to a post will bring you to the original website where it was published. However, if you prefer to read it on Scour, you can read the Preview, which can be found in the "..." menu under each post. Thanks to Linh for this suggestion! The filter menu for your feed (accessible via the button next to where it says Your Top Finds) should be clearer and more mobile-friendly. You can filter by time range and toggle between seeing posts from feeds you’ve subscribed to or see posts from everyone’s feeds. Thanks Stefan for the feedback on this! A number of people have told me that they are confused about how the love/like/dislike reactions are used on Scour. I'll work on making this clearer in the future but in the meantime, there's now a section in the FAQs about this. The answer is: Loves and likes are saved to your Likes page, so you can use them to bookmark interesting content. Unlike most content aggregators, Scour does not use reactions to change what shows up in your feed. Instead, reactions are used to generate Interest Recommendations for you. Scour only shows content related to topics you've explicitly chosen. You can also subscribe to other users' Likes as feeds. Everyone's reactions contribute to the Popular Posts page. Here were some of my favorite posts I found on Scour in November and December: Thanks to everyone who wrote about Scour on their blog or website in 2025! This included: If you write about Scour in the future, or if you already did and I didn't include you, please let me know! Thank you to everyone who provided feedback on Scour this year! Specifically, thank you to Aaron, Alberto, Alex K, Alex W, Allen, Andrew D, Andrew M, Andy M, Andy P, Cairin, Cole, Daniel, Elyem, Hary, Imperfect, Jadi, Jeppe, Jesse, Johnny, Jon, Karit, Kilpatrj, Linh, Proudmuslim-dev, Ryan, Sarah, Stefan, Tiago, Tomáš, Tyler, and Vahe. And thank you to all of the anonymous feedback givers as well! Because you made it to the end of the post, here's a little preview of an upcoming feature for you. Let's say you want to only see posts from small websites, like individuals' blogs. You can now try filtering your feed by how many posts each website or feed publishes per month. For example, you can use these links to see only posts from quieter domains or quieter feeds . Or, you can try this one to only see articles from larger websites . Let me know what you think! UI for controlling these filters is coming soon! Happy New Year and happy Scouring! - Evan Scour scoured 9,940,460 posts from 15,608 feeds 1,013 new users signed up (welcome!!) 12,620 interests were added, with 6,688 of those from recommendations 26,702 posts were read, 3,023 were liked, and 383 were loved 55 suggestions on the feedback board were completed Paper AI Tigers Build / Buy / Bot More databases should be single-threaded Disks Lie: Building a WAL that actually survives Minsuk Kang: Scour and minifeed are 100X better than Instagram and X (January) Winther: Blog Discovery (June) Daniel Prindii: My Read it later and discoverability systems in 2025 (July) PPC Land: Developer revives RSS with AI while Google targets syndication infrastructure (August) Tomáš Burkert: RSS feeds discovery strategies (October) Alex White: Discovering the Indie Web (November) Matt Maldre: Search engine for blogs (November) Andrew Doran: Tools for discovering the IndieWeb (December)

0 views
Grumpy Gamer 2 weeks ago

Sqlite Comments

When I started using Hugu for static site generation I lost the ability to have comments and we all know now supportive the Internet can be, so why wouldn’t you have comments? I wrote a few php scripts that I added on to Hugo and I had comments again. I decided to store the comments as flat files so I didn’t complicate things by needing the bloated MySQL. I wanted to keep it as simple and fast as possible. When a comment is added, my PHP script created a directory (if needed) for the post and saves the comment out as a .json file with name as the current time to make sorting easy. When the blog page was displayed, these files (already sorted thanks to the filename) were loaded and displayed. And it all worked well until it didn’t. Flat files are simple. but they can be hard to search or maintain if they need cleaning up or dealt with after a spam attack. I figured I use commandline tools to do all of that, but it’s a lot more cumbersome than I first thought. I missed have them in a sql database. I didn’t want to install MySQL again, but my site doesn’t get a lot of commenting traffic so I could use Sqlite instead. The downside is Sqlite write-locks the database while a write is happening. In my case it’s a fraction of a second and wouldn’t be a issue. The second problem I had was the version of Ubuntu my server was using is 5 years old and some of the packages I wanted wouldn’t available for it. I tried to update Ubuntu and for reasons I don’t fully understand I couldn’t. So I spun up a new server. Since grumpygamer.com is a statics site I only had to install Apache and I was off and running. Fun times. But the comment flat files still bugged me and I thought I’d use this as an opportunity to convert over to Sqlite. PHP/Apache comes with Sqilte already installed, so that’s easy. A long weekend and I rewrote the code to save comments and everything is back and working. Given that a webserver and PHP already needed to be installed, it isn’t a big deal to use Sqlite. If you’re not comfortable with SQL, it might be harder but I like SQL.

0 views
Ludicity 2 weeks ago

Merry Christmas, Ya Filthy Animals (2025)

It’s my last day of writing for the year, so I’m going to try keep this one quick – it was knocked out over three hours, so I hope you can forgive me if it’s a bit clumsier than my usual writing. For some strange reason, one of the few clear memories I have from growing up in Malaysia is a particular moment when I was seven years old. It was the first day of school for the year, and I was studying at Sekolah Kebangsaan Batu Lanchang , which in English is “Batu Lanchang National School”. When you’re seven years old, being told that you’ve got to wait an hour to see your best friend is an insurmountable obstacle. It feels like forever . The year it would take to go up another grade is, accordingly, so long that it’s not even imaginable. I recall thinking, probably in simpler language, “I probably won’t make it to eight years old. A year is way too much time for something random to happen. I’ll get hit by a car or something.” To round out this brief moment of uncharacteristic sobriety, is it very likely that my next thought “Blastoise is obviously better Pokemon than Venusaur and Charizard because he has big cannons.” Now I’m 31 and the years are flying by so fast that I have to desperately seize their trailing collars so that I don’t suddenly find myself seventy without noticing. So, as is becoming tradition, what happened this year? But hell I'm just a blind man on the plains, I drink my water when it rains, And live by chance among the lightning strikes. – Burden of Tomorrow , Tallest Man on Earth For the first time, I have absolutely no idea what’s going to happen next year. Don’t get me wrong, I’ve never been right about what’s going to happen next year, but I’ve always thought I had an idea. I thought I was going to be a failing student in Southeast Asia because I hated mathematics, maybe become a journalist, then presumably a corpse with 2.5 children. If we run the tape back, I ended up in Australia, getting deeply into sabre fencing, and somehow became an extremely un-failed student in psychology and then statistics, before earning a bajillion dollars in software. Then I was locked in an apartment for a year by a global pandemic, somehow became at least a not-totally-unknown writer, threw away the bajillion dollars to the absolutely horror of my very conservative all-doctor family, got into improv theatre, rejected a book deal, and started a software consultancy. At the start of every year until now, I’ve had some sort of plausible social script for how the year was going to go, and it has never, ever gone that way, but I nonetheless allowed myself a fresh misconception on January 1st. This year, I really have no idea, and I’m not going to bother wasting any time trying to figure it out. Not a clue what’s going to happen next year. Maybe I’ll start a recruiting agency and make a million dollars. Maybe I’ll get to February and run into a crippling illness. It’s very freeing in a lot of ways, particularly for someone that got into the works of Taleb at a young age. I surf chaos full time now. For example, despite all the marketing work we did, our biggest contract this year happened because I got a message on LinkedIn about a data engineering job, which would normally have been totally unsuitable for a whole company to work on. Except over a year ago, I was asked to get coffee with Mel Kendell and Martin Foster , who allowed me to give a talk at a Meetup where Dan Prager and Martin Chesbrough 1 , and they both worked with the messenger. I scarcely bother to plan anymore. I’ll either earn a million dollars or become homeless next year. I don’t know, I don’t care, bring on the lightning strikes. ’…I have been "in denial" for some time, knowingly burning the candle at both ends and finding that it often gives a lovely light. But for precisely this reason, I can't see myself smiting my brow with shock or hear myself whining about how it's all so unfair: I have been taunting the Reaper into taking a free scythe in my direction and have now succumbed to something so predictable and banal that it bores even me.’ – Christopher Hitchens I’m at the age now where the older people in my life are starting to get sick. Me and my friends find ourselves in hospital lobbies more often – there are emergency flights, sterile hospital waiting rooms, and trying to figure out what it means when a doctor says that someone is “stable”. Relax, I’m not going to be a downer. I know I don’t have to explain how it goes. And if I do, you’re probably twenty, in which case hoo boy, you have got some experiences headed your way. What I’m getting at is that I’m coming to terms with is the fact that we’re all running out the clock in one way or another. We know, to some level of precision, how the story goes. The general term for this is mortality salience , i.e, the realization that there’s a hospital bed or worse at the end of the rainbow, and we’ve got to make do with the time we’ve got. Sometimes it can be a little bit confronting, but it clarifies things too. There’s a story from David Whyte that I absolutely adore, about a conversation he had with his best friend before said friend passed 2 . We were towards the end of our meal on the Saturday evening, and I was in a kind of reverie. I was realizing that I needed to help my father out. Almost to the ceiling, I spoke out: “You know, my dad’s in a bit of trouble. I’m thinking of giving him some money.” John immediately leaned across the table and said, “How much are you thinking of giving him?” My father was in Yorkshire, in England. “I dunno — one thousand pounds,” thinking I was being very generous. John looked at me and said, “Go against yourself. Give two.” I said, “Thank you, John” (laughing). “A friend in need is a friend indeed.” Then John looked at me again and said, “Go against yourself again. Give four .” I took a big gulp. Part of the spirit of our meetings — these philosophical writing meetings, walking meetings — was to push each other. In the spirit of that, I said, “I will,” and we shook hands across the table. John made sure we were committed. Sure enough, I went away and gave the four thousand to my father, and you could have knocked him down with a feather. I was often giving him money, but in drips and drabs. He was always falling down financial holes and having to climb out again, and I’d have to help him climb out. But this four thousand actually transformed his financial life, because he was able to sort himself out — and he never fell down the hole again. I said to myself, “Wasn’t that a great thing for a friend to do?” One of the qualities that lies at the heart of friendship is encouraging your friend to be the best part of themselves — to be more generous — and to be a witness to that. John had done that for me, and I thought, “Wasn’t that a marvelous thing for a friend to do?” It wasn’t nine months later that we were at dinner again. John had obviously forgotten about this conversation, because towards the end of the meal he looked towards the ceiling and said, “You know, I have a good friend in a bit of financial trouble, and I’m thinking of giving him some money.” I said, “How much are you thinking of giving, John?” He shook his head. “I dunno — one thousand euros.” He was in Ireland. I looked at him and said, “Go against yourself, John. Give two.” John looked back across the table and said, “Jesus, Holy and Saint Mary, Joseph — tonight I’m in this for four.” I’ve done this a few times this year — that is, quadrupling the money I've sent someone that needed it — which may prove to be unwise in 2026 if our revenue dries up, but for now I don’t have any regrets. I was raised up believing I was somehow unique Like a snowflake distinct among snowflakes Unique in each way you'd conceive And now after some thinking I'd say I'd rather be A functioning cog in some great machinery Serving something beyond me – Helplnessness Blues , Fleet Foxes Two years ago, I was adamant that I wanted to “make it alone” when I started a business, largely because I wanted to lay out a blueprint for anyone to succeed, not just people with well-known blogs. This has quickly turned out to be utterly ridiculous. No one makes it alone. I’ve been on the receiving end of a huge amount of generosity for the past two years, and it would be ridiculous to pretend that isn’t the case, or that it’s even possible to succeed without that being the case. I was briefly tempted to start listing all the people that have helped me out this year, from helping me keep sharp on my software skills, to editing help, to preventing me from making horrible contractual blunders, but I realized that it would take me literal hours even if all I was doing was writing down their names and pasting links to their websites in. I’m generally pretty good at accepting help, but a big lesson from this year is to lean into it entirely. It’s going to be insufferable. Every other post is going to be banging on your doors demanding help with sales, obscure programming questions, and book recommendations. I've been holding back a lot of my thoughts on things because I was experiencing the first-world problem of being self-conscious about having too much writing go viral. I'm not going to worry about this next year, and oh boy, have I got thoughts on things . I don't like being told that it's my duty to love my enemies. No, we have to hate our enemies and try to destroy them before they destroy us. – Christopher Hitchens Next year, I’m going to try and put my enemies in the dirt. Earlier this year, we had a very unpleasant run-in with a competing consultancy in Melbourne. They have far more staff than us, but were running years late on their deliverables, were putting small, greenfield clinics on SSIS in 2025 3 , and had a contract that said they owned all the SQL in client systems so the clients could never migrate away. When it happened, I really wanted to stick the knife in. It was very much everything I’m opposed to in the industry – at best, incompetents, at worse, grifters, taking advantage of medical institutions. With a bit of distance, I’ve realized that the people there didn’t even know they were doing a bad job. As far as they were concerned, SSIS is state-of-the-art, and the fact that they didn’t have to learn how to code was pure upside, and every project they had ever been on was late so they weren’t being particularly ineffective. Sure, I’ve run into people with actual monstrous views about making money – an executive told me on Christmas Eve that there’s no room for ethics in a business 4 – but my enemy is generally not individual people. It’s the ideas and systems that create people with worldviews so comprehensively myopic. I am probably not going to be able to destroy them by taking all of their business in one swoop – it’s hard to compete with people that will lie for sales, advertently or not. Nor will I be able to have an impact if I do what I’ve been doing this year – paying my team a good wage, with no intention of ever growing. So what is there to do, if I’m not happy just giving myself a ton of money and watching the world slowly erode? Going into the next year, I want to grow our team until we have enough leverage to make hires, develop our own philosophy of engineering, and lay out a blueprint for how to run an ethical, human-oriented business for other people to follow. There is some size where every business turns evil – even mine would turn evil, if we got large enough to become acquired and our founding team quit – so all I can think of is to lay out the playbook for your peers to kill you during the full moon, when you’re selling Azure consulting and GenAI SEO platforms. What does that actually mean? I have no idea, we’ll figure it out somehow. All I know right now is that the goal is to make sure that everyone on my team is compensated around their corporate salaries by the end of 2026, that we’re in a position to comfortably support a few good people who need a good place to work by 2027, and that we have enough about the process documented that anyone with a bit of fearlessness can replicate our process. Then I’m going to stick in the knife and take all their business. I have had it up to here just having to watch Musk-and-Altman-types flounce around, lying and absolutely fucking everything up, and if I need to start obtaining a huge pile of money to engage them in mortal combat on the astral plane, then fine , someone needs to get on this. The theme for next year is generosity and preparation for economic damage. In any case, I hope you all had a great Christmas, are headed into a great new year, and that you also decide to choose violence in 2026. I’ve been talking to Martin for over a year, and I swear to God, he has told me that he’s the “Principal Intern” at Everest Engineering every single time. I have no idea what he does when though my team is literally subcontracted through Everest and I am what passes for our CEO.  ↩ I had to spend ages tracking this down and transcribing it by hand – I think it’s the only full transcription of the story on the internet – so you’d better enjoy it, okay? This is, genuinely, a Christmas present from me to you. I paid a gross amount of money to even access the audio.  ↩ For the non-data-engineers in the audience, all the specialists who read this just visibly winced.  ↩ This is true if you’re incompetent and have no leverage. Sucks to suck.  ↩ I’ve been talking to Martin for over a year, and I swear to God, he has told me that he’s the “Principal Intern” at Everest Engineering every single time. I have no idea what he does when though my team is literally subcontracted through Everest and I am what passes for our CEO.  ↩ I had to spend ages tracking this down and transcribing it by hand – I think it’s the only full transcription of the story on the internet – so you’d better enjoy it, okay? This is, genuinely, a Christmas present from me to you. I paid a gross amount of money to even access the audio.  ↩ For the non-data-engineers in the audience, all the specialists who read this just visibly winced.  ↩ This is true if you’re incompetent and have no leverage. Sucks to suck.  ↩

0 views
Dan Moore! 3 weeks ago

What New Developers Need to Know About Working with AI

It’s been a few years since I wrote Letters to a New Developer , about what I wish I’d known when I was starting out. The industry has changed with the advent and acceleration of generative AI and the implications of these tools on coding and software creation. So I wanted to write a quick update to give advice to developers who are entering this world with AI. It’s important to understand what developers actually do. They are not responsible for writing code. They are not responsible for liking technology. Like other employees, developers are responsible for  taking their particular skill set and using it to solve problems that a business or organization needs solved. Whether that’s a one-person shop organizing their customers’ data or a large organization like Walmart, Amazon or the US military trying to improve their logistics, there are goals to achieve. For a developer building, maintaining and improving software is the main means to achieve those goals. This does not change in the world of AI. The role of a developer is still to understand technology, how it can be applied, where its limits are and to build it with the quality and appropriate flexibility for the business situation. What do I mean by the last part? If you’re building a script to transfer data from one system to another one time, then a throwaway script that doesn’t have error checking, that doesn’t have proper variable names, that doesn’t have abstraction is appropriate. If, on the other hand, you’re creating foundational architectural components of a long-lived system, you need to think about all the things that make software more maintainable . In either case as a developer your role is clear. It’s not to code the software. It’s to take the business requirements, understand the domain and build a solution that meets the business’s requirements for a given level of flexibility, complexity and completeness. That job doesn’t change whether you’re using: As a dev, your job is to understand the technical trade-offs, use the right tools and meet the business or organization’s needs. Now as a new developer, how do you learn to leverage genAI in a way that is going to help your career rather than hurt it? It’s tough out there to get your job as a new dev and ignoring AI is going to make it even tougher. It’s important that you learn how to use this tool and use it well. But AI as a tool is much more like Google search results than it is like a compiler error. A compiler error is deterministic and will give you the same message each time you compile the code. The output of an LLM is not deterministic, just as when you search for guidance for building software on Stack Overflow or your team. With these sources of knowledge, you as a developer need to learn judgment. You need to learn when to trust genAI and when not to trust it. Do this by starting small, asking for links, and checking the output of an AI against other sources. These include other AIs and published documentation. You’re building your sense of judgment and intuition about the system you are improving. Use it to augment your understanding, not replace it . When an AI hallucinates, don’t throw the baby out with the bathwater and never touch genAI again. Instead learn to sniff out when an AI is generating garbage and when it is generating helpful code that will accelerate things. A good course of action is to use AI to generate easily verifiable code where errors are low impact. An example is writing tests with AI, especially unit tests, especially in a statically typed language. It’s very easy to tell if the tests that are written are working or not. Don’t forget to instruct the AI to fully flesh out the tests, you don’t want any “return true” nonsense. Another example is read-only queries. If you have an understanding of the data you can verify whether or not the SQL the LLM creates gives you the correct answer. Write multiple queries because they are so low effort, and use them to double check answers. If you were looking for a count of a particular set of customers, ask it for multiple different ways, including a count of one particular kind of customer and a count of all customers grouped by type. This lets you see if things match up. The goal is not trusting blindly but instead using the tool to accelerate delivery of solutions to the problems the business wants you to solve. But you want to do so in a way that is going to give you confidence that the solutions you deliver are real. By the way, the value of such intuition and judgement is high for all developers. I think that it’s even more valuable for newer developers. If you would like to purchase my book, “Letters To a New Developer” for more of this kind of insight, there’s a sale going on right now through the end of the year. You can use this link to buy my book for just $24 . (This is a better deal than I get with my author’s discount.) machine code assembly a non-memory managed language like C a memory managed language like Java or spec-driven development.

1 views
Armin Ronacher 3 weeks ago

A Year Of Vibes

2025 draws to a close and it’s been quite a year. Around this time last year, I wrote a post that reflected on my life . Had I written about programming, it might have aged badly, as 2025 has been a year like no other for my profession. 2025 was the year of changes. Not only did I leave Sentry and start my new company, it was also the year I stopped programming the way I did before. In June I finally felt confident enough to share that my way of working was different: Where I used to spend most of my time in Cursor, I now mostly use Claude Code, almost entirely hands-off. […] If you would have told me even just six months ago that I’d prefer being an engineering lead to a virtual programmer intern over hitting the keys myself, I would not have believed it. While I set out last year wanting to write more, that desire had nothing to do with agentic coding. Yet I published 36 posts — almost 18% of all posts on this blog since 2007. I also had around a hundred conversations with programmers, founders, and others about AI because I was fired up with curiosity after falling into the agent rabbit hole. 2025 was also a not so great year for the world. To make my peace with it, I started a separate blog to separate out my thoughts from here. It started with a growing obsession with Claude Code in April or May, resulting in months of building my own agents and using others’. Social media exploded with opinions on AI: some good, some bad. Now I feel I have found a new stable status quo for how I reason about where we are and where we are going. I’m doubling down on code generation, file systems, programmatic tool invocation via an interpreter glue, and skill-based learning. Basically: what Claude Code innovated is still state of the art for me. That has worked very well over the last few months, and seeing foundation model providers double down on skills reinforces my belief in this approach. I’m still perplexed by how TUIs made such a strong comeback. At the moment I’m using Amp , Claude Code , and Pi , all from the command line. Amp feels like the Apple or Porsche of agentic coding tools, Claude Code is the affordable Volkswagen, and Pi is the Hacker’s Open Source choice for me. They all feel like projects built by people who, like me, use them to an unhealthy degree to build their own products, but with different trade-offs. I continue to be blown away by what LLMs paired with tool execution can do. At the beginning of the year I mostly used them for code generation, but now a big number of my agentic uses are day-to-day things. I’m sure we will see some exciting pushes towards consumer products in 2026. LLMs are now helping me with organizing my life, and I expect that to grow further. Because LLMs now not only help me program, I’m starting to rethink my relationship to those machines. I increasingly find it harder not to create parasocial bonds with some of the tools I use. I find this odd and discomforting. Most agents we use today do not have much of a memory and have little personality but it’s easy to build yourself one that does. An LLM with memory is an experience that is hard to shake off. It’s both fascinating and questionable. I have tried to train myself for two years, to think of these models as mere token tumblers, but that reductive view does not work for me any longer. These systems we now create have human tendencies, but elevating them to a human level would be a mistake. I increasingly take issue with calling these machines “agents,” yet I have no better word for it. I take issue with “agent” as a term because agency and responsibility should remain with humans. Whatever they are becoming, they can trigger emotional responses in us that can be detrimental if we are not careful. Our inability to properly name and place these creations in relation to us is a challenge I believe we need to solve. Because of all this unintentional anthropomorphization, I’m really struggling at times to find the right words for how I’m working with these machines. I know that this is not just me; it’s others too. It creates even more discomfort when working with people who currently reject these systems outright. One of the most common comments I read in response to agentic coding tool articles is this rejection of giving the machine personality. An unexpected aspect of using AI so much is that we talk far more about vibes than anything else. This way of working is less than a year old, yet it challenges half a century of software engineering experience. So there are many opinions, and it’s hard to say which will stand the test of time. I found a lot of conventional wisdom I don’t agree with, but I have nothing to back up my opinions. How would I? I quite vocally shared my lack of success with MCP throughout the year, but I had little to back it up beyond “does not work for me.” Others swore by it. Similar with model selection. Peter , who got me hooked on Claude early in the year, moved to Codex and is happy with it. I don’t enjoy that experience nearly as much, though I started using it more. I have nothing beyond vibes to back up my preference for Claude. It’s also important to know that some of the vibes come with intentional signalling. Plenty of people whose views you can find online have a financial interest in one product over another, for instance because they are investors in it or they are paid influencers. They might have become investors because they liked the product, but it’s also possible that their views are affected and shaped by that relationship. Pick up a library from any AI company today and you’ll notice they’re built with Stainless or Fern. The docs use Mintlify, the site’s authentication system might be Clerk. Companies now sell services you would have built yourself previously. This increase in outsourcing of core services to companies specializing in it meant that the bar for some aspects of the user experience has risen. But with our newfound power from agentic coding tools, you can build much of this yourself. I had Claude build me an SDK generator for Python and TypeScript — partly out of curiosity, partly because it felt easy enough. As you might know, I’m a proponent of simple code and building it yourself . This makes me somewhat optimistic that AI has the potential to encourage building on fewer dependencies. At the same time, it’s not clear to me that we’re moving that way given the current trends of outsourcing everything. This brings me not to predictions but to wishes for where we could put our energy next. I don’t really know what I’m looking for here, but I want to point at my pain points and give some context and food for thought. My biggest unexpected finding: we’re hitting limits of traditional tools for sharing code. The pull request model on GitHub doesn’t carry enough information to review AI generated code properly — I wish I could see the prompts that led to changes. It’s not just GitHub, it’s also git that is lacking. With agentic coding, part of what makes the models work today is knowing the mistakes. If you steer it back to an earlier state, you want the tool to remember what went wrong. There is, for lack of a better word, value in failures. As humans we might also benefit from knowing the paths that did not lead us anywhere, but for machines this is critical information. You notice this when you are trying to compress the conversation history. Discarding the paths that led you astray means that the model will try the same mistakes again. Some agentic coding tools have begun spinning up worktrees or creating checkpoints in git for restore, in-conversation branch and undo features. There’s room for UX innovation that could make these tools easier to work with. This is probably why we’re seeing discussions about stacked diffs and alternative version control systems like Jujutsu . Will this change GitHub or will it create space for some new competition? I hope so. I increasingly want to better understand genuine human input and tell it apart from machine output. I want to see the prompts and the attempts that failed along the way. And then somehow I want to squash and compress it all on merge, but with a way to retrieve the full history if needed. This is related to the version control piece: current code review tools assign strict role definitions that just don’t work with AI. Take the GitHub code review UI: I regularly want to use comments on the PR view to leave notes for my own agents, but there is no guided way to do that. The review interface refuses to let me review my own code, I can only comment, but that does not have quite the same intention. There is also the problem that an increased amount of code review now happens between me and my agents locally. For instance, the Codex code review feature on GitHub stopped working for me because it can only be bound to one organization at a time. So I now use Codex on the command line to do reviews, but that means a whole part of my iteration cycles is invisible to other engineers on the team. That doesn’t work for me. Code review to me feels like it needs to become part of the VCS. I also believe that observability is up for grabs again. We now have both the need and opportunity to take advantage of it on a whole new level. Most people were not in a position where they could build their own eBPF programs, but LLMs can. Likewise, many observability tools shied away from SQL because of its complexity, but LLMs are better at it than any proprietary query language. They can write queries, they can grep, they can map-reduce, they remote-control LLDB. Anything that has some structure and text is suddenly fertile ground for agentic coding tools to succeed. I don’t know what the observability of the future looks like, but my strong hunch is that we will see plenty of innovation here. The better the feedback loop to the machine, the better the results. I’m not even sure what I’m asking for here, but I think that one of the challenges in the past was that many cool ideas for better observability — specifically dynamic reconfiguration of services for more targeted filtering — were user-unfriendly because they were complex and hard to use. But now those might be the right solutions in light of LLMs because of their increased capabilities for doing this grunt work. For instance Python 3.14 landed an external debugger interface which is an amazing capability for an agentic coding tool. This may be a little more controversial, but what I haven’t managed this year is to give in to the machine. I still treat it like regular software engineering and review a lot. I also recognize that an increasing number of people are not working with this model of engineering but instead completely given in to the machine. As crazy as that sounds, I have seen some people be quite successful with this. I don’t yet know how to reason about this, but it is clear to me that even though code is being generated in the end, the way of working in that new world is very different from the world that I’m comfortable with. And my suspicion is that because that world is here to stay, we might need some new social contracts to separate these out. The most obvious version of this is the increased amount of these types of contributions to Open Source projects, which are quite frankly an insult to anyone who is not working in that model. I find reading such pull requests quite rage-inducing. Personally, I’ve tried to attack this problem with contribution guidelines and pull request templates. But this seems a little like a fight against windmills. This might be something where the solution will not come from changing what we’re doing. Instead, it might come from vocal people who are also pro-AI engineering speaking out on what good behavior in an agentic codebase looks like. And it is not just to throw up unreviewed code and then have another person figure the shit out.

0 views
Evan Schwartz 4 weeks ago

Short-Circuiting Correlated Subqueries in SQLite

I recently added domain exclusion lists and paywalled content filtering to Scour . This blog post describes a small but useful SQL(ite) query optimization I came across between the first and final drafts of these features: using an uncorrelated scalar subquery to skip a correlated subquery (if you don't know what that means, I'll explain it below). Scour searches noisy sources for content related to users' interests. At the time of writing, it ingests between 1 and 3 million pieces of content from over 15,000 sources each month. For better and for worse, Scour does ranking on the fly, so the performance of the ranking database query directly translates to page load time. The main SQL query Scour uses for ranking applies a number of filters and streams the item embeddings through the application code for scoring. Scour uses brute force search rather than a vector database, which works well enough for now because of three factors: A simplified version of the query looks something like: The query plan shows that this makes good use of indexes: To add user-specified domain blocklists, I created the table and added this filter clause to the main ranking query: The domain exclusion table uses as a primary key, so the lookup is efficient. However, this lookup is done for every row returned from the first part of the query. This is a correlated subquery : A problem with the way we just added this feature is that most users don't exclude any domains, but we've added a check that is run for every row anyway. To speed up the queries for users who aren't using the feature, we could first check the user's settings and then dynamically build the query. But we don't have to, because we can accomplish the same effect within one static query. We can change our domain exclusion filter to first check whether the user has any excluded domains: Since the short-circuits, if the first returns (when the user has no excluded domains), SQLite never evaluates the correlated subquery at all. The first clause does not reference any column in , so SQLite can evaluate it once and reuse the boolean result for all of the rows. This "uncorrelated scalar subquery" is extremely cheap to evaluate and, when it returns , lets us short-circuit and skip the more expensive correlated subquery that checks each item's domain against the exclusion list. Here is the query plan for this updated query. Note how the second subquery says , whereas the third one is a . The latter is the per-row check, but it can be skipped by the second subquery. To test the performance of each of these queries, I replaced the with and used a simple bash script to invoke the binary 100 times for each query on my laptop. Starting up the process each time adds overhead, but we're comparing relative differences. At the time of this benchmark, the last week had 235,975 items, 144,229 of which were in English. The two example users I tested this for below only look for English content. This test represents most users, who have not configured any excluded domains: This shows that the short-circuit query adds practically no overhead for users without excluded domains, whereas the correlated subquery alone makes queries 17% slower for these users. This test uses an example user that has excluded content from 2 domains: In this case, we do need to check each row against the domain filter. But this shows that the short-circuit still adds no overhead on top of the query. When using SQL subqueries to filter down result sets, it's worth thinking about whether each subquery is really needed for most users or most queries. If the check is needed most of the time, this approach won't help. However if the per-row check isn't always needed, using an uncorrelated scalar subquery to short-circuit a condition can dramatically speed up the average case with practically zero overhead. This is extra important because the slow-down from each additional subquery compounds. In this blog post, I described and benchmarked a single additional filter. But this is only one of multiple subquery filters. Earlier, I also mentioned that users had asked for a way to filter out paywalled content. This works similarly to filtering out content from excluded domains. Some users opt-in to hiding paywalled content. For those users, we check if each item is paywalled. If so, we check if it comes from a site the user has specifically allowed paywalled content from (because they have a subscription). I used the same uncorrelated subquery approach to first check if the feature is enabled for the user and, only then, does SQLite need to check each row. Concretely, the paywalled content filter subquery looks like: In short, a trivial uncorrelated scalar subquery can help us short-circuit and avoid a more expensive per-row check when we don't need it. There are multiple ways to exclude rows from an SQL query. Here are the results from the same benchmark I ran above, but with two other ways of checking for whether an item comes from an excluded domain. The version of the query uses the subquery: The variation joins with and then checks for : And here are the full benchmarks: For users without excluded domains, we can see that the query using the short-circuit wins and adds no overhead. For users who do have excluded domains, the is faster than the version. However, this version raises the exact problem this whole blog post is designed to address. Since joins happen no matter what, we cannot use the short-circuit to avoid the overhead for users without excluded domains. At least for now, this is why I've gone with the subquery using the short-circuit. Discuss on Hacker News , Lobsters , r/programming , r/sqlite . Scour uses SQLite, so the data is colocated with the application code. It uses binary-quantized vector embeddings with Hamming Distance comparisons, which only take ~5 nanoseconds each . We care most about recent posts so we can significantly narrow the search set by publish date.

0 views
Marc Brooker 1 months ago

What Does a Database for SSDs Look Like?

Maybe not what you think. Over on X, Ben Dicken asked : What does a relational database designed specifically for local SSDs look like? Postgres, MySQL, SQLite and many others were invented in the 90s and 00s, the era of spinning disks. A local NVMe SSD has ~1000x improvement in both throughput and latency. Design decisions like write-ahead logs, large page sizes, and buffering table writes in bulk were built around disks where I/O was SLOW, and where sequential I/O was order(s)-of-magnitude faster than random. If we had to throw these databases away and begin from scratch in 2025, what would change and what would remain? How might we tackle this question quantitatively for the modern transaction-orientated database? Approach One: The Five Minute Rule Perhaps my single favorite systems paper, The 5 Minute Rule… by Jim Gray and Franco Putzolu gives us a very simple way to answer one of the most important questions in systems: how big should caches be? The five minute rule is that, back in 1986, if you expected to read a page again within five minutes you should keep in in RAM. If not, you should keep it on disk. The basic logic is that you look at the page that’s least likely to be re-used. If it’s cheaper to keep around until it’s next expected re-use, then you should keep more. If it’s cheaper to reload from storage than keep around, then you should keep less 1 . Let’s update the numbers for 2025, assuming that pages are around 32kB 2 (this becomes important later). The EC2 delivers about 1.8 million read iops of this size, at a price of around $0.004576 per second, or \(10^{-9}\) dollars per transfer (assuming we’re allocating about 40% of the instance price to storage). About one dollar per billion reads. It also has enough RAM for about 50 million pages of this size, costing around \(3 \times 10^{-11}\) dollars to storage a page for one second. So, on this instance type, we should size our RAM cache to store pages for about 30 seconds. Not too different from Gray and Putzolu’s result 40 years ago! That’s answer number one: the database should have a cache sized so that the hot set contains pages expected to be accessed in the next 30 seconds, for optimal cost. For optimal latency, however, the cache may want to be considerably bigger. Approach Two: The Throughput/IOPS Breakeven Point The next question is what size accesses we want to send to our storage devices to take best advantage of their performance. In the days of spinning media, the answer to this was surprisingly big: a 100MB/s disk could generally do around 100 seeks a second, so if your transfers were less than around 1MB you were walking away from throughput. Give or take a factor of 2. What does it look like for modern SSDs? SSDs are much faster on both throughput and iops. They’re less sensitive than spinning drives to workload patterns, but read/write ratios and the fullness of the drives still matter. Absent benchmarking on the actual hardware with the real workload, my rule of thumb is that SSDs are throughput limited for transfers bigger than 32kB, and iops limited for transfers smaller than 32kB. Making transfers bigger than 32kB doesn’t help throughput much, reduces IOPS, and probably makes the cache less effective because of false sharing and related effects. This is especially important for workloads with poor spatial locality . So that’s answer number two: we want our transfers to disk not to be much smaller than 32kB on average, or we’re walking away from throughput. Approach Three: Durability and Replication Building reads on local SSDs is great: tons of throughput, tons of iops. Writes on local SSDs, on the other hand, have the distinct problem of only being durable on the local box, which is unacceptable for most workloads. Modern hardware is very reliable, but thinking through the business risks of losing data on failover isn’t very fun at all, so let’s assume that our modern database is going to replicate off-box, making at least one more synchronous copy. Ideally in a different availability zone (AZ). That we were using for our comparison earlier has 100Gb/s (or around 12GB/s) of network bandwidth. That puts a cap on how much write throughput we can have for a single-leader database. Cross-AZ latency in EC2 varies from a couple hundred microseconds to a millisecond or two, which puts a minimum on our commit latency. That gives us answer number three: we want to incur cross-AZ latency only at commit time, and not during writes. Which is where we run into one of my favorite topics: isolation. The I in ACID . A modern database design will avoid read-time coordination using multiversioning, but to offer isolation stronger than will need to coordinate either on each write or at commit time. It can do that like, say, Aurora Postgres does, having a single leader at a time running in a single AZ. This means great latency for clients in that zone, and higher latency for clients in different AZs. Given that most applications are hosted in multiple AZs, this can add up for latency-sensitive applications which makes a lot of round trips to the database. The alternative approach is the one Aurora DSQL takes, doing the cross-AZ round trip only at time, saving round-trips. Here’s me talking about the shape of that trade-off at re:Invent this year: There’s no clear answer here, because there are real trade-offs between the two approaches. But do make sure to ask your database vendor whether those impressive latency benchmarks are running where you application actually runs. In the spirit of the original question, though, the incredible bandwidth and latency availability in modern datacenter networks is as transformative as SSDs in database designs. Or should be. While we’re incurring the latency cost of synchronous replication, we may as well get strongly consistent scale-out reads for free. In DSQL, we do this using high-quality hardware clocks that you can use too . Another nice win from modern hardware. There are other approaches too. That’s answer number four for me: The modern database uses high-quality clocks and knowledge of actual application architectures to optimize for real-world performance (like latency in multiple availability zones or regions) without compromising on strong consistency. Approach Four: What about that WAL? Design decisions like write-ahead logs, large page sizes, and buffering table writes in bulk were built around disks where I/O was SLOW, and where sequential I/O was order(s)-of-magnitude faster than random. WALs, and related low-level logging details, are critical for database systems that care deeply about durability on a single system. But the modern database isn’t like that: it doesn’t depend on commit-to-disk on a single system for its durability story. Commit-to-disk on a single system is both unnecessary (because we can replicate across storage on multiple systems) and inadequate (because we don’t want to lose writes even if a single system fails). That’s answer number five: the modern database commits transactions to a distributed log, which provides multi-machine multi-AZ durability, and might provide other services like atomicity. Recovery is a replay from the distributed log, on any one of a number of peer replicas. What About Data Structures? B-Trees versus LSM-trees vs B-Tree variants versus LSM variants versus other data structures are trade-offs that have a lot to do with access patterns and workload patterns. Picking a winner would be a whole series of blog posts, so I’m going to chicken out and say its complicated . If we had to throw these databases away and begin from scratch in 2025, what would change and what would remain? I’d keep the relational model, atomicity, isolation (but would probably pick as a default), strong consistency, SQL, interactive transactions, and the other core design decisions of relational databases. But I’d move durability, read and write scale, and high availability into being distributed rather than single system concerns. I think that helps with performance and cost, while making these properties easier to achieve. I’d mostly toss out local durability and recovery, and all the huge history of optimizations and data structures around that 3 , in favor of getting better properties in the distributed setting. I’d pay more attention to internal strong isolation (in the security sense) between clients and workloads. I’d size caches for a working set of between 30 seconds and 5 minutes of accesses. I’d optimize for read transfers around that 32kB sweet spot from local SSD, and the around 8kB sweet spot for networks. Probably more stuff too, but this is long enough as-is. Other topics worth covering include avoiding copies on IO, co-design with virtualization (e.g. see our Aurora Serverless paper ), trade-offs of batching, how the relative performance of different isolation levels changes, what promises to give clients, encryption and authorization of data at rest and in motion, dealing with very hot single items, new workloads like vector, verifiable replication journals, handing off changes to analytics systems, access control, multi-tenancy, forking and merging, and even locales. The reasoning is slightly smarter, thinking about the marginal page and marginal cost of memory, but this simplification works for our purposes here. The marginal cost of memory is particularly interesting in a provisioned system, because it varies between zero (you’ve paid for it already) and huge (you need a bigger instance size). One of the really nice things about serverless (like DSQL) and dynamic scaling (like Aurora Serverless) is that it makes the marginal cost constant, greatly simplifying the task of reasoning about cache size. Yes, I know that pages are typically 4kB or 2MB, but bear with me here. Sorry ARIES .

0 views
Blog System/5 1 months ago

From Azure Functions to FreeBSD

On Thanksgiving morning, I woke up to one of my web services being unavailable. All HTTP requests failed with a “503 Service unavailable” error. I logged into the console, saw a simplistic “Runtime version: Error” message, and was not able to diagnose the problem. I did not spend a lot of time trying to figure the issue out and I didn’t even want to contact the support black hole. Because… there was something else hidden behind an innocent little yellow warning at the top of the dashboard: Migrate your app to Flex Consumption as Linux Consumption will reach EOL on September 30 2028 and will no longer be supported. I had known for a few weeks now, while trying to set up a new app, that all of my Azure Functions apps were on death row. The free plan I was using was going to be decommissioned and the alternatives I tried didn’t seem to support custom handlers written in Rust. I still had three years to deal with this, but hitting a showstopper error pushed me to take action. All of my web services are now hosted by the FreeBSD server in my garage with just a few tweaks to their codebase. This is their migration story. Blog System/5 and the open source projects described below are all made in my limited free time. Subscribe now to show your support; it goes a long way! Back in 2021, I had been developing my EndBASIC language for over a year and I wanted to create a file sharing service for it. Part of this was to satisfy my users, but another part was to force myself into the web services world as I felt “behind”. At that time, I had also been at Microsoft for a few months already working on Azure Storage. One of the perks of the job was something like $300 of yearly credit to deploy stuff on Azure for learning purposes. It was only “natural” that I’d pick Azure for what I wanted to do with EndBASIC. Now… $300 can be plentiful for a simple app, but it can also be paltry. Running a dedicated VM would eat through this in a couple of months, but the serverless model offered by Azure Functions with its “infinite” free tier would go a long way. I looked at their online documentation, found a very good guide on how to deploy Rust-native functions onto a Linux runtime , and… I was sold. I quickly got a bare bones service up and running on Azure Functions and I built it up from there. Based on these foundations, I later developed a separate service for my own site analytics (poorly named EndTRACKER ), and I recently started working on a new service to provide secure auto-unlock of encrypted ZFS volumes (stay tuned!). And, for the most part, the experience with Azure has been neat. I learned a bunch and I got to a point where I had set up “push on green” via GitHub Actions and dual staging vs. prod deployments. The apps ran completely on their own for the last three years, a testament to the stability of the platform and to the value of designing for testability . Until now that is. Compute-wise, I was set: Azure Functions worked fine as the runtime for my apps’ logic and it cost pennies to run, so the $300 was almost untouched. But web services aren’t made of compute alone: they need to store data, which means they need a database. My initial research in 2021 concluded that the only option for a database instance with a free plan was to go with, no surprise, serverless Microsoft SQL Server (MSSQL). I had never used Microsoft’s offering but it couldn’t be that different from PostgreSQL or MySQL, could it? Maybe so, but I didn’t get very far in that line of research. The very first blocker I hit was that the MSSQL connection required TLS and this hadn’t been implemented in the connector I chose to use for my Rust-based functions. I wasted two weeks implementing TLS support in (see PR #1200 and PR #1203 ) and got it to work, but that code was not accepted upstream because it conflicted with their business strategy. Needless to say, this was disappointing because getting that to work was a frigging nightmare. In any case, once I passed that point, I started discovering more missing features and bugs in the MSSQL connector, and then I also found some really weird surprises in MSSQL’s dialect of SQL. TL;DR, this turned into a dead end. On the left, the default instance and cost selected by Azure when choosing to create a managed PostgreSQL server today. On the right, minimum possible cost after dialing down CPU, RAM, disk, and availability requirements. I had no choice other than to provision a full PostgreSQL server on Azure. Their onboarding wizard tried to push me towards a pretty beefy and redundant instance that would cost over $600 per month when all I needed was the lowest machine you could get for the amount of traffic I expected. Those options were hidden under a “for development only” panel and riddled with warnings about no redundancy, but after I dialed all the settings down and accepted the “serious risks”, I was left with an instance that’d cost $15 per month or so. This fit well well within the free yearly credit I had access to, so that was it. About two months ago, I started working on a new service to securely auto-unlock ZFS encrypted volumes (more details coming). For this, I had to create a new Azure Functions deployment… and I started seeing the writing on the wall. I don’t remember the exact details, but it was really difficult to get the creation wizard to provision me the same flex plan I had used for my other services, and it was warning me that the selected plan was going to be axed in 2028. At the time of this writing, 2028 is still three years out and this warning was for a new service I was creating. I didn’t want to consider migrating neither EndBASIC nor EndTRACKER to something else just yet. Until Thanksgiving, that was. On Thanksgiving morning, I noticed that my web analytics had stopped working. All HTTP API requests failed with a “503 Service unavailable.” error but, interestingly, the cron-triggered APIs were still running in the background just fine and the staging deployment slot of the same app worked fine end-to-end as well. I tried redeploying the app with a fresh binary, thinking that a refresh would fix the problem, but that was of no use. I also poked through the dashboard trying to figure out what “Runtime version: Error” would be about, making sure the version spec in was up-to-date, and couldn’t figure it out either. Summary state of my problematic Azure Functions deployment. Note the cryptic runtime error along with the subtle warning at the top about upcoming deprecations. So… I had to get out of Azure Functions, quick. Not accidentally, I had bought a second-hand, over-provisioned ThinkStation (2x36-core Xeon E5-2697, 64 GB of RAM, a 2 TB NVMe drive, and a 4x4 TB HDD array) just two years back. The justification I gave myself was to use it as my development server, but I had this idea in the back of my mind to use it to host my own services at some point. The time to put it to serving real-world traffic with FreeBSD 14.x had come. The way you run a serverless Rust (or Go) service on Azure Functions is by creating a binary that exposes an HTTP server on the port provided to it by the environment variable. Then, you package the binary along a set of metadata JSON files that tell the runtime what HTTP routes the binary serves and push the packaged ZIP file to Azure. From there on, the Azure Functions runtime handles TLS termination for those routes, spawns your binary server on a micro VM on demand, and redirects the requests to it. By removing the Azure Functions runtime from the picture, I had to make my server binary stand alone. This was actually pretty simple because the binary was already an HTTP server: it just had to be coerced into playing nicely with FreeBSD’s approach to running services. In particular, I had to: Inject configuration variables into the server process at startup time. These used to come from the Azure Functions configuration page, and are necessary to tell the server where the database lives and what credentials to use. Make the service run as an unprivileged user, easily. Create a PID file to track the execution of the process so that the framework could handle restarts and stop requests. Store the logs that the service emits via stderr to a log file, and rotate the log to prevent local disk overruns. Most daemons implement all of the above as features in their own code, but I did not want to have to retrofit all of these into my existing HTTP service in a rush. Fortunately, FreeBSD provides this little tool, daemon(8) , which wraps an existing binary and offers all of the above. This incantation was enough to get me going: I won’t dive into the details of each flag, but to note: specifies which PID file to create; specifies where to store the stdout and stderr of the process; is required for log rotation (much more below); drops privileges to the given user; and specifies the “title” of the process to display in output. The trick was sufficient to inject configuration variables upon process startup, simulating the same environment that my server used to see when spawned by the Azure Functions runtime. Hooking that up into an service script was then trivial: And with that: Ta-da! I had the service running locally and listening to a local port determined in the configuration file. As part of the migration out of Azure Functions, I switched to self-hosting PostgreSQL as well. This was straightforward but required a couple of extra improvements to my web framework: one to stop using a remote PostgreSQL instance for tests (something I should have done eons ago), and another to support local peer authentication to avoid unnecessary passwords. In the call to above, I briefly mentioned the need for the flag to support log rotation. What’s that about? You see, in Unix-like systems, when a process opens a file, the process holds a handle to the open file. If you delete or rename the file, the handle continues to exist exactly as it was . This has two consequences: If you rename the file, all subsequent reads and writes go to the new file location, not the old one. If you delete the file, all subsequent reads and writes continue to go to disk but to a file you cannot reference anymore. You can run out of disk space and, while will confirm the fact, will not let you find what file is actually consuming it! For a long-running daemon that spits out verbose logs, writing them to a file can become problematic because you can end up running out of disk space. To solve this problem, daemons typically implement log rotation : a mechanism to keep log sizes in check by moving them aside when a certain period of time passes or when they cross a size threshold, and then only keeping the last N files around. Peeking into one of the many examples in my server, note how is the “live” log where writes go to but there is a daily archive for up to a week: Having all daemons implement log rotation logic on their own would be suboptimal because you’d have duplicate logic throughout the system and you would not be able to configure policy easily for them all. This is where newsyslog(8) on FreeBSD (or on Linux) comes into play. is a tool that rotates log files based on criteria such as size or time and optionally compresses them. But remember: the semantics of open file handles mean that simply renaming log files is insufficient! Once takes action and moves a log file aside, it must ensure that the process that was writing to that file closes the file handle and reopens it so that writes start going to the new place. This is typically done via sending a to the daemon, and is why we need to pass to the call. To illustrate the sequence: The system starts a service via and redirects logs to . runs and determines that needs to be rotated because a day has passed. renames to and creates a new and empty . At this point is still writing to ! sends a to the process. The process closes its file handle for the log, reopens (which is the fresh new log file), and resumes writing. compresses the file for archival now that it’s quiesced. Configuring is easy, but cryptic. We can create a service-specific configuration file under that provides entries for our service, such as: I’ll leave you to the manpage to figure out what the magic is (but in short, it controls retention count, rotation schedule, and compression). As I briefly mentioned earlier, the Azure Functions runtime was responsible for TLS termination in my previous setup. Without such a runtime in place, I had to configure TLS on my own in my HTTP server… or did I? I had been meaning to play with Cloudflare Tunnels for a while given that I already use Cloudflare for DNS. Zero Trust Tunnels allow you to expose a service without opening inbound ports in your firewall. The way this works is by installing the tunnel daemon on your machine and configuring the tunnel to redirect certain URL routes to an internal address (typically ). Cloudflare then acts as the frontend for the requests, handles TLS termination and DDOS protection, and then redirects the request to your local service. Interactions between client machines, Cloudflare servers, the cloudflared tunnel agent, and the actual HTTP servers I wrote. The obvious downside of relying on someone else to do TLS termination instead of doing it yourself on your own server is that they can intercept and modify your traffic. For the kinds of services I run this isn’t a big deal for me, and the simplicity of others dealing with certificates is well welcome. Note that I was already offloading TLS termination to Azure Functions anyway, so this isn’t a downgrade in security or privacy. But using Cloudflare as the frontend came with a little annoyance: CORS handling. You see: the services I run require configuring extra allowed origins, and as soon as I tried to connect to them via the Cloudflare tunnel, I’d get the dreaded “405 Method not allowed” error in the requests. Before, I used to configure CORS orgins from the Azure Functions console, but no amount of peeking through the Cloudflare console showed me how to do this for my tunneled routes. At some point during the investigation, I assumed that I had to configure CORS on my own server. I’m not sure how I reached that bogus conclusion, but I ended up wasting a few hours implementing a configuration system for CORS in my web framework . Nice addition… but ultimately useless. I had not accounted for the fact that because Cloudflare acts as the frontend for the services, it is the one responsible for handling the pre-flight HTTP requests necessary for CORS. In turn, this means that Cloudflare is where CORS needs to be configured but there is nothing “obvious” about configuring CORS in the Cloudflare portal. AI to the rescue! As skeptical as I am of these tools, it’s true that they work well to get answers to common problems—and figuring out how to deal with CORS in Cloudflare was no exception. They told me to configure a transformation rule that explicitly sets CORS response headers for specific subdomains, and that did the trick: Sample rule configuration on the Cloudflare portal to rewrite CORS response headers. Even though AI was correct in this case, the whole thing looked fishy to me, so I did spend time reading about the inner workings of CORS to make sure I understood what this proposed solution was about and to gain my own confidence that it was correct. By now, my web services are now fully running on my FreeBSD machine. The above may have seemed complicated, but in reality it was all just a few hours of work on Thanksgiving morning. Let’s conclude by analyzing the results of the transition. On the plus side, here is what I’ve gained: Predictability: Running in the cloud puts you at the mercy of the upgrade and product discontinuation treadmill of big cloud providers. It’s no fun to have to be paying attention to deprecation messages and adjust to changes no matter how long the deadlines are. FreeBSD also evolves, of course, but it has remained pretty much the same over the last 30 years and I have no reason to believe it’ll significantly change in the years to come. Performance: My apps are so much faster now it’s ridiculous. The serverless runtime of Azure Functions starts quickly for sure, but it just can’t beat a server that’s continuously running and that has hot caches at all layers. That said, I bet the real difference in performance for my use case comes from collocating the app servers with the database, duh. Ease of management: In the past, having automated deployments via GitHub Actions to Azure Functions was pretty cool, not gonna lie. But… being now able to deploy with a trivial , perform administration PostgreSQL tasks with just a , and inspecting logs trivially and quickly by looking at beats any sort of online UI and distributed system. “Doesn’t scale” you say, but it scales up my time . Cost: My Azure bill has gone from $20/month, the majority of which was going into the managed PostgreSQL instance, to almost zero. Yes, the server I’m running in the garage is probably costing me the same or more in electricity, but I was running it anyway already for other reasons. And here is what I’ve lost (for now): Availability (and redundancy): The cloud gives you the chance of very high availability by providing access to multiple regions. Leveraging these extra availability features is not cheap and often requires extra work, and I wasn’t taking advantage of them in my previous setup. So, I haven’t really decreased redundancy, but it’s funny that the day right after I finished the migration, I lost power for about 2 hours. Hah, I think I hadn’t suffered any outages with Azure other than the one described in this article. A staging deployment: In my previous setup, I had dual prod and staging deployments (via Azure Functions slots and separate PostgreSQL databases—not servers) and it was cool to deploy first to staging, perform some manual validations, and then promote the deployment to prod. In practice, this was rather annoying because the deployment flow was very slow and not fully automated (see “manual testing”), but it indeed saved me from breaking prod a few times. Auto-deployments: Lastly and also in my previous setup, I had automated the push to staging and prod by simply updating tags in the GitHub repository. Once again, this was convenient, but the biggest benefit of it all was that the prod build process was “containerized” and not subject to environmental interference. I’d very well set up a cron job or webhook-triggered local service that rebuilt and deployed my services on push… but it’s now hard to beat the simplicity of . None of the above losses are inherent to self-hosting, of course. I could provide alternatives for them all and at some point I will; consider them to-dos! On Thanksgiving morning, I woke up to one of my web services being unavailable. All HTTP requests failed with a “503 Service unavailable” error. I logged into the console, saw a simplistic “Runtime version: Error” message, and was not able to diagnose the problem. I did not spend a lot of time trying to figure the issue out and I didn’t even want to contact the support black hole. Because… there was something else hidden behind an innocent little yellow warning at the top of the dashboard: Migrate your app to Flex Consumption as Linux Consumption will reach EOL on September 30 2028 and will no longer be supported. I had known for a few weeks now, while trying to set up a new app, that all of my Azure Functions apps were on death row. The free plan I was using was going to be decommissioned and the alternatives I tried didn’t seem to support custom handlers written in Rust. I still had three years to deal with this, but hitting a showstopper error pushed me to take action. All of my web services are now hosted by the FreeBSD server in my garage with just a few tweaks to their codebase. This is their migration story. Blog System/5 and the open source projects described below are all made in my limited free time. Subscribe now to show your support; it goes a long way! How did I get here? Back in 2021, I had been developing my EndBASIC language for over a year and I wanted to create a file sharing service for it. Part of this was to satisfy my users, but another part was to force myself into the web services world as I felt “behind”. At that time, I had also been at Microsoft for a few months already working on Azure Storage. One of the perks of the job was something like $300 of yearly credit to deploy stuff on Azure for learning purposes. It was only “natural” that I’d pick Azure for what I wanted to do with EndBASIC. Now… $300 can be plentiful for a simple app, but it can also be paltry. Running a dedicated VM would eat through this in a couple of months, but the serverless model offered by Azure Functions with its “infinite” free tier would go a long way. I looked at their online documentation, found a very good guide on how to deploy Rust-native functions onto a Linux runtime , and… I was sold. I quickly got a bare bones service up and running on Azure Functions and I built it up from there. Based on these foundations, I later developed a separate service for my own site analytics (poorly named EndTRACKER ), and I recently started working on a new service to provide secure auto-unlock of encrypted ZFS volumes (stay tuned!). And, for the most part, the experience with Azure has been neat. I learned a bunch and I got to a point where I had set up “push on green” via GitHub Actions and dual staging vs. prod deployments. The apps ran completely on their own for the last three years, a testament to the stability of the platform and to the value of designing for testability . Until now that is. The cloud database Compute-wise, I was set: Azure Functions worked fine as the runtime for my apps’ logic and it cost pennies to run, so the $300 was almost untouched. But web services aren’t made of compute alone: they need to store data, which means they need a database. My initial research in 2021 concluded that the only option for a database instance with a free plan was to go with, no surprise, serverless Microsoft SQL Server (MSSQL). I had never used Microsoft’s offering but it couldn’t be that different from PostgreSQL or MySQL, could it? Maybe so, but I didn’t get very far in that line of research. The very first blocker I hit was that the MSSQL connection required TLS and this hadn’t been implemented in the connector I chose to use for my Rust-based functions. I wasted two weeks implementing TLS support in (see PR #1200 and PR #1203 ) and got it to work, but that code was not accepted upstream because it conflicted with their business strategy. Needless to say, this was disappointing because getting that to work was a frigging nightmare. In any case, once I passed that point, I started discovering more missing features and bugs in the MSSQL connector, and then I also found some really weird surprises in MSSQL’s dialect of SQL. TL;DR, this turned into a dead end. On the left, the default instance and cost selected by Azure when choosing to create a managed PostgreSQL server today. On the right, minimum possible cost after dialing down CPU, RAM, disk, and availability requirements. I had no choice other than to provision a full PostgreSQL server on Azure. Their onboarding wizard tried to push me towards a pretty beefy and redundant instance that would cost over $600 per month when all I needed was the lowest machine you could get for the amount of traffic I expected. Those options were hidden under a “for development only” panel and riddled with warnings about no redundancy, but after I dialed all the settings down and accepted the “serious risks”, I was left with an instance that’d cost $15 per month or so. This fit well well within the free yearly credit I had access to, so that was it. The outage and trigger About two months ago, I started working on a new service to securely auto-unlock ZFS encrypted volumes (more details coming). For this, I had to create a new Azure Functions deployment… and I started seeing the writing on the wall. I don’t remember the exact details, but it was really difficult to get the creation wizard to provision me the same flex plan I had used for my other services, and it was warning me that the selected plan was going to be axed in 2028. At the time of this writing, 2028 is still three years out and this warning was for a new service I was creating. I didn’t want to consider migrating neither EndBASIC nor EndTRACKER to something else just yet. Until Thanksgiving, that was. On Thanksgiving morning, I noticed that my web analytics had stopped working. All HTTP API requests failed with a “503 Service unavailable.” error but, interestingly, the cron-triggered APIs were still running in the background just fine and the staging deployment slot of the same app worked fine end-to-end as well. I tried redeploying the app with a fresh binary, thinking that a refresh would fix the problem, but that was of no use. I also poked through the dashboard trying to figure out what “Runtime version: Error” would be about, making sure the version spec in was up-to-date, and couldn’t figure it out either. Summary state of my problematic Azure Functions deployment. Note the cryptic runtime error along with the subtle warning at the top about upcoming deprecations. So… I had to get out of Azure Functions, quick. Not accidentally, I had bought a second-hand, over-provisioned ThinkStation (2x36-core Xeon E5-2697, 64 GB of RAM, a 2 TB NVMe drive, and a 4x4 TB HDD array) just two years back. The justification I gave myself was to use it as my development server, but I had this idea in the back of my mind to use it to host my own services at some point. The time to put it to serving real-world traffic with FreeBSD 14.x had come. From serverless to serverful The way you run a serverless Rust (or Go) service on Azure Functions is by creating a binary that exposes an HTTP server on the port provided to it by the environment variable. Then, you package the binary along a set of metadata JSON files that tell the runtime what HTTP routes the binary serves and push the packaged ZIP file to Azure. From there on, the Azure Functions runtime handles TLS termination for those routes, spawns your binary server on a micro VM on demand, and redirects the requests to it. By removing the Azure Functions runtime from the picture, I had to make my server binary stand alone. This was actually pretty simple because the binary was already an HTTP server: it just had to be coerced into playing nicely with FreeBSD’s approach to running services. In particular, I had to: Inject configuration variables into the server process at startup time. These used to come from the Azure Functions configuration page, and are necessary to tell the server where the database lives and what credentials to use. Make the service run as an unprivileged user, easily. Create a PID file to track the execution of the process so that the framework could handle restarts and stop requests. Store the logs that the service emits via stderr to a log file, and rotate the log to prevent local disk overruns. If you rename the file, all subsequent reads and writes go to the new file location, not the old one. If you delete the file, all subsequent reads and writes continue to go to disk but to a file you cannot reference anymore. You can run out of disk space and, while will confirm the fact, will not let you find what file is actually consuming it! The system starts a service via and redirects logs to . runs and determines that needs to be rotated because a day has passed. renames to and creates a new and empty . At this point is still writing to ! sends a to the process. The process closes its file handle for the log, reopens (which is the fresh new log file), and resumes writing. compresses the file for archival now that it’s quiesced. Interactions between client machines, Cloudflare servers, the cloudflared tunnel agent, and the actual HTTP servers I wrote. The obvious downside of relying on someone else to do TLS termination instead of doing it yourself on your own server is that they can intercept and modify your traffic. For the kinds of services I run this isn’t a big deal for me, and the simplicity of others dealing with certificates is well welcome. Note that I was already offloading TLS termination to Azure Functions anyway, so this isn’t a downgrade in security or privacy. CORS But using Cloudflare as the frontend came with a little annoyance: CORS handling. You see: the services I run require configuring extra allowed origins, and as soon as I tried to connect to them via the Cloudflare tunnel, I’d get the dreaded “405 Method not allowed” error in the requests. Before, I used to configure CORS orgins from the Azure Functions console, but no amount of peeking through the Cloudflare console showed me how to do this for my tunneled routes. At some point during the investigation, I assumed that I had to configure CORS on my own server. I’m not sure how I reached that bogus conclusion, but I ended up wasting a few hours implementing a configuration system for CORS in my web framework . Nice addition… but ultimately useless. I had not accounted for the fact that because Cloudflare acts as the frontend for the services, it is the one responsible for handling the pre-flight HTTP requests necessary for CORS. In turn, this means that Cloudflare is where CORS needs to be configured but there is nothing “obvious” about configuring CORS in the Cloudflare portal. AI to the rescue! As skeptical as I am of these tools, it’s true that they work well to get answers to common problems—and figuring out how to deal with CORS in Cloudflare was no exception. They told me to configure a transformation rule that explicitly sets CORS response headers for specific subdomains, and that did the trick: Sample rule configuration on the Cloudflare portal to rewrite CORS response headers. Even though AI was correct in this case, the whole thing looked fishy to me, so I did spend time reading about the inner workings of CORS to make sure I understood what this proposed solution was about and to gain my own confidence that it was correct. Results of the transition By now, my web services are now fully running on my FreeBSD machine. The above may have seemed complicated, but in reality it was all just a few hours of work on Thanksgiving morning. Let’s conclude by analyzing the results of the transition. On the plus side, here is what I’ve gained: Predictability: Running in the cloud puts you at the mercy of the upgrade and product discontinuation treadmill of big cloud providers. It’s no fun to have to be paying attention to deprecation messages and adjust to changes no matter how long the deadlines are. FreeBSD also evolves, of course, but it has remained pretty much the same over the last 30 years and I have no reason to believe it’ll significantly change in the years to come. Performance: My apps are so much faster now it’s ridiculous. The serverless runtime of Azure Functions starts quickly for sure, but it just can’t beat a server that’s continuously running and that has hot caches at all layers. That said, I bet the real difference in performance for my use case comes from collocating the app servers with the database, duh. Ease of management: In the past, having automated deployments via GitHub Actions to Azure Functions was pretty cool, not gonna lie. But… being now able to deploy with a trivial , perform administration PostgreSQL tasks with just a , and inspecting logs trivially and quickly by looking at beats any sort of online UI and distributed system. “Doesn’t scale” you say, but it scales up my time . Cost: My Azure bill has gone from $20/month, the majority of which was going into the managed PostgreSQL instance, to almost zero. Yes, the server I’m running in the garage is probably costing me the same or more in electricity, but I was running it anyway already for other reasons. Availability (and redundancy): The cloud gives you the chance of very high availability by providing access to multiple regions. Leveraging these extra availability features is not cheap and often requires extra work, and I wasn’t taking advantage of them in my previous setup. So, I haven’t really decreased redundancy, but it’s funny that the day right after I finished the migration, I lost power for about 2 hours. Hah, I think I hadn’t suffered any outages with Azure other than the one described in this article. A staging deployment: In my previous setup, I had dual prod and staging deployments (via Azure Functions slots and separate PostgreSQL databases—not servers) and it was cool to deploy first to staging, perform some manual validations, and then promote the deployment to prod. In practice, this was rather annoying because the deployment flow was very slow and not fully automated (see “manual testing”), but it indeed saved me from breaking prod a few times. Auto-deployments: Lastly and also in my previous setup, I had automated the push to staging and prod by simply updating tags in the GitHub repository. Once again, this was convenient, but the biggest benefit of it all was that the prod build process was “containerized” and not subject to environmental interference. I’d very well set up a cron job or webhook-triggered local service that rebuilt and deployed my services on push… but it’s now hard to beat the simplicity of .

0 views
Stratechery 1 months ago

An Interview with Atlassian CEO Mike Cannon-Brookes About Atlassian and AI

Good morning, This week’s Stratechery Interview is with Atlassian founder and CEO Mike Cannon-Brookes . Cannon-Brookes and Scott Farquhar — whom I interviewed in 2017 — founded Atlassian in 2002; their first product was Jira, a project and issue-tracking tool, followed by Confluence, a team collaboration platform. Atlassian, thanks in part to their location in Australia, pioneered several critical innovations, including downloadable software and a self-serve business model; over the ensuing two decades Atlassian has moved to the cloud and greatly expanded their offering, and is now leaning into AI. In this interview we discuss that entire journey, including Cannon-Brookes’ desire to not have a job, how the absence of venture capital shaped the company, and how the company’s go-to-market approach has evolved. We then dive into AI, including why Cannon-Brookes believes that there will be more developers doing more, and why Atlassian’s position in the enterprise lets them create compelling offerings. Finally we discuss Atlassian’s sponsorship of Williams, the F1 race team, and why Cannon-Brookes thinks they can both help Williams win and also accrue big benefits for Atlassian. To repeat a disclosure I have long made in my Ethics Statement , I did, in the earliest years of Stratechery, take on consulting work for a limited number of companies, including Atlassian. And, for what it’s worth, I’m also a huge F1 fan! Go Max. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for content and clarity. Mike Cannon-Brooks, welcome to Stratechery. MCB: Thank you for having me, Ben. So this is admittedly a new experience for me, I’ve already interviewed the founder of Atlassian , but it wasn’t you. I’m of course referring to Scott [Farquhar] . That was eight years ago, actually, before I even had podcasts. It was very brief, but hey, like I said, new experiences. MCB: That’s true. That’s true. And you wrote a consulting paper for us in 2014! I was going to disclose, yes, in the very brief period where I did consulting work, you flew me down to Sydney for a week, I had a chance to learn a lot about Atlassian. And on a personal note, that consulting contract helped me a lot, that was when I was just starting. It’s funny how small the numbers seem in retrospect, but maybe that’s why I’ve shied away from writing about you too much over the years, because it meant a lot to me. So I appreciate it and there’s my disclosure for the interview. MCB: Thank you. It’s a good piece of work. Don’t forget, ironically, we started as a consulting and services business and then decided that software was a better business model, so I think you did the same thing. You went the scalability route instead of the consulting work via Sydney. Absolutely. I’m not doing anything that doesn’t scale anymore, but I did love visiting Sydney, so it was great. MCB: Still, we pulled out the old consulting paper you wrote for us in 2014. Why are we going to win, why are we going to lose, everything else, it was classic Ben work. Was it good? MCB: It’s pretty good! It’s interesting, I’d probably be embarrassed if I read it today. Anyhow, the good news is that since it’s the first time I’m interviewing you, I do get to do my favorite segment, which is learning more about you. Where did you grow up, but also, where were you born? I know they were different places. Then, how’d you get interested in technology and what’s your version of the Atlassian origin story? MCB: Sure, I feel like I’ve heard this question 1,000 times! Where to start? My dad was in banking, he joined the glorious institution that is Citibank today, from England. Parents are both from Cambridge and bounced around the world a lot as part of that job. Took the, “Hey, we need someone to go to this country”, and he was like, “I’ll take that”. So I was born in America, in a period I lived in New York. To be honest, lived there for three months before I moved to Taiwan. Really? Whoa. I didn’t know that. MCB: Yeah, in 1980 when it was very different than what it is today. Yeah. Were you saving that to drop that off me? I had no idea. I thought you went straight from America to Australia. MCB: I only just thought about it about 30 seconds ago, actually. No, I went to Taiwan for a few years, lived in Hong Kong for a few years, went to Australia for a few years. So how I got into technology is actually related because my parents were moving around so much, the logic was being English, that they would send us to English boarding schools and that would be a stable thing while they were moving once we got old enough. So at the mighty age of seven, I was put on Qantas and sent to England and back four times a year to go to boarding school in England for about five, six years. Because of that boarding school, I have one of the lowest frequent flyer numbers in Australia, they introduced the frequent flyer program and that was at the end of year one or end of year two. I get given this catalog by my parents and how you’ve earned all these points, “What do you want to buy?”, and it’s like, “I don’t know, trips, winery things, booze”, I’m flicking through this catalog and I’m like, “There’s literally nothing in this catalog”, of gear that you used to be able to get that I wanted and at the back is this computer, so I was like, “I guess I’ll get that”. The only thing that was potentially age appropriate. MCB: That was the only thing in the catalog, I didn’t want a toaster, I didn’t want wine, so that became my first computer, the mighty Amstrad PC20 . Four colors, no hard drive. Eventually, I bought an external floppy drive, so you could put in two and did buy magazines and type in programs and write games and stuff from magazines and play with it, played a lot of video games basically back in that era. I was into computers peripherally all through high school, came back to Australia at 12, my parents had settled here by then and weren’t moving, and so I came back here, did all high school and university here. In high school, I was always going to be an architect, that was my dream the entire way through, but come to the end of grade 12, applied for a bunch of scholarships, because university, applied for the scholarships, ended up getting one and so I thought, “Oh, well, maybe I’ll take that”, and it was in a course called BIT. Basically, half computer science, half finance and economics, but it was 15 grand a year, tax-free, so I was like, “Well, I’ll do that for a while and go back to the architecture thing”. Of course, famously in that scholarship, I met my first business partner of my first startup, met my second business partner of the second startup, they went in radically different directions in terms of outcome, but it was just 30 kids right at the right time, did the dot-com era thing. Now, ironically, as a part of that scholarship, you had to spend six months in three industrial placements, so the origin story of Atlassian comes from then a little bit, because those industrial placements were so boring. Scott spent six months installing Windows at a large corporate and he was crazy freaking smart and it was like, “Hey, go from computer to computer and upgrade to Windows 98”, or whatever it was. It was like, “Guys, this is our life, this is going to be horrible”. I worked for Nortel Bay Networks, which was a good, at the time, massive competitor, Cisco then completely disappeared and so a good tech lesson in and of itself, I basically cataloged the room full of networking gear and routers, it was mind-numbingly boring. So towards the end of the university course, I famously sent an email to a few people saying, “Look, I don’t really want to get a real job, why don’t we start a company and we’ll try some stuff?”. And this was after the dot-com era? This was the early 2000s? MCB: This was after the dot-com era, yeah. So I lived through the dot-com era actually as a journalist and writer, analyst and technology. I worked for a company called Internet.com, which became Jupiter Media and Jupiter Research and that was great, that was an amazing era for me. We ran events, newsletters, what would’ve been podcasts, didn’t have them back then. And we ran events on Mobile Monday, I think one of them was called and it was all about WAP and— Well, the real secret is you’re not the only one. There are some founders that are very successful, that they’re like, “Look, I just want to pontificate about technology”. MCB: A little bit like you, I remember getting in a lot of trouble from some of the startups, because some company would launch and I wrote basically 500 words on, “This thing’s never going to work, this is a disaster of an idea”, and they would ring up and yell at my boss and he was awesome, he’d be like, “Dude, just keep writing what you think”, and it didn’t make you very popular as a journalist type. Anyway, emailed some people, tried to start a business, we didn’t actually know what we were going to do. Atlassian has, I always tell people, a terrible origin story. You should not copy us. You just didn’t want to be installing Windows or upgrading software. MCB: We literally did not want to get a real job. And Scott replied and said, “Yeah, sure, I’m in for trying that”. He was one of the smartest kids in our class and his nickname is Skip, because he was the president of our student association and always a leader type and Eagle Scout and everything else, so we’re like, “Yeah, okay, let’s do that, we’re good mates” — and that started Atlassian. We picked the name in about five minutes, which if you consulted any branding company, would not have been chosen. Ironically, originally, we were going to do customer service and consulting, that was what the gig was. Hence the name, because Atlas was a Greek titan whose job was to stand on top of the Atlas Mountains and hold up the sky, that’s what he was supposed to be doing. He was a bad guy, so his punishment was to hold the sky up and we thought that was an act of legendary service, and so we were going to provide legendary service by holding up the sky for customers and as I said, did the service thing for about six months, decided that this is a terrible business. People paying us $350 US to answer their questions and didn’t scale and was at crazy hours of the morning and night and everything else. So in the meantime, we wrote the first version of what became Jira . We actually wrote three pieces of software, one was a knowledge basey type tool, one was a mail archiving tool for groups, so you could see each other’s email as a shared archiving. And were you seeing this and you were building tools for yourself, for your consulting business? MCB: Literally, yes, exactly. So all three were tools that we needed for ourselves. People would email us and I couldn’t see Scott’s email and he couldn’t see mine at the time and it was like this is silly, and we built Jira to handle questions and issues and problems that we were having ourselves that became a teeny bit popular. There was this glimmer that someone else cared, so we poured all the effort into that. What was that? What was the glimmer? Because this is when Agile is taking over software development and at least the legend is Jira and Agile go hand in hand, is that a correct characterization? MCB: A little bit, but this is actually pre-Agile. So Jira comes out before Agile is even a thing. I think it was about two or three years before we had any version of marketing or feature sets that involved Agile. This was just a web-based, at the time, a bug tracker. So the interesting evolution part of the company obviously is it started as a bug tracker for software developers, it became an issue tracker for technology teams and now it’s like a business workflow for tens of millions of people every day across the world, most of whom have nothing to do with technology, so it’s gone on its own evolution. Would anything have been different if this was the plan from the beginning, or did it have to be this organic, “We’re figuring it out as we go along as we’re running away from Windows installations”, sort of story? MCB: I think, look, obviously, if we could choose to follow in our own footsteps, the Back to the Future skeptic in me would say it’s gone pretty well, so I’d follow every single footstep I took. (laughing) Yep, totally. MCB: And that would’ve become the plan. But look, we had two hunches really, which both turned out to be radically correct. Now, I would say we were following waves or whatever else, but one was that the Internet would change software distribution, which sounds ridiculous now and when I talk to graduates nowadays, I have to put them in the right time and place and say, “Look, when we started, software was distributed on a CD”, BEA WebLogic was the bee’s knees and you used to have to get it on a CD if you were lucky. If not, someone would come and install it for you and that’s how software was distributed. We made that CD into a ZIP file and put it on the Internet for people to download. You didn’t access it like a SaaS application, you literally download it from our website. Right. It’s funny that when you first say that, it’s like, “Oh, it’s completely transformative”, well, but you were an on-premises software story. But actually, no, there’s several steps to getting to SaaS, one of which is just downloading software. MCB: And we had people call us before they would download to check that we were real and stuff and I’m like, “Why don’t you just download the damn ZIP file?”, and I also date them, because, well, maybe I’ll get to the business model part, but the second innovation was that we thought open source would change software costs. So we had this big hunch, we were both writing a bunch of open source code at the time. Open source was a massive movement, especially in the Java space. Embarrassingly, I actually wrote a book called Open Source Java Programming that you can find with some mates. It’s still on Amazon and we sold a few thousand copies, I think, but I swore I’d never write a book again, it was a very painful experience. Thank you, you’re validating my life decisions . MCB: Yeah. Open source did bring the cost of building software down radically. We were writing a very small layer, 5% of the code at best on top of masses of amazing open source libraries and we contributed to those libraries, but we could deliver an amazing experience for a very low cost. We learned a lot, pricing and packaging. So what was the implication of that hunch though? Just that the market for developers, that would subsequently mean there was more software? MCB: A little bit that was the implication of the hunch. Largely for us, it was that the cost was going down. Pre-open source, you had to write everything so if Jira was back then, I don’t know, a million lines of code, if you added all the open source libraries together, it was 25, 30, 40 million lines of code. It was so big that it was so expensive, because you had to write all of that. To think of Windows, they wrote everything, the networking stack, there were no libraries, there was no open source involved in the original versions, it was all written by Microsoft. So the cost of that was very high, then you had to charge a lot of money. So we thought, look, if we could take all these amazing open source libraries, contribute back to them — we were a great open source citizen — and build a piece of proprietary software on top of them that solved customer’s problems, we could deliver that really cheaply. In fact, we sold the original versions of Jira, they were $800, unlimited users, unlimited use with no lifespan. So it was just 800 bucks, one-time fee forever and we learned a lot about pricing and packaging firstly, but secondly, it was very simple. Our goal in the early days, we had to sell one copy a week to stay alive, that was it. Some weeks, we’d sell two copies. $1,600 US would roll in and we’d be like, “Cool, we got a week off to survive”, and then one copy a week became two and two became five and five became ten, and now it’s hundreds of thousands. Well, isn’t the thing you just didn’t want to have a job? So I love this part of the story, because when I started Stratechery, I had a job from Microsoft that made, I think, $104,000 or something like that. I’m like, “I just want to make that, because I don’t want to work for a corporation, so if I could just get to there, it’ll be great”. MCB: We had exactly the same sets of goals. We had a few things we wanted to make somewhere that we wanted to go to work. I wanted to get up every day and think, “I want to go to work”, and weirdly, almost 24 years later, I love coming to work, so a tick achieved. We wanted to make it so we didn’t have to wear a suit, neither of us really like wearing suits at all — in fact, it’s a bit of an allergic reaction often and so tick, don’t turn up to work in a suit every day. And thirdly, most of our friends, so this is right where IBM bought PwC ironically, so out of the 30-odd kids in our class, maybe 10 went to IBM as consultants and 10 went to PwC and then they all end up going to the same shop and their grad salary there was $47,600. So our goal for year one was to end the year making at least a grad salary and convince ourselves we’re not crazy kind of thing and we smashed that goal, so that was good, but that was there. The Internet, the distribution part is important, knowing your favorite topics. Tell me about that and along with the business model, because again, this goes back so far, I don’t think people appreciate the extent to this entire idea of self-serve or bottoms up selling. This is really where it all started. MCB: Yes. And look, a few things. Firstly, if you come from Australia, we’re an exporting nation. “We’re built on the sheep’s back”, is a phrase, Australia’s built on the sheep’s back. What that really means is because we were this colony originally, then country on the far side of the world, anything we did to make money largely had to leave the country and go somewhere else. Originally, that was a struggle to find a product that could do that. “Built on a sheep’s back” is because wool was the first product that could do that, you could put it on a wooden boat, because it wasn’t very heavy and you could ship it a long distance, because it kept really well, so we could make sheep’s wool and make money as a country by shipping it back to Europe and it could survive the journey and so the country was built on the sheep’s back. We are a massive exporting nation. Trump brings in his tariffs, we’re the only country with a negative rate of return, we have a positive trade relationship with America and we’re like, “Wait a second, why did we get taxed?”, so obviously, it’s rocks, technology, we build and export everything as a country that we do. So our mentality was like, “Well, if we’re going to make money, it’s going to be overseas”, that was the first thing, is, “Okay, it’s going to be somewhere else, it’s not going to be Australians buying our software”, and so the Internet allowed us to do this. We put up a shopfront, early website and people could come to our website, download our software and then we just needed a way to get paid for it. The problem was in order to do that and the trust barriers of the Internet, we had to have a very low price and we had to have a fully installable offering. So we spent so much time on making it installable, documentation, “How would you get yourself up and running and try it?” — the software, as we put it, had to sell itself. Our software had to be bought, not sold. We didn’t have any salespeople, we couldn’t travel to your office in Sweden or London and help you out with it. For $800, we couldn’t have done that and secondly, it didn’t make any sense. So the evolution was, “Okay, this is the only possible path that we can go down is we have to figure out how to get people to do this”, now it turns out once you have figured out how to do that, it’s an incredibly powerful motor because you have lots of people coming, you have a very cheap piece of software for its relative performance, and you get people using it in all these big businesses all over the place. I would say 50% of the customers I go meet nowadays, probably meet a handful of customers, a couple a day on an average kind of thing, many of those have been a customer for 20 years, 22 years, 23 years. How many customers have been a customer 23 years? I’m like that’s crazy, we’re only 24 years old. That’s awesome. MCB: And so they downloaded very early, they didn’t download as all of , all of them are customers. Just one guy who’s like, “I need a way to track my issues”. MCB: Exactly. It was some guy in a backroom who needed to track it. I know the Cisco origin story, that was literally a guy, he’s still there, he’s been there 22, 23 years, he’s awesome. And they started with just, “I just needed a way to manage my issues for 10 people”, and now it’s hundreds of thousands of people, seats that we have there, it’s kind of grown over time. How did we know that business model was working? Again, it dates us a lot, this didn’t mean we didn’t answer questions, we were big on customer service and helping people, email was the way to do that. A bit of IRC back then, we had a channel you could log into and we’d help you. But the first customer, we used to walk into the office in the morning and we had a fax machine with literally rolls of paper. So if you wanted to pay for this distributed software, this says how old, there was no SSL keys, I heard you complaining about it the other day, totally agree with that era. You had to download a PDF off our website, which was pretty modern that it was a PDF, fill in your credit card details, and fax it to us, that is how you paid when we started. So we would walk in the morning and there’d be these rolls of paper on the ground, you be like, “Ah, sweet, someone bought something”, you know what I mean? It became a weird dopamine drug for us. The very first company was American Airlines… MCB: About six months in that we came in the morning and there was a fax on the ground with $800 and a credit card number written on it and we had never talked to American Airlines, they had never emailed us, they had never asked for customer service, they’d never gone on IRC, they had never talked to us in any way, shape or form. Man, this thing could work, we just made $800 out of the air. MCB: I mean, there was a lot of pre-work to get them there, but obviously that was kind of different. MCB: Then secondarily, as you wrote, I’m just trying to finish a very long answer here, we started Confluence in 2004, and those two became the jewel engines and both of those I think were probably major moments. I often say Confluence is a bigger moment, actually. The business model was kind of established, this is two years into the business. We made, I think, $800 grand in year one, $1.6 million in year two, maybe $5 million in year three, and $12 million in year four, if I remember the revenue numbers. So the thing was working really well. You’re the company that’s the Microsoft heir in some respects, which is the really just you took venture eventually, but didn’t really need to, just pure bottoms up. You and Scott, we’re able to keep a huge portion of the company because of that, it’s an amazing story that is, I think, under-told in some respects. MCB: Yeah, well, we actually did. I mean, we did and didn’t. So the venture story is one of my favorites because it describes how we think from first principles. Firstly, the first capital we put on the balance sheet, institutional capital to put on the balance sheet, I guess you could argue our initial, I don’t know, $10 grand each was some money, but was in the IPO . So in 2015, when we went public, that was the first capital that went into the business all time. We took two rounds of funding, one in 2010 and one in 2013, but both of which were to employees, the first was to the founders and the second was to large number of employees who bought in so both of those companies bought ordinary stock. Secondary shares basically, yeah. MCB: They bought ordinary stock, there were no preferences, there were no anything, that was kind of the way it is. And we love the Accel guys that invested, it’s kind of funny because their business model was wildly wrong, we now have their original spreadsheets and stuff. We’ve 15 years in, you know them really, really well, they wanted us to grow it. I think we had to grow at 30% for two years, 20% the year after and something like that to double or triple their money and at the time they put in $60 mil US , that was the largest investment I think Accel had ever made in anything software, digital kind of world and it was this massive bet. It was a one-page term sheet for ordinary stock, so credit to those two partners who took massive risk on us, had to fight, we know that GC, everybody else to do this unusual funding round and I think we did 50% growth the first year, and our CAGR since then is probably 40%. Yeah, it worked out pretty well. MCB: They did very well. I think their 2-3x was more like a 300x or something. You mentioned the Confluence moment. Why was that a big deal? Usually the story is you have one product and you need to focus and you’re two years old, you’re launching a completely new product. Is that the aspect you’re referring to? MCB: Yes, I think it comes down to being bootstrapped. Look, we spent nine years convinced we were going to die every day, there was just such a mentality that this thing was all going to fall over and we better work harder and keep going. The Confluence moment was important because I remember, I don’t know exactly, but sometime around then we understood venture capital. Firstly, on the venture capital side, because they do relate to each other, there was no VC available in 2001 and 2002 in Australia. We’re a nuclear winter, we’re two idiots with no credibility. Right. You could barely get funded in San Francisco, you’re not going to get funding in Sydney. MCB: No, because 2001, you weren’t even finding San Francisco funding because the whole dot-com boom had just happened, no one was getting funded anyway. We’re in Australia and we have no credibility, so we didn’t even bother. We literally, 2010 when we went to the Accel thing and we talked to five VCs, was the first time we’d ever pitched the business. It was just not a thing, people don’t understand, we used to say we were customer-funded when people would ask the also awkward question of, “Who’s your funding come from?”, we were like, “We’re customer-funded”, They go, “Oh, okay”. Lifestyle business! MCB: But we did understand venture capital, massive readers, I have an army full of technical books, books about technology and the industry and history and stuff from that magic era of airport bookstores. We read every episode of Red Herring and Industry Standard and Wired Magazine, I have just this huge library, so voracious readers. One thing you understood about venture capital is they put the portfolio theory on their side — and I’m a big fan of venture capital, I should say, I’m the chair of Australia’s biggest VC fund and that’s my other mate that I met in university, Niki Scevak . But we wanted portfolio theory on our side, we’d done finance and economics, we had one product, this was highly risky if you’re bootstrapped. So there was a little bit of the thinking that actually if we have two products, our chances of total failure are less, one of them can fail and we’ll be okay and so we started a second product. Yes, arguably it was hard, but our first one was going all right, it was like making, I don’t know, five million bucks a year and we had a handful of really awesome backpacker programmers. And the early people, it’s like a whole total band of misfits that somehow made this thing work and we’re having a lot of fun, we’re working really hard and so we made another internal tool that became Confluence and being adjacent, but very different, selling to different audiences, but having a lot — if you bought one, there was a good reason to have the other one, no matter which way you started, became a really good symbiotic loop of these two engines that powered us for a very long time. So it was more a case of reducing our risk actually than anything else. Wasn’t it risky to be splitting your resources or did that not even occur to you? MCB: I don’t think it occurred to us, no. It was more about splitting our risk and we were doing pretty well, but it changed the business because we moved from being the Jira company to a software company, and I say that’s probably the most under-understood moment because we had to learn about not how to market Jira, but how to market software, not how to build Jira, but how to build software. So now we have 20, 25 apps in 5 different categories that sell to all sorts of different teams who own a business, but we had to become a software company. Microsoft, I don’t know the analogy’s really that fair to them, to be honest, or fair to us, it seems massively over-glamorizing what they’ve achieved, which is amazing, I’m huge fan of Microsoft. The need to understand how to sell, in their case, like Minecraft, SQL Server, Azure, AI, you have to understand the building, the creation of technology, the selling of technology, the marketing of technology at a generic level, it really helped us generify the business. I think if we’d gone too much longer, everybody would’ve been on the Jira team, it would’ve been too hard to start a second thing and instead, we’ve always been a multi-product company. You just mentioned selling a lot. When did you finally realize or transition away from just being self-serve to actually, “We’ve got to grow beyond this”? Was it almost like a pivot that came too late because your identity was so wrapped up into the, “We’re the self-serve company”? MCB: Look, it’s never been a pivot, I get asked this by investors all the time. I would say our go to-market model and our process has kept evolving pretty much every year or two for 20 years and I say evolving because we’re very aware of the strengths of the model that we came up with and we’re very aware of what it takes to power that and we’ve been very careful when we’ve evolved, changed, added to it, not to destroy the original one. So nowadays, we have two amazing business models where we call them high-touch and low-touch. So we have the low-touch model, which is literally the same thing as it’s always been, hundreds of thousands of people show up every week, they try our software, we want them to have a great experience trying the software, we want to spread it as widely as possible and as many enterprises as we can, and some of those will stick, some of those will get working and we measure aggressively the rates of return and dollars and flows and funnels and everything else. This whole team whose job is to make sure that that’s working at now massive scale, right. But at the same time, what happened is as customers got more and more Atlassian software deployed, they wanted a different relationship with us, they wanted a bigger relationship. Those days they used to be spending, as soon as we were spending $20 grand, we were like, “Oh man, maybe we should talk to these people”, nowadays it’s more like around $50 to $100 grand is when we’ll talk to you. So the lines kept moving for different reasons and we actually have online sales, inside sales in between actually, the sort of classical someone gets on an airplane and goes to travel to you. So it’s just kept evolving. We talk about the IPO a lot, it’s our 10-year anniversary coming up this month, I’m off to New York next week to ring the bell and celebrate 10 years. When we went public, as an example, we had less than 10 companies paying a million dollars a year, now we’re well north of 500 in 10 years. So that doesn’t come without an amazing enterprise sales team and teams that go out and help customers and customer success and all the trappings of a really top flight enterprise sales organization, because for most of those customers, again, I think it’s north of 85% of the Fortune 500 are deep Atlassian customers. We become a strategic partner to these businesses that if we go down, rockets don’t take off, banks shut down, it’s a real critical importance to most of these customers. How big is your business outside of directly working with developer teams? As I recall, this was part of the consulting thing was you were wanting to do Jira for sales or Jira for all these different sort of functions, where and how did that evolve? MCB: So it’s been a continuum for a long time. So nowadays, less than half of our users are in technology teams, and probably a third of those are developers, less than half of them. So a portion of our audience, it’s a very important point of words. When I talk about this, all the engineers are like, “Hey, you don’t care about us anymore”, I’m like, “No, that’s not true”, that business is a great business, it’s just the rest of our business has grown massively around it. There are not enough developers in the world for our business. Our fundamental value has always been actually, and it took us one of these things, it took a decade to realize, firstly, we don’t solve technology problems, we never have, we’ve never had anything that’s like, “I care what code you write, which language the code is in, what the code does”. We solve collaboration and people problems, we always have solved people problems, even Agile was a people problem. It’s not a technology problem, actually, it’s a people problem. It’s, “How do we organize a group of people to build a piece of technology that best meets the customer’s needs and goes off track as little as possible?”, that is a collaborative people problem, we’ve always solved people problems. Our value actually came because there’s a lot of tools for technology teams and we never wanted to be in the dev tools business, that’s a road of bones, it’s very hard to build sustainable competitive advantage and dev tools, the history shows this. There’s just a different company every few years, developers tastes are fickle, our developers taste are fickle, this is not me sledging developers at all, we have a massive R&D arm and that group changes languages every couple of years, they change how they build software every couple of years, they’re constantly moving on, they change our analytics tools and everything else because they are tool builders and toolmakers, that makes sense, but that’s a hard place to build a business. Interestingly topical today, so we’ll see. But the easier place to build a business in the long term was the level above that, which is the collaboration problems that came, which started as, “How do we get engineers, designers, product managers, business analysts to all be on the same page about what it is that they’re building and have a repeatable process for that?”. It turned out that as the world has become technology-driven, as we say, our customers are technology-driven organizations. If you’re a large organization for whom technology is your key distinct advantage, it doesn’t matter whether you’re making chips and databases or whether you’re making rockets or cars or whether you’re making financial services or insurance or healthcare, I would argue for most of the businesses that are great, technology is their key competitive advantage, then you should be our customer, that is it. And what we help you do is we help your technology teams and your business teams collaborate across that boundary because that’s actually the hardest boundary. Building great technology is one set of problems, making it work for your customers usually means in different industries, a different amount of working with all sorts of business people and that’s what Jira did from the very start. Now that’s what our whole portfolio in service management, in strategy and leadership teams is about doing that at different scales and different amounts in different places. Does it bug you when you get complaints on the Internet of, “Jira’s so complicated”, “Hard to use”, blah, blah, blah? And are you speaking to, the problem is that the problem space we’re working in is not the single developer trying to track an issue, it’s trying to herd a bunch of cats and get them the same direction and muddling through that is a lot more difficult than it seems. MCB: It bothers me anytime people don’t like our software, sure. We’ve worked for the last 20 years to make it better every day. We’ll probably work for the next 20 years to make it better every day and people will still probably be dissatisfied and that is our fundamental core design challenge. There’s a few reasons they say that. Firstly, the on-premise business model and the cloud shift is really important because with the cloud shift, we update the software, with the on-premise business model, we don’t, so you would often be on older data versions, customers would upgrade once a year or every two years or something, and so we can’t control that. Secondly, the challenge of Jira is at our core, we solve a whole lot of what we say is structured and unstructured workflows. Confluence is an unstructured workflow, Jira’s a very structured workflow. You have a set of steps, you have permissioning and restrictions, you have fields, you have what’s happening in this process. The auditor will do something and pass it to the internal accounting team, the accounting team will do this and pass it to legal, legal will do this and pass it to these people. You’re defining a workflow and you’re having information flow back and forth and a Jira work item is, as we call it, it’s a human reference to work. That’s the best description of what Jira is work in the knowledge work era is this very ephemeral concept. Back to your development example, is the code the software? Is the idea the software? Is the designs in Figma — these are all parts of what it is, this thing that’s called this virtual thing that we’ve built. What we track is with a human reference to that, so someone can say it’s a new admin console. Cool, here’s the design for the admin console, there’s the spec for the admin console, there’s the code for the admin console, here’s where it’s been tested, here’s where it’s deployed. Did customers like it? We need a reference to this thing that is otherwise spread across hundreds of systems and virtualized. Once you’re building a workflow system, companies, ours included, love process, we love workflows, we love control, and that control usually comes with more data. “Hey, don’t fill in these three fields, fill in these 50 fields”, and they’re all required for some reason and our job to customers is to say, “Do you really need 50 fields?”, because you’re creating a user experience- You’re ruining it for us! MCB: Your users are going to have to fill in all 50 fields, and it feels like that’s going to take you a while. We have customers — I went back and checked, I think almost every single person you’ve interviewed on your podcast is a customer of ours. I don’t know if it’s 100%, but it’s definitely north of 95% out of the last 20 guests. Stratechery is a customer of yours, so there you go. MCB: Oh, really? Well, there you go. Thank you. One of my engineers adores Jira, so I get the opposite angle from what I asked about. MCB: That’s right. So look, it’s a challenge for sure, but at the same time, man, the value we’ve created, the business value, the number of customers that run on it, it’s ironic, we talk about the AI era and all these other things. Literally, no chips go out of any of the chip companies you love talking about, every single one of them, soup to nuts. So at what point did you realize that AI was going to impact you in a major way? Was there an “aha” moment or it’s just been in the air? Or is it a specific time you realized, “Look, this is going to completely change what we do?” MCB: Again, I’m one of these — I’ve realized I’ve become the old man in the room. We’ve done machine learning for a long time in lots of ways because of our online business model, so I’d say we’ve done AI for a long time. Obviously, LLMs are what people refer to nowadays by AI and agents and these words that have corrupted the entire thing, the meaning changes in technology when it means something else. The launch of various versions of ChatGPT were very instructive obviously, they were a moment for everybody. The optimism, and I would say we’re massive AI optimists, it is the best thing that’s happened to our business in 25 years. Why? Because people might look at you from the outside and say you’re still characterized as — even though your business expanded far beyond developers — “Oh, you have a lot of developers”, I’m skipping over the transition to the cloud just because we’re running out of time, but it’s an interesting story. You did announce you are finally ending the on-premises software, which I’m curious, it is a sentimental moment to come to that decision, but people might look at you from the outside and say, “Oh, there’s a company that’s going to have a problem with AI, AI is going to replace developers, it’s the decreased seats . What are they going to do?” MCB: There’s a few ways to take that. I’m trying to put it on a tee for you. I think I know what you want to say. MCB: There’s a few ways to look at it. Firstly, I think AI is a good example where people are very concrete about the negatives and the positives are upside. I think it’s a huge force multiplier personally for human creativity, problem solving, all sorts of things, it’s a massive positive for society. That doesn’t mean there aren’t any negatives, but the net effect is really high. And we spend a lot of time, you hear it in the media talking about the job loss, the efficiency gains, whichever way you want to put it, that’s the thing. Well, that’s because it’s really concrete in a spreadsheet, “I can do this process with half as many people”, “Wow, look at that, that’s great”, what’s never written in the spreadsheet is all the new processes that get created, all the new ways of doing things, the quality of the output is going to be twice as high. If software costs half as much to write, I can either do it with half as many people, but core competitive forces, I would argue, in the economy mean I will need the same number of people, I would just need to do a better job of making higher quality technology. So our view on AI overall is an accelerant, not a replacement to everything we do, and just the next era of technology change is really positive. We’ve loved technology, we love the cloud, we love all the tech changes we’ve been through, mobile. Look, us as a business, we are in the game of knowledge work. We solve human problems, workflows, business processes, this is what we do. These largely revolve around text, or if it’s video nowadays, that can be reduced to text in various ways. LLMs allow us to understand that text in a massively deeper way than we ever have been, and the problems we solve aren’t going away. 20 years time, there’ll be groups of people trying to solve some sort of problem as a team and working on a project, and so these things aren’t going to go. They’re going to need to talk to each other and collaborate of what work’s going on and how it’s working, so the textual aspect of it has been amazing. The features we’ve been able to ship, we never could have built five years ago, it was literally impossible, so the ability to solve customer problems is so much higher than it ever has been. Secondly, our software is incredibly valuable at the core of these workflows, but it’s also incredibly promiscuous. What I mean by that is we have always been very highly interlinked with everything else. If it’s a sales team, there are links to Salesforce and customer records, there are links to internal systems, there are links to maybe features that need to be built, there are links to some content and document. So any Jira, Confluence, or Loom , you don’t record a Loom unless you’re talking about something, you don’t have a Jira issue without pointing to all sorts of different resources, whether that’s a GitHub or Figma, whether it’s Salesforce or Workday. That gives us a really unique knowledge, which we’ve turned into the teamwork graph, that actually started pre-AI, so the irony is the Teamwork Graph is about 6 years old. Well, it started with Confluence. This is the whole thing where you look backwards, and to your point, if you had just been the Jira company, but because from the very beginning, you mentioned Confluence was different but it was adjacent and you had to build the links and stuff together, and as you build all these different tools, because everyone wants to be this point of integration. And I wanted you to tell me about Rovo and this idea of being able to search across all your documents. Who gets permission to do that? It’s someone that’s already there, and you made the critical decision to be there back in 2004 or whatever it was. MCB: That’s true. Certainly back in 2004, and then in I think 2019, the Teamwork Graph starts, which is trying to take all of those links and turn them into a graph. The connectivity, two things linked to this Figma thing, five things linked to this customer record — okay, cool, that means something, so we built this Graph. To be honest, it was a bit of a technology lark. We have a lot of these projects that are really cool and we’re like, “We’ll be able to use this somehow and it’s going to grown”, and now it’s a hundred billion objects and connections connecting all of the company’s knowledge. It becomes the organizational memory nowadays and context and all these things nobody knew in 2019 that’s what it was going to be, it just seemed we needed it for various process connections. That turns out to be because it’s got permissions and compliance and all of the enterprise stuff built in, which is incredibly difficult, the best resource to point AI at in various forms. You still have to be good at the AI parts to get the knowledge, the context for any area, so the Teamwork Graph is our data layer. It’s not only the best kind of enterprise search engine for your content from a 10 Blue Links kind of way of thinking. If you’re chatting through your content, you still need all your organizational knowledge. I actually obviously found your Article, I was like, “Hey, what has Ben Thompson written about us last year?”, and I asked Rovo in chat and it comes back to me with he wrote this, that and the other and pulls out some snippets. I’m like, “Tell me more, do you think we’ve hit that?”, I literally got a report written by Rovo on your report as to whether it had been accurate. “Go look at the last 10 years with deep research and web search and come back and tell me, was he right or wrong?”, and it gave me a really interesting analysis of whether you were right and wrong. It’s like most AI things, it’s like 90% correct, it’s pretty good. It solved a lot of the first problem and I would not have done that work otherwise. I would have read it quickly and so I wasn’t going to put an analyst on it internally to do this work, but I could send something to do work I never would’ve done. Who’s your competitor for this spot, for this Rovo position where you have all this context, you can actually search your company in a way that just wasn’t possible previously? MCB: Who are the competitors you say? Yeah, because everyone is claiming they’re in this spot, “We can be the central place that you go and we have visibility everywhere”, why is Atlassian the one that’s going to win that space? MCB: A few reasons why we will. I think we have a great chance to be a great player is maybe the easiest way to say it. I think everybody loves this absolute win position, we don’t believe in enterprise technology, you usually get these absolute wins, it’s not quite the same as in the consumer world. We have a lot of business processes and workflows, millions every day that run through us, those are human collaboration workflows, so they are cool. The auditing team hands off to the accounting team, hands off to the tax team, whatever it is, sales workflows, marketing workflows, and they span lots of our applications and many others. If you’re going to go and introduce agents, these autonomous AI-driven software programs, whatever you want to call an agent, you’re going to put them into existing processes to make those processes either more efficient, more accurate. When the human picks up a task, it’s got all the information they need because something’s gone out to find it, that is an incredibly powerful position, which is why we support our agents and everybody else’s. You can assign a Jira work item to a Cursor agent in terms of code, you can assign it to a Salesforce agent. If you have your agent technology choice, I don’t think you’re going to have one agent platform, I think you’re probably going to have multiples, there are going to be a handful of organizational knowledge graphs that are powerful enough to solve these problems across multiple tools, but we have access to all those tools. We already know the information to some level, and that becomes a very unique advantage. Do you see this as a way to expand even further how much of a company you cover? You started with developers, then you expand to adjacent teams, and you talk about it’s now just a fraction of your user base. Do you own entire companies or could you get there? It’s like, “Okay, we still have these teams over here that are not on Jira, but Rovo’s so good that we need to bring everyone in”? MCB: Look, again, it would be great. I think it is unrealistic, and we should say “Absolutely”, right? MCB: If [Salesforce CEO Marc] Benioff was here, he’d be like, “Absolutely, we’ll own the world”, we love him, that’s the way he is, I don’t think about it as owning a customer. Our mentality has always been — I always use the subway analogy versus we have some competitors, for example, that want to be the control tower, their whole thing is we’ll be the control tower, just give us control and we’ll go and control everybody else, we’ll move the planes around. I think in enterprise IT, that’s an unrealistic view. Every CIO has been sold this for decades, it doesn’t happen because the world changes too quickly. Our philosophy and our commitment to customers has always been we will be a great citizen on all sides, we will interact with all of the applications you need, the old ones and the new ones, and we will be a valuable point of exchange in your business workflows and processes, whether those are structured like in Jira, whether unstructured like in Loom or Talent or something else. The reason for that is you have lots of systems. We want to be a valuable station on your subway network, we don’t want to be at the end of one of the lines, we want to be one of the handful of hub stations that are about moving trains around, and that is the best way to get your knowledge moving in your organization, it’s the best way to deal with your processes. Therefore, we need to have amazing AI capabilities. We have a massive investment in R&D, we have thousands of people working on AI tooling at the moment, and we have a huge creation bent, which is one of the reasons I think — we’ve talked a bit about the data advantage we have, I think we have a huge design advantage, and I actually think design is one of the hardest parts of building great AI experiences because it’s real fundamental design for the first time. You had a great line, you did a podcast a couple of weeks ago that I’ll put a link to, but you mentioned basically, the customer should not need to understand the difference between deterministic and probabilistic in the context of design, that’s what you’re driving at here. MCB: They should not need to understand that, they should need to understand when outcomes, outputs may be wrong or may be creative. Again, you talk a lot about the fact that hallucination is the other side of creativity, right, you can’t have one without the other. Hallucinations are a miracle. We have computers making stuff up! MCB: Our job is to explain to a customer when that happens, so it’s like this might be something you want to do, and that requires a lot of design. We have a feature in Jira called Work Breakdown which is super popular, where I can take a Jira issue and say, “Make me a bunch of sub-issues, this task has to be broken into a set of steps”. I don’t believe in the magic button theory of AI, that I’ll just hit a button and it’ll do all the things, I believe deeply in the value from AI will come from human-AI collaboration in a loop. It’s me and the AI working back and forth. You talk about yourself and Daman quite a lot , and it’s you, Daman and ChatGPT working together, but it’s not like you ask one thing and it’s done. It’s an interaction, it’s a collaboration back and forth, and that’s going to happen everywhere. In Work Breakdown, what it does is it says, “Hey, based on these types of documents I’ve gone to find from your whole graph in Google Docs and Confluence, whatever, I think this piece breaks down into these, is that correct?”, and it goes, “No, actually, that one doesn’t make any difference, these two are really good, you forgot about this document”, “Cool, let me go do that for you again”, and come back and say, “Is it these?”, “That’s closer”, and then you’re like, “That’s good enough, it’s 90% of what I need”, and then I go add the two that I need myself. That is a huge productivity boost but it’s not magically correct, and it requires a lot of design to tell people, “These are not the answers, these are possible answers, help us refine them and get better at it so that you get the 90% upside and the 10% downside is managed”. Are all these people pursuing these full agents that act on their own, are they just totally misguided? MCB: No, because I think, well, agents will take — there’s a snake oil sales thing going on as there always is in any bubble, and the snake oil sales is not wrong, it’s just chronologically challenged. (laughing) That’s so good. MCB: Well, customers are struggling. When I talk to customers every day, they’re like, “Is everyone else using these things to just magically transform their business with this simple, it took them five minutes and it’s replaced entire armies of people?”, and I’m like, “No, nobody’s doing that”. What they’re actually doing is taking business processes that are really important to their business and saying, “Okay, can I make this step better? This is highly error-prone. It’s compliance in a large organization, how do I make this part of the process better?”, and we’re like, “Oh, we can totally do that”, and they will replace small bits of lots of processes so that in Ship of Theseus style, five years from now, the process will look radically different. Occasionally, they are replacing entire processes, but this is the 1% case, what they’re actually doing is they have whole machines that are running and they’re trying to fix this cog and fix that cog, and that’s super valuable for them. That’s not a downside, that’s really, really valuable. And often, it’s work they didn’t want to do, work that wasn’t getting done, it wasn’t done at a high quality, so we got to remember that, I say this quite a lot, people shouldn’t be afraid of AI taking their job, I fundamentally believe this, they should be afraid of someone who’s really good at AI taking their job. That’s actually what’s going to happen, is someone is going to come along, in a sales sense, they’re really good at using all these AI tools to give better customer outcomes or handle more customers at one time. Is this why you’re hiring so many young people? MCB: Yes, I guess so. Yes, they’re more AI-native, they come out understanding these tools and technologies. I find the biggest irony in universities is all these people who “cheat” their way through every assignment, I use cheat in quote marks, using ChatGPT to handle these assignments, and then they’re worried AI is going to take all these jobs. I’m like, “Wait, you literally took your own job of writing the assignment, but you’ve also trained yourself on how to use these tools to get the outcome required” — now one might argue the university degree should be different, but just like when Google came along and you could look up any fact, knowing facts became far less important than the ability to look it up. I still think AI, it doesn’t create anything, maybe slightly controversial, but I argue it synthesizes information, it’s really good at processing huge amounts of information, giving it back to you, changing its form, bringing it back. Humans are still the only source of fundamental knowledge creation. I point out one of the flaws in the one person billion dollar company argument, and this will happen but it’ll be an anomaly. That company doesn’t get created without that one person, so there’s not AI creating companies magically. It’s like can a company eternally buy back its stock? No, because at some point, someone is going to own the final share? MCB: That’s right and I think this is missed, right? This is where we say it’s about unlocking creativity and what we do for our customers is put Rovo and these amazing data capabilities that we have alongside all the enterprise compliance and data residency, and there’s a massive amount of making this work in the enterprise with trust and probity and security. It’s very difficult. And great design to say, “What do you hire us to do? How do you get these technology and business teams to work together? What workflows do you have in your projects and your service teams, and how can we make those workflows better with more data and make your teams more informed?” That will end up with us having more share of employees in a business that use our stuff every day. Awesome. You made two big acquisitions recently, the DX acquisition , I think, makes a ton of sense to me measuring engineering productivity, particularly in the area of AI. What actual ROI are we getting on this? MCB: And how much money am I spending? Because I’m spending suddenly a lot of money, right? This is not cheap at all, I have huge bills. Internally, we use Rovo Dev , we use Claude Code, we use GitHub Copilot, we use Cursor, we have them available to all. We have a huge R&D — again, I think we’re still number one on the NASDAQ for R&D spending as proportion of revenue. You can take that as a good thing in the AI era or a bad thing, everyone gets to choose their own view on that, but we’ve always been incredibly high on R&D spending since day one. The bills that we pay though are very high, so DX is simply saying, “Okay, cool, how do I measure what I’m getting for that? Should I pay twice as much money because these bills are worthwhile, or is there a lot of it that’s actually just it’s really fun and it’s not actually leading to productivity gains?”. This is going to be a hard problem because there’s a lot of money on the line at the moment that people are paying for these tools, which is not without value, but measuring exactly what the value is is really, really hard, and that team’s done a phenomenal job. And we now have an Atlassian office in Salt Lake City, Utah, where I already spend a lot of time. Totally by coincidence, but it’s really nice. So that purchase, love it, makes a ton of sense. In perfect alignment with you. How does The Browser Company fit in? MCB: A lot of ways. So I have believed for a long time that browsers are broken. We’ve built browsers for an era of software that we don’t live in today. And I don’t, in my browser, have a bunch of tabs that represent webpages, I don’t have that. I have a bunch of tasks, I have a bunch of applications, I have a bunch of documents, and the browser was fundamentally never built to do that. That’s what Arc, first product from The Browser Company — if you don’t use Arc every single day, you should be, it’ll increase your productivity instantly because it’s built for knowledge workers and the way that they have to actually work every day and how they manage all of these tabs and tasks and flows versus serving the New York Times or whatever. That is a browser built for knowledge workers, and there’s a lot more we can do in that era as software changes. Secondly, obviously AI has come along, and we now have chats and applications as a extra part of the browser experience, so I think we can change how enterprises use browsers, security being a big issue. I think AI in the browser is a really important thing, but I suspect it’s not in the basic way of just combining Chrome and ChatGPT, that’s not how it’s going to play out. I suspect it requires a massive amount of design, which The Browser Company is phenomenal at, and it requires changing how people use their day-to-day applications. From our point of view, and I’ve been an Arc fan since day one, [The Browser Company CEO] Josh [Miller] and I have known each other a long time, there’s a knowledge worker angle and there’s obviously a business angle to it in a huge way that our customers are knowledge workers. We can change the way they do their work in a meaningful way of productivity, that is exactly what we have been trying to do in a lot of different ways. The browser itself, being chromium-based, Edge being chromium-based, Chrome being chromium-based, the rendering of webpages is not the problem, it is the fundamental user experience of, “How do I take all of my SaaS applications, my agents, my chats, my tabs, my knowledge, and put it all together in ways that make my day quicker?” — that is what we are trying to do fundamentally at the start. The context that we have is incredibly important for that. And the browser has, if you think about it, my personal memory. We used to call it the browser history. Great, it shows what I’ve seen, it does not have my organizational memory, which we have a great example of in the Teamwork Graph. So if I can put these things together, I can make a much more productive browsing experience for customers fundamentally in that world. I think we have an amazing shot of doing that and of changing how knowledge workers use SaaS. We’re not trying to make a browser, as I’ve said, for my kids, we’re not trying to make a browser for my parents, we’re not trying to make a browser for shopping or for anything else. We’re trying to make a browser for people who spend all day living in Salesforce and Jira and Google Docs and Confluence and Figma and GitHub, and that is their life. The laptop warrior that sits in that experience, I believe we can use AI and design to make that a far better experience and build an amazing product. They’re well on the way to doing that, we can supercharge doing it. You look skeptical. No, I’m looking at the clock, I skipped over a huge section. Your whole shift to the cloud, all those sorts of things. However, there is one thing I wanted to get to: you are wearing an Atlassian Williams Racing hat , I am a big F1 fan, I was very excited about you doing this . How did that come about? How was the first year? Was this another hunch this is going to work out? I mean, Williams is looking like a pretty good bet. MCB: Yes, our world’s largest sports bet. Look, how did it come about? So how do I make a short answer? F1 is changing, I think, in a massive way. I know now being incredibly deep in the business of it, the fundamental change is that hardware is becoming less important and software is becoming more important, this is a trend that we are used to. JV, James Vowles , the Team Principal, was the first person that approached us a long while ago now to help them, and for a teeny, teeny sticker in the corner, to help them get more productive as a team. What people don’t realize about F1 is these are large organizations, right? There’s 1100 people that work for Atlassian Williams Racing. And Williams was really pared down and skinny, he was brought back in with new owners to actually rebuild the entire thing? MCB: Yes, they were in deep trouble. But in rebuilding it, he is a software engineer, software developer by trade, by history kind of thing. He’s a technically-minded person. He downloaded Jira himself in 2004 to install it, so he knows us quite well. So we were brought on for our ability to help them with their teamwork and their collaboration, they really needed a technical upgrade to a whole lot of their systems. Turns out they need us in almost every part of their business because the service workflow’s important. We’re now in the garage, we’re using tons of AI to try to make them better, so there’s a lot of things we can do to build to hopefully help them win, and it’s a mission you can fall in love with. Here is one of the most storied brands in Formula 1 that’s fallen on tough times, every sportsperson loves a recovery story. And I was sold early on the recovery story, I’m like, “Fuck it, let’s go help, let’s make this happen. Let’s get back to being a championship team”. So we fell in love with the mission, and JV is super compelling, he’s got a one-decade goal, and they’re very goal-driven, and we love that, but they needed a lot of help, so that’s what they asked us for help with is initially. The more we looked at it, the more we learned about Formula 1, yes, it’s becoming a software-driven sport. So as an example, Atlassian Williams, I believe have twice as many software developers as the next team on the grid. Because it’s cost-capped, you got to choose, “Do I hire a software developer or an aerodynamicist?” — it’s a very clear cost cap, you’re choosing where to put your resources. As virtualization and everything get better, it’s less, “How well can I draw a curve?” and, “How much can I help 1100 people work together, and how can we build great software”, which really is the core of the car, right? So that then comes to us, tiny sticker, probably a founder-ish moment where I’m like, “How much is the sticker on the top?”, and they didn’t have a sticker on the top and I’m like, well, “What would that get us?” So we ran the numbers on that and the reason is twofold. You talked about our GTM, our go-to-market transformation, we have an ability to build various things. Firstly, branding is obviously massive, top three teams get 10 times the branding as the bottom three teams. So if you’re going to make a sports bet, you pay for a long period of time with the bottom three team, you help make them a top three team, and your sport bet pays out really well just on a sheer TV time and etc — the number of staff, parents, and other things, have said to staff members, “Hey, that company you work for, it’s really great, I saw them on the TV on the weekend”, and the staff member will say, “Dude, I’ve worked there for 12 years, why do you suddenly know about it?”, “Oh, I saw them driving. Carlos [Sainz Jr.] is great”, or something. And he is! So obviously, there’s a huge marketing and branding angle that’s about their position being better. The really interesting part of what we’re doing there is we have customers all around the world, we have customers in 200-odd countries, and we can’t go and visit all of our biggest customers in a meaningful way. We certainly can’t take them to some of our best and most exciting customers, right? There are electric car companies that use our stuff that we’d love to take many customers to a factory, or rockets, or whoever, I can’t take many customers into some of your favorite chip companies and say, “Look how they use our stuff”, I can maybe get one or two customers a year into that customer and show them how they use our things. With Formula 1, what we’re building is a mobile EBC, so an executive briefing center. Formula 1 goes around the world. It goes to Melbourne, it goes to Singapore, it goes to Japan, it goes to England, it goes to various parts of Northern Europe, it goes to various parts of America and you’re like, “Hey, where are our customers?” — roughly distributed like that. It comes to town, we can invite a whole lot of customers into a great experience, we can tell them a lot about Atlassian software, we can also invite them into one of our best customers. They can sit in the garage, and I can tell them how our service collection is helping power the assets, that when that wing’s broken, it gets known here, and they start making a new one back in the factory in Oxford, and this one gets shipped around the world and another one will get moved. And, “Here, I can show you the asset management and the service that goes along with it, I can show you how the garage is getting more efficient because of us, I can show you how we’re helping them win races”. We don’t drive cars, we help them be more productive as a team and I can do that in an environment of it’s an exciting environment. They can drink a great latte or a champagne or whatever they want, and I can explain to them how we are transforming this business in a meaningful way with our tools no matter which way they want to look at it, which is the most powerful customer story that you can go and tell a couple-hundred customers a year in their city. We come to their city, right? I was in Montreal, I took a whole bunch of Canadian customers over the three days, they were like, “This changes my view of Atlassian”, and I’m like, “That’s exactly our goal”, that is at the enterprise end of enterprise sales though, right? But that’s the ironic thing, it’s as far away from where you started as you could be. MCB: Well, they didn’t get there. I met two Canadian banks we had in Montreal as an example, both of whom had been customers for over 20 years, they started spending $800 bucks or maybe $4800 as we moved our pricing to around five grand — now they spend a million, two million dollars a year, and they could be spending ten. We have the ability to give the massive business value across a far larger swath of their business. And I can say, “What do you use from our system of work today? What could you use? Let me show you how Williams uses that piece of the system of work”, which is just a very visceral and exciting customer example to show them how they’re winning. And it helps, again, culturally, super aligned. They’re an awesome group of people trying really hard to win in the most ridiculously competitive sport and the highs are highs, the lows are low. Any sporting fan, you’re well familiar with various different sports that we have in common, but this is technology built by a large business team that has to win a sport. That doesn’t happen anywhere else in the sporting world, I would claim. Giannis [Antetokounmpo] doesn’t make his own shoes and have a team of people making better shoes and a better basketball so he can win, that doesn’t happen in other sports. It’s all about the people on the floor in an NBA game as to who wins, and that’s great, don’t get me wrong, I love basketball. The work in Formula 1 is done by 1000 people back in Oxford. It’s a Constructor Championship . MCB: The constructor championship I do think should be more important, especially given the current exact week we’re in, which is an amazing week for Atlassian Williams Racing, second podium . You talk about that bet, I told JV at the start of the year, I thought that he’s like, “What do you think our five-year future is?”, and I said, “Look, I think, number one, we’ll get one podium this year, 2025; 2026, we’ll win a race; and by 2030, we will have won a championship, that is my OKRs [Objectives and Key Results]”, and he said, “Oh, wow, okay, yeah I think so”. It lines up, I know the team OKRs and other things. And we won two podiums this year, so I was wrong, and I think we have a great chance for 2026, and we are working hard to make the team better and the single-best customer example we have of every piece of software that we sell. Mike, I’d love to talk again. It was great talking to you again. And, hey, good luck. And I’m a Williams fan, so I’ll be cheering for you this weekend. MCB: Oh, yeah. Well, I’m not sure this weekend, but 2026, 2027- Okay. I’m kind of kissing up, I am dying for Max [Verstappen] to win is the honest truth. I need the McLarens to run into each other . But other than that, Williams is my second love. MCB: Do you think McLaren will issue team orders to switch them if Oscar is in second and Lando’s in fourth? Yes. And I don’t know what’s going to happen if that happens, and this will be fascinating. MCB: We will have to see. It’s going to be a huge week. But that’s what makes the sport exciting, right? The whole thing is amazing. Talk to you later. MCB: All right. Thanks, man. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Robin Moffatt 1 months ago

Using Graph Analysis with Neo4j to Spot Astroturfing on Reddit

Reddit is one of the longer-standing platforms on the internet, bringing together folk to discuss, rant, grumble, and troll others on all sorts of topics, from Kafka to data engineering to nerding out over really bright torches to grumbling about the state of the country —and a whole lot more. As a social network it’s a prime candidate for using graph analysis to examine how people interact—and in today’s post, hunt down some sneaky shills ;-) I’ve loaded data for several subs into Neo4j, a graph database. Whilst RDBMS is great for digging into specific users or posts, aggregate queries, and so on, graph excels at complex pattern matching and recursive relationships. It’s a case of best tool for the job; you can do recursive SQL instead of graph, it’s just a lot more complicated. Plus the graphical tools I’ll show below are designed to be used with Neo4j or other property graph databases. In Neo4j the nodes (or vertices ) are user, subreddit, comment, and post. The edges (or relationships ) are how these interact. For example: a user [node] authored [edge] a post [node] a user [node] posted in [edge] a subreddit [node] These relationships can be analysed independently, or combined: Let’s familiarise ourselves with graph visualisations and queries. In RDBMS we use SQL to describe the data that we want to return in a query. Neo4j uses Cypher , which looks a bit like SQL but describes graph relationships. Here’s a query to show the user nodes : Neo4j includes a visualisation tool, which shows the returned nodes: We can add predicates, such as matching on a particular node property ( , in this example): You can also look at the raw data: If we zoom in a bit to the previous query results we’ll see that it’s also showing the edges that have been defined indicating a relationship ( ) between some of the nodes: Let’s build on the above predicate query to find my username ( ) and any users that I’ve interacted with: I’m going to head over to a different tool for visualising the data since the built-in capabilities in the free version of Neo4j are too limited for where we’re going with it. Data Explorer for Neo4j is a really nice tool from yWorks . It connects directly to Neo4j and can either use Cypher queries to pull in data, or directly search nodes. The first reason I like using it is the flexibility it gives for laying out the data. Here is the same set of data as above, but shown in different ways: One of the cool things that graph analysis does for us is visualise patterns that are not obvious through regular relational analysis. One of these is a form of astroturfing. Since the LLMs (GPT, Claude, etc) are trained on data that includes Reddit, it’s not uncommon now to see companies trying to play the game (just like they did with keyword-stuffing with white text on white background for Google in the old days) and 'seed' Reddit with positive content about their product. For example, genuine user A asks " what’s the best tool for embedding this nail into a piece of wood ". Genuine user B suggests " well, a hammer, DUUUHHH " (this is Reddit, after all). The Astroturfer comes along and says " What a great question! I’ve been really happy with ACME Corp’s Screwdriver! If you hold it by the blade you’ll find the handle makes a perfect tool for hitting nails. " Astroturfing also includes "asked and answered" (although not usually from the same account; that would be too obvious): Astroturfer A: "Hey guys! I’m building a house and looking for recommendations for the best value toolkit out there. Thanks!" Astroturfer B: "Gosh, well I really love my ACME Corp’s Toolbelt 2000, it is really good, and I’ve been very happy with it. Such good value too!" One of the cornerstones of Reddit is the account handle—whilst you can choose to identify yourself (as I do - ), you can also stay anonymous and be known to the world as something like . This means that what one might do on LinkedIn (click on the person’s name, figure out their company affiliation) often isn’t an option. This is where graph analysis comes in, because it’s great at both identifying and visualising patterns in behaviour that are not so easy to spot otherwise. Poking around one of the subreddits using betweenness analysis I spotted this set of three users highlighted: The accounts picked up here are key to the particular activity on the sub; but that in itself isn’t suprising. You often get key members of a community who post the bulk of the content. But, digging into these particular accounts I saw this significant pattern. The three users are shown as orange boxes; posts are blue and comments are green: It’s a nice little network of one user posting with another commenting—how helpful! To share the work they each take turns writing new posts and replying to others. Each post generally has one and only one comment, usually from one of the others in the group. You can compare this to a sub in which there is much more organic interaction. is a good example of this: Most users tend to just post replies, some only contribute new posts, and so on. Definitely not the nicely-balanced to-and-fro on the unnamed sub above ;) a user [node] authored [edge] a post [node] a user [node] posted in [edge] a subreddit [node] For example, genuine user A asks " what’s the best tool for embedding this nail into a piece of wood ". Genuine user B suggests " well, a hammer, DUUUHHH " (this is Reddit, after all). The Astroturfer comes along and says " What a great question! I’ve been really happy with ACME Corp’s Screwdriver! If you hold it by the blade you’ll find the handle makes a perfect tool for hitting nails. " Astroturfer A: "Hey guys! I’m building a house and looking for recommendations for the best value toolkit out there. Thanks!" Astroturfer B: "Gosh, well I really love my ACME Corp’s Toolbelt 2000, it is really good, and I’ve been very happy with it. Such good value too!"

0 views
Simon Willison 1 months ago

sqlite-utils 4.0a1 has several (minor) backwards incompatible changes

I released a new alpha version of sqlite-utils last night - the 128th release of that package since I started building it back in 2018. is two things in one package: a Python library for conveniently creating and manipulating SQLite databases and a CLI tool for working with them in the terminal. Almost every feature provided by the package is available via both of those surfaces. This is hopefully the last alpha before a 4.0 stable release. I use semantic versioning for this library, so the 4.0 version number indicates that there are backward incompatible changes that may affect code written against the 3.x line. These changes are mostly very minor: I don't want to break any existing code if I can avoid it. I made it all the way to version 3.38 before I had to ship a major release and I'm sad I couldn't push that even further! Here are the annotated release notes for 4.0a1. This change is for type hint enthusiasts. The Python library used to encourage accessing both SQL tables and SQL views through the syntactic sugar - but tables and view have different interfaces since there's no way to handle a on a SQLite view. If you want clean type hints for your code you can now use the and methods instead. A new feature, not a breaking change. I realized that supporting a stream of lists or tuples as an option for populating large tables would be a neat optimization over always dealing with dictionaries each of which duplicated the column names. I had the idea for this one while walking the dog and built the first prototype by prompting Claude Code for web on my phone. Here's the prompt I used and the prototype report it created , which included a benchmark estimating how much of a performance boost could be had for different sizes of tables. I was horrified to discover a while ago that I'd been creating SQLite columns called FLOAT but the correct type to use was REAL! This change fixes that. Previously the fix was to ask for tables to be created in strict mode. As part of this I also figured out recipes for using as a development environment for the package, which are now baked into the Justfile . This one is best explained in the issue . Another change which I would have made earlier but, since it introduces a minor behavior change to an existing feature, I reserved it for the 4.0 release. Back in 2018 when I started this project I was new to working in-depth with SQLite and incorrectly concluded that the correct way to create tables and columns named after reserved words was like this: That turned out to be a non-standard SQL syntax which the SQLite documentation describes like this : A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility. Unfortunately I baked it into the library early on and it's been polluting the world with weirdly escaped table and column names ever since! I've finally fixed that, with the help of Claude Code which took on the mind-numbing task of updating hundreds of existing tests that asserted against the generated schemas. The above example table schema now looks like this: This may seem like a pretty small change but I expect it to cause a fair amount of downstream pain purely in terms of updating tests that work against tables created by ! I made this change first in LLM and decided to bring it to for consistency between the two tools. One last minor ugliness that I waited for a major version bump to fix. Update : Now that the embargo has lifted I can reveal that a substantial amount of the work on this release was performed using a preview version of Anthropic's new Claude Opus 4.5 model . Here's the Claude Code transcript for the work to implement the ability to use an iterator over lists instead of dictionaries for bulk insert and upsert operations. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Breaking change : The method now only works with tables. To access a SQL view use instead. ( #657 ) The and methods can now accept an iterator of lists or tuples as an alternative to dictionaries. The first item should be a list/tuple of column names. See Inserting data from a list or tuple iterator for details. ( #672 ) Breaking change : The default floating point column type has been changed from to , which is the correct SQLite type for floating point values. This affects auto-detected columns when inserting data. ( #645 ) Now uses in place of for packaging. ( #675 ) Tables in the Python API now do a much better job of remembering the primary key and other schema details from when they were first created. ( #655 ) Breaking change : The and mechanisms no longer skip values that evaluate to . Previously the option was needed, this has been removed. ( #542 ) Breaking change : Tables created by this library now wrap table and column names in in the schema. Previously they would use . ( #677 ) The CLI argument now accepts a path to a Python file in addition to accepting a string full of Python code. It can also now be specified multiple times. ( #659 ) Breaking change: Type detection is now the default behavior for the and CLI commands when importing CSV or TSV data. Previously all columns were treated as unless the flag was passed. Use the new flag to restore the old behavior. The environment variable has been removed. ( #679 )

0 views
Simon Willison 1 months ago

Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model

Hot on the heels of Tuesday's Gemini 3 Pro release, today it's Nano Banana Pro , also known as Gemini 3 Pro Image . I've had a few days of preview access and this is an astonishingly capable image generation model. As is often the case, the most useful low-level details can be found in the API documentation : Designed to tackle the most challenging workflows through advanced reasoning, it excels at complex, multi-turn creation and modification tasks. [...] These 14 images can include the following: Max Woolf published the definitive guide to prompting Nano Banana just a few days ago. I decided to try his example prompts against the new model, requesting results in 4K. Here's what I got for his first test prompt, using Google's AI Studio : Here's what I got: The result came out as a 24.1MB, 5632 × 3072 pixel PNG file. I don't want to serve that on my own blog so here's a Google Drive link for the original . Then I ran his follow-up prompt: I'll note that it did put the plate-sized cookie on a regular plate. Here's the 24.9MB PNG . The new model isn't cheap. Here's the API pricing : it's 24 cents for a 4K image and 13.4 cents for a 1K or 2K image. Image inputs are 0.11 cents (just over 1/10th of a cent) each - an earlier version of their pricing page incorrectly said 6.7 cents each but that's now been fixed. Unlike most of Google's other models it also isn't available for free via AI Studio: you have to configure an API key with billing in order to use the model there. So this thing is great at following instructions. How about rendering text? I tried this prompt, this time using the Gemini consumer app in "thinking" mode (which now uses Nano Banana Pro for image generation). Here's a share link - my prompt was: Infographic explaining how the Datasette open source project works This is a great opportunity to test its ability to run searches (aka "Grounding with Google Search"). Here's what it created based on that 9 word prompt: That's a pretty extraordinary result for such a simple prompt. The text is all spelled correctly and rendered without glitches. The content is solid too - it even included logos for the most popular publish platforms, and a tiny thumbnail of the Datasette UI which is close-enough for an infographic. "Data Ingestion (Read-Only)" is a bit off. Everything else looks great to me. Here's the thinking trace for that image generation process: Initiating Datasette Infographic I'm starting a detailed infographic about the Datasette project. I plan to use Google Search to gather specific information about its core processes, which I will then visualize in a multi-panel diagram. Developing the Datasette workflow I am visualizing the Datasette workflow as described in the search results. I'm focusing on the conversion of data from different formats into a SQLite database, which is then presented as a web application complete with a user interface and JSON output. Visualizing Datasette Workflow I've sketched out the initial structure for the infographic, focusing on a flowchart style. The key components include data input from diverse sources, conversion to SQLite, the Datasette core application, its main features (web UI, JSON API, SQL querying), ecosystem plugins, and finally, deployment options. I aim for clear icons and labels to represent each step. Now I will focus on refining the flow and clarifying the relationships between the parts. Defining Core Datasette Stages I've sketched out the initial infographic structure. I have identified the key components, focusing on visual representation of the data flow and the key benefits and features of the system. I have made updates to the flowchart with clear icons and labels. Now I'm working to expand the data input section and refine the visual representation of each step. Constructing Datasette Infographic I am translating the current Datasette workflow outline into visual elements. The input section will show icons for different data formats. I've sketched the data conversion stage with clear illustrations of the necessary tools. I will add a depiction of the Datasette core application, showcasing its interfaces and functions. The plugin ecosystem and deployment options will follow. Now, I am refining the visual representation. Analyzing Datasette's Architecture I'm currently working on representing the Datasette open-source project's architecture visually. I've broken it down into key components: raw data processing, SQLite conversion, core application functionalities, plugin integration, and deployment methods. I'm focusing on creating a clear, linear representation to show the workflow. Mapping the Datasette Flow I've crafted a comprehensive graphic to illustrate Datasette's operation. It spans the data ingestion, SQLite conversion, main application features, plugin interactions, and deployment stages, aiming for clarity and a logical flow. This note from the announcement post caught my eye: We believe it's critical to know when an image is AI-generated. This is why all media generated by Google's tools are embedded with our imperceptible SynthID digital watermark. Today, we are putting a powerful verification tool directly in consumers' hands: you can now upload an image into the Gemini app and simply ask if it was generated by Google AI, thanks to SynthID technology. We are starting with images, but will expand to audio and video soon. Last night I used Nano Banana Pro to generate a fake photograph of raccoons stealing our food delivery, then scrubbed out the little diamond icon using the Apple Photos "cleanup" tool. I uploaded that Gemini app and asked "Was this image created with AI?": It replied: Yes, it appears that all or part of this image was created with Google Al. SynthID detected a watermark in 25-50% of the image. Presumably that 25-50% figure is because the rest of the photo was taken by me - it was just the raccoons that were added by Nano Banana Pro. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . High-resolution output : Built-in generation capabilities for 1K, 2K, and 4K visuals. Advanced text rendering : Capable of generating legible, stylized text for infographics, menus, diagrams, and marketing assets. Grounding with Google Search : The model can use Google Search as a tool to verify facts and generate imagery based on real-time data (e.g., current weather maps, stock charts, recent events). Thinking mode : The model utilizes a "thinking" process to reason through complex prompts. It generates interim "thought images" (visible in the backend but not charged) to refine the composition before producing the final high-quality output. Up to 14 reference images : You can now mix up to 14 reference images to produce the final image. Up to 6 images of objects with high-fidelity to include in the final image Up to 5 images of humans to maintain character consistency

0 views
Robin Moffatt 1 months ago

Stumbling into AI: Part 6—I've been thinking about Agents and MCP all wrong

Ever tried to hammer a nail in with a potato? Nor me, but that’s what I’ve felt like I’ve been attempting to do when trying to really understand agents, as well as to come up with an example agent to build. As I wrote about previously , citing Simon Willison, an LLM agent runs tools in a loop to achieve a goal . Unlike building ETL/ELT pipelines, these were some new concepts that I was struggling to fit to an even semi-plausible real world example. That’s because I was thinking about it all wrong. For the last cough 20 cough years I’ve built data processing pipelines, either for real or as examples based on my previous experience. It’s the same pattern, always: Data comes in Data gets processed Data goes out Maybe we fiddle around with the order of things (ELT vs ETL), maybe a particular example focusses more on one particular point in the pipeline—but all the concepts remain pleasingly familiar. All I need to do is figure out what goes in the boxes: I’ve even extended this to be able to wing my way through talking about applications and microservices (kind of). We get some input, we make something else happen. Somewhat stretching beyond my experience, admittedly, but it’s still the same principles. When this thing happens, make a computer do that thing. Perhaps I’m too literal, perhaps I’m cynical after too many years of vendor hype, or perhaps it’s just how my brain is wired—but I like concrete, tangible, real examples of something. So when it comes to agents, particularly with where we’re at in the current hype-cycle, I really wanted to have some actual examples on which to build my understanding. In addition, I wanted to build some of my own. But where to start? Here was my mental model; literally what I sketched out on a piece of paper as I tried to think about what real-world example could go in each box to make something plausible: But this is where I got stuck, and spun my proverbial wheels on for several days. Every example I could think of ended up with me uttering, exasperated… but why would you do it like that . My first mistake was focussing on the LLM bit as needing to do something to the input data . I had a whole bunch of interesting data sources (like river levels , for example) but my head blocked on " but that’s numbers, what can you get an LLM to do with those?! ". The LLM bit of an agent, I mistakenly thought, demanded unstructured input data for it to make any sense. After all, if it’s structured, why aren’t we just processing it with a regular process—no need for magic fairy dust here. This may also have been an over-fitting of an assumption based on my previous work with an LLM to summarise human-input data in a conference keynote . The tool bit baffled me just as much. With hindsight, the exact problem turned out to be the solution . Let me explain… Whilst there are other options, in many cases an agent calling a tool is going to do so using MCP. Thus, grabbing the dog firmly by the tail and proceeding to wag it, I went looking for MCP servers. Looking down a list of hosted MCP servers that I found, I saw that there was only about a half-dozen that were open, including GlobalPing , AlphaVantage , and CoinGecko . Flummoxed, I cast around for an actual use of one of these, with an unstructured data source. Oh jeez…are we really going to do the ' read a stream of tweets and look up the stock price/crypto-token ' thing again? The mistake I made was this: I’d focussed on the LLM bit of the agent definition: an LLM agent runs tools in a loop to achieve a goal Actually, what an agent is about is this: […] runs tools The LLM bit can do fancy LLM stuff—but it’s also there to just invoke the tool(s) and decide when they’ve done what they need to do . A tool is quite often just a wrapper on an API. So what we’re saying is, with MCP, we have a common interface to APIs. That’s…all. We can define agents to interact with systems, and the way they interact is through a common protocol: MCP. When we load a web page, we don’t concern ourselves with what Chrome is doing, and unless we stop and think about it we don’t think about the TCP and HTTP protocols being used. It’s just the common way of things talking to each other. And that’s the idea with MCP, and thus tool calling from agents. (Yes, there are other ways you can call tools from agents, but MCP is the big one, at the moment). Given this reframing, it makes sense why there are so few open MCP servers. If an MCP server is there to offer access to an API, who leaves their API open for anyone to use? Well, read-only data provided like CoinGecko and AlphaVantage, perhaps. In general though, the really useful thing we can do with tools is change the state of systems . That’s why any SaaS platform worth its salt is rushing to provide an MCP server. Not to jump on the AI bandwagon per se, but because if this is going to be the common protocol by which things get to be automated with agents, you don’t want to be there offering Betamax when everyone else has VHS. SaaS platforms will still provide their APIs for direct integration, but they will also provide MCP servers. There’s also no reason why applications developed within an organisation wouldn’t offer MCP either, in theory. No, not really. It actually makes a bunch of sense to me. I personally also like it a lot from a SQL-first, not-really-a-real-coder point of view. Let me explain. If you want to build a system to respond to something that’s happened by interacting with another external system, you have two choices now: Write custom code to call the external system’s API. Handle failures, retries, monitoring, etc. If you want to interact with a different system, you now need to understand the different API, work out calling it, write new code to do so. Write an agent that responds to the thing that happened, and have it call the tool. The agent framework now standardises handling failures, retries, and all the rest of it. If you want to call a different system, the agent stays pretty much the same. The only thing that you change is the MCP server and tool that you call. You could write custom code—and there are good examples of where you’ll continue to. But you no longer have to . For Kafka folk, my analogy here would be data integration with Kafka Connect. Kafka Connect provides the framework that handles all of the sticky and messy things about data integration (scale, error handling, types, connectivity, restarts, monitoring, schemas, etc etc etc). You just use the appropriate connector with it and configure it. Different system? Just swap out the connector. You want to re-invent the wheel and re-solve a solved-problem? Go ahead; maybe you’re special. Or maybe NIH is real ;P So…what does an actual agent look like now, given this different way of looking at it? How about this: Sure, the LLM could do a bunch of clever stuff with the input. But it can also just take our natural language expression of what we want to happen, and make it so. Agents can use multiple tools, from multiple MCP servers. Confluent launched Streaming Agents earlier this year. They’re part of the fully-managed Confluent Cloud platform and provide a way to run agents like I’ve described above, driven by events in a Kafka topic. Here’s what the above agent would look like as a Streaming Agent: Is this over-engineered? Do you even need an agent? Why not just do this? You can. Maybe you should. But…don’t forget failure conditions. And restarts. And testing. And scaling. All these things are taken care of for you by Flink. Although having the runtime considerations taken care of for you is nice, let’s not forget another failure vector which LLMs add into the mix: talking shite hallucinations. Compared to a lump of Python code which either works or doesn’t, LLMs keep us on our toes by sometimes confidently doing the wrong thing. However, how do we know it’s wrong? Our Python program might crash, or throw a nicely-handled error, but left to its own devices an AI Agent will happily report that everything worked even if it actually made up a parameter for a tool call that doesn’t exist. There are mitigating steps we can take, but it’s important to recognise the trade-offs between the approaches. Permit me to indulge this line of steel-manning, because I think I might even have a valid argument here. Let’s say we’ve built the above simplistic agent that sends a Slack when a data point is received. Now we want to enhance it to also include information about the weather forecast. An agent would conceptually be something like this: Our streaming agent above changes to just amending the prompt and adding a new tool (just DDL statements, defining the MCP server and its tools): Whilst the bespoke application might have a seemingly-innocuous small addition: But consider what this looks like in practice. Figuring out the API, new lines of code to handle calling it, failures, and so on. Oh, whilst you’re at it; don’t introduce any bugs into the bespoke code. And remember to document the change. Not insurmountable, and probably a good challenge if you like that kind of thing. But is it as straightforward as literally changing the prompt in an agent to use an additional tool, and let it figure the rest out (courtesy of MCP)? Let’s not gloss over the reality too much here though; whilst adding a new tool call into the agent is definitely easier and less prone to introducing code errors, LLMs are by their nature non-deterministic—meaning that we still need to take care with the prompt and the tool invocation to make sure that the agent is still doing what it’s designed to do. You wouldn’t be wrong to argue that at least the non-Agent route (of coding API invocations directly into your application) can actually be tested and proved to work. There are different types of AI Agent—the one I’ve described is a tools-based one. As I mentioned above, its job is to run tools . The LLM provides the natural language interface with which to invoke the tools. It can also , optionally , do additional bits of magic: Process [unstructured] input, such as summarising or extracting key values from it Decide which tool(s) need calling in order to achieve its aim But at the heart of it, it’s about the tool that gets called. That’s where I was going wrong with this. That’s the bit I needed to think differently about :) Data comes in Data gets processed Data goes out Write custom code to call the external system’s API. Handle failures, retries, monitoring, etc. If you want to interact with a different system, you now need to understand the different API, work out calling it, write new code to do so. Write an agent that responds to the thing that happened, and have it call the tool. The agent framework now standardises handling failures, retries, and all the rest of it. If you want to call a different system, the agent stays pretty much the same. The only thing that you change is the MCP server and tool that you call. Process [unstructured] input, such as summarising or extracting key values from it Decide which tool(s) need calling in order to achieve its aim

0 views
Jim Nielsen 1 months ago

Data Storage As Files on Disk Paired With an LLM

I recently added a bunch of app icons from macOS Tahoe to my collection . Afterwards, I realized some of them were missing relational metadata. For example, I have a collection of iMove icons through the years which are related in my collection by their App Store ID. However, the latest iMovie icon I added didn’t have this ID. This got me thinking, "Crap, I really want this metadata so I can see apps over time . Am I gonna have to go back through each icon I just posted and find their associated App Store ID?” Then I thought: “Hey, I bet AI could figure this out — right? It should be able to read through my collection of icons (which are stored as JSON files on disk), look for icons with the same name and developer, and see where I'm missing and .” So I formulated a prompt (in hindsight, a really poor one lol): look through all the files in and find any that start with and then find me any icons like iMovie that have a correlation to other icons in where it's missing and But AI did pretty good with that. I’ll save you the entire output, but Cursor thought for a bit, then asked to run this command: I was like, “Ok. I couldn’t write that myself, but that looks about right. Go ahead.” It ran the command, thought some more, then asked to run another command. Then another. It seemed unsatisfied with the results, so it changed course and wrote a node script and asked permission to run that. I looked at it and said, “Hey that’s probably how I would’ve approached this.” So I gave permission. It ran the script, thought a little, then rewrote it and asked permission to run again. Here’s the final version it ran: And with that, boom! It found a few newly-added icons with corollaries in my archive, pointed them out, then asked if I wanted to add the missing metadata. The beautiful part was I said “go ahead” and when it finished, I could see and review the staged changes in git. This let me double check the LLM’s findings with my existing collection to verify everything looked right — just to make sure there were no hallucinations. Turns out, storing all my icon data as JSON files on disk (rather than a database) wasn’t such a bad idea. Part of the reason I’ve never switched from static JSON files on disk to a database is because I always figured it would be easier for future me to find and work with files on disk (as opposed to learning how to setup, maintain, and query a database). Turns out that wasn’t such a bad bet. I’m sure AI could’ve helped me write some SQL queries to do all the stuff I did here. But what I did instead already fit within a workflow I understand: files on disk, modified with scripting, reviewed with git, checked in, and pushed to prod. So hey, storing data as JSON files in git doesn’t look like such a bad idea now, does it future Jim? Reply via: Email · Mastodon · Bluesky

0 views
devansh 2 months ago

Hitchhiker's Guide to Attack Surface Management

I first heard about the word "ASM" (i.e., Attack Surface Management) probably in late 2018, and I thought it must be some complex infrastructure for tracking assets of an organization. Looking back, I realize I almost had a similar stack for discovering, tracking, and detecting obscure assets of organizations, and I was using it for my bug hunting adventures. I feel my stack was kinda goated, as I was able to find obscure assets of Apple, Facebook, Shopify, Twitter, and many other Fortune 100 companies, and reported hundreds of bugs, all through automation. Back in the day, projects like ProjectDiscovery were not present, so if I had to write an effective port scanner, I had to do it from scratch. (Masscan and nmap were present, but I had my fair share of issues using them, this is a story for another time). I used to write DNS resolvers (massdns had a high error rate), port scanners, web scrapers, directory brute-force utilities, wordlists, lots of JavaScript parsing logic using regex, and a hell of a lot of other things. I used to have up to 50+ self-developed tools for bug-bounty recon stuff and another 60-something helper scripts written in bash. I used to orchestrate (gluing together with duct tape is a better word) and slap together scripts like a workflow, and save the output in text files. Whenever I dealt with a large number of domains, I used to distribute the load over multiple servers (server spin-up + SSH into it + SCP for pushing and pulling files from it). The setup was very fragile and error-prone, and I spent countless nights trying to debug errors in the workflows. But it was all worth it. I learned the art of Attack Surface Management without even trying to learn about it. I was just a teenager trying to make quick bucks through bug hunting, and this fragile, duct-taped system was my edge. Fast forward to today, I have now spent almost a decade in the bug bounty scene. I joined HackerOne in 2020 (to present) as a vulnerability triager, where I have triaged and reviewed tens of thousands of vulnerability submissions. Fair to say, I have seen a lot of things, from doomsday level 0-days, to reports related to leaked credentials which could have led to entire infrastructure compromise, just because some dev pushed an AWS secret key in git logs, to things where some organizations were not even aware they were running Jenkins servers on some obscure subdomain which could have allowed RCE and then lateral movement to other layers of infrastructure. A lot of these issues I have seen were totally avoidable, only if organizations followed some basic attack surface management techniques. If I search "Guide to ASM" on Internet, almost none of the supposed guides are real resources. They funnel you to their own ASM solution, and the guide is just present there to provide you with some surface-level information, and is mostly a marketing gimmick. This is precisely why I decided to write something where I try to cover everything I learned and know about ASM, and how to protect your organization's assets before bad actors could get to them. This is going to be a rough and raw guide, and will not lead you to a funnel where I am trying to sell my own ASM SaaS to you. I have nothing to sell, other than offering what I know. But in case you are an organization who needs help implementing the things I am mentioning below, you can reach out to me via X or email (both available on the homepage of this blog). This guide will provide you with insights into exactly how big your attack surface really is. CISOs can look at it and see if their organizations have all of these covered, security researchers and bug hunters can look at this and maybe find new ideas related to where to look during recon. Devs can look at it and see if they are unintentionally leaving any door open for hackers. If you are into security, it has something to offer you. Attack surface is one of those terms getting thrown around in security circles so much that it's become almost meaningless noise. In theory, it sounds simple enough, right. Attack surface is every single potential entry point, interaction vector, or exploitable interface an attacker could use to compromise your systems, steal your data, or generally wreck your day. But here's the thing, it's the sum total of everything you've exposed to the internet. Every API endpoint you forgot about, every subdomain some dev spun up for "testing purposes" five years ago and then abandoned, every IoT device plugged into your network, every employee laptop connecting from a coffee shop, every third-party vendor with a backdoor into your environment, every cloud storage bucket with permissions that make no sense, every Slack channel, every git commit leaking credentials, every paste on Pastebin containing your database passwords. Most organizations think about attack surface in incredibly narrow terms. They think if they have a website, an email server, and maybe some VPN endpoints, they've got "good visibility" into their assets. That's just plain wrong. Straight up wrong. Your actual attack surface would terrify you if you actually understood it. You run , and is your main domain. You probably know about , , maybe . But what about that your intern from 2015 spun up and just never bothered to delete. It's not documented anywhere. Nobody remembers it exists. Domain attack surface goes way beyond what's sitting in your asset management system. Every subdomain is a potential entry point. Most of these subdomains are completely forgotten. Subdomain enumeration is reconnaissance 101 for attackers and bug hunters. It's not rocket science. Setting up a tool that actively monitors through active and passive sources for new subdomains and generates alerts is honestly an hour's worth of work. You can use tools like Subfinder, Amass, or just mine Certificate Transparency logs to discover every single subdomain connected to your domain. Certificate Transparency logs were designed to increase security by making certificate issuance public, and they've become an absolute reconnaissance goldmine. Every time you get an SSL certificate for , that information is sitting in public logs for anyone to find. Attackers systematically enumerate these subdomains using Certificate Transparency log searches, DNS brute-forcing with massive wordlists, reverse DNS lookups to map IP ranges back to domains, historical DNS data from services like SecurityTrails, and zone transfer exploitation if your DNS is misconfigured. Attackers are looking for old development environments still running vulnerable software, staging servers with production data sitting on them, forgotten admin panels, API endpoints without authentication, internal tools accidentally exposed, and test environments with default credentials nobody changed. Every subdomain is an asset. Every asset is a potential vulnerability. Every vulnerability is an entry point. Domains and subdomains are just the starting point though. Once you've figured out all the subdomains belonging to your organization, the next step is to take a hard look at IP address space, which is another absolutely massive component of your attack surface. Organizations own, sometimes lease, IP ranges, sometimes small /24 blocks, sometimes massive /16 ranges, and every single IP address in those blocks and ranges that responds to external traffic is part of your attack surface. And attackers enumerate them all if you won't. They use WHOIS lookups to identify your IP ranges, port scanning to find what services are running where, service fingerprinting to identify exact software versions, and banner grabbing to extract configuration information. If you have a /24 network with 256 IP addresses and even 10% of those IPs are running services, you've got 25 potential attack vectors. Scale that to a /20 or /16 and you're looking at thousands of potential entry points. And attackers aren't just looking at the IPs you know about. They're looking at adjacent IP ranges you might have acquired through mergers, historical IP allocations that haven't been properly decommissioned, and shared IP ranges where your servers coexist with others. Traditional infrastructure was complicated enough, and now we have cloud. It's literally exploded organizations' attack surfaces in ways that are genuinely difficult to even comprehend. Every cloud service you spin up, be it an EC2 instance, S3 bucket, Lambda function, or API Gateway endpoint, all of this is a new attack vector. In my opinion and experience so far, I think the main issue with cloud infrastructure is that it's ephemeral and distributed. Resources get spun up and torn down constantly. Developers create instances for testing and forget about them. Auto-scaling groups generate new resources dynamically. Containerized workloads spin up massive Kubernetes clusters you have minimal visibility into. Your cloud attack surface could be literally anything. Examples are countless, but I'd categorize them into 8 different categories. Compute instances like EC2, Azure VMs, GCP Compute Engine instances exposed to the internet. Storage buckets like S3, Azure Blob Storage, GCP Cloud Storage with misconfigured permissions. Serverless stuff like Lambda functions with public URLs or overly permissive IAM roles. API endpoints like API Gateway, Azure API Management endpoints without proper authentication. Container registries like Docker images with embedded secrets or vulnerabilities. Kubernetes clusters with exposed API servers, misconfigured network policies, vulnerable ingress controllers. Managed databases like RDS, CosmosDB, Cloud SQL instances with weak access controls. IAM roles and service accounts with overly permissive identities that enable privilege escalation. I've seen instances in the past where a single misconfigured S3 bucket policy exposed terabytes of data. An overly permissive Lambda IAM role enabled lateral movement across an entire AWS account. A publicly accessible Kubernetes API server gave an attacker full cluster control. Honestly, cloud kinda scares me as well. And to top it off, multi-cloud infrastructure makes everything worse. If you're running AWS, Azure, and GCP together, you've just tripled your attack surface management complexity. Each cloud provider has different security models, different configuration profiles, and different attack vectors. Every application now uses APIs, and all applications nowadays are like a constellation of APIs talking to each other. Every API you use in your organization is your attack surface. The problem with APIs is that they're often deployed without the same security scrutiny as traditional web applications. Developers spin up API endpoints for specific features and those endpoints accumulate over time. Some of them are shadow APIs, meaning API endpoints which aren't documented anywhere. These endpoints are the equivalent of forgotten subdomains, and attackers can find them through analyzing JavaScript files for API endpoint references, fuzzing common API path patterns, examining mobile app traffic to discover backend APIs, and mining old documentation or code repositories for deprecated endpoints. Your API attack surface includes REST APIs exposed to the internet, GraphQL endpoints with overly broad query capabilities, WebSocket connections for real-time functionality, gRPC services for inter-service communication, and legacy SOAP APIs that never got decommissioned. If your organization has mobile apps, be it iOS, Android, or both, this is a direct window to your infrastructure and should be part of your attack surface management strategy. Mobile apps communicate with backend APIs and those API endpoints are discoverable by reversing the app. The reversed source of the app could reveal hard-coded API keys, tokens, and credentials. Using JADX plus APKTool plus Dex2jar is all a motivated attacker needs. Web servers often expose directories and files that weren't meant to be publicly accessible. Attackers systematically enumerate these using automated tools like ffuf, dirbuster, gobuster, and wfuzz with massive wordlists to discover hidden endpoints, configuration files, backup files, and administrative interfaces. Common exposed directories include admin panels, backup directories containing database dumps or source code, configuration files with database credentials and API keys, development directories with debug information, documentation directories revealing internal systems, upload directories for file storage, and old or forgotten directories from previous deployments. Your attack surface must include directories which are accidentally left accessible during deployments, staging servers with production data, backup directories with old source code versions, administrative interfaces without authentication, API documentation exposing endpoint details, and test directories with debug output enabled. Even if you've removed a directory from production, old cached versions may still be accessible through web caches or CDNs. Search engines also index these directories, making them discoverable through dorking techniques. If your organization is using IoT devices, and everyone uses these days, this should be part of your attack surface management strategy. They're invisible to traditional security tools. Your EDR solution doesn't protect IoT devices. Your vulnerability scanner can't inventory them. Your patch management system can't update them. Your IoT attack surface could include smart building systems like HVAC, lighting, access control. Security cameras and surveillance systems. Printers and copiers, which are computers with network access. Badge readers and physical access systems. Industrial control systems and SCADA devices. Medical devices in healthcare environments. Employee wearables and fitness trackers. Voice assistants and smart speakers. The problem with IoT devices is that they're often deployed without any security consideration. They have default credentials that never get changed, unpatched firmware with known vulnerabilities, no encryption for data in transit, weak authentication mechanisms, and insecure network configurations. Social media presence is an attack surface component that most organizations completely ignore. Attackers can use social media for reconnaissance by looking at employee profiles on LinkedIn to reveal organizational structure, technologies in use, and current projects. Twitter/X accounts can leak information about deployments, outages, and technology stack. Employee GitHub profiles expose email patterns and development practices. Company blogs can announce new features before security review. It could also be a direct attack vector. Attackers can use information from social media to craft convincing phishing attacks. Hijacked social media accounts can be used to spread malware or phishing links. Employees can accidentally share sensitive information. Fake accounts can impersonate your brand to defraud customers. Your employees' social media presence is part of your attack surface whether you like it or not. Third-party vendors, suppliers, contractors, or partners with access to your systems should be part of your attack surface. Supply chain attacks are becoming more and more common these days. Attackers can compromise a vendor with weaker security and then use that vendor's access to reach your environment. From there, they pivot from the vendor network to your systems. This isn't a hypothetical scenario, it has happened multiple times in the past. You might have heard about the SolarWinds attack, where attackers compromised SolarWinds' build system and distributed malware through software updates to thousands of customers. Another famous case study is the MOVEit vulnerability in MOVEit Transfer software, exploited by the Cl0p ransomware group, which affected over 2,700 organizations. These are examples of some high-profile supply chain security attacks. Your third-party attack surface could include things like VPNs, remote desktop connections, privileged access systems, third-party services with API keys to your systems, login credentials shared with vendors, SaaS applications storing your data, and external IT support with administrative access. It's obvious you can't directly control third-party security. You can audit them, have them pen-test their assets as part of your vendor compliance plan, and include security requirements in contracts, but ultimately their security posture is outside your control. And attackers know this. GitHub, GitLab, Bitbucket, they all are a massive attack surface. Attackers search through code repositories in hopes of finding hard-coded credentials like API keys, database passwords, and tokens. Private keys, SSH keys, TLS certificates, and encryption keys. Internal architecture documentation revealing infrastructure details in code comments. Configuration files with database connection strings and internal URLs. Deprecated code with vulnerabilities that's still in production. Even private repositories aren't safe. Attackers can compromise developer accounts to access private repositories, former employees retain access after leaving, and overly broad repository permissions grant access to too many people. Automated scanners continuously monitor public repositories for secrets. The moment a developer accidentally pushes credentials to a public repository, automated systems detect it within minutes. Attackers have already extracted and weaponized those credentials before the developer realizes the mistake. CI/CD pipelines are massive another attack vector. Especially in recent times, and not many organizations are giving attention to this attack vector. This should totally be part of your attack surface management. Attackers compromise GitHub Actions workflows with malicious code injection, Jenkins servers with weak authentication, GitLab CI/CD variables containing secrets, and build artifacts with embedded malware. The GitHub Actions supply chain attack, CVE-2025-30066, demonstrated this perfectly. Attackers compromised the Action used in over 23,000 repositories, injecting malicious code that leaked secrets from build logs. Jenkins specifically is a goldmine for attackers. An exposed Jenkins instance provides complete control over multiple critical servers, access to hardcoded AWS keys, Redis credentials, and BitBucket tokens, ability to manipulate builds and inject malicious code, and exfiltration of production database credentials containing PII. Modern collaboration tools are massive attack surface components that most organizations underestimate. Slack has hidden security risks despite being invite-only. Slack attack surface could include indefinite data retention where every message, channel, and file is stored forever unless admins configure retention periods. Public channels accessible to all users so one breached account opens the floodgates. Third-party integrations with excessive permissions accessing messages and user data. Former contractor access where individuals retain access long after projects end. Phishing and impersonation where it's easy to change names and pictures to impersonate senior personnel. In 2022, Slack leaked hashed passwords for five years affecting 0.5% of users. Slack channels commonly contain API keys, authentication tokens, database credentials, customer PII, financial data, internal system passwords, and confidential project information. The average cost of a breached record was $164 in 2022. When 1 in 166 messages in Slack contains confidential information, every new message adds another dollar to total risk exposure. With 5,000 employees sending 30 million Slack messages per year, that's substantial exposure. Trello board exposure is a significant attack surface. Trello attack vectors include public boards with sensitive information accidentally shared publicly, default public visibility where boards are created as public by default in some configurations, unsecured REST API allowing unauthenticated access to user data, and scraping attacks where attackers use email lists to enumerate Trello accounts. The 2024 Trello data breach exposed 15 million users' personal information when a threat actor named "emo" exploited an unsecured REST API using 500 million email addresses to compile detailed user profiles. Security researcher David Shear documented hundreds of public Trello boards exposing passwords, credentials, IT support customer access details, website admin logins, and client server management credentials. IT companies were using Trello to troubleshoot client requests and manage infrastructure, storing all credentials on public Trello boards. Jira misconfiguration is a widespread attack surface issue. Common misconfigurations include public dashboards and filters with "Everyone" access actually meaning public internet access, anonymous access enabled allowing unauthenticated users to browse, user picker functionality providing complete lists of usernames and email addresses, and project visibility allowing sensitive projects to be accessible without authentication. Confluence misconfiguration exposes internal documentation. Confluence attack surface components include anonymous access at site level allowing public access, public spaces where space admins grant anonymous permissions, inherited permissions where all content within a space inherits space-level access, and user profile visibility allowing anonymous users to view profiles of logged-in users. When anonymous access is enabled globally and space admins allow anonymous users to access their spaces, anyone on the internet can access that content. Confluence spaces often contain internal documentation with hardcoded credentials, financial information, project details, employee information, and API documentation with authentication details. Cloud storage misconfiguration is epidemic. Google Drive misconfiguration attack surface includes "Anyone with the link" sharing making files accessible without authentication, overly permissive sharing defaults making it easy to accidentally share publicly, inherited folder permissions exposing everything beneath, unmanaged third-party apps with excessive read/write/delete permissions, inactive user accounts where former employees retain access, and external ownership blind spots where externally-owned content is shared into the environment. Metomic's 2023 Google Scanner Report found that of 6.5 million Google Drive files analyzed, 40.2% contained sensitive information, 34.2% were shared externally, and 0.5% were publicly accessible, mostly unintentionally. In December 2023, Japanese game developer Ateam suffered a catastrophic Google Drive misconfiguration that exposed personal data of nearly 1 million people for over six years due to "Anyone with the link" settings. Based on Valence research, 22% of external data shares utilize open links, and 94% of these open link shares are inactive, forgotten files with public URLs floating around the internet. Dropbox, OneDrive, and Box share similar attack surface components including misconfigured sharing permissions, weak or missing password protection, overly broad access grants, third-party app integrations with excessive permissions, and lack of visibility into external sharing. Features that make file sharing convenient create data leakage risks when misconfigured. Pastebin and similar paste sites are both reconnaissance sources and attack vectors. Paste site attack surface includes public data dumps of stolen credentials, API keys, and database dumps posted publicly, malware hosting of obfuscated payloads, C2 communications where malware uses Pastebin for command and control, credential leakage from developers accidentally posting secrets, and bypassing security filters since Pastebin is legitimate so security tools don't block it. For organizations, leaked API keys or database credentials on Pastebin lead to unauthorized access, data exfiltration, and service disruption. Attackers continuously scan Pastebin for mentions of target organizations using automated tools. Security teams must actively monitor Pastebin and similar paste sites for company name mentions, email domain references, and specific keywords related to the organization. Because paste sites don't require registration or authentication and content is rarely removed, they've become permanent archives of leaked secrets. Container registries expose significant attack surface. Container registry attack surface includes secrets embedded in image layers where 30,000 unique secrets were found in 19,000 images, with 10% of scanned Docker images containing secrets, and 1,200 secrets, 4%, being active and valid. Immutable cached layers contain 85% of embedded secrets that can't be removed, exposed registries with 117 Docker registries accessible without authentication, unsecured registries allowing pull, push, and delete operations, and source code exposure where full application code is accessible by pulling images. GitGuardian's analysis of 200,000 publicly available Docker images revealed a staggering secret exposure problem. Even more alarming, 99% of images containing active secrets were pulled in 2024, demonstrating real-world exploitation. Unit 42's research identified 941 Docker registries exposed to the internet, with 117 accessible without authentication containing 2,956 repositories, 15,887 tags, and full source code and historical versions. Out of 117 unsecured registries, 80 allow pull operations to download images, 92 allow push operations to upload malicious images, and 7 allow delete operations for ransomware potential. Sysdig's analysis of over 250,000 Linux images on Docker Hub found 1,652 malicious images including cryptominers, most common, embedded secrets, second most prevalent, SSH keys and public keys for backdoor implants, API keys and authentication tokens, and database credentials. The secrets found in container images included AWS access keys, database passwords, SSH private keys, API tokens for cloud services, GitHub personal access tokens, and TLS certificates. Shadow IT includes unapproved SaaS applications like Dropbox, Google Drive, and personal cloud storage used for work. Personal devices like BYOD laptops, tablets, and smartphones accessing corporate data. Rogue cloud deployments where developers spin up AWS instances without approval. Unauthorized messaging apps like WhatsApp, Telegram, and Signal used for business communication. Unapproved IoT devices like smart speakers, wireless cameras, and fitness trackers on the corporate network. Gartner estimates that shadow IT makes up 30-40% of IT spending in large companies, and 76% of organizations surveyed experienced cyberattacks due to exploitation of unknown, unmanaged, or poorly managed assets. Shadow IT expands your attack surface because it's not protected by your security controls, it's not monitored by your security team, it's not included in your vulnerability scans, it's not patched by your IT department, and it often has weak or default credentials. And you can't secure what you don't know exists. Bring Your Own Device, BYOD, policies sound great for employee flexibility and cost savings. For security teams, they're a nightmare. BYOD expands your attack surface by introducing unmanaged endpoints like personal devices without EDR, antivirus, or encryption. Mixing personal and business use where work data is stored alongside personal apps with unknown security. Connecting from untrusted networks like public Wi-Fi and home networks with compromised routers. Installing unapproved applications with malware or excessive permissions. Lacking consistent security updates with devices running outdated operating systems. Common BYOD security issues include data leakage through personal cloud backup services, malware infections from personal app downloads, lost or stolen devices containing corporate data, family members using devices that access work systems, and lack of IT visibility and control. The 60% of small and mid-sized businesses that close within six months of a major cyberattack often have BYOD-related security gaps as contributing factors. Remote access infrastructure like VPNs and Remote Desktop Protocol, RDP, are among the most exploited attack vectors. SSL VPN appliances from vendors like Fortinet, SonicWall, Check Point, and Palo Alto are under constant attack. VPN attack vectors include authentication bypass vulnerabilities with CVEs allowing attackers to hijack active sessions, credential stuffing through brute-forcing VPN logins with leaked credentials, exploitation of unpatched vulnerabilities with critical CVEs in VPN appliances, and configuration weaknesses like default credentials, weak passwords, and lack of MFA. Real-world attacks demonstrate the risk. Check Point SSL VPN CVE-2024-24919 allowed authentication bypass for session hijacking. Fortinet SSL-VPN vulnerabilities were leveraged for lateral movement and privilege escalation. SonicWall CVE-2024-53704 allowed remote authentication bypass for SSL VPN. Once inside via VPN, attackers conduct network reconnaissance, lateral movement, and privilege escalation. RDP is worse. Sophos found that cybercriminals abused RDP in 90% of attacks they investigated. External remote services like RDP were the initial access vector in 65% of incident response cases. RDP attack vectors include exposed RDP ports with port 3389 open to the internet, weak authentication with simple passwords vulnerable to brute force, lack of MFA with no second factor for authentication, and credential reuse from compromised passwords in data breaches. In one Darktrace case, attackers compromised an organization four times in six months, each time through exposed RDP ports. The attack chain went successful RDP login, internal reconnaissance via WMI, lateral movement via PsExec, and objective achievement. The Palo Alto Unit 42 Incident Response report found RDP was the initial attack vector in 50% of ransomware deployment cases. Email infrastructure remains a primary attack vector. Your email attack surface includes mail servers like Exchange, Office 365, and Gmail with configuration weaknesses, email authentication with misconfigured SPF, DKIM, and DMARC records, phishing-susceptible users targeted through social engineering, email attachments and links as malware delivery mechanisms, and compromised accounts through credential stuffing or password reuse. Email authentication misconfiguration is particularly insidious. If your SPF, DKIM, and DMARC records are wrong or missing, attackers can spoof emails from your domain, your legitimate emails get marked as spam, and phishing emails impersonating your organization succeed. Email servers themselves are also targets. The NSA released guidance on Microsoft Exchange Server security specifically because Exchange servers are so frequently compromised. Container orchestration platforms like Kubernetes introduce massive attack surface complexity. The Kubernetes attack surface includes the Kubernetes API server with exposed or misconfigured API endpoints, container images with vulnerabilities in base images or application layers, container registries like Docker Hub, ECR, and GCR with weak access controls, pod security policies with overly permissive container configurations, network policies with insufficient micro-segmentation between pods, secrets management with hardcoded secrets or weak secret storage, and RBAC misconfigurations with overly broad service account permissions. Container security issues include containers running as root with excessive privileges, exposed Docker daemon sockets allowing container escape, vulnerable dependencies in container images, and lack of runtime security monitoring. The Docker daemon attack surface is particularly concerning. Running containers with privileged access or allowing docker.sock access can enable container escape and host compromise. Serverless computing like AWS Lambda, Azure Functions, and Google Cloud Functions promised to eliminate infrastructure management. Instead, it just created new attack surfaces. Serverless attack surface components include function code vulnerabilities like injection flaws and insecure dependencies, IAM misconfigurations with overly permissive Lambda execution roles, environment variables storing secrets as plain text, function URLs with publicly accessible endpoints without authentication, and event source mappings with untrusted input from various cloud services. The overabundance of event sources expands the attack surface. Lambda functions can be triggered by S3 events, API Gateway requests, DynamoDB streams, SNS topics, EventBridge schedules, IoT events, and dozens more. Each event source is a potential injection point. If function input validation is insufficient, attackers can manipulate event data to exploit the function. Real-world Lambda attacks include credential theft by exfiltrating IAM credentials from environment variables, lateral movement using over-permissioned roles to access other AWS resources, and data exfiltration by invoking functions to query and extract database contents. The Scarlet Eel adversary specifically targeted AWS Lambda for credential theft and lateral movement. Microservices architecture multiplies attack surface by decomposing monolithic applications into dozens or hundreds of independent services. Each microservice has its own attack surface including authentication mechanisms where each service needs to verify requests, authorization rules where each service enforces access controls, API endpoints for service-to-service communication channels, data stores where each service may have its own database, and network interfaces where each service exposes network ports. Microservices security challenges include east-west traffic vulnerabilities with service-to-service communication without encryption or authentication, authentication and authorization complexity from managing auth across 40 plus services multiplied by 3 environments equaling 240 configurations, service-to-service trust where services blindly trust internal traffic, network segmentation failures with flat networks allowing unrestricted pod-to-pod communication, and inconsistent security policies with different services having different security standards. One compromised microservice can enable lateral movement across the entire application. Without proper network segmentation and zero trust architecture, attackers pivot from service to service. How do you measure something this large, right. Attack surface measurement is complex. Attack surface metrics include the total number of assets with all discovered systems, applications, and devices, newly discovered assets found through continuous discovery, the number of exposed assets accessible from the internet, open ports and services with network services listening for connections, vulnerabilities by severity including critical, high, medium, and low CVEs, mean time to detect, MTTD, measuring how quickly new assets are discovered, mean time to remediate, MTTR, measuring how quickly vulnerabilities are fixed, shadow IT assets that are unknown or unmanaged, third-party exposure from vendor and partner access points, and attack surface change rate showing how rapidly the attack surface evolves. Academic research has produced formal attack surface measurement methods. Pratyusa Manadhata's foundational work defines attack surface as a three-tuple, System Attackability, Channel Attackability, Data Attackability. But in practice, most organizations struggle with basic attack surface visibility, let alone quantitative measurement. Your attack surface isn't static. It changes constantly. Changes happen because developers deploy new services and APIs, cloud auto-scaling spins up new instances, shadow IT appears as employees adopt unapproved tools, acquisitions bring new infrastructure into your environment, IoT devices get plugged into your network, and subdomains get created for new projects. Static, point-in-time assessments are obsolete. You need continuous asset discovery and monitoring. Continuous discovery methods include automated network scanning for regular scans to detect new devices, cloud API polling to query cloud provider APIs for resource changes, DNS monitoring to track new subdomains via Certificate Transparency logs, passive traffic analysis to observe network traffic and identify assets, integration with CMDB or ITSM to sync with configuration management databases, and cloud inventory automation using Infrastructure as Code to track deployments. Understanding your attack surface is step one. Reducing it is the goal. Attack surface reduction begins with asset elimination by removing unnecessary assets entirely. This includes decommissioning unused servers and applications, deleting abandoned subdomains and DNS records, shutting down forgotten development environments, disabling unused network services and ports, and removing unused user accounts and service identities. Access control hardening implements least privilege everywhere by enforcing multi-factor authentication, MFA, for all remote access, using role-based access control, RBAC, for cloud resources, implementing zero trust network architecture, restricting network access with micro-segmentation, and applying the principle of least privilege to IAM roles. Exposure minimization reduces what's visible to attackers by moving services behind VPNs or bastion hosts, using private IP ranges for internal services, implementing network address translation, NAT, for outbound access, restricting API endpoints to authorized sources only, and disabling unnecessary features and functionalities. Security hardening strengthens what remains by applying security patches promptly, using security configuration baselines, enabling encryption for data in transit and at rest, implementing Web Application Firewalls, WAF, for web apps, and deploying endpoint detection and response, EDR, on all devices. Monitoring and detection watch for attacks in progress by implementing real-time threat detection, enabling comprehensive logging and SIEM integration, deploying intrusion detection and prevention systems, IDS/IPS, monitoring for anomalous behavior patterns, and using threat intelligence feeds to identify known bad actors. Your attack surface is exponentially larger than you think it is. Every asset you know about probably has three you don't. Every known vulnerability probably has ten undiscovered ones. Every third-party integration probably grants more access than you realize. Every collaboration tool is leaking more data than you imagine. Every paste site contains more of your secrets than you want to admit. And attackers know this. They're not just looking at what you think you've secured. They're systematically enumerating every possible entry point. They're mining Certificate Transparency logs for forgotten subdomains. They're scanning every IP in your address space. They're reverse-engineering your mobile apps. They're buying employee credentials from data breach databases. They're compromising your vendors to reach you. They're scraping Pastebin for your leaked secrets. They're pulling your public Docker images and extracting the embedded credentials. They're accessing your misconfigured S3 buckets and exfiltrating terabytes of data. They're exploiting your exposed Jenkins instances to compromise your entire infrastructure. They're manipulating your AI agents to exfiltrate private Notion data. The asymmetry is brutal. You have to defend every single attack vector. They only need to find one that works. So what do you do. Start by accepting that you don't have complete visibility. Nobody does. But you can work toward better visibility through continuous discovery, automated asset management, and integration of security tools that help map your actual attack surface. Implement attack surface reduction aggressively. Every asset you eliminate is one less thing to defend. Every service you shut down is one less potential vulnerability. Every piece of shadow IT you discover and bring under management is one less blind spot. Every misconfigured cloud storage bucket you fix is terabytes of data no longer exposed. Every leaked secret you rotate is one less credential floating around the internet. Adopt zero trust architecture. Stop assuming that anything, internal services, microservices, authenticated users, collaboration tools, is inherently trustworthy. Verify everything. Monitor paste sites and code repositories. Your secrets are out there. Find them before attackers weaponize them. Secure your collaboration tools. Slack, Trello, Jira, Confluence, Notion, Google Drive, and Airtable are all leaking data. Lock them down. Fix your container security. Scan images for secrets. Use secret managers instead of environment variables. Secure your registries. Harden your CI/CD pipelines. Jenkins, GitHub Actions, and GitLab CI are high-value targets. Protect them. And test your assumptions with red team exercises and continuous security testing. Your attack surface is what an attacker can reach, not what you think you've secured. The attack surface problem isn't getting better. Cloud adoption, DevOps practices, remote work, IoT proliferation, supply chain complexity, collaboration tool sprawl, and container adoption are all expanding organizational attack surfaces faster than security teams can keep up. But understanding the problem is the first step toward managing it. And now you understand exactly how catastrophically large your attack surface actually is.

1 views
Robin Moffatt 2 months ago

Tech Radar (Nov 2025) - data blips

The latest  Thoughtworks TechRadar  is out. Here are some of the more data-related ‘blips’ (as they’re called on the radar) that I noticed. Each item links to the blip’s entry where you can read more information about Thoughtwork’s usage and opinions on it. Databricks Assistant Apache Paimon Delta Sharing Naive API-to-MCP conversion Standalone data engineering teams Text to SQL

0 views
Simon Willison 2 months ago

A new SQL-powered permissions system in Datasette 1.0a20

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits , with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code. Datasette's permissions system exists to answer the following question: Is this actor allowed to perform this action , optionally against this particular resource ? An actor is usually a user, but might also be an automation operating via the Datasette API. An action is a thing they need to do - things like view-table, execute-sql, insert-row. A resource is the subject of the action - the database you are executing SQL against, the table you want to insert a row into. Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data. Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes a write API with a need to configure who can insert, update, and delete rows or create new tables. Actors can be authenticated in a number of different ways provided by plugins using the actor_from_request() plugin hook. datasette-auth-passwords and datasette-auth-github and datasette-auth-existing-cookies are examples of authentication plugins. The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result. This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view? I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables. Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is really good at filtering large numbers of records. The biggest change in the new release is that I've replaced the previous plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new permission_resources_sql(actor, action) plugin hook. Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against. Here's an example, lifted from the documentation: This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database. The object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging. When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against the internal catalog tables and efficiently lists all the resources the actor can access. This query can then be limited or paginated to avoid loading too many results at once. Datasette has several additional requirements that make the permissions system more complicated. Datasette permissions can optionally act against a two-level hierarchy . You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in every database in that Datasette instance. Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example. Datasette currently has ten default actions but plugins that add additional features can register new actions to better participate in the permission systems. Datasette's permission system has a mechanism to veto permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level. Finally, Datasette includes a mechanism for applying additional restrictions to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are described in more detail in the documentation. That's a lot of different moving parts for the new implementation to cover. Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible. The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user: And this page listing the rules that apply to that question - since different plugins may return different rules which get combined together: This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the option.) Those rules are defined in the datasette/default_permissions.py Python module. There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource. It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before. API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system. This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons. Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out. The Datasette upgrade guide includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document. I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working extremely well . I've even had Claude Code update those notes itself with things it learned during an upgrade process! This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect. I've also been leaning heavily on to help with the upgrade process. I wrote myself two new helper scripts - and - to help test the new plugins. The and implementations can be found in this TIL . Some of my plugin upgrades have become a one-liner to the command, which runs OpenAI Codex CLI with a prompt without entering interactive mode: There are still a bunch more to go - there's a list in this tracking issue - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place. This change to Datasette core by far the most ambitious piece of work I've ever attempted using a coding agent. Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window. Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable. I no longer see codebase size as a limiting factor for how useful they can be. I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right. For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI. I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification. I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My research/sqlite-permissions-poc project was the one that finally convinced me of a viable approach, That one started as a free ranging conversation with Claude , at the end of which I told it to generate a specification which I then fed into GPT-5 to implement. You can see that specification at the end of the README . I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment. This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months. Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same. The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week. This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Understanding the permissions system Permissions systems need to be able to efficiently list things The new permission_resources_sql() plugin hook Hierarchies, plugins, vetoes, and restrictions New debugging tools The missing feature: list actors who can act on this resource Upgrading plugins for Datasette 1.0a20 Using Claude Code to implement this change Starting with a proof-of-concept Miscellaneous tips I picked up along the way What's next? = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to so I can run or as needed. = "run against datasette dev" - it runs the latest dev command with the plugin installed. When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The and shortcuts were invaluable for productively working on those plugins while I made changes to core. Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low. Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be. Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes". Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch. A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: . This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc. The code I write with LLMs is higher quality code . I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low. Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them. That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as Claude Skills . I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.

0 views
devansh 2 months ago

AI pentest scoping playbook

Disclosure: Certain sections of this content were grammatically refined/updated using AI assistance, as English is not my first language. Organizations are throwing money at "AI red teams" who run a few prompt injection tests, declare victory, and cash checks. Security consultants are repackaging traditional pentest methodologies with "AI" slapped on top, hoping nobody notices they're missing 80% of the actual attack surface. And worst of all, the people building AI systems, the ones who should know better, are scoping engagements like they're testing a CRUD app from 2015. This guide/playbook exists because the current state of AI security testing is dangerously inadequate. The attack surface is massive. The risks are novel. The methodologies are immature. And the consequences of getting it wrong are catastrophic. These are my personal views, informed by professional experience but not representative of my employer. What follows is what I wish every CISO, security lead, and AI team lead understood before they scoped their next AI security engagement. Traditional web application pentests follow predictable patterns. You scope endpoints, define authentication boundaries, exclude production databases, and unleash testers to find SQL injection and XSS. The attack surface is finite, the vulnerabilities are catalogued, and the methodologies are mature. AI systems break all of that. First, the system output is non-deterministic . You can't write a test case that says "given input X, expect output Y" because the model might generate something completely different next time. This makes reproducibility, the foundation of security testing, fundamentally harder. Second, the attack surface is layered and interconnected . You're not just testing an application. You're testing a model (which might be proprietary and black-box), a data pipeline (which might include RAG, vector stores, and real-time retrieval), integration points (APIs, plugins, browser tools), and the infrastructure underneath (cloud services, containers, orchestration). Third, novel attack classes exist that don't map to traditional vuln categories . Prompt injection isn't XSS. Data poisoning isn't SQL injection. Model extraction isn't credential theft. Jailbreaks don't fit CVE taxonomy. The OWASP Top 10 doesn't cover this. Fourth, you might not control the model . If you're using OpenAI's API or Anthropic's Claude, you can't test the training pipeline, you can't audit the weights, and you can't verify alignment. Your scope is limited to what the API exposes, which means you're testing a black box with unknown internals. Fifth, AI systems are probabilistic, data-dependent, and constantly evolving . A model that's safe today might become unsafe after fine-tuning. A RAG system that's secure with Dataset A might leak PII when Dataset B is added. An autonomous agent that behaves correctly in testing might go rogue in production when it encounters edge cases. This isn't incrementally harder than web pentesting. It's just fundamentally different. And if your scope document looks like a web app pentest with "LLM" find-and-replaced in, you're going to miss everything that matters. Before you can scope an AI security engagement, you need to understand what you're actually testing. And most organizations don't. Here's the stack: This is the thing everyone focuses on because it's the most visible. But "the model" isn't monolithic. Base model : Is it GPT-4? Claude? Llama 3? Mistral? A custom model you trained from scratch? Each has different vulnerabilities, different safety mechanisms, different failure modes. Fine-tuning : Have you fine-tuned the base model on your own data? Fine-tuning can break safety alignment. It can introduce backdoors. It can memorize training data and leak it during inference. If you've fine-tuned, that's in scope. Instruction tuning : Have you applied instruction-tuning or RLHF to shape model behavior? That's another attack surface. Adversaries can craft inputs that reverse your alignment work. Multi-model orchestration : Are you running multiple models and aggregating outputs? That introduces new failure modes. What happens when Model A says "yes" and Model B says "no"? How do you handle consensus? Can an adversary exploit disagreements? Model serving infrastructure : How is the model deployed? Is it an API? A container? Serverless functions? On-prem hardware? Each deployment model has different security characteristics. AI systems don't just run models. They feed data into models. And that data pipeline is massive attack surface. Training data : Where did the training data come from? Who curated it? How was it cleaned? Is it public? Proprietary? Scraped? Licensed? Can an adversary poison the training data? RAG (Retrieval-Augmented Generation) : Are you using RAG to ground model outputs in retrieved documents? That's adding an entire data retrieval system to your attack surface. Can an adversary inject malicious documents into your knowledge base? Can they manipulate retrieval to leak sensitive docs? Can they poison the vector embeddings? Vector databases : If you're using RAG, you're running a vector database (Pinecone, Weaviate, Chroma, etc.). That's infrastructure. That has vulnerabilities. That's in scope. Real-time data ingestion : Are you pulling live data from APIs, databases, or user uploads? Each data source is a potential injection point. Data preprocessing : How are inputs sanitized before hitting the model? Are you stripping dangerous characters? Validating formats? Filtering content? Attackers will test every preprocessing step for bypasses. Models don't exist in isolation. They're integrated into applications. And those integration points are attack surface. APIs : How do users interact with the model? REST APIs? GraphQL? WebSockets? Each has different attack vectors. Authentication and authorization : Who can access the model? How are permissions enforced? Can an adversary escalate privileges? Rate limiting : Can an adversary send 10,000 requests per second? Can they DOS your model? Can they extract the entire training dataset via repeated queries? Logging and monitoring : Are you logging inputs and outputs? If yes, are you protecting those logs from unauthorized access? Logs containing sensitive user queries are PII. Plugins and tool use : Can the model call external APIs? Execute code? Browse the web? Use tools? Every plugin is an attack vector. If your model can execute Python, an adversary will try to get it to run . Multi-turn conversations : Do users have multi-turn dialogues with the model? Multi-turn interactions create new attack surfaces because adversaries can condition the model over multiple turns, bypassing safety mechanisms gradually/ If you've built agentic systems, AI that can plan, reason, use tools, and take actions autonomously, you've added an entire new dimension of attack surface. Tool access : What tools can the agent use? File system access? Database queries? API calls? Browser automation? The more powerful the tools, the higher the risk. Planning and reasoning : How does the agent decide what actions to take? Can an adversary manipulate the planning process? Can they inject malicious goals? Memory systems : Do agents have persistent memory? Can adversaries poison that memory? Can they extract sensitive information from memory? Multi-agent coordination : Are you running multiple agents that coordinate? Can adversaries exploit coordination protocols? Can they cause agents to turn on each other or collude against safety mechanisms? Escalation paths : Can an agent escalate privileges? Can it access resources it shouldn't? Can it spawn new agents? AI systems run on infrastructure. That infrastructure has traditional security vulnerabilities that still matter. Cloud services : Are you running on AWS, Azure, GCP? Are your S3 buckets public? Are your IAM roles overly permissive? Are your API keys hardcoded in repos? Containers and orchestration : Are you using Docker, Kubernetes? Are your container images vulnerable? Are your registries exposed? Are your secrets managed properly? CI/CD pipelines : How do you deploy model updates? Can an adversary inject malicious code into your pipeline? Dependencies : Are you using vulnerable Python libraries? Compromised npm packages? Poisoned PyPI distributions? Secrets management : Where are your API keys, database credentials, and model weights stored? Are they in environment variables? Config files? Secret managers? How much of that did you include in your last AI security scope document? If the answer is "less than 60%", your scope is inadequate. And you're going to get breached by someone who understands the full attack surface. The OWASP Top 10 for LLM Applications is the closest thing we have to a standardized framework for AI security testing. If you're scoping an AI engagement and you haven't mapped every item in this list to your test plan, you're doing it wrong. Here's the 2025 version: That's your baseline. But if you stop there, you're missing half the attack surface. The OWASP LLM Top 10 is valuable, but it's not comprehensive. Here's what's missing: Safety ≠ security . But unsafe AI systems cause real harm, and that's in scope for red teaming. Alignment failures : Can the model be made to behave in ways that violate its stated values? Constitutional AI bypass : If you're using constitutional AI techniques (like Anthropic's Claude), can adversaries bypass the constitution? Bias amplification : Does the model exhibit or amplify demographic biases? This isn't just an ethics issue—it's a legal risk under GDPR, EEOC, and other regulations. Harmful content generation : Can the model be tricked into generating illegal, dangerous, or abusive content? Deceptive behavior : Can the model lie, manipulate, or deceive users? Traditional adversarial ML attacks apply to AI systems. Evasion attacks : Can adversaries craft inputs that cause misclassification? Model inversion : Can adversaries reconstruct training data from model outputs? Model extraction : Can adversaries steal model weights through repeated queries? Membership inference : Can adversaries determine if specific data was in the training set? Backdoor attacks : Does the model have hidden backdoors that trigger on specific inputs? If your AI system handles multiple modalities (text, images, audio, video), you have additional attack surface. Cross-modal injection : Attackers embed malicious instructions in images that the vision-language model follows. Image perturbation attacks : Small pixel changes invisible to humans cause model failures. Audio adversarial examples : Audio inputs crafted to cause misclassification. Typographic attacks : Adversarial text rendered as images to bypass filters. Multi-turn multimodal jailbreaks : Combining text and images across multiple turns to bypass safety. AI systems must comply with GDPR, HIPAA, CCPA, and other regulations. PII handling : Does the model process, store, or leak personally identifiable information? Right to explanation : Can users get explanations for automated decisions (GDPR Article 22)? Data retention : How long is data retained? Can users request deletion? Cross-border data transfers : Does the model send data across jurisdictions? Before you write your scope document, answer every single one of these questions. If you can't answer them, you don't understand your system well enough to scope a meaningful AI security engagement. If you can answer all these questions, you're ready to scope. If you can't, you're not. Your AI pentest/engagement scope document needs to be more detailed than a traditional pentest scope. Here's the structure: What we're testing : One-paragraph description of the AI system. Why we're testing : Business objectives (compliance, pre-launch validation, continuous assurance, incident response). Key risks : Top 3-5 risks that drive the engagement. Success criteria : What does "passing" look like? Architectural diagram : Include everything—model, data pipelines, APIs, infrastructure, third-party services. Component inventory : List every testable component with owner, version, and deployment environment. Data flows : Document how data moves through the system, from user input to model output to downstream consumers. Trust boundaries : Identify where data crosses trust boundaries (user → app, app → model, model → tools, tools → external APIs). Be exhaustive. List: For each component, specify: Map every OWASP LLM Top 10 item to specific test cases. Example: LLM01 - Prompt Injection : Include specific threat scenarios: Explicitly list what's NOT being tested: Tools : List specific tools testers will use: Techniques : Test phases : Authorization : All testing must be explicitly authorized in writing. Include names, signatures, dates. Ethical boundaries : No attempts at physical harm, financial fraud, illegal content generation (unless explicitly scoped for red teaming). Disclosure : Critical findings must be disclosed immediately via designated channel (email, Slack, phone). Standard findings can wait for formal report. Data handling : Testers must not exfiltrate user data, training data, or model weights except as explicitly authorized for demonstration purposes. All test data must be destroyed post-engagement. Legal compliance : Testing must comply with all applicable laws and regulations. If testing involves accessing user data, appropriate legal review must be completed. Technical report : Detailed findings with severity ratings, reproduction steps, evidence (screenshots, logs, payloads), and remediation guidance. Executive summary : Business-focused summary of key risks and recommendations. Threat model : Updated threat model based on findings. Retest availability : Will testers be available for retest after fixes? Timeline : Start date, end date, report delivery date, retest window. Key contacts : That's your scope document. It should be 10-20 pages. If it's shorter, you're missing things. Here's what I see organizations get wrong: Mistake 1: Scoping only the application layer, not the model You test the web app that wraps the LLM, but you don't test the LLM itself. You find XSS and broken authz, but you miss prompt injection, jailbreaks, and data extraction. Fix : Scope the full stack-app, model, data pipelines, infrastructure. Mistake 2: Treating the model as a black box when you control it If you fine-tuned the model, you have access to training data and weights. Test for data poisoning, backdoors, and alignment failures. Don't just test the API. Fix : If you control any part of the model lifecycle (training, fine-tuning, deployment), include that in scope. Mistake 3: Ignoring RAG and vector databases You test the LLM, but you don't test the document store. Adversaries inject malicious documents, manipulate retrieval, and poison embeddings—and you never saw it coming. Fix : If you're using RAG, the vector database and document ingestion pipeline are in scope. Mistake 4: Not testing multi-turn interactions You test single-shot prompts, but adversaries condition the model over 10 turns to bypass refusal mechanisms. You missed the attack entirely. Fix : Test multi-turn dialogues explicitly. Test conversation history isolation. Test memory poisoning. Mistake 5: Assuming third-party models are safe You're using OpenAI's API, so you assume it's secure. But you're passing user PII in prompts, you're not validating outputs before execution, and you haven't considered what happens if OpenAI's safety mechanisms fail. Fix : Even with third-party models, test your integration. Test input/output handling. Test failure modes. Mistake 6: Not including AI safety in security scope You test for technical vulnerabilities but ignore alignment failures, bias amplification, and harmful content generation. Then your model generates racist outputs or dangerous instructions, and you're in the news. Fix : AI safety is part of AI security. Include alignment testing, bias audits, and harm reduction validation. Mistake 7: Underestimating autonomous agent risks You test the LLM, but your agent can execute code, call APIs, and access databases. An adversary hijacks the agent, and it deletes production data or exfiltrates secrets. Fix : Autonomous agents are their own attack surface. Test tool permissions, privilege escalation, and agent behavior boundaries. Mistake 8: Not planning for continuous testing You do one pentest before launch, then never test again. But you're fine-tuning weekly, adding new plugins monthly, and updating RAG documents daily. Your attack surface is constantly changing. Fix : Scope for continuous red teaming, not one-time assessment. Organizations hire expensive consultants to run a few prompt injection tests, declare the system "secure," and ship to production. Then they get breached six months later when someone figures out a multi-turn jailbreak or poisons the RAG document store. The problem isn't that the testers are bad. The problem is that the scopes are inadequate . You can't find what you're not looking for. If your scope doesn't include RAG poisoning, testers won't test for it. If your scope doesn't include membership inference, testers won't test for it. If your scope doesn't include agent privilege escalation, testers won't test for it. And attackers will. The asymmetry is brutal: you have to defend every attack vector. Attackers only need to find one that works. So when you scope your next AI security engagement, ask yourself: "If I were attacking this system, what would I target?" Then make sure every single one of those things is in your scope document. Because if it's not in scope, it's not getting tested. And if it's not getting tested, it's going to get exploited. Traditional pentests are point-in-time assessments. You test, you report, you fix, you're done. That doesn't work for AI systems. AI systems evolve constantly: Every change introduces new attack surface. And if you're only testing once a year, you're accumulating risk for 364 days. You need continuous red teaming . Here's how to build it: Use tools like Promptfoo, Garak, and PyRIT to run automated adversarial testing on every model update. Integrate tests into CI/CD pipelines so every deployment is validated before production. Set up continuous monitoring for: Quarterly or bi-annually, bring in expert red teams for comprehensive testing beyond what automation can catch. Focus deep assessments on: Train your own security team on AI-specific attack techniques. Develop internal playbooks for: Every quarter, revisit your threat model: Update your testing roadmap based on evolving threats. Scoping AI security engagements is harder than traditional pentests because the attack surface is larger, the risks are novel, and the methodologies are still maturing. But it's not impossible. You need to: If you do this right, you'll find vulnerabilities before attackers do. If you do it wrong, you'll end up in the news explaining why your AI leaked training data, generated harmful content, or got hijacked by adversaries. First, the system output is non-deterministic . You can't write a test case that says "given input X, expect output Y" because the model might generate something completely different next time. This makes reproducibility, the foundation of security testing, fundamentally harder. Second, the attack surface is layered and interconnected . You're not just testing an application. You're testing a model (which might be proprietary and black-box), a data pipeline (which might include RAG, vector stores, and real-time retrieval), integration points (APIs, plugins, browser tools), and the infrastructure underneath (cloud services, containers, orchestration). Third, novel attack classes exist that don't map to traditional vuln categories . Prompt injection isn't XSS. Data poisoning isn't SQL injection. Model extraction isn't credential theft. Jailbreaks don't fit CVE taxonomy. The OWASP Top 10 doesn't cover this. Fourth, you might not control the model . If you're using OpenAI's API or Anthropic's Claude, you can't test the training pipeline, you can't audit the weights, and you can't verify alignment. Your scope is limited to what the API exposes, which means you're testing a black box with unknown internals. Fifth, AI systems are probabilistic, data-dependent, and constantly evolving . A model that's safe today might become unsafe after fine-tuning. A RAG system that's secure with Dataset A might leak PII when Dataset B is added. An autonomous agent that behaves correctly in testing might go rogue in production when it encounters edge cases. Base model : Is it GPT-4? Claude? Llama 3? Mistral? A custom model you trained from scratch? Each has different vulnerabilities, different safety mechanisms, different failure modes. Fine-tuning : Have you fine-tuned the base model on your own data? Fine-tuning can break safety alignment. It can introduce backdoors. It can memorize training data and leak it during inference. If you've fine-tuned, that's in scope. Instruction tuning : Have you applied instruction-tuning or RLHF to shape model behavior? That's another attack surface. Adversaries can craft inputs that reverse your alignment work. Multi-model orchestration : Are you running multiple models and aggregating outputs? That introduces new failure modes. What happens when Model A says "yes" and Model B says "no"? How do you handle consensus? Can an adversary exploit disagreements? Model serving infrastructure : How is the model deployed? Is it an API? A container? Serverless functions? On-prem hardware? Each deployment model has different security characteristics. Training data : Where did the training data come from? Who curated it? How was it cleaned? Is it public? Proprietary? Scraped? Licensed? Can an adversary poison the training data? RAG (Retrieval-Augmented Generation) : Are you using RAG to ground model outputs in retrieved documents? That's adding an entire data retrieval system to your attack surface. Can an adversary inject malicious documents into your knowledge base? Can they manipulate retrieval to leak sensitive docs? Can they poison the vector embeddings? Vector databases : If you're using RAG, you're running a vector database (Pinecone, Weaviate, Chroma, etc.). That's infrastructure. That has vulnerabilities. That's in scope. Real-time data ingestion : Are you pulling live data from APIs, databases, or user uploads? Each data source is a potential injection point. Data preprocessing : How are inputs sanitized before hitting the model? Are you stripping dangerous characters? Validating formats? Filtering content? Attackers will test every preprocessing step for bypasses. APIs : How do users interact with the model? REST APIs? GraphQL? WebSockets? Each has different attack vectors. Authentication and authorization : Who can access the model? How are permissions enforced? Can an adversary escalate privileges? Rate limiting : Can an adversary send 10,000 requests per second? Can they DOS your model? Can they extract the entire training dataset via repeated queries? Logging and monitoring : Are you logging inputs and outputs? If yes, are you protecting those logs from unauthorized access? Logs containing sensitive user queries are PII. Plugins and tool use : Can the model call external APIs? Execute code? Browse the web? Use tools? Every plugin is an attack vector. If your model can execute Python, an adversary will try to get it to run . Multi-turn conversations : Do users have multi-turn dialogues with the model? Multi-turn interactions create new attack surfaces because adversaries can condition the model over multiple turns, bypassing safety mechanisms gradually/ Tool access : What tools can the agent use? File system access? Database queries? API calls? Browser automation? The more powerful the tools, the higher the risk. Planning and reasoning : How does the agent decide what actions to take? Can an adversary manipulate the planning process? Can they inject malicious goals? Memory systems : Do agents have persistent memory? Can adversaries poison that memory? Can they extract sensitive information from memory? Multi-agent coordination : Are you running multiple agents that coordinate? Can adversaries exploit coordination protocols? Can they cause agents to turn on each other or collude against safety mechanisms? Escalation paths : Can an agent escalate privileges? Can it access resources it shouldn't? Can it spawn new agents? Cloud services : Are you running on AWS, Azure, GCP? Are your S3 buckets public? Are your IAM roles overly permissive? Are your API keys hardcoded in repos? Containers and orchestration : Are you using Docker, Kubernetes? Are your container images vulnerable? Are your registries exposed? Are your secrets managed properly? CI/CD pipelines : How do you deploy model updates? Can an adversary inject malicious code into your pipeline? Dependencies : Are you using vulnerable Python libraries? Compromised npm packages? Poisoned PyPI distributions? Secrets management : Where are your API keys, database credentials, and model weights stored? Are they in environment variables? Config files? Secret managers? Alignment failures : Can the model be made to behave in ways that violate its stated values? Constitutional AI bypass : If you're using constitutional AI techniques (like Anthropic's Claude), can adversaries bypass the constitution? Bias amplification : Does the model exhibit or amplify demographic biases? This isn't just an ethics issue—it's a legal risk under GDPR, EEOC, and other regulations. Harmful content generation : Can the model be tricked into generating illegal, dangerous, or abusive content? Deceptive behavior : Can the model lie, manipulate, or deceive users? Evasion attacks : Can adversaries craft inputs that cause misclassification? Model inversion : Can adversaries reconstruct training data from model outputs? Model extraction : Can adversaries steal model weights through repeated queries? Membership inference : Can adversaries determine if specific data was in the training set? Backdoor attacks : Does the model have hidden backdoors that trigger on specific inputs? Cross-modal injection : Attackers embed malicious instructions in images that the vision-language model follows. Image perturbation attacks : Small pixel changes invisible to humans cause model failures. Audio adversarial examples : Audio inputs crafted to cause misclassification. Typographic attacks : Adversarial text rendered as images to bypass filters. Multi-turn multimodal jailbreaks : Combining text and images across multiple turns to bypass safety. PII handling : Does the model process, store, or leak personally identifiable information? Right to explanation : Can users get explanations for automated decisions (GDPR Article 22)? Data retention : How long is data retained? Can users request deletion? Cross-border data transfers : Does the model send data across jurisdictions? What base model are you using (GPT-4, Claude, Llama, Mistral, custom)? Is the model proprietary (OpenAI API) or open-source? Have you fine-tuned the base model? On what data? Have you applied instruction tuning, RLHF, or other alignment techniques? How is the model deployed (API, on-prem, container, serverless)? Do you have access to model weights? Can testers query the model directly, or only through your application? Are there rate limits? What are they? What's the model's context window size? Does the model support function calling or tool use? Is the model multimodal (vision, audio, text)? Are you using multiple models in ensemble or orchestration? Where did training data come from (public, proprietary, scraped, licensed)? Was training data curated or filtered? How? Is training data in scope for poisoning tests? Are you using RAG (Retrieval-Augmented Generation)? If RAG: What's the document store (vector DB, traditional DB, file system)? If RAG: How are documents ingested? Who controls ingestion? If RAG: Can testers inject malicious documents? If RAG: How is retrieval indexed and searched? Do you pull real-time data from external sources (APIs, databases)? How is input data preprocessed and sanitized? Is user conversation history stored? Where? For how long? Can users access other users' data? How do users interact with the model (web app, API, chat interface, mobile app)? What authentication mechanisms are used (OAuth, API keys, session tokens)? What authorization model is used (RBAC, ABAC, none)? Are there different user roles with different permissions? Is there rate limiting? At what levels (user, IP, API key)? Are inputs and outputs logged? Where? Who has access to logs? Are logs encrypted at rest and in transit? How are errors handled? Are error messages exposed to users? Are there webhooks or callbacks that the model can trigger? Can the model call external APIs? Which ones? Can the model execute code? In what environment? Can the model browse the web? Can the model read/write files? Can the model access databases? What permissions do plugins have? How are plugin outputs validated before use? Can users add custom plugins? Are plugin interactions logged? Do you have autonomous agents that plan and execute multi-step tasks? What tools can agents use? Can agents spawn other agents? Do agents have persistent memory? Where is it stored? How are agent goals and constraints defined? Can agents access sensitive resources (DBs, APIs, filesystems)? Can agents escalate privileges? Are there kill-switches or circuit breakers for agents? How is agent behavior monitored? What cloud provider(s) are you using (AWS, Azure, GCP, on-prem)? Are you using containers (Docker)? Orchestration (Kubernetes)? Where are model weights stored? Who has access? Where are API keys and secrets stored? Are secrets in environment variables, config files, or secret managers? How are dependencies managed (pip, npm, Docker images)? Have you scanned dependencies for known vulnerabilities? How are model updates deployed? What's the CI/CD pipeline? Who can deploy model updates? Are there staging environments separate from production? What safety mechanisms are in place (content filters, refusal training, constitutional AI)? Have you red-teamed for jailbreaks? Have you tested for bias across demographic groups? Have you tested for harmful content generation? Do you have human-in-the-loop review for sensitive outputs? What's your incident response plan if the model behaves unsafely? Can testers attempt to jailbreak the model? Can testers attempt prompt injection? Can testers attempt data extraction (training data, PII)? Can testers attempt model extraction or inversion? Can testers attempt DoS or resource exhaustion? Can testers poison training data (if applicable)? Can testers test multi-turn conversations? Can testers test RAG document injection? Can testers test plugin abuse? Can testers test agent privilege escalation? Are there any topics, content types, or test methods that are forbidden? What's the escalation process if critical issues are found during testing? What regulations apply (GDPR, HIPAA, CCPA, FTC, EU AI Act)? Do you process PII? What types? Do you have data processing agreements with model providers? Do you have the legal right to test this system? Are there export control restrictions on the model or data? What are the disclosure requirements for findings? What's the confidentiality agreement for testers? Model(s) : Exact model names, versions, access methods APIs : All endpoints with authentication requirements Data stores : Databases, vector stores, file systems, caches Integrations : Every third-party service, plugin, tool Infrastructure : Cloud accounts, containers, orchestration Applications : Web apps, mobile apps, admin panels Access credentials testers will use Environments (dev, staging, prod) that are in scope Testing windows (if limited) Rate limits or usage restrictions Test direct instruction override Test indirect injection via RAG documents Test multi-turn conditioning Test system prompt extraction Test jailbreak techniques (roleplay, hypotheticals, encoding) Test cross-turn memory poisoning "Can an attacker leak other users' conversation history?" "Can an attacker extract training data containing PII?" "Can an attacker bypass content filters to generate harmful instructions?" Production environments (if testing only staging) Physical security Social engineering of employees Third-party SaaS providers we don't control Specific attack types (if any are prohibited) Manual testing Promptfoo for LLM fuzzing Garak for red teaming PyRIT for adversarial prompting ART (Adversarial Robustness Toolbox) for ML attacks Custom scripts for specific attack vectors Traditional tools (Burp Suite, Caido, Nuclei) for infrastructure Prompt injection testing Jailbreak attempts Data extraction attacks Model inversion Membership inference Evasion attacks RAG poisoning Plugin abuse Agent privilege escalation Infrastructure scanning Reconnaissance and threat modeling Automated vulnerability scanning Manual testing of high-risk areas Exploitation and impact validation Reporting and remediation guidance Engagement lead (security team) Technical point of contact (AI team) Escalation contact (for critical findings) Legal contact (for questions on scope) Models get fine-tuned RAG document stores get updated New plugins get added Agents gain new capabilities Infrastructure changes Prompt injection attempts Jailbreak successes Data extraction queries Unusual tool usage patterns Agent behavior anomalies Novel attack vectors that tools don't cover Complex multi-step exploitation chains Social engineering combined with technical attacks Agent hijacking and multi-agent exploits Prompt injection testing Jailbreak methodology RAG poisoning Agent security testing What new attacks have been published? What new capabilities have you added? What new integrations are in place? What new risks does the threat landscape present? Understand the full stack : model, data pipelines, application, infrastructure, agents, everything. Map every attack vector : OWASP LLM Top 10 is your baseline, not your ceiling. Answer scoping questions (mentioned above) : If you can't answer them, you don't understand your system. Write detailed scope documents : 10-20 pages, not 2 pages. Use the right tools : Promptfoo, Garak, ART, LIME, SHAP—not just Burp Suite. Test continuously : Not once, but ongoing. Avoid common mistakes : Don't ignore RAG, don't underestimate agents, don't skip AI safety.

0 views
Lalit Maganti 2 months ago

CAST(x AS STRING) casts to integer in SQLite

Discussed on lobste.rs . As an “SQLite consultant” for my local area of Google, I often have people come to me having written SQL like: and ask me “Why is baz always an integer?! Have I hit an SQLite bug?”. I have to again reach for my list of “odd quirks that SQLite has that people don’t know about”. Because this is not a bug, at least according to the SQLite manual. Instead, the correct way to write the above query is: The reason for this? Quoting from “Determination of Column Affinity” For tables not declared as STRICT, the affinity of a column is determined by the declared type of the column, according to the following rules in the order shown: Otherwise, the affinity is NUMERIC. does not match any of the numbered rules and so fallback to affinity which, in the general case, means integer. Due to SQLite’s staunch stance of being “backwards compatible” there’s very little chance of this paper-cut ever going away. But at least next time someone comes to me with this issue, I’ll be able to link to this post instead of writing the same thing for the nth time :) If the declared type contains the string “INT” then it is assigned INTEGER affinity. If the declared type of the column contains any of the strings “CHAR”, “CLOB”, or “TEXT” then that column has TEXT affinity. Notice that the type VARCHAR contains the string “CHAR” and is thus assigned TEXT affinity. If the declared type for a column contains the string “BLOB” or if no type is specified then the column has affinity BLOB. If the declared type for a column contains any of the strings “REAL”, “FLOA”, or “DOUB” then the column has REAL affinity.

0 views