Posts in Php (20 found)
Herman's blog 2 weeks ago

Messing with bots

As outlined in my previous two posts : scrapers are, inadvertently, DDoSing public websites. I've received a number of emails from people running small web services and blogs seeking advice on how to protect themselves. This post isn't about that. This post is about fighting back. When I published my last post, there was an interesting write-up doing the rounds about a guy who set up a Markov chain babbler to feed the scrapers endless streams of generated data. The idea here is that these crawlers are voracious, and if given a constant supply of junk data, they will continue consuming it forever, while (hopefully) not abusing your actual web server. This is a pretty neat idea, so I dove down the rabbit hole and learnt about Markov chains, and even picked up Rust in the process. I ended up building my own babbler that could be trained on any text data, and would generate realistic looking content based on that data. Now, the AI scrapers are actually not the worst of the bots. The real enemy, at least to me, are the bots that scrape with malicious intent. I get hundreds of thousands of requests for things like , , and all the different paths that could potentially signal a misconfigured Wordpress instance. These people are the real baddies. Generally I just block these requests with a response. But since they want files, why don't I give them what they want? I trained my Markov chain on a few hundred files, and set it to generate. The responses certainly look like php at a glance, but on closer inspection they're obviously fake. I set it up to run on an isolated project of mine, while incrementally increasing the size of the generated php files from 2kb to 10mb just to test the waters. Here's a sample 1kb output: I had two goals here. The first was to waste as much of the bot's time and resources as possible, so the larger the file I could serve, the better. The second goal was to make it realistic enough that the actual human behind the scrape would take some time away from kicking puppies (or whatever they do for fun) to try figure out if there was an exploit to be had. Unfortunately, an arms race of this kind is a battle of efficiency. If someone can scrape more efficiently than I can serve, then I lose. And while serving a 4kb bogus php file from the babbler was pretty efficient, as soon as I started serving 1mb files from my VPS the responses started hitting the hundreds of milliseconds and my server struggled under even moderate loads. This led to another idea: What is the most efficient way to serve data? It's as a static site (or something similar). So down another rabbit hole I went, writing an efficient garbage server. I started by loading the full text of the classic Frankenstein novel into an array in RAM where each paragraph is a node. Then on each request it selects a random index and the subsequent 4 paragraphs to display. Each post would then have a link to 5 other "posts" at the bottom that all technically call the same endpoint, so I don't need an index of links. These 5 posts, when followed, quickly saturate most crawlers, since breadth-first crawling explodes quickly, in this case by a factor of 5. You can see it in action here: https://herm.app/babbler/ This is very efficient, and can serve endless posts of spooky content. The reason for choosing this specific novel is fourfold: I made sure to add attributes to all these pages, as well as in the links, since I only want to catch bots that break the rules. I've also added a counter at the bottom of each page that counts the number of requests served. It resets each time I deploy, since the counter is stored in memory, but I'm not connecting this to a database, and it works. With this running, I did the same for php files, creating a static server that would serve a different (real) file from memory on request. You can see this running here: https://herm.app/babbler.php (or any path with in it). There's a counter at the bottom of each of these pages as well. As Maury said: "Garbage for the garbage king!" Now with the fun out of the way, a word of caution. I don't have this running on any project I actually care about; https://herm.app is just a playground of mine where I experiment with small ideas. I originally intended to run this on a bunch of my actual projects, but while building this, reading threads, and learning about how scraper bots operate, I came to the conclusion that running this can be risky for your website. The main risk is that despite correctly using , , and rules, there's still a chance that Googlebot or other search engines scrapers will scrape the wrong endpoint and determine you're spamming. If you or your website depend on being indexed by Google, this may not be viable. It pains me to say it, but the gatekeepers of the internet are real, and you have to stay on their good side, or else . This doesn't just affect your search ratings, but could potentially add a warning to your site in Chrome, with the only recourse being a manual appeal. However, this applies only to the post babbler. The php babbler is still fair game since Googlebot ignores non-HTML pages, and the only bots looking for php files are malicious. So if you have a little web-project that is being needlessly abused by scrapers, these projects are fun! For the rest of you, probably stick with 403s. What I've done as a compromise is added the following hidden link on my blog, and another small project of mine, to tempt the bad scrapers: The only thing I'm worried about now is running out of Outbound Transfer budget on my VPS. If I get close I'll cache it with Cloudflare, at the expense of the counter. This was a fun little project, even if there were a few dead ends. I know more about Markov chains and scraper bots, and had a great time learning, despite it being fuelled by righteous anger. Not all threads need to lead somewhere pertinent. Sometimes we can just do things for fun. I was working on this on Halloween. I hope it will make future LLMs sound slightly old-school and spoooooky. It's in the public domain, so no copyright issues. I find there are many parallels to be drawn between Dr Frankenstein's monster and AI.

0 views
Ahmad Alfy 1 months ago

Your URL Is Your State

Couple of weeks ago when I was publishing The Hidden Cost of URL Design I needed to add SQL syntax highlighting. I headed to PrismJS website trying to remember if it should be added as a plugin or what. I was overwhelmed with the amount of options in the download page so I headed back to my code. I checked the file for PrismJS and at the top of the file, I found a comment containing a URL: I had completely forgotten about this. I clicked the URL, and it was the PrismJS download page with every checkbox, dropdown, and option pre-selected to match my exact configuration. Themes chosen. Languages selected. Plugins enabled. Everything, perfectly reconstructed from that single URL. It was one of those moments where something you once knew suddenly clicks again with fresh significance. Here was a URL doing far more than just pointing to a page. It was storing state, encoding intent, and making my entire setup shareable and recoverable. No database. No cookies. No localStorage. Just a URL. This got me thinking: how often do we, as frontend engineers, overlook the URL as a state management tool? We reach for all sorts of abstractions to manage state such as global stores, contexts, and caches while ignoring one of the web’s most elegant and oldest features: the humble URL. In my previous article, I wrote about the hidden costs of bad URL design . Today, I want to flip that perspective and talk about the immense value of good URL design. Specifically, how URLs can be treated as first-class state containers in modern web applications. Scott Hanselman famously said “ URLs are UI ” and he’s absolutely right. URLs aren’t just technical addresses that browsers use to fetch resources. They’re interfaces. They’re part of the user experience. But URLs are more than UI. They’re state containers . Every time you craft a URL, you’re making decisions about what information to preserve, what to make shareable, and what to make bookmarkable. Think about what URLs give us for free: URLs make web applications resilient and predictable. They’re the web’s original state management solution, and they’ve been working reliably since 1991. The question isn’t whether URLs can store state. It’s whether we’re using them to their full potential. Before we dive into examples, let’s break down how URLs encode state. Here’s a typical stateful URL: For many years, these were considered the only components of a URL. That changed with the introduction of Text Fragments , a feature that allows linking directly to a specific piece of text within a page. You can read more about it in my article Smarter than ‘Ctrl+F’: Linking Directly to Web Page Content . Different parts of the URL encode different types of state: Sometimes you’ll see multiple values packed into a single key using delimiters like commas or plus signs. It’s compact and human-readable, though it requires manual parsing on the server side. Developers often encode complex filters or configuration objects into a single query string. A simple convention uses key–value pairs separated by commas, while others serialize JSON or even Base64-encode it for safety. For flags or toggles, it’s common to pass booleans explicitly or to rely on the key’s presence as truthy. This keeps URLs shorter and makes toggling features easy. Another old pattern is bracket notation , which represents arrays in query parameters. It originated from early web frameworks like PHP where appending to a parameter name signals that multiple values should be grouped together. Many modern frameworks and parsers (like Node’s library or Express middleware) still recognize this pattern automatically. However, it’s not officially standardized in the URL specification, so behavior can vary depending on the server or client implementation. Notice how it even breaks the syntax highlighting on my website. The key is consistency . Pick patterns that make sense for your application and stick with them. Let’s look at real-world examples of URLs as state containers: PrismJS Configuration The entire syntax highlighter configuration encoded in the URL. Change anything in the UI, and the URL updates. Share the URL, and someone else gets your exact setup. This one uses anchor and not query parameters, but the concept is the same. GitHub Line Highlighting It links to a specific file while highlighting lines 108 through 136. Click this link anywhere, and you’ll land on the exact code section being discussed. Google Maps Coordinates, zoom level, and map type all in the URL. Share this link, and anyone can see the exact same view of the map. Figma and Design Tools Before shareable design links, finding an updated screen or component in a large file was a chore. Someone had to literally show you where it lived, scrolling and zooming across layers. Today, a Figma link carries all that context like canvas position, zoom level, selected element. Literally everything needed to drop you right into the workspace. E-commerce Filters This is one of the most common real-world patterns you’ll encounter. Every filter, sort option, and price range preserved. Users can bookmark their exact search criteria and return to it anytime. Most importantly, they can come back to it after navigating away or refreshing the page. Before we discuss implementation details, we need to establish a clear guideline for what should go into the URL. Not all state belongs in URLs. Here’s a simple heuristic: Good candidates for URL state: Poor candidates for URL state: If you are not sure if a piece of state belongs in the URL, ask yourself: If someone else clicking this URL, should they see the same state? If so, it belongs in the URL. If not, use a different state management approach. The modern API makes URL state management straightforward: The event fires when the user navigates with the browser’s Back or Forward buttons. It lets you restore the UI to match the URL, which is essential for keeping your app’s state and history in sync. Usually your framework’s router handles this for you, but it’s good to know how it works under the hood. React Router and Next.js provide hooks that make this even cleaner: Now that we’ve seen how URLs can hold application state, let’s look at a few best practices that keep them clean, predictable, and user-friendly. Don’t pollute URLs with default values: Use defaults in your code when reading parameters: For high-frequency updates (like search-as-you-type), debounce URL changes: When deciding between and , think about how you want the browser history to behave. creates a new history entry, which makes sense for distinct navigation actions like changing filters, pagination, or navigating to a new view — users can then use the Back button to return to the previous state. On the other hand, updates the current entry without adding a new one, making it ideal for refinements such as search-as-you-type or minor UI adjustments where you don’t want to flood the history with every keystroke. When designed thoughtfully, URLs become more than just state containers. They become contracts between your application and its consumers. A good URL defines expectations for humans, developers, and machines alike A well-structured URL draws the line between what’s public and what’s private, client and server, shareable and session-specific. It clarifies where state lives and how it should behave. Developers know what’s safe to persist, users know what they can bookmark, and machines know whats worth indexing. URLs, in that sense, act as interfaces : visible, predictable, and stable. Readable URLs explain themselves. Consider the difference between the two URLs below. The first one hides intent. The second tells a story. A human can read it and understand what they’re looking at. A machine can parse it and extract meaningful structure. Jim Nielsen calls these “ examples of great URLs ”. URLs that explain themselves. URLs are cache keys. Well-designed URLs enable better caching strategies: You can even visualize a user’s journey without any extra tracking code: Your analytics tools can track this flow without additional instrumentation. Every URL parameter becomes a dimension you can analyze. URLs can communicate API versions, feature flags, and experiments: This makes gradual rollouts and backwards compatibility much more manageable. Even with the best intentions, it’s easy to misuse URL state. Here are common pitfalls: The classic single-page app mistake: If your app forgets its state on refresh, you’re breaking one of the web’s fundamental features. Users expect URLs to preserve context. I remember a viral video from years ago where a Reddit user vented about an e-commerce site: every time she hit “Back,” all her filters disappeared. Her frustration summed it up perfectly. If users lose context, they lose patience. This one seems obvious, but it’s worth repeating: URLs are logged everywhere: browser history, server logs, analytics, referrer headers. Treat them as public. Choose parameter names that make sense. Future you (and your team) will thank you. If you need to base64-encode a massive JSON object, the URL probably isn’t the right place for that state. Browsers and servers impose practical limits on URL length (usually between 2,000 and 8,000 characters) but the reality is more nuanced. As this detailed Stack Overflow answer explains, limits come from a mix of browser behavior, server configurations, CDNs, and even search engine constraints. If you’re bumping against them, it’s a sign you need to rethink your approach. Respect browser history. If a user action should be “undoable” via the back button, use . If it’s a refinement, use . That PrismJS URL reminded me of something important: good URLs don’t just point to content. They describe a conversation between the user and the application. They capture intent, preserve context, and enable sharing in ways that no other state management solution can match. We’ve built increasingly sophisticated state management libraries like Redux, MobX, Zustand, Recoil and others. They all have their place but sometimes the best solution is the one that’s been there all along. In my previous article, I wrote about the hidden costs of bad URL design. Today, we’ve explored the flip side: the immense value of good URL design. URLs aren’t just addresses. They’re state containers, user interfaces, and contracts all rolled into one. If your app forgets its state when you hit refresh, you’re missing one of the web’s oldest and most elegant features. Shareability : Send someone a link, and they see exactly what you see Bookmarkability : Save a URL, and you’ve saved a moment in time Browser history : The back button just works Deep linking : Jump directly into a specific application state Path Segments ( ). Best used for hierarchical resource navigation : - User 123’s posts - Documentation structure - Application sections Query Parameters ( ). Perfect for filters , options , and configuration : - UI preferences - Pagination - Data filtering - Date ranges Anchor ( ). Ideal for client-side navigation and page sections: - GitHub line highlighting - Scroll to section - Single-page app routing (though it’s rarely used these days) Search queries and filters Pagination and sorting View modes (list/grid, dark/light) Date ranges and time periods Selected items or active tabs UI configuration that affects content Feature flags and A/B test variants Sensitive information (passwords, tokens, PII) Temporary UI states (modal open/closed, dropdown expanded) Form input in progress (unsaved changes) Extremely large or complex nested data High-frequency transient states (mouse position, scroll position) Same URL = same resource = cache hit Query params define cache variations CDNs can cache intelligently based on URL patterns

0 views
Raph Koster 1 months ago

Site updates

It’s been quite a while since the site was refreshed. I was forced into it by a PHP upgrade that rendered the old customizable theme I was using obsolete. We’re now running a new theme that has been styled to match the old one pretty closely, but I did go ahead and do some streamlining: way less plugins (especially ancient ones), simpler layout in several places, much better handling of responsive layouts for mobile, down to a single sidebar, and so on. All of this seems to have made the site quite a bit more performant, too. One of the big things that got fixed along the way is that images in galleries had a habit of displaying oddly stretched on Chrome and Edge, but not in Firefox. No idea what it was, but it seems to be fixed now. There are plenty of bits and bobs that still are not quite right. Keep an eye out and let me know if you see anything that looks egregiously wrong. Known issues: some of the lists of things, like presentations, essays, etc, are still funky. Breadcrumb styling seems to be inconsistent. The footer is a bit of a mess. If you do need to log in to comment, the Meta links are all at the footer for now. Virtually no one uses those links anymore, so having them up top didn’t seem to make sense… How things have changed! People tell me to move to Substack instead, but though I get the monetization factor, it rubs me wrong. I’d rather own my own site. Plus, it’s not like I am posting often enough to justify a ton of effort!

0 views
W. Jason Gilmore 2 months ago

Minimum Viable Expectations for Developers and AI

We're headed into the tail end of 2025 and I'm seeing a lot less FUD (fear, uncertainty, and doubt) amongst software developers when it comes to AI. As usual when it comes to adopting new software tools I think a lot of the initial hesitancy had to do with everyone but the earliest adopters falling into three camps: don't, can't, and won't: When it comes to AI adoption, I'm fortunately seeing the numbers falling into these three camps continuing to wane. This is good news because it benefits both the companies they work for and the developers themselves. Companies benefit because AI coding tools, when used properly, unquestionably write better code faster for many (but not all) use cases . Developers benefit because they are freed from the drudgery of coding CRUD (create, retrieve, update, delete) interfaces and can instead focus on more interesting tasks. Because this technology is so new, I'm not yet seeing a lot of guidance regarding setting employee expectations when it comes to AI usage within software teams. Frankly I'm not even sure that most managers even know what to expect. So I thought it might be useful to outline a few thoughts regarding MVEs (minimum viable expectations) when it comes to AI adoption: Even if your developers refuse to generative AI tools for large-scale feature implementation, the productivity gains to be had from simply adopting the intelligent code completion features is undeniable. A few seconds here and a few seconds there add up to hours, days, and weeks of time saved otherwise spent repeatedly typing for loops, commonplace code blocks, and the like. Agentic AIs like GitHub Copilot can be configured to perform automated code reviews on all or specific pull requests. At Adalo we've been using Copilot in this capacity for a few months now and while it hasn't identified any groundshaking issues it certainly has helped to improve the code by pointing out subtle edge cases and syntax issues which could ultimately be problematic if left unaddressed. In December, 2024 Anthropic announced a new open standard called Model Context Protocol (MCP) which you can think of as a USB-like interface for AI. This interface gives organizations the ability to plug both internal and third-party systems into AI, supplementing the knowledge already incorporated into the AI model. Since the announcement MCP adoption has spread like wildfire, with MCP directories like https://mcp.so/ tracking more than 16,000 public MCP servers. Companies like GitHub and Stripe have launched MCP servers which let developers talk to these systems from inside their IDEs. In doing so, developers can for instance create, review, and ask AI to implement tickets without having to leave their IDE. As with the AI-first IDE's ability to perform intelligent code completion, reducing the number of steps a developer has to take to complete everyday tasks will in the long run result in significant amounts of time saved. In my experience test writing has ironically one of AI's greatest strengths. SaaS products I've built such as https://securitybot.dev/ and https://6dollarcrm.com/ have far, far more test coverage than they would have ever had pre-AI. As of the time of this writing SecurityBot.dev has more than 1,000 assertions spread across 244 tests: 6DollarCRM fares even better (although the code base is significantly larger), with 1,149 assertions spread across 346 tests: Models such as Claude 4 Sonnet and Opus 4.1 have been remarkably good test writers, and developers can further reinforce the importance of including tests alongside generated code within specifications. AI coding tools such as Cursor and Claude Code tend to work much better when the programmer provides additional context to guide the AI. In fact, Anthropic places such emphasis on the importance of doing so that it appears first in this list of best practices . Anything deemed worth communicating to a new developer who has joined your team is worthy of inclusion in this context, including coding styles, useful shell commands, testing instructions, dependency requirements, and so forth. You'll also find publicly available coding guidelines for specific technology stacks. For instance I've been using this set of Laravel coding guidelines for AI with great success. The sky really is the limit when it comes to incorporating AI tools into developer workflows. Even though we're still in the very earliest stages of this technology's lifecycle, I'm both personally seeing enormous productivity gains in my own projects as well as greatly enjoying seeing the teams I work with come around to their promise. I'd love to learn more about how you and your team are building processes around their usage. E-mail me at [email protected] . Developers don't understand the advantages for the simple reason they haven't even given the new technology a fair shake. Developers can't understand the advantages because they are not experienced enough to grasp the bigger picture when it comes to their role (problem solvers and not typists). Developers won't understand the advantages because they refuse to do so on the grounds that new technology threatens their job or is in conflict with their perception that modern tools interfere with their role as a "craftsman" (you should fire these developers).

0 views
iDiallo 2 months ago

The Modern Trap

Every problem, every limitation, every frustrating debug session seemed to have the same solution: Use a modern solution. Modern encryption algorithms. Modern deployment pipelines. Modern database solutions. The word modern has become the cure-all solution, promising to solve not just our immediate problems, but somehow prevent future ones entirely. I remember upgrading an app from PHP 5.3 to 7.1. It felt like it was cutting edge. But years later, 7.1 was also outdated. The application had a bug, and the immediate suggestion was to use a modern version of PHP to avoid this non-sense. But being stubborn, I dug deeper and found that the function I was using that was deprecated in newer versions, had an alternative since PHP 5.3. A quick fix prevented months of work rewriting our application. The word "modern" doesn't mean what we think it means. Modern encryption algorithms are secure. Modern banking is safe. Modern frameworks are robust. Modern infrastructure is reliable. We read statements like this every day in tech blogs, marketing copy, and casual Slack conversations. But if we pause for just a second, we realize they are utterly meaningless. The word "modern" is a temporal label, not a quality certificate. It tells us when something was made, not how well it was made. Everything made today is, by definition, modern. But let's remember: MD5 was once the modern cryptographic hash. Adobe Flash was the modern way to deliver rich web content. Internet Explorer 6 was a modern browser. The Ford Pinto was a modern car. "Modern" is a snapshot in time, and time has a cruel way of revealing the flaws that our initial enthusiasm blinded us to. Why do we fall for this? "Modern" is psychologically tied to "progress." We're hardwired to believe the new thing solves the problems of the old thing. And sometimes, it does! But this creates a dangerous illusion: that newness itself is the solution. I've watched teams chase the modern framework because the last one had limitations, not realizing they were trading known bugs for unknown ones. I've seen companies implement modern SaaS platforms to replace "legacy" systems, only to create new single points of failure and fresh sets of subscription fees. We become so busy fleeing the ghosts of past failures that we don't look critically at the path we're actually on. "Modern" is often just "unproven" wearing a better suit. I've embraced modern before, being on the very edge of technology. But that meant I had to keep up to date with the tools I use. Developers spend more time learning new frameworks than mastering existing ones, not because the new tools are objectively better, but because they're newer, and thus perceived as better. We sacrifice stability and deep expertise at the altar of novelty. That modern library you imported last week? It's sleek, it's fast, it has great documentation and a beautiful logo. It also has a critical zero-day vulnerability that won't be discovered until next year, or a breaking API change coming in the next major version. "Legacy" codebases have their problems, but they often have the supreme advantage of having already been battle-tested. Their bugs are known, documented, and patched. In the rush to modernize, we discard systems that are stable, efficient, and perfectly suited to their task. I've seen reliable jQuery implementations replaced by over-engineered React applications that do the same job worse, with more overhead and complexity. The goal becomes "be modern" instead of "be effective." But this illusion of "modern" doesn't just lead us toward bad choices; it can bring progress to a halt entirely. When we sanctify something as "modern," we subtly suggest we've arrived at the final answer. Think about modern medicine. While medical advances are remarkable, embedded in that phrase is a dangerous connotation: that we've reached the complete, final word on human health. This framing can make it difficult to question established practices or explore alternative approaches. Modern medicine didn't think it was important for doctors to wash their hands . The same happens in software development. When we declare a framework or architectural pattern "modern," we leave little room for the "next." We forget that today's groundbreaking solution is merely tomorrow's foundation or tomorrow's technical debt. Instead of modern, I prefer the terms "robust" or "stable". The most modern thing you can do is to look at any solution and ask: "How will this look obsolete in ten years?" Because everything we call "modern" today will eventually be someone else's legacy system. And that's not a bug, it's a feature. It's how progress actually works.

0 views
iDiallo 2 months ago

You are not going to turn into Google eventually

A few years back, I was running a CI/CD pipeline from a codebase that just kept failing. It pulled the code successfully, it passed the test, the docker image was built, but then it would fail. Each run took around 15 minutes to fail, meaning whatever change I made had to take at least 15 minutes before I knew if it was successful or not. Of course, it failed multiple times before I figured out a solution. When I was done, I wasn't frustrated with the small mistake I had made, I was frustrated by the time it took to get any sort of feedback. The code base itself was trivial. It was a microservice with a handful of endpoints that was only occasionally used. The amount of time it took to build was not proportional to the importance of the service. Well it took so long to build because of dependencies. Not the dependencies it actually used, but the dependencies it might use one day. The ones required because the entire build system was engineered for a fantasy future where every service, no matter how small, had to be pre-optimized to handle millions of users. This is the direct cost of building for a scale you will never reach. It’s the architectural version of buying a Formula 1 car to do your grocery shopping. It’s not just overkill, it actively makes the simple task harder, slower, and infinitely more frustrating. We operate under a dangerous assumption that our companies are inevitably on a path to become the next Google or Meta. So we build like they do, grafting their solutions onto our problems, hoping it will future-proof us. It won't. It just present-proofs us. It saddles us with complexity where none is needed, creating a drag that actually prevents the growth we're trying to engineer for. Here is why I like microservices. The concept is beautiful. Isolate a single task into a discrete, independent service. It’s the Unix philosophy applied to the web: do one thing and do it well. When a problem occurs, you should, in theory, be able to pinpoint the exact failing service, fix it, and deploy it without disrupting the rest of your application. If this sounds exactly how a simple PHP includes or a modular library works… you’re exactly right. And here is why I hate them. In practice, without Google-scale resources, microservices often create the very problems they promise to solve. You don’t end up with a few neat services; you end up with hundreds of them. You’re not in charge of maintaining all of them, and neither is anyone else. Suddenly, “pinpointing the error” is no longer a simple task. It’s a pilgrimage. You journey through logging systems, trace IDs, and distributed dashboards, hoping for an epiphany. You often return a changed man, older, wiser, and empty-handed. This is not to say to avoid microservices at all cost, but it's to focus on the problems you have at hand instead of writing code for a future that may never come. Don’t architect for a hypothetical future of billions of users. Architect for the reality of your talented small team. Build something simple, robust, and effective. Grow first, then add complexity only where and when it is absolutely necessary . When you're small, your greatest asset is agility. You can adapt quickly, pivot on a dime, and iterate rapidly. Excessive process stifles this inherent flexibility. It introduces bureaucracy, slows down decision-making, and creates unnecessary friction. Instead of adopting the heavy, restrictive frameworks of large enterprises, small teams should embrace a more ad-hoc, organic approach. Focus on clear communication, shared understanding, and direct collaboration. Let your processes evolve naturally as your team and challenges grow, rather than forcing a square peg into a round hole.

0 views
Karboosx 2 months ago

In-house parsers are easy!

Ever wanted to build your own programming language? It sounds like a huge project, but I'll show you it's not as hard as you think. In this post, we'll build one from scratch, step-by-step, covering everything from the Tokenizer and Parser to a working Interpreter, with all the code in clear PHP examples.

0 views
Grumpy Gamer 2 months ago

Comments are back

“But? Wait?” I can hear you saying, “Isn’t grumpygamer.com a static site built by Hugo? What dark magic did you use to get comments on the static site?” No dark magic. But it does involve a small php script. You can embed php in a hugo page and since grumpygamer.com is hosted on my server and it’s running php it wasn’t that hard. No tricky javascript and since it’s all hosted by me, no privacy issues. All your comments stay on my server and don’t feed Big Comment. Comments are stored in flat files so no pesky SQL databases. It only took me about a day, so all in all not bad. I may regret this. I’m only turning on comments for future posts. P.S. I will post the code and a small guide in a few days, so you too can invite the masses to critic and criticize your every word. Good times.

0 views
Evan Hahn 3 months ago

Notes from August 2025

Things I published and things I saw this August. See also: my notes from last month , which has links to all the previous months so far. Most of my work this month was on private stuff, like some contracting and a demo app for a small social group. But I published a few little things: Over on Zelda Dungeon, I wrote a big guide showing how to play every Zelda in 2025 and a deranged post about my favorite Ocarina of Time item . Got invited to speak at Longhorn PHP in October , giving a version of a Unicode talk I’ve given before . Spent some time prepping that. Speaking of Unicode, I added a new script, , to my dotfiles . Now I can run to see . Thanks to Python’s library for making it easy! I also wrote a quick script, , to convert CSV files to Markdown tables. Hopefully I’ll have more blog posts in September! I started seeding some torrents of censored US government data . Cool project. From “The Militarization of Silicon Valley” : “In a major shift, Google, OpenAI, Meta and venture capitalists—many of whom had once forsworn involvement in war—have embraced the military industrial complex.” For more, see this investigation , this Google policy update from February , or even the story of the invention of the internet . “Instead of building our own clouds, I want us to own the cloud. Keep all of the great parts about this feat of technical infrastructure, but put it in the hands of the people rather than corporations. I’m talking publicly funded, accessible, at cost cloud-services.” Via “The Future is NOT Self-Hosted” . “Today, people find it easier to imagine that we can build intelligence on silicon than we can do democracy at scale, or that we can escape arms races. It’s complete bullshit. Of course we can do democracy at scale. We’re a naturally social, altruistic, democratic species and we all have an anti-dominance intuition. This is what we’re built for.” A positive quote from an otherwise worrying article about societal collapse . “As is true with a good many tech companies, especially the giants, in the AI age, OpenAI’s products are no longer primarily aimed at consumers but at investors.” From the great Blood in the Machine newsletter . “Slide 1: car brands using curl. Slide 2: car brands sponsoring or paying for curl support”. 38 car brands are listed on the first slide, zero on the second. “How to not build the Torment Nexus” describes how I feel about the tech industry. If you work at Meta or Palantir, the most ethical thing to do is quit. I finished The Interesting Narrative of the Life of Olaudah Equiano this month. Honestly, I picked it up because it’s free in the public domain. I liked the writing style of a book published in the late 1700s, and the rambling accounts of day-to-day life. Who’s writing sentences like this nowadays: “Hitherto I had thought only slavery dreadful; but the state of a free negro appeared to me now equally so at least, and in some respects even worse, for they live in constant alarm for their liberty; and even this is but nominal, for they are universally insulted and plundered without the possibility of redress; for such is the equity of the West Indian laws, that no free negro’s evidence will be admitted in their courts of justice.” Mina the Hollower , a Zelda -inspired game by the developers of Shovel Knight , released a demo this month. I loved it! You can read my experience with the game over at Zelda Dungeon . Tatami is a casual iOS game mixing Sudoku and nonograms. Enjoyed this too. Hope you had a good August. Over on Zelda Dungeon, I wrote a big guide showing how to play every Zelda in 2025 and a deranged post about my favorite Ocarina of Time item . Got invited to speak at Longhorn PHP in October , giving a version of a Unicode talk I’ve given before . Spent some time prepping that. Speaking of Unicode, I added a new script, , to my dotfiles . Now I can run to see . Thanks to Python’s library for making it easy! I also wrote a quick script, , to convert CSV files to Markdown tables. I started seeding some torrents of censored US government data . Cool project. From “The Militarization of Silicon Valley” : “In a major shift, Google, OpenAI, Meta and venture capitalists—many of whom had once forsworn involvement in war—have embraced the military industrial complex.” For more, see this investigation , this Google policy update from February , or even the story of the invention of the internet . “Instead of building our own clouds, I want us to own the cloud. Keep all of the great parts about this feat of technical infrastructure, but put it in the hands of the people rather than corporations. I’m talking publicly funded, accessible, at cost cloud-services.” Via “The Future is NOT Self-Hosted” . “Today, people find it easier to imagine that we can build intelligence on silicon than we can do democracy at scale, or that we can escape arms races. It’s complete bullshit. Of course we can do democracy at scale. We’re a naturally social, altruistic, democratic species and we all have an anti-dominance intuition. This is what we’re built for.” A positive quote from an otherwise worrying article about societal collapse . “As is true with a good many tech companies, especially the giants, in the AI age, OpenAI’s products are no longer primarily aimed at consumers but at investors.” From the great Blood in the Machine newsletter . “Slide 1: car brands using curl. Slide 2: car brands sponsoring or paying for curl support”. 38 car brands are listed on the first slide, zero on the second. “How to not build the Torment Nexus” describes how I feel about the tech industry. If you work at Meta or Palantir, the most ethical thing to do is quit. I finished The Interesting Narrative of the Life of Olaudah Equiano this month. Honestly, I picked it up because it’s free in the public domain. I liked the writing style of a book published in the late 1700s, and the rambling accounts of day-to-day life. Who’s writing sentences like this nowadays: “Hitherto I had thought only slavery dreadful; but the state of a free negro appeared to me now equally so at least, and in some respects even worse, for they live in constant alarm for their liberty; and even this is but nominal, for they are universally insulted and plundered without the possibility of redress; for such is the equity of the West Indian laws, that no free negro’s evidence will be admitted in their courts of justice.” Mina the Hollower , a Zelda -inspired game by the developers of Shovel Knight , released a demo this month. I loved it! You can read my experience with the game over at Zelda Dungeon . Tatami is a casual iOS game mixing Sudoku and nonograms. Enjoyed this too.

0 views
Brain Baking 3 months ago

Indispensable Cloud It Yourself Software: 2025 Edition

It’s been too long since this blog published a meaningless click-bait list article so here you go. Instead of simply enumerating frequently used apps such as app defaults from late 2023 , I thought it might be fun to zoom in on the popular self-hosted branch and summarize what we are running to be able to say Fuck You to the Big Guys. Below is a list of software that we depend on categorized by usage. I’m sure you can figure out for yourself how to run these in a container on your NAS. We still have a Synology, and while I strongly dislike the custom Linux distribution’s tendency to misplace configuration files, the DSM software that comes with it is good enough to cover a lot of topics. The list excludes typical Linux sysadmin stuff such as fail2ban, ssh key setup, Samba, … Photos : PhotoPrism . It comes with WebDAV support to easily sync photos from your phone. My wife’s iPhone uses PhotoSync which works flawlessly. I’d rather also use SyncThing on iOS like I do on Android (or the latest Android client fork). SyncThing is amazing and I use it for much more than photo syncing. Streaming videos : Synology’s built-in Video Station . It’s got a lot of flaws and Jellyfin is clearly the better choice here. As for how to get the videos on there: rip & tear using good old DVDShrink on the WinXP Retro Machine! We still use the old DS Video Android app on our smart box to connect to the NAS as we don’t have a smart TV. Streaming music : Navidrome —see How To Stream Your Own Music: Reprise for more info on which clients we use and why caching some albums locally is good enough. As for how to get the music on there: rip & tear using A Better CD Encoder ; or for Win98 lovers; WinGrab. Backups : Restic —see Verify Your Backup Strategy to see how this automatically works from multiple machines. Smart Home : Home Assistant with a HomeWizard P1 meter that monitors our gas/electricity usage locally instead of sending it god knows where. We only use the bare minimum features, I’m not a big Smart Home fan. I suppose WireGuard should also be in this category but for now I refuse to enable the possibility to dial home . Ads/DHCP : Pi-Hole . That wonderful piece of software blocks almost 15% of our daily traffic—see Six Months With A Pi-Hole . We also use it as a DHCP server to have more control over DNS. Wi-Fi : TP-Link Omada Controller that provisions and secures our access points locally instead of poking through the real cloud for no reason at all. Git : Gitea although I should really migrate to Forgejo. The NAS hosts my private projects, I have another instance on the VPS for public ones. RSS : FreshRSS . Until recently, just NetNewsWire as an RSS client did just fine but I sometimes caught myself doomscrolling on my phone so figured instead I’d scroll on other people’s blogs. NetNewsWire supports it so my reading behaviour doesn’t change on the computer. Pair with Readrops on Android that also caches entries so if I’m disconnected from the home network I can still read interesting stuff. I do not see the appeal of cloud-based office software so simply rely on LibreOffice to do its thing locally—no need for NextCloud, but it’s there if you want to. Speaking of which, I still use DEVONthink and of course Obsidian to manage my personal files/databases that hook into the above using SyncThing and Restic. Abandoned software: RSS Bridge (no longer needed), Watchtower (too complex for my simple setup), some kind of PHP-based accounting software I already forgot about. Software running publicly on the VPS: Radicale CardDAV/CalDAV server (I want this to be accessible outside of the NAS), another Gitea instance, Nginx (I really need to migrate to Caddy) et al. Related topics: / self-hosted / NAS / lists / By Wouter Groeneveld on 21 August 2025.  Reply via email .

0 views
James Stanley 5 months ago

The Story of Max, a Real Programmer

This is a story about Imagebin. Imagebin is the longest-lived software project that I still maintain. I'm the only user. I use it to host images, mainly to include in my blog, sometimes for sharing in other places. Imagebin's oldest changelog entry is dated May 2010, but I know it had already existed for about a year before I had the idea of keeping a changelog. Here's an image hosted by Imagebin: For years Imagebin was wide open to the public and anybody could upload their own images to it. Almost nobody did. But a couple of years ago I finally put a password on it during a paranoid spell. But actually this is not a story about Imagebin. This is a story about the boy genius who wrote it, and the ways of his craft. Lest a whole new generation of programmers grow up in ignorance of this glorious past, I feel duty-bound to describe, as best I can through the generation gap, how a Real Programmer wrote code. I'll call him Max, because that was his name. Max was a school friend of mine. He didn't like to use his surname on the internet, because that was the style at the time, so I won't share his surname. Max disappeared from our lives shortly after he went to university. We think he probably got recruited by MI6 or something. This weekend I set about rewriting Max's Imagebin in Go so that my server wouldn't have to run PHP any more. And so that I could rid myself of all the distasteful shit you find in PHP written by children 15+ years ago. I don't remember exactly what provoked him to write Imagebin, and I'm not even certain that "Imagebin" is what he called it. That might just be what I called it. I was struck by how much better Max's code is than mine! For all its flaws, Max's code is simple . It just does what it needs to do and gets out of the way. Max's Imagebin is a single 233-line PHP script, interleaving code and HTML, of which the first 48 lines are a changelog of what I have done to it since inheriting it from Max. So call it 185 lines of code. At school, Max used to carry around a HP 620LX in his blazer pocket. Remember this was a time before anyone had seen a smartphone. Sure, you had PDAs, but they sucked because they didn't have real keyboards. The HP 620LX was a palmtop computer , the height of coolness. My Go version is 205 lines of code plus a 100-line HTML template, which is stored in an entire separate file. So call it 305 lines plus complexity penalty for the extra file. And my version is no better! And my version requires a build step, and you need to leave the daemon running. With Max's version you just stick the PHP file on the server and it runs whenever the web server asks it to. And btw this is my third attempt at doing this in Go. I had to keep making a conscious effort not to make it even more complicated than this. And some part of me doesn't even understand why my Go version is so much bigger. None of it looks extraneous. It has a few quality-of-life features, like automatically creating the directories if they don't already exist, and supporting multiple files in a single upload, But nothing that should make it twice as big. Are our tools just worse now? Was early 2000s PHP actually good? While I was writing this, I noticed something else: Max's code doesn't define any functions! It's just a single straight line. Upload handling, HTML header, thumbnail code, HTML footer. When you put it like that, it's kind of surprising that it's so large . It hardly does anything at all! Max didn't need a templating engine, he just wrote HTML and put his code inside <?php tags. Max didn't need a request router, he just put his PHP file at the right place on the disk. Max didn't need a request object, he just got query parameters out of $_GET . Max didn't need a response writer, he just printed to stdout . And Max didn't need version control, he just copied the file to index.php.bak if he was worried he might want to revert his changes. You might think that Max's practices make for a maintenance nightmare. But I've been "maintaining" it for the last 15 years and I haven't found it to be a nightmare. It's so simple that nothing goes wrong. I expect I'd have much more trouble getting my Go code to keep running for the next 15 years. And yeah we all scoff at these stupid outdated practices, but what's our answer? We make a grand effort to write a simpler, better, modern replacement, and it ends up twice as complicated and worse? The reason the Go code is so much bigger is because it checks and (kind of) handles errors everywhere (?) they could occur. The PHP code just ignores them and flies on through regardless. But even if you get rid of checking for the more unlikely error cases, the Go version is longer. It's longer because it's structured across several functions, and with a separate template. The Go version is Designed . It's Engineered . But it's not better . I think there are lessons to (re-)learn from Max's style. You don't always have to make everything into a big structure with lots of moving parts. Sometimes you're allowed to just write simple straight-line code. Sometimes that's fine. Sometimes that's better. Longevity doesn't always come from engineering sophistication. Just as often, longevity comes from simplicity . To be perfectly honest, as a teenager I never thought Max was all that great at programming. I thought his style was overly-simplistic. I thought he just didn't know any better. But 15 years on, I now see that the simplicity that I dismissed as naive was actually what made his code great. Whether that simplicity came from wisdom or from naivety doesn't matter. The result speaks for itself. So I'm not going to bother running my Go version of the Imagebin. I'm going to leave Max's code in place, and I'm going to let my server keep running PHP. And I think that's how it should be. I didn't feel comfortable hacking up the code of a Real Programmer.

0 views
James Stanley 1 years ago

Prompts as source code: a vision for the future of programming

I'm going to present a vision for the future of programming in which programmers don't work with source code any more. The idea is that prompts will be to source code as source code is to binaries. In the beginning (I claim) there were only binaries, and without loss of generality, assembly language. (If you think binaries and assembly language are too far apart to lump together: keep up grandad, you're thinking too low-level; just wait until the further future where source code and binaries are too close together to distinguish!). Then somebody invented the compiler . And now it was possible to write code in a more natural language and have the machine automatically turn it into binaries! And we saw that it was good. As hardware resources grew, the compilers' capabilities grew, and now the idea that there was programming before compilers is pretty weird to new developers. Almost noone is writing assembly language and even fewer write bare machine code. Now take LLMs. If you create software using an LLM today, you probably give an initial prompt to get started, and then you refine the generated source code by giving follow-up prompts to ask for changes, and you never revisit your initial prompt. It's just a series of "patches" created by follow-up prompts. This is like programming by writing source code once, compiling it, and then throwing the source code away and working directly on the binary with incremental patches! Which is just obviously crazy. So here's my outline for "prompts as source code": The prompts will be committed to git, the generated source code will not. The prompts will be big, and split across multiple files just like source code is now, except it's all freeform text. We just give the LLM a directory tree full of text files and ask it to write the program. The prompts will be unimaginably large by today's standards. Compare the size of the Linux or Firefox source trees to the total amount of machine code that had ever been written in the entire history of the world before the first compiler was invented. (To spell it out: the future will contain LLM prompts that are larger than all of the source code that humanity combined has ever written in total up to this point in time.) Our build system will say which exact version of the LLM you're using, and it will be evaluated deterministically so that everybody gets the same output from the same prompt (reproducible builds). The LLMs will be bigger than they are today, have larger context windows, etc., and as the LLMs improve, and our understanding of how to work with them improves, we'll gain confidence that small changes to the prompt have correspondingly small changes in the resulting program. It basically turns into writing a natural language specification for the application, but the specification is version-controlled and deterministically turns into the actual application. Human beings will only look at the generated source code in rare cases (how often do you look at assembly code today?). Normally they'll just use their tooling to automatically build and run the application directly from the prompts. You'll be able to include inline code snippets in the prompt, of course. That's a bit like including inline assembly language in your source code. And you could imagine the tooling could let you include some literal code files that the LLM won't touch, but will be aware of, and will be included verbatim in the output. That's a bit like linking with precompiled object files. Once you have a first version that you like, there could be a "backwards pass" where an LLM looks at the generated source code and fills in all the gaps in the specification to clarify the details, so that if you then make a small change to the prompt you're more likely to get only a small change in the program. You could imagine the tooling automatically running the backwards pass every time you build it, so that you can see in your prompts exactly what assumptions you're baking in. That's my vision for the future of programming. Basically everything that today interacts with source code and/or binaries, we shift one level up so that it interacts with prompts and/or source code. What do you think? Although we could make an initial stab at the tooling today, I feel like current LLMs aren't quite up to the job: context windows are too small for all but toy applications (OK, you might fit your spec in the context window, but you also want the LLM to do some chain-of-thought before it starts writing code) as far as I know, it's not possible to run the best LLMs (Claude, gpt4o) deterministically, and even if it was they are cloud-hosted and proprietary, and that is an extremely shaky foundation for a new system of programming you could use Llama 405b, but GPUs are too expensive and too slow we'd need the LLMs to be extraordinarily intelligent and able to follow every tiny detail in the prompt, in order for small changes to the prompt not to result in random bugs getting switched on/off, the UI randomly changing, file formats randomly changing, etc. I haven't quite figured out how you "update" the LLM without breaking your program; you wouldn't want to be stuck on the same version forever, this feels similar to but harder than the problem of switching PHP versions for example

0 views
W. Jason Gilmore 1 years ago

Technical Due Diligence - Relational Databases

Despite the relative popularity of NoSQL and graph databases, relational databases like MySQL, SQL Server, Oracle, and PostgreSQL continue to be indispensable for storing and managing software application data. Because of this, technical due diligence teams are practically guaranteed to encounter them within almost any project. Novice team members will gravitate towards understanding the schema which is of course important but only paints a small part of the overall risk picture. A complete research and risk assessment will additionally include information about the following database characteristics: I identify these three characteristics because technical due diligence is all about identifying and quantifying risk assessment , and not about nerding out over the merit of past decisions. The importance of quantifying risk assessment is in most cases no greater than when evaluating the software product's data store, for several reasons: Be sure to confirm all database licenses are in compliance with the company's use case, and if the database is commercially licensed you'll need to additional confirm the available features and support contract are in line with expectations. To highlight the importance of this verification work I'll point out a few ways in which expectations might not be met: All mainstream databases (MySQL, Oracle, PostgreSQL, etc) will have well-defined end-of-life (EOL) dates associated with each release. The EOL date identifies the last date in which that particular version will receive security patches. Therefore it is critical to determine what database versions are running in production in order to determine whether the database has potentially been running in an unpatched state. For instance MySQL 5.7 has an EOL date of October 25, 2023, and therefore if the seller's product is still running MySQL 5.7 after that date then it is in danger of falling prey to any vulnerabilities identified after that EOL date. Of course, the EOL date isn't the only issue at play here. If the database version hasn't reached its EOL date then you should still determine whether the database has been patched appropriately. For instance as of the time of this writing MySQL 8.2 was released only 9 months ago (on October 12, 2023) and there are already 11 known vulnerabilities . It's entirely possible that none of these vulnerabilities are exploitable in the context of the seller's product, however its nonetheless important to catalog these possibilities and supply this information to the buyer. In my experience where there is smoke there is fire and unpatched software is often symptomatic of much larger issues associated with technical debt and a lack of developer discipline. Enterprise web applications will typically run in an N-Tier architecture, meaning the web, data, caching, and job processing components can all be separately managed and scaled. This configuration means each tier will often run on separate servers and therefore a network connection between the database and web application tiers will need to be configured. Most databases can be configured to allow for connections from anywhere (almost invariably a bad idea), which is precisely what you don't want to see when that database is only intended to be used by the web application because it means malicious third-parties have a shot at successfully logging in should they gain access to or guess the credentials. Connecting users will be associated with a set of privileges which define what the user can do once connected to the database. It is considered best practice to assign those users the minimum privileges required to carry out their tasks. Therefore a database user which is intended to furnish information to a data visualization dashboard should be configured with read-only privileges, whereas a customer relationship management (CRM) application would require a database user possessing CRUD (create, retrieve, update, delete) database privileges. Therefore when examining database connectivity and privileges you should at a minimum answer the following questions: Satisfying this review requirement is relatively straightforward, and completed in two steps: Performance Disaster Recovery Poor security practices open up the possibility of application data having already been stolen, or in danger of being imminently stolen, placing the buyer in legal danger. Poor performance due to inadequate or incorrect indexing, insufficient resourcing, or a combination of the two might result in disgruntled customers who are considering cancelling their subscription. Some of these customers may be major contributors to company revenue, severely damaging the company's outlook should they wind up departing following acquisition. A lack of disaster recovery planning outs the buyer in greater short-term risk following acquisition due to an outage which may occur precisely at a time when personnel are not available or are not entirely up to speed. The buyer requires the data to be encrypted at-rest due to regulatory issues, however the product data is in fact not encrypted-at-rest due to use of the Heroku Essential Postgres tier which does not offer this feature. There could possibly be an easy fix here which involves simply upgrading to a tier which does support encryption-at-rest, however you should receive vendor confirmation (in writing) that encryption is indeed possible as a result of upgrading, and whether any downtime will be required to achieve this requirement. The buyer's downtime expectations are more strict than what is defined by the cloud service provider's SLA. TODO TALK ABOUT MEMSQL What users are defined and active on the production databases, and from what IP addresses / hostnames are they accessible? Is the database server accessible to the wider internet and if so, why? What privileges do the defined database users possess, and why? To what corporate applications are production databases connected? This includes the customer-facing application, business intelligence software, backup services, and so forth. What other non-production databases exist? Where is production data replicated? Are these destinations located within jurisdictions compliant by the laws and SLA under which the buyer's target IP operates? From a security standpoint, data is often defined as being encrypted at-rest and in-transit , the former referring to its encryption state when residing in the database or on server, and the latter referring to its encryption state when being transferred from the application to the requesting user or service. You'll want to determine whether these two best practices are implemented. If the data is not encrypted at-rest (which is typical and not necessarily an issue for many use cases), then how is sensitive data like passwords encrypted (or hashed)? You often won't be able to determine this by looking at the database itself; web frameworks will typically dictate the password hashing scheme, such as Laravel's use of the Bcrypt algorithm for this purpose.

0 views
dfir.ch 1 years ago

From Dangerous PHP Functions to Webshell Hunting

This blog post discusses how to enhance PHP security using the disable_functions directive, which prevents specific PHP functions from being executed. We further explore webshell detection techniques, highlighting the challenges of identifying webshells using Yara rules, proposing alternatives like manual analysis, frequency analysis of web server logs, and utilizing tools like Velociraptor and UAC along the way. Introduction The disable_functions directive in PHP is a security feature that allows administrators to disable specific PHP functions from being executed within PHP scripts.

0 views
W. Jason Gilmore 1 years ago

Minimal SaaS Technical Due Diligence

For more than six years now I've been deeply involved in and in recent years leading Xenon Partners ' technical due diligence practice. This means that when we issue an LOI (Letter of Intent) to acquire a company, it's my responsibility to dig deep, very deep, into the often arcane technical details associated with the seller's SaaS product. Over this period I've either been materially involved in or led technical due diligence for DreamFactory , Baremetrics , Treehouse , Packagecloud , Appsembler , UXPin , Flightpath Finance , as well as several other companies. While I've perhaps not seen it all, I've seen a lot, and these days whenever SaaS M&A comes up in conversation, I tend to assume the thousand-yard stare , because this stuff is hard . The uninitiated might be under the impression that SaaS technical due diligence involves "understanding the code". In reality, the code review is but one of many activities that must be completed, and in the grand scheme of things I wouldn't even put it in the top three tasks in terms of priority. Further complicating the situation is the fact that sometimes due to circumstances beyond our control we need to close a deal under unusually tight deadlines, meaning it is critically important that this process is carried out with extreme efficiency. Due to the growing prevalence of SaaS acquisition marketplaces like Acquire.com and Microassets , lately I've been wondering what advice I would impart to somebody who wants to acquire a SaaS company yet who possesses relatively little time, resources, and money. What would be the absolute minimum requirements necessary to reduce acquisition risk to an acceptable level? This is a really interesting question, and I suppose I'd focus on the following tasks. Keep in mind this list is specific to the technology side of due diligence; there are also financial, operational, marketing, legal, and HR considerations that also need to be addressed during this critical period. I am not a lawyer, nor an accountant, and therefore do not construe anything I say on this blog as being sound advice. Further, in this post I'm focused on minimal technical due diligence here, and largely assuming you're reading this because you're interested in purchasing a micro-SaaS or otherwise one run by an extraordinarily small team. For larger due diligence projects there are plenty of other critical tasks to consider, including technical team interviews. Perhaps I'll touch upon these topics in future posts. Please note I did not suggest asking for architectural diagrams. Of course you should ask for them, but you should not believe a single thing you see on the off chance they even exist. They'll tell you they do exist, but they likely do not. If they do exist, they are almost certainly outdated or entirely wrong. But I digress. On your very first scheduled technical call, open a diagramming tool like Draw.io and ask the seller's technical representative to please begin describing the product's architecture. If they clam up or are unwilling to do so (it happens), then start drawing what you believe to be true, because when you incorrectly draw or label part of the infrastructure, the technical representative will suddenly become very compelled to speak up and correct you. These diagrams don't have to be particularly organized nor aesthetically pleasing; they just need to graphically convey as much information as possible about the application, infrastructure, third-party services, and anything else of relevance. Here's an example diagram I created on Draw.io for the purposes of this post: Don't limit yourself to creating a single diagram! I suggest additionally creating diagrams for the following: We have very few requirements that if not met will wind up in a deal getting delayed or even torpedoed, however one of them is that somebody on our team must successfully build the development environment on their local laptop and subsequently successfully deploy to production. This is so important that we will not acquire the company until these two steps are completed . These steps are critical because in completing them you confirm: Keep in mind you don't need to add a new feature or fix a bug in order to complete this task (although either would be a bonus). You could do something as simple as add a comment or fix a typo. Keep in mind at this phase of the acquisition process you should steadfastly remain in "do no harm" mode, and are only trying to confirm your ability to successfully deploy the code, not make radical improvements to the code. This isn't strictly a technical task, but it's so important that I'm going to color outside the lines and at least mention it here. The software product you are considering purchasing is almost unquestionably built atop the very same enormous open source (OSS) ecosystem upon which our entire world has benefited. There is nothing at all wrong with this, and in fact I'd be concerned if it wasn't the case, however you need to understand that there are very real legal risks associated with OSS licensing conflicts. As I've already made clear early in this post, I am not a lawyer so I'm not going to offer any additional thoughts regarding the potential business risks other than to simply bring the possibility to your attention. The software may additionally very well rely upon commercially licensed third-party software, and it is incredibly important that you know whether this is the case. If so, what are the terms of the agreement? Has the licensing fee already been paid in full, or is it due annually? What is the business risk if this licensor suddenly triples fees? There are actually a few great OSS tools that can help with dependency audits. Here are a few I've used in the past: That said for reasons I won't go into here because again, IANAL, it is incumbent upon the seller to disclose licensing issues . The buyer should only be acting as an auditor, and not the investigatory lead with regards to potential intellectual property conflicts. You should always retain legal counsel for these sorts of transactions. Finally, if the software relies on third-party services (e.g OpenAI APIs) to function (it almost certainly does), many of the same aforementioned questions apply. How critical are these third-party services? At some point down the road could you reasonably swap them out for a better or more cost-effective alternative? A penetration test (pen test) is an authorized third-party cybersecurity attack on a computer system. In my experience for SaaS products these pen tests cost anywhere between $5K and $10K and take 1-2 weeks to complete once scheduled. A lengthy report is typically delivered by the pen tester, at which point the company can dispute/clarify the findings or resolve the security issues and arrange for a subsequent test. Also in my experience, if you're interested in purchasing a relatively small SaaS with no employees other than the founder, it's a practical certainty the product has never been pen tested. Further, if the SaaS is web-based and isn't using a web framework such as Ruby on Rails or Laravel, for more reasons than I could possibly enumerate here I'd be willing to bet there are gaping security holes in the product (SQL injection, cross-site scripting attack, etc) which may have already been compromised. Therefore you should be sure to ask if a pen test has recently been completed, and if so ask for the report and information about any subsequent security-related resolutions. If one has not been completed, then it is perfectly reasonable to ask (in writing) why this has not been the case, and whether the seller can attest to the fact that the software is not known to have been compromised. If the answers to these questions are not satisfactory, then you might consider asking the seller to complete a pen test, or ask if you can arrange for one on your own dime. If you're sufficiently technical and have a general familiarity of cybersecurity concepts such as the OWASP Top Ten , then you could conceivably lower the costs associated with this task by taking a DIY approach. Here is a great list of bug bounty tools that could be used for penetration test purposes. That said, please understand that you should in no circumstances use these tools to test a potential seller's web application without their written permission! If you think the SaaS you're considering buying doesn't have any technical debt, then consider the fact that even the largest and most successful software products in the world are filled with it: That said, due to perfectly reasonable decisions made literally years ago, it is entirely possible that this "UI change" isn't fixable in 3 months, let alone 3 days. And there is a chance it can't ever be reasonably fixed, and anybody who has built sufficiently complicated software is well aware as to why. Technical debt is a natural outcome of writing software, and there's nothing necessarily wrong with it provided your acknowledge its existence and associated risks. But there are limits to risk tolerances, and if the target SaaS is running on operating systems, frameworks, and libraries that have long since been deprecated and are no longer able to receive maintenance and security updates, then I think it is important to recognize that you're probably going to be facing some unwelcome challenges in the near term as you update the software and infrastructure instead of focusing on the actual business. Of everything that comprises technical due diligence there is nothing that makes me break out into a sweat more than this topic. Any SaaS product will rely upon numerous if not dozens of credentials. GSuite, AWS, Jenkins, Atlassian, Trello, Sentry, Forge, Twitter, Slack... this list is endless. Not to mention SSH keys, 2FA settings, bank accounts, references to founder PII such as birthdates and so forth. In a perfect world all of this information would be tidily managed inside a dedicated password manager, but guess what it's probably not. I cannot possibly impress upon you in this post how important it is for you to aggressively ask for, review, and confirm access to everything required to run this business because once the paperwork is signed and money transferred, it's possible the seller will be a lot less responsive to your requests. Ensuring access to all credentials is so critical that you might consider structuring the deal to indicate that part of the purchase price will be paid at some future point in time (90 days from close for example) in order to ensure the founder remains in contact with you for a period of time following acquisition. This will give you the opportunity to follow up via email/Zoom and gain access to services and systems that were previously missed. This blog post barely scratches the surface in terms of what I typically go through during a full-fledged due diligence process, but I wanted to give interested readers a basic baseline understanding of the minimum requirements necessary to assuage my personal levels of paranoia. If you have any questions about this process, feel free to hit me up at @wjgilmore , message me on LinkedIn , or email me at [email protected] . Cloud infrastructure: For instance if the seller is using AWS then try to identify the compute instance sizes, RDS instances, security groups, monitoring services, etc. The importance of diagramming the cloud infrastructure becomes even more critical if Kubernetes or other containerized workload solutions are implemented, not only due to the additional complexity but also because frankly in my experience these sorts of solutions tend to not be implemented particularly well. Deployment strategy: If CI/CD is used, what does the deployment process look like? What branch triggers deployments to staging and production? Is a test suite run as part of the deployment process? How is the team notified of successful and failed deployments? You're able to successfully clone the (presumably private) repository and configure the local environment. You're able to update the code, submit a pull request, and participate in the subsequent review, testing, and deployment process (if one even exists) You've gained insights into what technologies, processes, and services are used to manage company credentials, build the development environment, merge code into production, run tests, and trigger deployments. LicenseFinder

0 views
W. Jason Gilmore 1 years ago

Disabling SSL Validation in Bruno

I use Laravel Herd to manage local Laravel development environments. Among many other things, it can generate self-signed SSL certificates. This is very useful however modern browsers and other HTTP utilities tend to complain about these certificates. Fortunately it's easy to disable SSL validation by opening Bruno, navigating to under the menu heading, and unchecking .

0 views

The Best "Hello World" in Web Development

Here’s how you make a webpage that says “Hello World” in PHP: Name that file and you’re set. Awesome. Version 1 of our website looks like this: Okay, we can do a little better. Let’s add the HTML doctype and element to make it a legal HTML5 page , an header to give the “Hello World” some heft, and a paragraph to tell our visitor where they are. This is a complete webpage! If you host this single file at one of the many places available to host PHP code, it will show that webpage to everyone who visits your website. Here’s Version 2 : Now let’s make Version 3 comic sans! And baby blue! We just have to add a style tag: Already our webpage has a little bit of personality, and we’ve spent just a couple minutes on it. At each step we could see the website in a browser, and keep adding to it. We haven’t even used any PHP yet—all this is plain old HTML, which is much easier to understand than PHP. This is the best “Hello World” in web development, and possibly all of programming. The thing that first got me interested in PHP in the first place is a comment that Ruby on Rails creator David Heinemeier Hansson made on the “CoRecursive” podcast , about PHP’s influence on Rails: […] the other inspiration, which was from PHP, where you literally could do a one line thing that said, “Print hello world,” and it would show a web page. It would show Hello World on a web page. You just drop that file into the correct folder, and you were on the web […] I think to this day still unsurpassed ease of Hello World. He’s right—this is an unsurpassed ease of Hello World. It is certainly not surpassed by Ruby on Rails, the “Getting Started” guide for which not only requires installing ruby, SQLite, and Rails itself, but also has you run an initialization command ( ) that creates a genuinely shocking number of files and directories : Of course, Rails is doing a lot of stuff for you! It’s setting up a unit test framework, a blog content folder, a database schema, whatever is, and so on. If I wanted all of that, then Rails might be the way to go. But right now I want to make a webpage that says “Hello World” and start adding content to it; I should not have to figure out what a is to do that. As a reminder, here’s the directory structure for our “Hello World” in PHP: My goal here isn’t to rip on Ruby on Rails—although, is a Dockerfile really necessary when you’re just “Getting Started”?—but to highlight a problem that is shared by basically every general-purpose programming language: using Ruby for web development requires a discomfiting amount of scaffolding. Over in the Python ecosystem, one of the first web development frameworks you will encounter is flask , which is a much lighter-weight framework than Rails. In flask, you can also get the “Hello World” down to one file, sort of: Even here, there are a ton of concepts to wrap your head around: you have to understand basic coding constructs like “functions” and “imports”, as well as Python’s syntax for describing these things; you have to figure out how to install Python, how to install Python packages like , and how to run Python environment management tool like (a truly bizarre kludge that Python developers insist isn’t that big of a deal but is absolutely insane if you come from any other modern programming environment); I know we said one file earlier, but if you want this work on a server you’re going to have to document that you installed flask, using a file like ; when you start to add more content you’re going to have to figure out how to do multiline strings; and what’s going with the inscrutable ? If any of these concepts aren’t arranged properly—in your head and in your file—your server will display nothing. By contrast, you don’t have to know a thing about PHP to start writing PHP code. Hell, you barely have to know the command line. If you can manage to install and run a PHP server this file will simply display in your browser. And the file itself: You didn’t have to think about dependencies, or routing, or , or language constructs, or any of that stuff. You’re just running PHP. And you’re on the web. The Time-To-Hello-World test is about the time between when you have an idea and when you are able to see the seed of its expression. That time is crucial—it’s when your idea is in its most mortal state. Years before my friend Morry really knew how to code, he was able to kludge together enough PHP of make a website that tells you whether your IP address has 69 in it. It basically looks like this: You may or may not find that to be a compelling work of art, but it would not exist if spinning up Flask boilerplate were a requirement to do it. And he had taken a CS course in basic Python; the experience of making a website in PHP was just that much better. This turned out to be the first in a long line of internet art projects, some of which we made together and some of which he did on his own . doesmyipaddresshave69init.com is a dumb idea for a website. But sometimes dumb ideas evolve into good ideas, or they teach you something that’s later used to make a good idea, or they just make you chuckle. Or none of the above. The best thing about websites is that you don’t have to justify them to anyone—you can just make them. And PHP is still the fastest way to make a “dynamic” website. I recently made a little invoice generator with a local browser interface for my freelance business. It works great! It’s got a homepage with a list of my generated invoices, a route for making a new one, and routes to view each invoice in a printable format. I don’t find the boilerplate required to make a RESTful web service in NodeJS especially onerous—I have a pretty good system for it at this point. But PHP brings the time-to-hello-world down tremendously. I just don’t think this would have gotten off the ground if I had to setup ExpressJS, copy my router boilerplate, make 2 files for each route (the template and the javascript that serves it), and do all the other things I do to structure web-apps in Node. Instead, I got all that stuff built-in with vanilla PHP, and that will presumably work for as long as PHP does. I didn’t even have to touch the package manager . A lot of people have the attitude that writing vanilla code (and vanilla PHP especially) is never okay because you need secure-by-default frameworks to ensure that you don’t make any security mistakes. It clearly true that if you are building professional software you should be aware of the web security model and make informed decisions about the security model of your application; not everyone is building professional software. Relatedly, one route to becoming is a software professional is to have a delightful experience as a software amateur. I believe that more people should use the internet not just as consumers, but as creators (not of content but of internet ). There is a lot of creativity on the web that can be unlocked by making web development more accessible to artists, enthusiasts, hobbyists, and non-web developers of all types. The softer the learning curve of getting online, the more people will build, share, play, and create there. Softening the learning curve means making the common things easy and not introducing too many concepts until you hit the point where you need them. Beginners and experts alike benefit. Thanks to Nathaniel Sabanski and Al Sweigart for their feedback on a draft of this blog. I wrote this blog at Recurse Center , a terrific programming community that you should check out. My example invoice generator is not meant to be put online, so it doesn’t escape text to prevent XSS attacks, or do the other web security basics . Admittedly, some of PHP’s design decisions really lend themselves to insecure code. For starters, they really need a short echo tag that auto-escapes. At this time, I don’t think I’m going to start defaulting to PHP for client work. I’m very comfortable in JS for general-purpose dynamic programming, and JS has a bunch of other useful web built-ins that PHP does not. I am definitely going to do more web art in PHP, though. I especially like how compact and shareable it can be, which has tremendous value for certain types of code. PHP is also missing a bunch of stuff I consider really important to writing RESTFUL web services, that makes pre-processing your requests close to mandatory. Big ones for me include removing the file extension from the URL, and PUT/DELETE support. Yes, I’m aware of well-known opinion-haver David Heinemeier Hansson’s other opinions. Some of them are right and some of them are wrong. More languages should have a “thing” that they are “for.” Maybe I’ll write about how awk rekindled my love for programming next.

0 views
W. Jason Gilmore 2 years ago

Blitz Building with AI

In August, 2023 I launched two new SaaS products: EmailReputationAPI and BlogIgnite . While neither are exactly moonshots in terms of technical complexity, both solve very real problems that I've personally encountered while working at multiple organizations. EmailReputationAPI scores an email address to determine validity, both in terms of whether it is syntactically valid and is deliverable (via MX record existence verification), as well has the likelihood there is a human on the other end (by comparing the domain to a large and growing database of anonymized domains). BlogIgnite serves as a writing prompt assistant, using AI to generate a draft article, as well as associated SEO metadata and related article ideas. Launching a SaaS isn't by itself a particularly groundbreaking task these days, however building and launching such a product in less than 24 hours might be somewhat more notable accomplishment. And that's exactly what I did for both products, deploying MVPs approximately 15 hours after writing the first line of code. Both are written using the Laravel framework , a technology I happen to know pretty well . However there is simply no way this self-imposed deadline would have been met without leaning heavily on artificial intelligence. I am convinced AI coding assistants are opening up the possibility of rapidly creating, or blitzbuilding , new software products. The goal of blitzbuilding is not to create a perfect or even a high-quality product! Instead, the goal is to minimize business risk incurred via a prolonged development cycle by embracing AI to assist with the creation of a marketable product in the shortest amount of time. The term blitzbuilding is a tip of the cap to LinkedIn founder Reid Hoffman's book, "Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies", in which he describes techniques for growing a company as rapidly as possible. The chosen technology stack isn't important by itself, however it is critical that you know it reasonably well otherwise the AI will give advice and offer code completions that can't easily be confirmed as correct. In my case, EmailReputationAPI and BlogIgnite are built atop the Laravel framework, use the MySQL database, with Redis used for job queuing. They are hosted on Digital Ocean, and deployed with Laravel Forge. Stripe is used for payment processing. The common thread here is I am quite familiar with all of these technologies and platforms. Blitz building is not a time for experimenting with new technologies, because you will only get bogged down in the learning process. The coding AI is GitHub Copilot with the Chat functionality. At the time of this writing the Chat feature is only available in a limited beta, but it is already extraordinarily capable insomuch that I consider it indispensable. Among many things it can generate tests, offer feedback, and even explain highlighted code. GitHub Copilot Chat runs in a VS Code sidebar tab, like this: Notably missing from these products is JavaScript (to be perfectly accurate, there are miniscule bits of JavaScript found on both sites due to unavoidable responsive layout behavior) and custom CSS. I don't like writing JavaScript, and am terrible at CSS and so leaned heavily on Tailwind UI for layouts and components. 24 hours will be over before you know it, and so it is critical to clearly define the minimum acceptable set of product requirements. Once defined, cut the list in half, and then cut it again. To the bone. Believe me, that list should be much smaller than you believe. For EmailReputationAPI, that initial list consisted of the following: There are now plenty of additional EmailReputationAPI features, such as a Laravel package , but most didn't make the list critical to the first release and so were delayed. It is critical to not only understand but be fine with the fact you will not be happy with the MVP . It won't include all of the features you want, and some of the deployed features may even be broken. It doesn't matter anywhere near as much as you think. What does matter is putting the MVP in front of as many people as possible in order to gather feedback and hopefully customers. I hate CSS with the heat of a thousand suns, largely because I've never taken the time to learn it well and so find it frustrating. I'm also horrible at design. I'd imagine these traits are shared by many full stack developers, which explains why the Bootstrap CSS library was such a huge hit years ago, and why the Tailwind CSS framework is so popular today. Both help design-challenged developers like myself build acceptable user interfaces. That said, I still don't understand what most of the Tailwind classes even do, but fortunately GitHub Copilot is a great tutor. Take for example the following stylized button: I have no idea what the classes , , etc do, but can ask Copilot chat to explain: I also use Copilot chat to offer code suggestions. One common request pertains to component alignment. For instance I wanted to add the BlogIgnite logo to the login and registration forms, however the alignment was off: I know the logo should be aligned with Tailwind's alignment classes, but have no clue what they are nor do I care. So after adding the logo to the page I asked Copilot . It responded with: After updating the image classes, the logo was aligned as desired: Even when using the AI assistant the turnaround time is such that your product will inevitably have a few bugs. Focus on fixing what your early users report, because those issues are probably related to the application surfaces other users will also use. Sometime soon I'd love to experiment with using a local LLM such as Code Llama in conjunction with an error reporting tool to generate patches and issue pull requests. At some point in the near future I could even see this process as being entirely automated, with the AI additionally writing companion tests. If those tests pass the AI will merge the pull request and push to production, with no humans involved! Have any questions or thoughts about this post? Contact Jason at [email protected] . Marketing website Account management (register, login, trial, paid account) Crawlers and parsers to generate various datasets (valid TLDs, anonymized domains, etc) Secure API endpoint and documentation Stripe integration

0 views
usher.dev 3 years ago

Django on Fly.io with Litestream/LiteFS

One of the neat things that has come out of Fly is a renewed interest across the dev world in SQLite - an embedded database that doesn't need any special servers or delicate configuration. Some part of this interest comes from the thought that if you had an SQLite database that sat right next to your application, in the same VM, with no network latency, that's probably going to be pretty quick and pretty easy to deploy. Although in some ways it feels like this idea comes full circle back to the days of running a MySQL Server alongside our PHP application on a single VPS, we're also in an era where we need to deal with things like geographic distribution, ephemeral filesystems and scale-to-zero. So we want to run our apps in a nice PaaS, and also quite like the idea of our database being local to our application code, but there's a few conflicts here: Thankfully Fly have been funding the development of some interesting tools; Litestream and LiteFS which aim to solve this. The difference between these tools is not particularly obvious; so to summarise: Litestream was Ben Johnson's first attempt at solving this problem, and is now focused primarily on disaster recovery. It's a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to. This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs. Unfortunately, it doesn't solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS. LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It's a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict! Alas, our first conflict isn't yet solved by LiteFS - if all your nodes go away, there's nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems! So we know what these tools do, let's experiment with getting our Django applications running with them on Fly.io For Litestream, we'll need: Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to, is the path where Django and can find your database file, and is the path to your S3-compatible bucket. to import these values in to your Fly environment. Create a : Replace your section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits. Create a script, , that will run on application start to make sure all our directories are created: Update your Docker to run this . Once deployed with , Litestream will start backing up your database. Careful, if you try to scale out by adding more instances, at best you'll see out of sync data, at worst you'll end up with a corrupt database. For LiteFS, we'll need: Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to and is the path where Django and can find your database file. to import these values in to your Fly environment. In your , add: This gives us access to the shared Fly.io-managed Consul instance. Create a : Replace your section with whatever you you normally run to start your web server. The is where LiteFS will create its filesystem (where the database will live), is where it keeps files it needs for replication. The and blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance. Create the that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@: Create a script, , that will run on application start to make sure all our directories are created: Update your Docker to run this . We're not there yet. We need to make sure database writes only go to our primary. To do this, we'll register a database which intercepts any write queries. I've got this in my app's (heavily based on Adam Johnson's ): This will raise an exception if the query will write to the database, and if the file created by LiteFS exists (meaning this is not the primary). We need something to intercept this exception, so add some middleware: and register it in your settings. This catches the exception raised by the previously registered , finds out where the primary database is hosted and returns a header telling Fly.io; "Sorry, I can't handle this request, please replay it to this database primary". Once deployed with , LiteFS will start replicating your database! These are fun tools to play with for now, but there's clearly a lot of work to get them working with our normal apps. I'm excited about how they could make getting a Django/Wagtail app deployed much more accessible, easier and cheaper, but they're still some work to be done to make that a reality. The LiteFS roadmap includes things like S3 replication (so we get similar backup features to Litestream), and write forwarding (so writes to read-replicas will automatically be forwarded to the primary). There's a lot of promise there and I can't wait to make more use of it! PaaS tools like Heroku/Fly tend to offer ephemeral storage, or no guarantees on the safety of storage. Trying to keep an SQLite database around on this sort of storage just won't work out. A common approach to scaling is to "scale out" - start up more instances of your application and load balance between them. How would that work with SQLite? Even if you could access the same database file from each instance, we're re-introducing latency and as SQLite can't be written to by multiple processes at once, we're probably slowing our app down too. Litestream was Ben Johnson's first attempt at solving this problem, and is now focused primarily on disaster recovery. It's a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to. This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs. Unfortunately, it doesn't solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS. LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It's a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict! Alas, our first conflict isn't yet solved by LiteFS - if all your nodes go away, there's nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems! An S3-compatible storage bucket and access keys Our django app, ideally configured with for convenience The binary available to our application. I have: in my Dockerfile. Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to, is the path where Django and can find your database file, and is the path to your S3-compatible bucket. Run to import these values in to your Fly environment. Create a : Replace your section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits. Create a script, , that will run on application start to make sure all our directories are created: This: Checks important environment variables are set. Creates a database directory and makes sure it's open enough for the app to read/write to it (you might choose to tighten this up if appropriate). Restores the database using litestream if it doesn't already exist. Runs migrate to make sure the database is up to date (or creates it if there wasn't anything to restore). Runs which will in turn run the command in the litestream config, starting the application. Our django app, ideally configured with for convenience The binary available to our application. I have: in my Dockerfile (alternatively, copy the binary from the image . Some way to make sure our write requests only end up with the primary (we'll come back to this). Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to and is the path where Django and can find your database file. Run to import these values in to your Fly environment. In your , add: This gives us access to the shared Fly.io-managed Consul instance. Create a : Replace your section with whatever you you normally run to start your web server. The is where LiteFS will create its filesystem (where the database will live), is where it keeps files it needs for replication. The and blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance. Create the that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@: Create a script, , that will run on application start to make sure all our directories are created: This: Update your Docker to run this . We're not there yet. We need to make sure database writes only go to our primary. To do this, we'll register a database which intercepts any write queries. I've got this in my app's (heavily based on Adam Johnson's ): This will raise an exception if the query will write to the database, and if the file created by LiteFS exists (meaning this is not the primary). We need something to intercept this exception, so add some middleware: and register it in your settings. This catches the exception raised by the previously registered , finds out where the primary database is hosted and returns a header telling Fly.io; "Sorry, I can't handle this request, please replay it to this database primary".

0 views
Ratfactor 4 years ago

Slackware Apache Plus PHP-FPM

I'm making good on my earlier promise to start publishing more of my notes in public so others can benefit from them and where I am more likely to find them myself! Here's my notes from today's excursion into getting Apache working with PHP-FPM (specifically on Slackware Linux, but the instructions are quite vanilla because Slackware doesn't mess with upstream packages.)

0 views