Django on Fly.io with Litestream/LiteFS
One of the neat things that has come out of Fly is a renewed interest across the dev world in SQLite - an embedded database that doesn't need any special servers or delicate configuration. Some part of this interest comes from the thought that if you had an SQLite database that sat right next to your application, in the same VM, with no network latency, that's probably going to be pretty quick and pretty easy to deploy. Although in some ways it feels like this idea comes full circle back to the days of running a MySQL Server alongside our PHP application on a single VPS, we're also in an era where we need to deal with things like geographic distribution, ephemeral filesystems and scale-to-zero. So we want to run our apps in a nice PaaS, and also quite like the idea of our database being local to our application code, but there's a few conflicts here: Thankfully Fly have been funding the development of some interesting tools; Litestream and LiteFS which aim to solve this. The difference between these tools is not particularly obvious; so to summarise: Litestream was Ben Johnson's first attempt at solving this problem, and is now focused primarily on disaster recovery. It's a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to. This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs. Unfortunately, it doesn't solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS. LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It's a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict! Alas, our first conflict isn't yet solved by LiteFS - if all your nodes go away, there's nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems! So we know what these tools do, let's experiment with getting our Django applications running with them on Fly.io For Litestream, we'll need: Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to, is the path where Django and can find your database file, and is the path to your S3-compatible bucket. to import these values in to your Fly environment. Create a : Replace your section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits. Create a script, , that will run on application start to make sure all our directories are created: Update your Docker to run this . Once deployed with , Litestream will start backing up your database. Careful, if you try to scale out by adding more instances, at best you'll see out of sync data, at worst you'll end up with a corrupt database. For LiteFS, we'll need: Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to and is the path where Django and can find your database file. to import these values in to your Fly environment. In your , add: This gives us access to the shared Fly.io-managed Consul instance. Create a : Replace your section with whatever you you normally run to start your web server. The is where LiteFS will create its filesystem (where the database will live), is where it keeps files it needs for replication. The and blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance. Create the that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@: Create a script, , that will run on application start to make sure all our directories are created: Update your Docker to run this . We're not there yet. We need to make sure database writes only go to our primary. To do this, we'll register a database which intercepts any write queries. I've got this in my app's (heavily based on Adam Johnson's ): This will raise an exception if the query will write to the database, and if the file created by LiteFS exists (meaning this is not the primary). We need something to intercept this exception, so add some middleware: and register it in your settings. This catches the exception raised by the previously registered , finds out where the primary database is hosted and returns a header telling Fly.io; "Sorry, I can't handle this request, please replay it to this database primary". Once deployed with , LiteFS will start replicating your database! These are fun tools to play with for now, but there's clearly a lot of work to get them working with our normal apps. I'm excited about how they could make getting a Django/Wagtail app deployed much more accessible, easier and cheaper, but they're still some work to be done to make that a reality. The LiteFS roadmap includes things like S3 replication (so we get similar backup features to Litestream), and write forwarding (so writes to read-replicas will automatically be forwarded to the primary). There's a lot of promise there and I can't wait to make more use of it! PaaS tools like Heroku/Fly tend to offer ephemeral storage, or no guarantees on the safety of storage. Trying to keep an SQLite database around on this sort of storage just won't work out. A common approach to scaling is to "scale out" - start up more instances of your application and load balance between them. How would that work with SQLite? Even if you could access the same database file from each instance, we're re-introducing latency and as SQLite can't be written to by multiple processes at once, we're probably slowing our app down too. Litestream was Ben Johnson's first attempt at solving this problem, and is now focused primarily on disaster recovery. It's a tool to stream all the changes made to your SQLite database to some remote storage, like S3, and then recover from it when you need to. This is great, and it nicely solves our first conflict. Our application can be configured to restore the database from remote storage when it starts, and we can be safe knowing that any changes are being backed up as our application runs. Unfortunately, it doesn't solve our second problem, replicating our databases to other instances of our app if we decide to scale out. While there were plans (and an initial implementation) for this in Litestream, live replication was instead moved to the second project, LiteFS. LiteFS does some magic with FUSE to allow it to intercept SQLite transactions and then replicate to multiple instances of your application. It's a little more complicated as you need additional tools like Consul so that it knows where to find the primary instance (where it will direct queries that write to the database), but it solves our second conflict! Alas, our first conflict isn't yet solved by LiteFS - if all your nodes go away, there's nowhere to replicate your database from so it too will disappear. S3 replication like in Litestream is on the roadmap however, so it seems like LiteFS is fixed to solve all our problems! An S3-compatible storage bucket and access keys Our django app, ideally configured with for convenience The binary available to our application. I have: in my Dockerfile. Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to, is the path where Django and can find your database file, and is the path to your S3-compatible bucket. Run to import these values in to your Fly environment. Create a : Replace your section with whatever you you normally run to start your web server. Litestream will do its stuff and conveniently run our own application, exiting when our server exits. Create a script, , that will run on application start to make sure all our directories are created: This: Checks important environment variables are set. Creates a database directory and makes sure it's open enough for the app to read/write to it (you might choose to tighten this up if appropriate). Restores the database using litestream if it doesn't already exist. Runs migrate to make sure the database is up to date (or creates it if there wasn't anything to restore). Runs which will in turn run the command in the litestream config, starting the application. Our django app, ideally configured with for convenience The binary available to our application. I have: in my Dockerfile (alternatively, copy the binary from the image . Some way to make sure our write requests only end up with the primary (we'll come back to this). Prepare your Fly application with (we don't need a Postgres database if it asks). Set all the environment variables we're going to need by creating a new file (call it something like ): will be the directory where the database is replicated to and is the path where Django and can find your database file. Run to import these values in to your Fly environment. In your , add: This gives us access to the shared Fly.io-managed Consul instance. Create a : Replace your section with whatever you you normally run to start your web server. The is where LiteFS will create its filesystem (where the database will live), is where it keeps files it needs for replication. The and blocks tell LiteFS how to talk to each other and where to find the Fly.io managed Consul instance. Create the that is started by LiteFS. We need things like migrations to run after LiteFS has set up its filesystem, so we do those in this script@: Create a script, , that will run on application start to make sure all our directories are created: This: Update your Docker to run this . We're not there yet. We need to make sure database writes only go to our primary. To do this, we'll register a database which intercepts any write queries. I've got this in my app's (heavily based on Adam Johnson's ): This will raise an exception if the query will write to the database, and if the file created by LiteFS exists (meaning this is not the primary). We need something to intercept this exception, so add some middleware: and register it in your settings. This catches the exception raised by the previously registered , finds out where the primary database is hosted and returns a header telling Fly.io; "Sorry, I can't handle this request, please replay it to this database primary".