Latest Posts (20 found)
Gabriel Garrido 3 weeks ago

WireGuard topologies for self-hosting at home

I recently migrated my self-hosted services from a VPS (virtual private server) at a remote data center to a physical server at home. This change was motivated by wanting to be in control of the hardware and network where said services run, while trying to keep things as simple as possible. What follows is a walk-through of how I reasoned through different WireGuard toplogies for the VPN (virtual private network) in which my devices and services reside. Before starting, it’s worth emphasizing that using WireGuard (or a VPN altogether) is ont strictly required for self-hosting. WireGuard implies one more moving part in your system, the cost of which is justified only by what it affords you to do. The constraints that I outline below should provide clarity as to why using WireGuard is appropriate for my needs. It goes without saying that not everyone has the same needs, resources, and threat model, all of which a design should account for. That said, there isn’t anything particularly special about what I’m doing. There is likely enough overlap here for this to be useful to individuals or small to medium-sized organizations looking to host their services. I hope that this review helps others build a better mental model of WireGuard, and the sorts of networks that you can build up to per practical considerations. Going through this exercise proved to be an excellent learning experience, and that is worthwhile on its own. This post assumes some familiarity with networking. This is a subject in which acronyms are frequently employed, so I’ve made sure to spell these out wherever introduced. The constraints behind the design of my network can be categorized into first-order and second-order constraints. Deploying WireGuard responds to the first-order constraints, whereas the specifics of how WireGuard is deployed responds to the second-order constraints. There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Choosing to deploy WireGuard is motivated by constraints two and three. Constraint one is not sufficient on its own to necessitate using WireGuard because everything can run on the local area network (LAN). Once deployed, I should be able to connect all of my devices using hardware, software, and keys that I control within the boundaries of my home office. These devices all exist in the same physical network, but may reside in separate virtual LANs (VLANs) or subnets. Regardless, WireGuard is used to establish secure communications within and across these boundaries, while working in tandem with network and device firewalls for access control. I cannot connect to my home network directly from the wide area network (WAN, e.g the Internet) because it is behind Carrier-Grade Network Address Translation (CGNAT) . A remote host is added to the WireGuard network to establish connections from outside. This host runs on hardware that I do not control, which goes against the spirit of the first constraint. However, an allowance is made considering that the role of this peer is not load-bearing in the overarching design, and can be removed from the network as needed. Assuming WireGuard is now inherent in this design, its use should adhere to the following constraints: Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Satisfying these constraints preclude the use of solutions such as Tailscale, Headscale, or Netbird. Using WireGuard natively has the added benefit that I can rely on a vetted and stable version as packaged by my Linux distribution of choice, Debian . Lastly, it is worth stating requirements or features that are often found in designs such as these, but that are not currently relevant to me. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. Let’s look at the resources in the network, and how these connect with each other. Consider the following matrix. Each row denotes whether the resource in the first column connects to the resources in the remaining columns, either to consume a service or perform a task. For example, we can tell that the server does not connect to any device, but all devices connect to the server. Said specifically: The purpose of this matrix is to determine which connections between devices ought to be supported, regardless of the network topology . This informs how WireGuard peers are configured, and what sort of firewall rules need to be established. Before proceeding, let’s define the networks and device IP addresses that will be used. The name of the WireGuard network interface will be , where applicable 1 . For purposes of this explanation, port will be used in all of the devices when a port needs to be defined. I’ll explore different topologies as I build to up to the design that I currently employ. By starting with the simplest topology, we can appreciate the benefits and trade-offs involved in each step, while strengthening our conceptual model of WireGuard. Each topology below is accompanied by a simple diagram of the network. In it, the orange arrow denotes a device connecting to another device. Where two devices connect to each other, a bidirectional arrow is employed. Later on, green arrows denote a device forwarding packets to and from other resources. The basic scenario, and perhaps the most familiar to someone looking to start using WireGuard to self-host at home, is hosting in the network that is established by the router provided by an Internet service provider (ISP). Let’s assume its configuration has not been modified other than changing the Wi-Fi and admin passwords. A topology that can be used here is point-to-point , where each device lists every other device it connects to as its peer. In WireGuard terminology, “peers” are endpoints configured to connect with each other to establish an encrypted “tunnel” through which packets are sent and received. According to the connections matrix, the desktop computer and the server are peers, but the desktop computer and tablet aren’t. The WireGuard configuration for the desktop computer looks as follows: Note that the LAN IP address of each peer is specified under . This is used to find the peer’s device and establish the WireGuard tunnel. specifies the IP addresses used within the WireGuard network. In other words, the phone is the desktop’s peer, it can be found at to establish the tunnel. Let’s assume each of these devices have firewalls that allow UDP traffic through port and all subsequent traffic through the WireGuard interface. Once the WireGuard configurations of the server, laptop, and phone include the corresponding peers, secure communication is established through the WireGuard network interface. Let’s try sending a packet from the desktop computer to the phone. The packet was routed directly to the phone and echoed back. At this point access control is enforced in each device’s firewall. Allowing everything that comes through the interface is convenient, but it should be limited to the relevant ports and protocols for least-privileged access. An obvious problem with this scenario is that the Dynamic Host Configuration Protocol (DHCP) server in the router likely allocates IP addresses dynamically when devices connect to the LAN network. The IP address for a device may thus change over time, and WireGuard will be unable to find a peer to establish a connection to it. For example, I’m at home and my phone dies. The LAN IP address is freed and assigned to another device that comes online. WireGuard will attempt to connect to (per ) and fail for any of the following reasons: Fortunately, most routers support configuring static IP addresses for a given device in the network. Doing so for all devices in our WireGuard network fixes this problem as the IP address used in will be reserved accordingly. Suppose I want to work at a coffee shop, but still need access to something that’s hosted on my home server. As mentioned in the constraints, my home network is behind CGNAT. This means that I cannot connect directly to it using whatever WAN IP address my router is using at the moment. What I can do instead is use a device that has a publicly routable IP address and make that a WireGuard peer of our server. In this case that’ll be a VPS in some data center. How is the packet ultimately relayed to and from the server at home? Both the server and laptop established direct encrypted tunnels with the VPS. WireGuard on the VPS will receive the encrypted packets from the laptop, decrypt them, and notice that they’re meant for the server. It will then encrypt these packets with the server’s key and send them through the server’s tunnel. It’ll do same thing with the server’s response, except towards the laptop using the laptop’s tunnel. A device that forwards packets between peers needs to be configured for IPv4 packet forwarding. I will not cover the specifics of this configuration because it depends on what operating system and firewall are used 2 . The VPS has a public IP address of , and its WireGuard IP address will be . The laptop and server are listed as peers in its WireGuard configuration: Note that is omitted for each peer. The publicly routable IP addresses of the laptop and the home router are not known to us. Even if they were, they cannot be reached by the VPS. However, they will be known to the VPS when these connect to it. Now, the server at home adds the VPS as its peer, using the VPS public IP address as its : We also make use of to send an empty packet every 25 seconds. This is done to establish the tunnel ahead of time, and to keep it open. This is necessary because otherwise the tunnel may not exist when I’m at the coffee shop trying to access the server at home. Remember, the VPS doesn’t know how to reach the server unless the server is connected to it. Let’s take a careful look at the laptop’s configuration, and what we’re looking to achieve. When the laptop is at home, it connects to the server using an endpoint address that is routable within the home LAN network. This endpoint address is not routable when I’m outside, in which case I want the connection to go through the VPS. To achieve this, the laptop maintains two mutually-exclusive WireGuard interfaces: and . The former is active only while I’m in the home network, and the latter while I’m on the go. Unlike the server, the VPS does not need to be added as a peer to the laptop’s interface because it doesn’t need connect to it while at home. Instead, the VPS is added to the configuration: The section for both and is mostly the same. The laptop should have the same adress and key, regardless of where it is. Only is omitted in because no other device will look to connect to it, in which case we can have WireGuard set a port dynamically. What differs is the peer configuration. In the VPS is set as the only peer. However, the home server’s IP address is added to the VPS’ list of . WireGuard uses this information to route any packets for the VPS or the server through the VPS. Unlike the server’s peer configuration for the VPS, is not needed because the laptop is always the one initiating the tunnel when it reaches out to the server. We can verify that packets are being routed appropriately to the server through the VPS: We solved for outside connectivity using a network topology called hub-and-spoke. The laptop and home server are not connecting point-to-point. The VPS acts as a hub or gateway through which connections among members of the network (i.e the spokes) are routed. If we scope down our network to just the laptop and the home server, we see how this hub is not only a peer of every spoke, but also just its only peer. Yet, how exactly is the packet routed back to the laptop? Mind you, at home the laptop is a peer of the server. When the server responds to the laptop, it will attempt to route the response directly to the laptop’s peer endpoint. This fails because the laptop is not actually reachable via that direct connection when I’m on the go. This makes the laptop a “roaming client”; it connects to the network from different locations, and its may change. This all works because the hub has been configured to do Network Address Translation (NAT); it is replacing the source address of each packet for its own as it is being forwarded. The spokes at end of each hub accept the packets because they appear to originate from its peer. In other words, when the laptop is reaching out to the home server, the server sees traffic coming from the VPS and returns it there. The hub is forwarding packets among the spokes without regards to access control. Thus, its firewall should be configured for least-privilege access. For example, if the laptop is only accessing git repositories in the home server over SSH, then the hub firewall should only allow forwarding from the laptop’s peer connection to the home server’s IP address and SSH port. Let’s reiterate. If I now wish to sync my laptop with my desktop computer from outside the network, I would be adding yet another spoke to this hub. The desktop computer and the VPS configure each other as peers, while the desktop’s IP address is included in the VPS’ list of the laptop’s configuration. Our topology within the home network is still point-to-point. As soon as I return home, my laptop will connect directly to the server when I toggle off and on. But now that we know about hub-and-spoke , it might make sense to consider using it at home as well. According to the connection matrix, the server can assume the role of a hub because all other devices already connect to it. Likewise, the server runs 24/7, so it will always be online to route packets. This topology simplifies the WireGuard configurations for all of the spokes. The desktop computer, phone, laptop, and tablet can now list the server as its only peer in . This is convenient because now only one static address in the LAN network needs to be allocated by the DHCP server – the server’s. Consider the changes to the WireGuard configuration of the desktop computer. All peers are removed except the server, and the IPs of the phone and laptop are added to the server’s of . WireGuard will route packets for these other hosts through the server. We could also use Classless Inter-Domain Routing (CIDR) notation to state that packets for all hosts in the WireGuard network go through the server peer: The server, in turn, keeps listing every device at home as its peer but no longer needs an for each. The peers will initiate the connection to the server. Once again, let’s test sending a packet from the desktop computer to the phone. The packet was hops once through the server ( ), is received by the phone, and is echoed back. The downside to this topology is that the server is now a single point of failure. If the server dies then the spokes won’t be able to connect with each other through WireGuard. There’s also an added cost to having every packet flow through the hub. As for access control, much like we saw in the VPS, the hub now concentrates firewalling responsibilities. It knows which peer is looking to connect to which peer, thus it should establish rules for which packets can be forwarded. This is not mutually exclusive with input firewall rules on each device; those should exist as well. We’ve seen that home hub will route packets between the spokes. Furthermore, because it is peer of the VPS, the server can be used to route connections coming from outside the LAN network. Effectively, these are two hubs that connect to each other so that packets can flow across separate physical networks. If the laptop wants to sync with the desktop while it is outside the LAN network, then the packets make two hops: once through the VPS, and another through the server. If the laptop is within the LAN network, the packets hop only once through the server. Yet, there’s a subtle caveat to this design. The laptop can initiate a sync with the desktop from outside the LAN network and receive the response that it expects. However, the desktop can only initiate a sync with the laptop while the latter is within the LAN network. Why? Similar to our previous example of the laptop communicating with the server, the laptop is configured as a peer of the home hub. When the desktop initiates a sync, the server will attempt to route the packet to the laptop. Per our last change, the laptop doesn’t have a fixed and there is no established tunnel because the laptop is outside the network. Additionally, the home hub is not configured to route packets destined for the laptop through the VPS peer. The packet is thus dropped by the hub. One could look into making the routing dynamic such that the packets are delivered through other means, perhaps through mesh networking. But herein lies a compromise that I’ve made. In this design, a spoke in the home hub cannot initiate connections to a roaming client. It can only receive connections from them, because the roaming client uses NAT through the remote hub. I’m perfectly fine with this compromise as I don’t actually need this bidirectionality, and I don’t want the additional complexity from solving this issue. The remote hub facilitates tunneling into the home hub, not out of. My needs call for allowing my mobile devices (e.g laptop, phone, tablet) to communicate with the non-mobile devices at home (e.g server, desktop), and this has been solved. At this point we’re done insofar the overarching topology of our WireGuard network, but there is an improvement that can be made to make our home hub less brittle. Consider the case where I’m using a router that can run WireGuard. Making the router the hub of our home network poses some benefits over the previous setup. First, the router is already a single point of failure by way of being responsible for the underlying LAN network. Making the router the hub isn’t as costly as it is with some other device in the network. Second, all devices in the network are already connected to the router. This simplifies the overall configuration because it is no longer necessary to configure static IP addresses in the DHCP server. Instead, each spoke can use the network gateway address to reach the hub. Let’s the assume that the gateway for the LAN network is , and the WireGuard IP address for the router is . Each spoke replaces the server peer with the router’s, and uses the gateway address for its . For example, in the desktop computer: The server is demoted to a spoke and is configured like all other spokes. In turn, the router lists all peers like the server previously did: Again, the firewall in the router is now responsible for enforcing access control between spokes. For the sake of illustrating how much further the underlying networks can evolve without interfering with the WireGuard network, consider the final design. I’ve broken apart the LAN network into separate VLANs to isolate network traffic. The server resides in its own VLAN, and client devices in another. The router keeps on forwarding packets in WireGuard network regardless of where these devices are. The only change that is necessary to keep things working is to update address for the router peer in each spoke. The spoke now uses the corresponding VLAN gateway address, rather than that of the LAN network: I’ve been using this setup for some months now and it’s been working without issues. A couple of thoughts come to mind having gone through this exercise and written about it. Running WireGuard on the router simplifies things considerably. If the home network were not behind CGNAT then I could do away with the VPS hub altogether. I would still need separate WireGuard interfaces for when I’m on the go, but that’s not a big deal. Nonetheless, within the LAN network, configuration is simpler by using a hub-and-spoke topology with the router as hub. Centralizing access control on the router’s firewall is also appreciated. WireGuard is simple to deploy and it just works. Nonetheless, some knowledge of networking is required to think through how to deploy WireGuard appropriately for a given context. Being comfortable with configuring interfaces and firewalls is also necessary to troubleshoot the inevitable connectivity issues. One can appreciate why solutions that abstract over WireGuard exist. I used Tailscale extensively before this and did not have think through things as much as I did here. This was all solved for me. I just had to install the agent on each device, authorize it, and suddenly packets moved securely and efficiently across networks. And yet, WireGuard was there all along and I knew that I could unearth the abstraction. Now I appreciate its simplicity even more, and take relish in having a stronger understanding of what I previously took for granted. Lastly, I purposefully omitted other aspects of my WireGuard setup for self-hosting, particularly around DNS. This will be the subject of another article, which is rather similar to the one I wrote on using Tailscale with custom domains . Furthermore, a closer look at access control in this topology might be of interest to others considering that there are multiple firewalls that come into play. Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎ There should be no dependencies to services or hardware outside of the physical network. I should be able to connect to my self-hosted services while I’m at home as long as there’s electricity in the house and the hardware involved is operating without problems. Borrow elements of the Zero Trust Architecture where appropriate. Right now that means treating all of my services and devices as resources, securing all communications (i.e not trusting the underlying network), and enforcing least-privileged access. Provisions made to connect to a device from outside the home network should be secondary and optional. While I do wish to use to my services while I’m away, satisfying this should not compromise the fundamental design of my setup. For example, I shouldn’t rely on tunneling services provided by third-parties. Use WireGuard natively as opposed to software that builds on top of WireGuard. I choose to favor simplicity and ease of understanding rather than convenience or added features, ergo , complexity. Use of a control plane should not be required. All endpoints are first-class citizens and managed individually, regardless of using a network topology that confers routing responsibilities to a given peer. Mesh networking and direct peer-to-peer connections. It’s ok to have peers act as gateways if connections need to be established across different physical or logical networks. The size, throughput, and bandwidth of the network is small enough that prioritizing performance is not strictly necessary. Automatic discovery or key distribution . It’s ok for nodes in the network to be added or reconfigured manually. The desktop computer connects to the server to access a calendar, git repositories, etc The tablet connects to the server to download RSS feeds The laptop and desktop connect with each other to sync files Said device is not running WireGuard Said device is using a different or , in which case the peer’s or port in does not match Said device is using a different , in which case the peer’s does not match WireGuard: Next Generation Kernel Network Tunnel Networking for System Administrators Primary WireGuard Topologies How Tailscale works Each interface has its own configuration file, and can be thought as a “connection profile” in the context of a VPN. This profile is managed by in Linux, or through the WireGuard app for macOS, Android, etc.  ↩︎ In my case, that’s Debian and nftables . This article from Pro Custodibus explains how to configure for the hub.  ↩︎

0 views
Gabriel Garrido 10 months ago

Automatic KDE Theme switching using systemd

Update August 8th, 2025: Theme switching is configurable in system settings as of KDE Plasma 6.5. I use KDE Plasma 6 as my desktop environment. Among my petty grievances is the fact that KDE does not provide a way to automatically switch themes. In my case, I like to use a light theme during the day, and a dark theme as soon as the sun sets. An option to configure theme switching would be nice I searched for a solution and found an old thread on reddit where someone recently shared a command that you can use to set a theme manually using . For example, to set “Breeze Dark”: is already installed so all I needed was to run it on a schedule using cron or . I’ve been learning the ins and outs of to automate tasks, so naturally I thought of using a systemd timer . With systemd, I define a service that changes the theme, and a timer that runs the service at a predetermined time. In this case, the theme will switch to “Breeze Dark” at 6:00pm. is used so that the theme switches when I turn on the computer and it’s already past the scheduled time. Similarly, I defined a separate service and timer so that the “Breeze Light” theme is switched to at 8am. The only difference between these is the timer’s scheduled time. Finally, I installed these as user units. I want to keep these unit files under version control, so I’d rather not maintain these under . Instead, I use , which links the files from wherever these files live to . Then, the timers are enabled and started. The timers can be verified using or : The neat thing is that I can also invoke this service unit manually. If during the day I want to use the dark mode, I can run . To view a log of the service runs, I can use :

0 views
Gabriel Garrido 12 months ago

What self-hosting teaches

We’re well into times in which it’s clear why you’d care to own your data and host the software that you depend on. Enshittification and surveillance capitalism are not heterodox concepts, so I won’t make that the point of this post. Yet, people on the internet can be quick to dissuade you from doing so. Whether that’s claiming that companies are inherently better than you at operating servers and safeguarding your data, or reminding you that your time is worth more than what you’d pay for a subscription, or that you’d be mad to subject your family to a bus factor of one – the reasons abound. Some of the arguments may be valid, others may not 1 . How do you reconcile the plea to take action with the apparent futility of the task? I guess you can only speak from your own experience, which means you have to try it out. Let us return to pathemata mathemata (learning through pain) and consider its reverse: learning through thrills and pleasure. People have two brains, one when there is skin in the game, one when there is none. – Nassim Taleb, Skin in the game When you look after your own stuff, reality dawns on you. When things go awry, there’s likely nobody to blame but yourself. But that’s not necessarily a bad thing in and of itself. It encourages action and responsibility, and it provides with an opportunity to learn. And boy, have I learned. So you start small 2 , you put up a guardrail or two, and permit some slack. Not all lessons have to be expensive for them to teach. When you look after your own stuff, you think twice before deploying new software. Do the benefits outweigh the setup and operating costs? How much will it demand from you? And so, little by little self-hosting has changed the way that I approach my digital life. No longer am I haphazardly signing up to new services and giving out my data. I keep my dependencies to a minimum. I think of my needs in terms systems and workflows, as opposed to a hodgepodge of apps. I come to value standards and protocols, interoperability, and extendability. I’m forced to organize myself from the very beginning – what data do I truly need to have on me, and on each of my devices? Do I really need sync? Where is the data backed up to, and how often? Do I need all of those photos? How soon can I recover from disaster? Constraints are great at separating the wheat from the chaff. More likely than not, you can do more with less. And ultimately, you become pragmatic. Life is unpredictable, and sometimes you’d rather not. That’s perfectly fine, for the lessons have stuck and you’re more judicious in your approach. You contemplate on-ramps and off-ramps, offline alternatives, or doing away with dispensable matters altogether. If anything, if the time comes around to move to somebody else’s turf, at least you won’t have to beg anyone for the data. It’s hard to ignore the constant news of leaks, or stories of people getting locked out of their own data with no recourse other than publicizing their case on popular news aggregators.  ↩︎ I’m not going about hosting my email, at least for now .  ↩︎ It’s hard to ignore the constant news of leaks, or stories of people getting locked out of their own data with no recourse other than publicizing their case on popular news aggregators.  ↩︎ I’m not going about hosting my email, at least for now .  ↩︎

0 views
Gabriel Garrido 1 years ago

Servicing my Framework laptop for the first time

Last week my two year old Framework laptop went completely unresponsive while I was using it. Unable to bring it back into a working state, I pressed down the power button until it turned off. When I turned on the laptop again, it powered up but failed boot. Being that I was travelling, and that I was not carrying the laptop’s screwdriver as to attempt further troubleshooting, I stowed my laptop away and touched grass for a couple of days. Once I was back at home I came across a page in Framework’s help center about a laptop not powering on . I learned that a Framework laptop will run a diagnosis when it fails to exit the BIOS successfully. The results are emitted as a sequence of lights on the side LEDs. The LEDs on my laptop where in fact blinking, but the sequence is emitted too fast for me to decipher it at a glance. I recorded the sequence in slow motion and jotted down the results. Here’s how my diagnosis went, as reproduced from the help page. A green blink means a check passed, whereas an orange blink means that a check failed. It appeared that my laptop’s memory module was not initializing correctly. According to the help page, this is a known cause for the issue that I was having. Framework’s documentation suggests removing the memory modules and testing them individually in each socket. I unscrewed the laptop, disconnected the input cover, and removed the left memory chip. Reinstalling the memory on my Framework laptop I connected the input cover and pressed the power button. The laptop turned on and then booted up correctly! The remaining memory chip was recognized. I ran some tests and everything seemed to work fine. Wanting to verify whether the memory chip had gone bad, I turned the laptop off and reinstalled the memory in the same socket. To my surprise, this time the laptop booted up as usual and both memory chips were recognized. At this point, I did not attempt any further tests and have since used the laptop without any further issue. I’m still not confident that I won’t run into this issue again, and I’m reaching out to Framework’s support to better understand why this happened in the first place. Further investigation confirmed that others have resolved this exact issue by reinstalling the memory. That said, I’m pleased with how this turned out. Funnily enough, hours before the incident the thought came to me that I should write about my experience with the laptop thus far. While this post is not that, it touches on an important aspect of the experience. Some years ago a two-year old Macbook Air died on me. The motherboard had an issue, and both the warranty and Apple Care periods had elapsed. Fixing it would’ve been almost as costly as purchasing a new laptop. In fact, the only recourse I had was replacing said laptop. In contrast, this week I knew there was a resolution to this incident that did not involve discarding the laptop entirely. As soon as I was able to diagnose the apparent issue, I was reassured of the probable causes and fixes. Considering that the laptop’s warranty period has elapsed, thus assuming that I would bear the cost of fixing it, these were the scenarios that I was contemplating: Luckily, neither of these scenarios materialized. I did not expect reinstalling the memory would fix the issue. I purchased this computer specifically for the ability to repair or modify in situations like these, and I’m glad that my expectations were met. The help pages and the community boards were particularly helpful, and servicing the hardware was easy. Now here’s hoping I don’t have to do it again any time soon. This represents almost 30% of the laptop’s original price. Costly, but not ruinous.  ↩︎ Faulty memory chip : I would need to replace a memory chip or two. At the time of this writing, I can find a SODIMM DDR4-3200 16GB chip for $40. Faulty mainboard : Assuming the mainboard’s memory slots had an issue, I would need to replace the mainboard. At the time of this writing, I can order the exact same Intel i7-1280P mainboard from Framework for $699 1 , or alternatives for more or less. This represents almost 30% of the laptop’s original price. Costly, but not ruinous.  ↩︎

0 views
Gabriel Garrido 1 years ago

Caching Hugo resources in Forgejo actions

Hugo keeps a resources directory that is used to cache asset pipeline output across builds. Running for the first time processes all static assets to apply any specified transformations 1 . Subsequent runs leverage cache in the directory to process only what is necessary for a given build. This site is built and deployed using a Forgejo action . Unlike my computer, an action’s file system is ephemeral, which means that the directory is discarded when the action finishes executing. Unless this directory is explicitly persisted as action cache , every site asset will be needlessly reprocessed whenever the site is built. Consider Hugo’s output when this site is built in an action that does not reuse the directory: It took Hugo approximately 7 seconds to generate the site. Now, compare Hugo’s output when the site is built in an action that loads the directory from cache: Unsurprisingly, Hugo is able to build the site many times faster by leveraging its cache. It just needs to have it. Although I’m writing about this in the context of a Forgejo action, I suspect this works exactly the same using Gitea’s and Github’s caching actions. The way this works is as follows. Before Hugo runs, the directory is restored from cache and loaded into the action’s working directory. Hugo uses files from this directory as needed while the site is built. Then, the cache is updated in case Hugo modified the directory. Action cache works by assigning a key to the cache. You use said key to save and restore cache across action runs. However, action cache cannot be overwritten after it has been created. Instead, a new key should be used if the cache is meant to be updated. In our case, using a static key such as is insufficient as it yields only what existed in the directory when the cache was created. Any assets added or modified thereafter would get recreated on each subsequent run. To this end, we need to use a dynamic key that is unique based on what’s inside the directory. This way, we ensure that the action cache is updated automatically whenever Hugo modifies its cache. We can use the function to suffix the key with a hash of the directory contents. We’ll also want to prefix the keys with a string that is common to all keys used for this particular kind of cache. In the end, our cache keys will look like this: . There is no directory in the working directory until we use the restore action to load the cache. Thus, it is not possible to derive a hash and attempt matching a cache key directly. Passing the key prefix as a will cause a cache miss. However, passing the same prefix to allows falling back to the latest cache containing such a prefix. Lastly, we use to load the cache into a directory in Hugo’s working directory. If everything goes as expected, the action logs a success message: After the Hugo build step is finished, the directory should be saved using the save action . As mentioned beforehand, the key should be suffixed with a hash of the contents of the directory. When a commit contains a new asset, or an existing one is modified, the directory will change. When this occurs, the cache key will be new and the cache is saved: Conversely, when a commit yields no changes to the restored directory, no cache will be saved. The save action will fail silently with the following warning: This warning is expected as no new hash was derived from the directory. The existing cache will be reused in the next run. In my case, that’s generating variants of images and fingerprinting static assets.  ↩︎ In my case, that’s generating variants of images and fingerprinting static assets.  ↩︎

0 views
Gabriel Garrido 1 years ago

Archiving and syndicating Mastodon posts

I’ve been using Mastodon as a microblog . It’s a more instantaneous, and at times conversational, medium than this site. Yet, whatever I make public there I would make public here as well. Social media platforms come and go. Throughout the years I’ve joined and left several of them, and I suspect that this will continue to be the case. However, I’m increasingly turning to my personal site for online discourse. Some months ago I started archiving posts from my Mastodon account to this site, following IndieWeb’s “Publish Elsewhere, Syndicate Own Site (PESOS) syndication model: It’s a syndication model where publishing starts by posting to a 3rd party service, then using infrastructure (e.g. feeds, Micropub, webhooks) to create an archive copy on your site. By syndicating to my site, people who are not on the Fediverse can easily follow my statuses. By archiving in my site, I can switch external platforms, or drop them altogether, and the statuses would persist. By syndicating to my site, I can leverage my site’s taxonomy to link relevant content together. All mastodon servers have a public API, and there’s an endpoint to retrieve an account’s statuses . I use Hugo to build this site, and I started out with the simplest approach using Hugo’s data fetching capabilities for templates. This approach is quite limited. The statuses endpoint returns a maximum of 40 posts per API call. This is ok if all you care is showing your most recent activity, which is what I did. However, my intent was to have a long-lived archive of my posts, so these statuses have to persist somewhere. At this point I wanted to support other features, such as: Hugo’s new content adapters feature addresses some of these features, but it’s still limited by the batch size and the fact that it only runs at build time. At the time I was learning the Go 1 language, so I decided to write my own tool for some practice. This tool fetches an account’s statuses from Mastodon’s API and turns each status into a file. In my case, these are markdown files that adhere to Hugo’s content system. However, this tool is flexible enough to fit most archiving and syndicating approaches involving static files. The tool mirrors Mastodon’s API interface via command-line arguments. Additionally, it provides options to further filter statuses by visibility, to thread replies, to customize the file contents and its filename, and more. For detailed information on using this tool, refer to its documentation . I use this tool in two ways: once, initially, to create the archive up to a given date, and then periodically to retrieve the latest statuses. The archive is version-controlled using git, with files stored in the directory for this site. I don’t archive reblogs or replies made to others. I only archive statuses whose visibility is set to public. My replies to my own statuses are archived chronologically under the originating status (i.e. they’re threaded), even if the visibility of these is set to unlisted 2 . To start, I wanted to generate an archive of statuses published since April of this year. Though Mastodon’s statuses endpoint returns only 40 statuses at once, you can pass several parameters to control the bounds of the batch. This way, you can fetch statuses published before or after a status, or combine these to fetch between statuses. To create the archive, and to update it without recreating the entire archive, I need to keep track of one of the delimiting statuses that were fetched. Though the tool is stateless, it supports saving the id of the first and/or last status to a file. You can then parse this file to set the parameters that control the bounds of the statuses batch. With that in mind, creating an archive means running the tool in a loop until no posts are returned. To support this, the flag 3 can be used for machine-readable output. and are used in combination to scroll the statuses bounds since the latest status in reverse chronological order. Alternatively, I could have started from a particular toot and fetch statuses in chronological order (non-inclusive) by first saving that toot’s id in a file, and using the and flags instead. Once the archive was created, all that was left to do was to run the tool periodically to fetch the latest posts. This could be done as simply as running the command manually, or delegating it to a cron job. In my case, I decided to use a scheduled Forgejo action in my site’s git repository. This time, however, I needed to use different arguments. The archive was created by saving the last status returned and going backwards in time. To update the archive, we save the first status returned and going forward in time using the and flags: A Forgejo action runs this script every hour: If there are any new statuses, the files are added, the latest status cursor is bumped, and everything is committed to the repository. A build action immediately takes off to update the site. The latest status is committed to the archive Integrating with the site The file template that I use for these statuses adheres to Hugo’s front-matter, and the status content is transformed to markdown. There is no title associated to each status, the date is kept intact, and tags are integrated if present. The posts index page itself uses a simple template that renders the statuses as a list. Because the archive consists of just content files, it integrates easily with other features in this site. Statuses are included in my activity page , and tagged statuses are surfaced in tag pages. Statuses shown in my activity page Tags used in statuses automatically integrated as Hugo tags Lastly, I did not want to generate individual HTML pages for each status. I also did not want statuses to be included in the main RSS feed for the site. I used Hugo’s front-matter variable in the content section index (i.e. ) to specify this behavior : If I look at what’s inside the directory served by my server, I only see the HTML index file, and directories for statuses containing images. I still am! For this particular program, I ended up learning a lot about templates and Go’s pass-by-value. The latter was particularly relevant for someone who has mainly written JavaScript.  ↩︎ In Mastodon it is customary to set your threaded replies as unlisted to avoid cluttering other’s feeds.  ↩︎ Inspired by the option in   ↩︎ Having threaded posts appear as a single post, including those whose posts exist beyond a single batch of posts returned by the statuses endpoint Integrate Mastodon tags with tags on this site Archive attached media Backfill the archive to a given date I still am! For this particular program, I ended up learning a lot about templates and Go’s pass-by-value. The latter was particularly relevant for someone who has mainly written JavaScript.  ↩︎ In Mastodon it is customary to set your threaded replies as unlisted to avoid cluttering other’s feeds.  ↩︎ Inspired by the option in   ↩︎

0 views
Gabriel Garrido 1 years ago

Backing up a Thunderbird profile in Flatpak

I use Thunderbird to manage multiple email inboxes, calendars, and address books. All of the corresponding accounts, and their data, are associated to what Thunderbird calls a “profile”. To backup your accounts and their data, including email messages, you can export the entire profile. Thunderbird provides a helpful article on how to export a profile. The size of my profile is larger than what’s supported by their export option, so I needed to manually backup the directory where the profile is stored. In the profile export screen there is a button to open the profile directory. However, clicking it did not open it for me. To find the actual profile path, I had to go to > , scroll down to the row, and click . This opens up a separate page where all profiles are listed, including their root directory. Here too, there’s a button to open the profile’s root directory. However, the button did not open the file explorer either. Furthermore, I realized that the path listed for does not even exist on my machine. I installed Thunderbird using Flatpak . Upon inspecting its associated permissions, I realized that Thunderbird has no permission to access user files, like for the profile’s root directory. Instead, its application files live under . No home access for Thunderbird Knowing this, the Thunderbird profiles can be found in . Noting the profile that I’m currently using, I added to my autorestic configuration file and I was good to go.

0 views
Gabriel Garrido 1 years ago

Hello FreeBSD

I have started to learn about the FreeBSD operating system. I picked up a copy of Michael W. Lucas’ “Absolute FreeBSD” and created a virtual machine that I am tinkering with. This is the first time that I deliberately set out to learn a particular operating system. I currently use different distributions of Linux on a daily basis, both on servers and on my personal machines. Yet, learning Linux and the idiosyncrasies of each distribution has been a multi-year endeavor carried out without much forethought. I simply started using them and made my way through, footguns and revelations notwithstanding. While my work experience has been primarily in software engineering, in recent years I’ve worked in projects where I have had to provision infrastructure and administer systems. Whereas previously I was mainly responsible for writing software, I was now required to understand parts of the system that I took for granted 1 (particularly as someone who mainly delved in front-end engineering). Naturally, having learned a thing or two, I was drawn to hosting and managing my own data and the services that I use. Then, feeling more comfortable assuming back-end and sysadmin work. Next thing you know, I’m spinning up virtual machines to test operating systems, and developing an opinion or two on the matter. In other words, I feel that I have reached a point where I can actually appreciate the strengths and weaknesses of working with different operating systems. Which leads me to my interest in derivatives of the BSD operating system. My humble watercolor rendition of the FreeBSD Project Logo To start, I’ll share what Ruben Schade has to say in his article “It’s worth running a FreeBSD or NetBSD desktop” : Sometimes you need to give yourself permission to tinker. I like to tinker with computers. I’ve done so since I was a kid, and I still get a kick out of probing, breaking, fixing, and ultimately understanding these machines and systems to whom we’ve delegated so much . Perhaps the most compelling characteristic of FreeBSD to me is that it is designed, developed, and distributed as a cohesive whole. This level of integration makes it less intimidating to learn deeply about the machine and operating system that I rely on. Likewise, I’d expect that knowledge in this area accrues more consistently and durably, though only time will tell if that’s the case. Then, I’m intrigued by particularities of FreeBSD such as its documentation, cleaner delineation of the operating system and third party software, its jails mechanism, and its integration with the Z File System. These stand out as I reminisce what my experience has been across multiple operating systems with regards to using third-party packages 2 , isolating software, and managing files. I look forward to seeing how these compare in practice. Lastly, I’ve also been interested in expanding my knowledge of networks by repurposing some old devices in a home lab. As far as I understand, FreeBSD’s networking stack and virtualization capabilities are well deployed to this endeavor. For now, my goal is to replace the Ubuntu servers that I use as web and file servers with FreeBSD. I’m pretty happy with Fedora as a daily driver 3 , so I have no plans to use FreeBSD on the desktop. Here are some links to writing elsewhere that I’ve found interesting regarding FreeBSD and other BSDs. I do argue that everyone benefits when software engineers understand how the software that they write is ultimately deployed, served, and operated.  ↩︎ I shudder thinking about that one update that slipped through Homebrew once and provoked a lot of headaches on macOS. And don’t get me started with Python environments. Developing with containers has provided some relief but not without its own drawbacks.  ↩︎ Immutability in Fedora with its Atomic Desktops, Flatpak, and Toolbx have served me well with regards to third-party software and development workflows.  ↩︎ “Let’s talk FreeBSD (finally) with Allan Jude (Changelog Interviews #574)” FreeBSD Handbook Stefano Marinelli’s writing on FreeBSD Ruben Shade’s writing on FreeBSD Derek Sivers’s “OpenBSD: why and how” Matt Fuller’s “BSD vs Linux” I do argue that everyone benefits when software engineers understand how the software that they write is ultimately deployed, served, and operated.  ↩︎ I shudder thinking about that one update that slipped through Homebrew once and provoked a lot of headaches on macOS. And don’t get me started with Python environments. Developing with containers has provided some relief but not without its own drawbacks.  ↩︎ Immutability in Fedora with its Atomic Desktops, Flatpak, and Toolbx have served me well with regards to third-party software and development workflows.  ↩︎

0 views
Gabriel Garrido 1 years ago

Bunny CDN Edge Rules for HTML canonical URLs

For some months I hosted this site using Bunny’s file storage and CDN. Bunny’s CDN is minimally opinionated about serving HTML. It does have some default behavior that is reasonable. For example, not caching HTML. Other than that, it’s on you to use their Edge Rules for anything custom like redirects, managing headers, caching HTML , rate limiting, etc. Update: In June 2024, Bunny renovated its edge rule system , making it easier to create and maintain edge rules. However, the logic to implement the functionality described in this note remains unchanged. This site is completely static and is built using Hugo . When the site is built, Hugo generates a directory that looks something like this: By default, Hugo produces pretty URLs for all permalinks in the site. This means that a permalink for a given page will end with a trailing slash. For example, the page above is rendered as in the case of relative permalinks, and in the case of absolute permalinks. However, when you upload the HTML files to Bunny’s file storage, the above page will be served at three different URLs: This may be fine for some but in my case I wanted pages to be accessible only at the URL specified by the canonical link element in the page header: To accomplish this, I needed to use Bunny’s Edge Rules. I had a conflicting experience using Bunny’s Edge Rule system. Configuring static rules is as straightforward as it gets, but as soon as you’re trying to apply actions or match rules dynamically then you’re in for some trial and error. Bunny’s edge rules support dynamic variables and variable expansion , you use to achieve dynamic behavior. To redirect to the canonical URL I have to split the request’s path into its parts and then add or remove the trailing slash manually. There are many URLs on my site so this rule has to be applied dynamically. Unfortunately, given how Bunny’s rule system works, I had two create variations for each rule below to account for the presence or absence of query parameters. This is not ideal but it’ll work. In my opinion, the documentation could be improved and the rule editor could provide some sort of testing ground to avoid frustrations. Bunny’s support is good, so they will happily fill in the blanks for you. Note : If you’re using multiple hostnames for the corresponding pull zone, you should use instead of hard-coding the host name (e.g below). For this rule, I was looking to trim from requests. This should only be applied to any request containing in its path. To redirect to the correct path, I use the variable. This will be expanded at evaluation time to get the requested file’s directory. Requests without query parameters Condition matching should be set to all . Requests with query parameters Condition matching should be set to all . For this rule, I was looking to match any URL whose path does not end in a trailing and has no file extension. After several iterations, I failed at figuring out how to get this to work and had to reach out to support to understand if it was feasible at all. Turns out this is only possible by using a dynamic variable that is not documented at all 1 : . I ddidn’t want to apply this rule to my webfinger , so I made sure to exclude the directory from this rule. Requests without query parameters Condition matching should be set to all . Rule for requests with query parameters Condition matching should be set to all . In an earlier version of this note the rules were setup in a way that conflicted with query parameters if they were present. Thank you Vikram for pointing it out. They mentioned they were going to add this to their documentation but as of the time of this writing this has not been the case.  ↩︎ They mentioned they were going to add this to their documentation but as of the time of this writing this has not been the case.  ↩︎

0 views
Gabriel Garrido 1 years ago

Hosting my own software forge

Update : I’ve since moved on to stagit for my public-facing repositories. The repositories are still found at git.garrido.io , and can be cloned using the dumb http protocol . It’s been a couple of months since I started hosting my own software forge . Since then I’ve used it as a remote for all of my git repositories, as a registry for container images, to store gists , and for continuous integration (CI). I chose Forgejo because, for my use case, it has feature parity with Github, while also being open-source, lightweight, and simple to operate. It also happens to be governed by its contributors, and is stewarded by Codeberg , non-profit organization. Unlike Github, I’m not using Forgejo to collaborate on other people’s projects. My server is a single-user instance. This isn’t much of a problem for me as my main motivation to host a software forge is not social. However, something particularly exciting about Forgejo is that federation is being implemented via ForgeFed , an ActivityPub extension for software forges: ForgeFed is a federation protocol for software forges and code collaboration tools for the software development lifecycle and ecosystem. This includes repository hosting websites, issue trackers, code review applications, and more. ForgeFed provides a common substrate for people to create interoperable code collaboration websites and applications. What’s great about version control systems like git is that it is easy to move code from one remote to another as one deems approriate. Yet, tangential artifacts that are equally as important are not as portable or accessible. Initiatives like ForgeFed pave the way to a world in which we’re not dependent on a centralized entity whose interests and incentives may no longer align with one’s own. Hosting Forgejo has been uneventful and virtually maintenance-free. The setup was quick, though configuring Forgejo Runners did require some trial and error as some parts of the documentation were not entirely clear to me. With regards to performance, Forgejo does run lightly. A quick glance at Grafana tells me that in the past seven days it has been using approximately 150MB of memory and 1-2% of CPU on a low-powered VPS. Such footprint would be even smaller if I were not using this for CI. Like all other services that I self host, Forgejo and Forgejo Runners run in Docker containers. This adds some overhead but that’s the tradeoff I choose to keep things simpler on the host 1 . I use Tailscale to access the server privately over SSH while also exposing the public repositories through a proxy on a separate public server. As for backups, Forgejo has a helpful command to export its files and SQLite database. I use this to run automated backups using Restic . When I decide to upgrade versions I read their well-written release notes , bumps the container’s image, and run an Ansible playbook that runs a backup, restarts the containers, and migrates the database if necessary. It’s worth pointing out that Forgejo just released their first Long Term Support release so the maintenance burden could be eased further by sticking with the LTS version and applying minor updates. Forgejo is written in Go so running it on “bare metal” is as simple as it gets, but I still use a container to keep logging, deployments, and configuration management consistent across many services that I host on a single VPS.  ↩︎ Forgejo is written in Go so running it on “bare metal” is as simple as it gets, but I still use a container to keep logging, deployments, and configuration management consistent across many services that I host on a single VPS.  ↩︎

0 views
Gabriel Garrido 1 years ago

Simple automated deployments using git push

Using remains one of my favorite ways of deploying software. It’s simple, effective, and you can stretch it significantly until you need more complex workflows. I’m not referring to using to trigger a Github action which builds and deploys software. I’m talking about using to deploy your branch to a server that you’ve named . I learned this workflow from Josef Strzibny’s excellent book Deployment from Scratch , which I’ve adapted somewhat. This note supposes you have SSH access to a server that has installed. Let’s assume that said server is already configured as a host in your machine’s SSH configuration file: I keep an Ansible playbook that automates provisioning this workflow. It should be easy to derive an equivalent bash script if you’re not using Ansible. The way this works is that you keep a bare git repository in the server where you want to deploy software. A bare repository is a repository that does not have a working directory. It does not push or pull. Anyone with access and permission to the server and the directory where git repository is created will be able to push to it to deploy. My convention is to create a directory for the project at hand in the directory. Inside, I will create two directories: a directory where the bare repository lives, and a directory where the source-controlled project files live. Then, you configure a script in the git repository hook’s directory. This script will check out the code that was pushed to the branch into the directory. You could do other git operations here like obtaining the current hash if you’re using that somewhere in your application code. Finally, you trigger a deployment script that also lives in the project root. This deployment script, in turn, takes care of whatever is necessary to build and release the pushed code. For example, you could use it generate a new version of a website that uses Hugo: It’s important to note that the current working directory for that script will be , so you may need to change directory or use absolute paths as I showed in the example above. Another important consideration is that the remote repository gets updated even if exits because of an error. It is recommended, particularly in the deploy script, to use so that the script stops if any command exits with an non-zero status. If an error ocurrs you’ll know right away because the stdout and stderr of is piped back to the client that pushed. I also recommend writing the script such that it is not coupled to a particular push. It should work with what’s in the directory. In other words, I should be able to use the same script to manually deploy the application if I have a reason to. Here are a couple of uses for this workflow, some of which I’ve done: If you’re using PHP, or serving plain HTML files, you may even get away with not having a script given that the hook updates the source files. At this point all you need to do is create a new SSH remote for the repository in your machine: The first above is the name of the remote, which is arbitrary. The second matches the name of the host we defined configured in our SSH configuration file. And finally, you push to it. Because this workflow is version-controlled, reverting or jumping to a specific version is just another git operation. I should emphasize that this workflow is pinned to the branch. Any change that you do must be reflected in that branch in order to get it live. That said, you can push other branches to this remote as it is a regular git repository. However, note the following difference. Doing will not deploy a new version with the contents of that branch. The hook above always checks out the contents of . You can either merge to in your machine and then push to it, or push directly from the branch to the remote: You can force push with if the remote warns you about discrepancies. This one is quite convenient for projects where you can get away with not using an image repository and build pipeline, and if building your image is not too process-intensive. Running periodically is necessary though.  ↩︎ Build a new binary of a Go program using and replace the process Build a new image of a Docker container using or 1 and replace the running containers Restart a Node.js server This one is quite convenient for projects where you can get away with not using an image repository and build pipeline, and if building your image is not too process-intensive. Running periodically is necessary though.  ↩︎

0 views
Gabriel Garrido 1 years ago

Caching HTML in CDNs

Content delivery networks do not cache HTML out of the box. This is expected, otherwise their support lines would flood with the same issue: “why is my site not updating?” I want to cache my site fully. It’s completely static and I post infrequently. With some exceptions, most pages remain unchanged once published. Caching at the edge means that requests to this site travel shorter distances across the globe. Keeping pages and assets small means less bytes to send along . This is the way. Most CDNs have the ability to configure which files to cache and for how long. I use Bunny , which does not cache HTML out of the box. What follows should translate to other CDNs, but you may have to adapt a setting or two. I want the CDN to hold on to HTML files for a year. To that end, I define an edge rule that looks like this: If my origin responds to a request with a content type of or 1 then the CDN will cache it. I don’t want the CDN to cache requests for pages that do not exist, so I configure it to check that the origin also returns a status code of . I need to make sure that the CDN instructs browsers to not cache the page. Why? If I publish an update to a page, I can invalidate the cache in the CDN but not in someone’s browser. I create a second edge rule that looks like this: The header is used to instruct the browser on how to cache the resource. marks the response as stale and ensures it validates the page against the CDN. Suppose someone has loaded my page once. If they return to this page, the browser will verify with the CDN whether the page has been modified since it was requested. If it hasn’t, the CDN will send a without transmitting back the contents of the requested page. The header is set only if the requested resource ends in , as feeds do, or it does not have an extension at all (as html pages do) 2 . Lastly, I need to tell the CDN to clear some of its cache when I publish an update to the site. For example: I’ll admit that invalidating the cache is a reason why someone may just not bother with caching HTML. Following the list above, there are different scenarios to consider. If I edit a given post I may be tempted to think that only that page’s cache must be invalidated. However, in my site an update to a single page can manifest in changes to other pages: A change in the title requires updating the index, a new tag requires updating the tag index, a new internal backlink requires updating the referenced page. Lest you’re down to play whack-a-mole, the feasibility of this endeavour rests in the ability to invalidate the cache correctly and automatically. In my case I check every new site build against the live build to purge the cache automatically. I’m also caching files, as RSS readers are probing my site’s feeds frequently.  ↩︎ These are the affordances that I can use in my CDN’s edge rules. Other CDNs may provide with something more explicit.  ↩︎ The CSS file for this site has a hash in its filename. Updating the CSS means a new hash which means all pages update their CSS reference.  ↩︎ Override Cache Time: seconds If Response Header If Status Code: Set Response Header: If File Extension If Response Header If I edit a page’s content. I publish a new post I update the site’s CSS 3 I’m also caching files, as RSS readers are probing my site’s feeds frequently.  ↩︎ These are the affordances that I can use in my CDN’s edge rules. Other CDNs may provide with something more explicit.  ↩︎ The CSS file for this site has a hash in its filename. Updating the CSS means a new hash which means all pages update their CSS reference.  ↩︎

0 views
Gabriel Garrido 1 years ago

Using Rclone to automate CDN cache invalidation

I want to cache HTML in Bunny’s content delivery network (CDN). Unlike other static files, HTML filenames cannot be hashed 1 or they’d be served with ugly urls. I need a way to invalidate the cache granularly when the site is published. When I publish this note, the corresponding HTML file is uploaded to Bunny’s file storage. This file is automatically replicated to Bunny’s file storage regions throughout the world and served by the CDN. HTML files are not cached by the CDN, so each request to this note is retrieved from the nearest file storage region. For example, if someone visits this page from Santiago, Chile, the HTML will be retrieved from the file storage region in São Paulo, Brazil. However, Bunny has several points of presence (PoP) that are closer to this person, including one in Santiago. The other static files are likely being served from there, so why not do HTML too? Once published, this page will likely remain unchanged for a long time. Unless I edit it, or change its styling, this page can be cached indefinitely in Bunny’s 123 PoPs around the globe. When the page needs updating, I can purge the cache granularly. In this note I’m skipping the part of configuring my CDN to cache HTML , which has some caveats which I’ll explain in a separate note. For now all you need to know is that HTML is cached in the CDN but not in the browser. A new set of files is generated each time this site is built using Hugo . I then use Rclone to upload these files to Bunny’s file storage. Rclone has a check command that I can use to compare my new set of files against those in Bunny’s file storage. It can also generate a manifest of the files that are different in the destination. Here’s the command: Suppose I’m updating my page. This page is already live and cached in the PoPs where a request for has been routed through. Once I’m ready to publish the changes, I build the new version of the site and run the command. Inspecting tells me that the in the file storage is different and will be updated when I upload the new build: Bunny has an API that I can use to purge cache in the CDN. Following the example above, I need to tell it to purge . I wrote a small script that loops through the file, formats each file path as its URL equivalent, and sends a request to Bunny’s API. This script is invoked after I finish uploading the new files to the Bunny’s file storage. The cache is immediately purged throughout Bunny’s CDN and the update is now live. You can use this approach to fully cache your page. Other types of files that would need the same cache purging are: In other words, any content whose URL remains constant across updates. CSS, JavaScript, SVGs and images have unique filenames based on their contents. Cache does not need to be invalidated for these because changing them means adding a new file to the CDN. In contrast, each page’s file is named and changing it means modifying the file.  ↩︎ takes a filename where the list of different files will be saved at limits the check to just HTML files assumes files in the destination do exist in the source checks the data of both files during the check, as hash comparison is not supported in Bunny’s storage RSS/Atom feeds CSS, JavaScript, SVGs and images have unique filenames based on their contents. Cache does not need to be invalidated for these because changing them means adding a new file to the CDN. In contrast, each page’s file is named and changing it means modifying the file.  ↩︎

0 views
Gabriel Garrido 1 years ago

Implementing internal backlinks in Hugo

As of today, some pages on this site surface any other internal page where they are cross-referenced. If you’ve used a tool like Obsidian, you’ll know how useful backlinks are to navigate related content. The joys of hypertext! Hugo does not generate or expose backlinks out of the box. Instead, you must compute the references on each page as it is being built. This approach considers the following constraints: For example, you can go to my /about page and see the various pages that reference it, including this one. When this page is built, all other pages are inspected. If a page’s content has in its content, it will be matched. Create a file in your theme’s directory with the following markup. Then, instantiate it in any template where you want to show backlinks using . I did some light testing and the following references are supported using : For relative references using , supposing I’m a page: For page bundles , these work: I wish Hugo had better affordances to accomplish this type of tasks. Until then, you must bear with adding logic to your templates and O(n^2). Only links in the markdown content are eligible All content pages are eligible Links use Hugo’s ref and relref shortcodes No explicit front-matter is required Anchors within the same page (e.g ) are not considered backlinks Multiple languages are not considered

0 views
Gabriel Garrido 1 years ago

Bundling a JSON file dynamically as a typed module

I recently refactored a Next.js single-page application to support white-labeled deployments. No requirements called for pulling and defining the configuration at runtime. Instead, the configuration is pulled once at build time and saved to the project root as a JSON file. This configuration is referred to both during the build and at runtime. For example, at build time the configuration is used to define a color scheme in Tailwind’s configuration file. At runtime, the configuration is used to set the logo URL in the page header. Without the corresponding setup, referring to this configuration in the runtime modules results in two issues: I’m going to gloss over the part of pulling the configuration file from its source during the build. Assume the configuration has been fetched and stored in a temporary file in the project root directory in the CI environment. The first issue is addressed using the aliasing feature that exists in most front-end tools like Webpack , Rollup and esbuild . This feature allows you to specify how a given module is resolved. In my case I wanted the JSON file to exist in the dependency graph as a module. This would allow me to import the configuration anywhere at runtime like: Next.js uses Webpack to build the application. To create an alias, you can add custom configuration to the file: Now we need to provide type information for the configuration module. I created a file in the directory with the following: The include option of the Typescript configuration file is set to so this type information will be included by the Typescript compiler. At this point, the module should have the appropriate types. Type hints for the configuration file in Sublime Text Validating at build time I want to make sure that the configuration file is valid before building the application. Using zod , I define a schema and validate the contents of the configuration JSON before running the build: The module does not exist within the directory so importing it raises an error The module has no type information so Typescript will raise an error when referring to its contents

0 views
Gabriel Garrido 1 years ago

Running private services with Tailscale and a custom domain

I use a virtual private server to host multiple services that are accessed only by me. Instead of exposing these services to the public internet, I use Tailscale to access them privately through my Tailscale network. There are several ways in which I can access these services through Tailscale. I have settled on an approach involving a custom domain, proper TLS certificates, and without opening the server to the public internet 1 . Assuming I’m connected to Tailscale, I can do the following on any of my devices: In this note I’ll share the different ways in which I’ve used Tailscale for this to explain why I prefer the current approach. Jump ahead if you care only about this particular implementation. For the sake of this post, lets pretend my server’s tailnet IP address is , and that the custom domain that I own is . The easiest way to reach a service within the Tailscale network is expose it directly on the server’s tailnet IP address. For example, if I’m using Docker I can publish the container’s port like: Once running, I can reach my RSS reader at . If MagicDNS is enabled, I can reach it at my server’s machine name (e.g ). In this approach I’m accessing the service using HTTP. This is not much of a problem because all connections between devices through the tailnet are end-to-end encrypted. However, the browser doesn’t know about this and warns me accordingly: Besides the fact that I prefer to run services behind a reverse proxy, a drawback with this approach is that I’d rather not have to remember the port assigned to each service. Serve is Tailscale’s way of exposing a local service to all other devices in the tailnet. Unlike the previous approach, this method supports TLS certificates. However, Tailscale’s HTTPS certificates feature must be turned on. To use I publish my container’s port to localhost instead: I can reach my RSS reader at by running on the server. The drawback here is that I only get one domain per device and I cannot use subdomains. I have to use subpaths if I want to expose multiple services at once. For example, by running the RSS reader is served at . The drawback with the subpath approach is that it typically implies having to reconfigure the base URL for each service. Having used the other approaches, I wanted to use a custom domain to isolate each service with subdomains. Likewise, I wanted run the webserver myself in order to consolidate the ingress behavior and configuration. The main component of this approach is to configure the tailnet with a DNS server that will resolve queries for the custom domain with a tailnet IP address. I had an unused domain in Porkbun that I decided to repurpose for this. All I had to do here was generate API credentials and enable API access for the domain. In Tailscale you can configure your tailnet to use a specific nameserver. In my case, I chose NextDNS as I was already using it 2 and Tailscale supports it out of the box . In NextDNS I created a new profile just for my tailnet and added a Rewrite for my custom domain in the Settings page. The domain will resolve to my server’s tailnet IP address. Lastly, I noted the endpoint ID shown in the Endpoints section of the Setup page. Back in the Tailscale admin console, I went to the Nameservers section of the DNS settings page, pressed Add nameserver and selected NextDNS . In the endpoint input, I entered the NextDNS profile endpoint ID. Once added, I enabled the Override local DNS option. At this point, DNS lookups for the custom domain in any of the tailnet devices resolved correctly. Lastly, I installed and configured Caddy on the server. I created a snippet in the Caddy configuration file that I could reuse across all of my hosts. This snippet is responsible for: Caddy does not support all of the available DNS providers out of the box so I had to build 3 the Caddy binary with the porkbun module . I included the server’s tailnet IP address and the API credentials that I generated in Porkbun in Caddy’s env file . I also set the certificate resolver to . Lastly, I included my email address in the global directive to be used for the issued certificates. At this point, I was good to go. Anytime I want to add a new hosted service I add a couple of lines to the that includes the snippet and defines the reverse proxy. The complete : I should note that I can swap any of the components in this implementation for an alternative: Will Norris of Tailscale has a great article on a different approach to this setup. There are no public DNS records pointing to this server, and a firewall blocks all incoming connections except those made through the Tailscale interface .  ↩︎ I use NextDNS to block ads at the network level in my router.  ↩︎ I use Ansible to manage my server, and the caddy-ansible role makes it trivial to add modules to your Caddy binary.  ↩︎ Access my RSS reader at Fetch my calendars and address books at Pull code from my private git repositories at Binding the corresponding host to the tailnet IP so that the hosts are reachable only at the tailnet address Rejecting requests made from outside the tailnet address space Configuring TLS using a DNS challenge with Porkbun Traefik instead of Caddy for the webserver Hosting CoreDNS instead of the cloud offering of NextDNS Wireguard instead of Tailscale There are no public DNS records pointing to this server, and a firewall blocks all incoming connections except those made through the Tailscale interface .  ↩︎ I use NextDNS to block ads at the network level in my router.  ↩︎ I use Ansible to manage my server, and the caddy-ansible role makes it trivial to add modules to your Caddy binary.  ↩︎

0 views
Gabriel Garrido 1 years ago

Read your Ghost subscriptions using RSS

I follow a handful of Ghost-powered sites that don’t serve their articles fully in their RSS feed. Most of the time this is because an article requires a paid or free subscription to be read. This turns out to be a nuisance as I prefer to read things using an RSS reader. For example, 404 Media’s recent article on celebrity AI scam ads in Youtube ends abruptly in my RSS reader. If I want to read the entire article then I am expected to visit their site, log in, and read it there. Article in the RSS reader cut short Update: In March 2024, 404 Media introduced full text RSS feeds for paid subscribers. If you’re not a paid subscriber, this note is still relevant if you wish to read free articles in full-text via RSS for any Ghost blog out there. Miniflux is a minimal RSS reader and client that I self-host. I like to use it with NetNewsWire on my iPad so that I can process and read articles in batches while offline. One of my favorite features in Miniflux is the ability to set cookies for any feed that you’re subscribed to. You can use this to have Miniflux pull the feed content as if you were authenticated 1 in websites that use cookies to maintain the user’s session. Ghost uses cookies to keep users authenticated for at least six months . A Ghost-powered site will respond with a different RSS feed depending on who is making the request. If you’re logged in and your subscription is valid for the article at hand, you get the entire article. If you’re not logged in or if don’t have the appropriate subscription, you get the abbreviated article. This is great! I can continue to support the publisher that I’m subscribed to while retaining control over my reading experience. Only the cookies are necessary Back in Miniflux, head to the Feeds page and press Add feed and enter the site’s URL. Toggle the Advanced Options dropdown,look for the Set Cookies field and add the following string: . Replace with the corresponding value that you see in the browser’s cookie jar for each cookie. Press Find a feed . At this point Miniflux should find the RSS feed automatically and configure it accordingly. If you’ve already added the feed before you don’t need to remove and add the feed again. Instead, go to the feed settings page, add the cookie, and click Refresh to force Miniflux to re-fetch the feed. Article in the RSS reader rendered fully Cookie expiration I didn’t read enough of Ghost’s code to verify whether they refresh the authentication cookies every once in a while. That said, the cookie’s expiration time is long enough that I’d be find with having to replace them once every six months if necessary. Anyone with access to these cookies can impersonate your account in the corresponding website. I self-host Miniflux so no one has access to the Miniflux database but me. If you pay for Miniflux then you’ll want to make sure you feel comfortable trusting them with your account cookies. I wouldn’t be too worried but you should be aware of this fact.  ↩︎ Open your browser, visit the Ghost-powered site that you’re following, and log in. Open your browser’s developer tools and head to the storage section (under “Application” in Chromium-based browsers, “Storage” in Firefox and Safari). Look for the cookies section and locate the Ghost-powered site. Look for the and cookies. Back in Miniflux, head to the Feeds page and press Add feed and enter the site’s URL. Toggle the Advanced Options dropdown,look for the Set Cookies field and add the following string: . Replace with the corresponding value that you see in the browser’s cookie jar for each cookie. Press Find a feed . Anyone with access to these cookies can impersonate your account in the corresponding website. I self-host Miniflux so no one has access to the Miniflux database but me. If you pay for Miniflux then you’ll want to make sure you feel comfortable trusting them with your account cookies. I wouldn’t be too worried but you should be aware of this fact.  ↩︎

0 views
Gabriel Garrido 1 years ago

Adding Webfinger to this domain

Mastodon supports WebFinger to find a Mastodon account by searching for an email address. This is helpful if you’re looking for someone but you don’t know what server they’re in. I recently migrated my Mastodon account to a different server. Mastodon does as much as possible to make this a smooth process. It allows you to move over your followers, followings, and some other things. It also allows you to redirect from the old account to the new one so people can follow you around. Great! I figured this was a good time to add WebFinger to the mix. I can now be found by searching for [email protected] , regardless of what server I’m in. My Mastodon profile is found in Elk This works because Mastodon will perform a look up to https://garrido.io/.well-known/webfinger?resource=acct:[email protected] , which contains the necessary information to resolve the account. First, you must obtain the data contained by the WebFinger. In my case that meant visiting https://social.coop/.well-known/webfinger?resource=acct:[email protected] . You can replace with your Mastodon server domain, and with your Mastodon account name. Next up, you must serve this data under . If you need to resolve a different Webfinger per searched email, you can do that too. Mastodon will send the email as a query parameter 1 . In my case, I’m the only Mastodon user for this domain so I will respond to all requests with the same response. I’m using Hugo for this site, which means that I can add this as a static file . I added the following configuration to my Hugo file: This instructs Hugo to include as a source of static files. Then, I created inside the directory and then pasted the data that I received above. Because this site is completely static, all that is left for me is to build the site and upload it to my server. My web server will then return this file to any requests made for . Edit 15 November 2023 It was brought up to me that it wasn’t clear whether the searched email matters or not. I’ve updated the post to address this. This is helpful in cases where there are multiple Mastodon users for a given domain. If so, your web server would need to look up the requested email and return the matching WebFinger or a 404.  ↩︎ This is helpful in cases where there are multiple Mastodon users for a given domain. If so, your web server would need to look up the requested email and return the matching WebFinger or a 404.  ↩︎

0 views