Posts in Kubernetes (5 found)
alikhil 4 months ago

Why Graceful Shutdown Matters in Kubernetes

Have you ever deployed a new version of your app in Kubernetes and noticed errors briefly spiking during rollout? Many teams do not even realize this is happening, especially if they are not closely monitoring their error rates during deployments. There is a common misconception in the Kubernetes world that bothers me. The official Kubernetes documentation and most guides claim that “if you want zero downtime upgrades, just use rolling update mode on deployments”. I have learned the hard way that this simply it is not true - rolling updates alone are NOT enough for true zero-downtime deployments. And it is not just about deployments. Your pods can be terminated for many other reasons: scaling events, node maintenance, preemption, resource constraints, and more. Without proper graceful shutdown handling, any of these events can lead to dropped requests and frustrated users. In this post, I will share what I have learned about implementing proper graceful shutdown in Kubernetes. I will show you exactly what happens behind the scenes, provide working code examples, and back everything with real test results that clearly demonstrate the difference. If you are running services on Kubernetes, you have probably noticed that even with rolling updates (where Kubernetes gradually replaces pods), you might still see errors during deployment. This is especially annoying when you are trying to maintain “zero-downtime” systems. When Kubernetes needs to terminate a pod (for any reason), it follows this sequence: The problem? Most applications do not properly handle that SIGTERM signal. They just die immediately, dropping any in-flight requests. In the real world, while most API requests complete in 100-300ms, there are often those long-running operations that take 5-15 seconds or more. Think about processing uploads, generating reports, or running complex database queries. When these longer operations get cut off, that’s when users really feel the pain. Rolling updates are just one scenario where your pods might be terminated. Here are other common situations that can lead to pod terminations: Horizontal Pod Autoscaler Events : When HPA scales down during low-traffic periods, some pods get terminated. Resource Pressure : If your nodes are under resource pressure, the Kubernetes scheduler might decide to evict certain pods. Node Maintenance : During cluster upgrades, node draining causes many pods to be evicted. Spot/Preemptible Instances : If you are using cost-saving node types like spot instances, these can be reclaimed with minimal notice. All these scenarios follow the same termination process, so implementing proper graceful shutdown handling protects you from errors in all of these cases - not just during upgrades. Instead of just talking about theory, I built a small lab to demonstrate the difference between proper and improper shutdown handling. I created two nearly identical Go services: Both services: I specifically chose a 4-second processing time to make the problem obvious. While this might seem long compared to typical 100-300ms API calls, it perfectly simulates those problematic long-running operations that occur in real-world applications. The only difference between the services is how they respond to termination signals. To test them, I wrote a simple k6 script that hammers both services with requests while triggering rolling restart of service’s deployment. Here is what happened: The results speak for themselves. The basic service dropped 14 requests during the update (that is 2% of all traffic), while the graceful service handled everything perfectly without a single error. You might think “2% it is not that bad” — but if you are doing several deployments per day and have thousands of users, that adds up to a lot of errors. Plus, in my experience, these errors tend to happen at the worst possible times. After digging into this problem and testing different solutions, I have put together a simple recipe for proper graceful shutdown. While my examples are in Go, the fundamental principles apply to any language or framework you are using. Here are the key ingredients: First, your app needs to catch that SIGTERM signal instead of ignoring it: This part is easy - you are just telling your app to wake up when Kubernetes asks it to shut down. You need to know when it is safe to shut down, so keep track of ongoing requests: This counter lets you check if there are still requests being processed before shutting down. it is especially important for those long-running operations that users have already waited several seconds for - the last thing they want is to see an error right before completion! Here is a commonly overlooked trick - you need different health check endpoints for liveness and readiness: This separation is crucial. The readiness probe tells Kubernetes to stop sending new traffic, while the liveness probe says “do not kill me yet, I’m still working!” Now for the most important part - the shutdown sequence: I’ve found this sequence to be optimal. First, we mark ourselves as “not ready” but keep running. We pause to give Kubernetes time to notice and update its routing. Then we patiently wait until all in-flight requests finish before actually shutting down the server. Do not forget to adjust your Kubernetes configuration: This tells Kubernetes to wait up to 30 seconds for your app to finish processing requests before forcefully terminating it. If you are in a hurry, here are the key takeaways: Catch SIGTERM Signals : Do not let your app be surprised when Kubernetes wants it to shut down. Track In-Flight Requests : Know when it is safe to exit by counting active requests. Split Your Health Checks : Use separate endpoints for liveness (am I running?) and readiness (can I take traffic?). Fail Readiness First : As soon as shutdown begins, start returning “not ready” on your readiness endpoint. Wait for Requests : Do not just shut down - wait for all active requests to complete first. Use Built-In Shutdown : Most modern web frameworks have graceful shutdown options; use them! Configure Terminaton Grace Period : Give your pods enough time to complete the shutdown sequence. Test Under Load : You will not catch these issues in simple tests - you need realistic traffic patterns. You might be wondering if adding all this extra code is really worth it. After all, we’re only talking about a 2% error rate during pod termination events. From my experience working with high-traffic services, I would say absolutely yes - for three reasons: User Experience : Even small error rates look bad to users. Nobody wants to see “Something went wrong” messages, especially after waiting 10+ seconds for a long-running operation to complete. Cascading Failures : Those errors can cascade through your system, especially if services depend on each other. Long-running requests often touch multiple critical systems. Deployment Confidence : With proper graceful shutdown, you can deploy more frequently without worrying about causing problems. The good news is that once you have implemented this pattern once, it is easy to reuse across your services. You can even create a small library or template for your organization. In production environments where I have implemented these patterns, we have gone from seeing a spike of errors with every deployment to deploying multiple times per day with zero impact on users. that is a win in my book! If you want to dive deeper into this topic, I recommend checking out the article Graceful shutdown and zero downtime deployments in Kubernetes from learnk8s.io. It provides additional technical details about graceful shutdown in Kubernetes, though it does not emphasize the critical role of readiness probes in properly implementing the pattern as we have discussed here. For those interested in seeing the actual code I used in my testing lab, I’ve published it on GitHub with instructions for running the demo yourself. Have you implemented graceful shutdown in your services? Did you encounter any other edge cases I didn’t cover? Let me know in the comments how this pattern has worked for you! Sends a SIGTERM signal to your container Waits for a grace period (30 seconds by default) If the container does not exit after the grace period, it gets brutal and sends a SIGKILL signal Horizontal Pod Autoscaler Events : When HPA scales down during low-traffic periods, some pods get terminated. Resource Pressure : If your nodes are under resource pressure, the Kubernetes scheduler might decide to evict certain pods. Node Maintenance : During cluster upgrades, node draining causes many pods to be evicted. Spot/Preemptible Instances : If you are using cost-saving node types like spot instances, these can be reclaimed with minimal notice. Basic Service : A standard HTTP server with no special shutdown handling Graceful Service : The same service but with proper SIGTERM handling Process requests that take about 4 seconds to complete (intentionally configured for easier demonstration) Run in the same Kubernetes cluster with identical configurations Serve the same endpoints Catch SIGTERM Signals : Do not let your app be surprised when Kubernetes wants it to shut down. Track In-Flight Requests : Know when it is safe to exit by counting active requests. Split Your Health Checks : Use separate endpoints for liveness (am I running?) and readiness (can I take traffic?). Fail Readiness First : As soon as shutdown begins, start returning “not ready” on your readiness endpoint. Wait for Requests : Do not just shut down - wait for all active requests to complete first. Use Built-In Shutdown : Most modern web frameworks have graceful shutdown options; use them! Configure Terminaton Grace Period : Give your pods enough time to complete the shutdown sequence. Test Under Load : You will not catch these issues in simple tests - you need realistic traffic patterns. User Experience : Even small error rates look bad to users. Nobody wants to see “Something went wrong” messages, especially after waiting 10+ seconds for a long-running operation to complete. Cascading Failures : Those errors can cascade through your system, especially if services depend on each other. Long-running requests often touch multiple critical systems. Deployment Confidence : With proper graceful shutdown, you can deploy more frequently without worrying about causing problems.

0 views
fasterthanli.me 6 months ago

More devops than I bargained for

I recently had a bit of impromptu disaster recovery , and it gave me a hunger for more! More downtime! More kubernetes manifest! More DNS! Ahhhh! The plan was really simple. I love dedicated Hetzner servers with all my heart but they are not very fungible. You have to wait entire minutes for a new dedicated server to be provisioned. Sometimes you pay a setup fee, et cetera. And at some point to server static websites and serve as a K3S server, it’s simply just too big, and approximately twice the price that I should pay.

0 views
//pauls dev blog 1 years ago

How To Deploy Portainer in Kubernetes With Traefik Ingress Controller

This tutorial will show how to deploy Portainer (Business Edition) with Traefik as an Ingress Controller in Kubernetes (or k8s) to manage installed Service. To follow this tutorial you need the following: Helm is the primarily used Package manager for our Kubernetes cluster. We can use the official Helm installer script to automatically install the latest version. To download the script and execute it locally we run the following command: To access our Kubernetes cluster we have to use  and supply a file which we can download from our provider. Then we can store our kubecfonig in the environment variable to enable the configuration for all following commands: Alternatively, we can install "Lens - The Kubernetes IDE" from https://k8slens.dev/ . I would recommend working with Lens! Running Portainer on Kubernetes needs data persistence to store user data and other important information. During installation using Helm Portainer will automatically use the default storage class from our Kubernetes cluster. To list all storage classes in our Kubernetes cluster and identify the default we execute : The default storage class is marked with after its name. As I (or we) don't want to use I switch the default storage class by executing the following command: It is also possible to use the following parameter while installing with Helm: We will deploy Portainer in our Kubernetes cluster with Helm. To install with Helm we have to add the Portainer Helm repository: After the update finishes we will install Portainer and expose it via NodePort because we utilize Traefik to proxy requests to a URL and generate an SSL certificate: With this command, Portainer will be installed with default values. After some seconds we should see the following output: If you are using Lens you can now select the Pod and scroll down to the Ports section to forward the port to your local machine: Press Forward and enter a Port to access the Portainer instance in your browser to test Portainer before creating the Deployment: Press Start and a new browser window will open showing the initial registration screen for Portainer in which we can insert the first user: After the Form is filled out and the button Create user is pressed, we successfully created our Administrator user and Portainer is ready to use. If you have installed the business edition you should now insert your License Key which we got f rom following the registration process on the Portainer website . To make Portainer available with an URL and a SSL certificate within the WWW we have to add an IngressRoute for Traefik. The IngressRoute will contain the service name, the port Portainer is using, and the URL on which Portainer can be accessed. We should save (and maybe adjust) this code snippet as and apply it to our Kubernetes cluster by executing: Congratulations! We have reached the end of this short tutorial! I hope this article gave you a quick and neat overview of how to set up Portainer in your Kubernetes cluster using Traefik Proxy as an Ingress Controller. I would love to hear your feedback about this tutorial. Furthermore, if you already used Portainer in Kubernetes with Traefik and use a different approach please comment here and explain what you have done differently. Also, if you have any questions, please ask them in the comments. I try to answer them if possible. Feel free to connect with me on  Medium ,  LinkedIn ,  Twitter , and  GitHub . Thank you for reading, and  happy deploying! A running Kubernetes cluster or a Managed Kubernetes running a Traefik Ingress Controller ( see this tutorial ) A PRIMARY_DOMAIN

0 views
Karan Sharma 2 years ago

Nomad can do everything that K8s can

This blog post is ignited by the following Twitter exchange : I don’t take the accusation of unsubstantiated argument, especially on a technical topic lightly. I firmly believe in substantiated arguments and hence, here I am, elaborating on my stance. If found mistaken, I am open to corrections and revise my stance. In my professional capacity, I have run and managed several K8s clusters (using AWS EKS) for our entire team of devs ( been there done that ). The most complex piece of our otherwise simple and clean stack was K8s and we’d been longing to find a better replacement. None of us knew whether that would be Nomad or anything else. But we took the chance and we have reached a stage where we can objectively argue that, for our specific workloads, Nomad has proven to be a superior tool compared to K8s. Nomad presents a fundamental building block approach to designing your own services. It used to be true that Nomad was primarily a scheduler, and for serious production workloads, you had to rely on Consul for service discovery and Vault for secret management. However, this scenario has changed as Nomad now seamlessly integrates these features, making them first-class citizens in its environment. Our team replaced our HashiCorp stack with just Nomad, and we never felt constrained in terms of what we could accomplish with Consul/Vault. While these tools still hold relevance for larger clusters managed by numerous teams, they are not necessary for our use case. Kubernetes employs a declarative state for every operation in the cluster, essentially operating as a reconciliation mechanism to keep everything in check. In contrast, Nomad requires dealing with fewer components, making it appear lacking compared to K8s’s concept of everything being a “resource.” However, that is far from the truth. One of my primary critiques of K8s is its hidden complexities. While these abstractions might simplify things on the surface, debugging becomes a nightmare when issues arise. Even after three years of managing K8s clusters, I’ve never felt confident dealing with databases or handling complex networking problems involving dropped packets. You might argue that it’s about technical chops, which I won’t disagree with - but then do you want to add value to the business by getting shit done or do you want to be the resident K8s whiz at your organization? Consider this: How many people do you know who run their own K8s clusters? Even the K8s experts themselves preach about running prod clusters on EKS/GKE etc. How many fully leverage all that K8s has to offer? How many are even aware of all the network routing intricacies managed by kube-proxy? If these queries stir up clouds of uncertainty, it’s possible you’re sipping the Kubernetes Kool-Aid without truly comprehending the recipe, much like I found myself doing at one point Now, if you’re under the impression that I’m singing unabashed praises for Nomad, let me clarify - Nomad has its share of challenges. I’ve personally encountered and reported several. However, the crucial difference lies in Nomad’s lesser degree of abstraction, allowing for a comprehensive understanding of its internals. For instance, we encountered service reconciliation issues with a particular Nomad version. However, we could query the APIs, identify the problem, and write a bash script to resolve and reconcile it. It wouldn’t have been possible when there are too many moving parts in the system and we don’t know where to even begin debugging. The YAML hell is all too well known to all of us. In K8s, writing job manifests required a lot of effort (by the developers who don’t work with K8s all day) and were very complex to understand. It felt “too verbose” and involved copy pasting large blocks from the docs and trying to make things work. Compare that to HCL, it feels much nicer to read and shorter. Things are more straightforward to understand. I’ve not even touched upon the nice-ities on Nomad yet. Like better humanly understandable ACLs? Cleaner and simpler job spec, which defines the entire job in one file? A UI which actually shows everything about your cluster, nodes, and jobs? Not restricting your workloads to be run as Docker containers? A single binary which powers all of this? The central question this post aims to raise is: What can K8s do that Nomads can’t, especially considering the features people truly need? My perspectives are informed not only by my organization but also through interactions with several other organizations at various meetups and conferences. Yet, I have rarely encountered a use case that could only be managed by K8s. While Nomad isn’t a panacea for all issues, it’s certainly worth a try. Reducing the complexity of your tech stack can prove beneficial for your applications and, most importantly, your developers. At this point, K8s enjoys immense industry-wide support, while Nomad remains the unassuming newcomer. This contrast is not a negative aspect, per se. Large organizations often gravitate towards complexity and the opportunity to engage more engineers. However, if simplicity were the primary goal, the prevailing sense of overwhelming complexity in the infrastructure and operations domain wouldn’t be as pervasive. I hope my arguments provide a more comprehensive perspective and address the earlier critique of being unsubstantiated. Darren has responded to this blog post. You can read the response on Twitter . Ingress : We run a set of HAProxy on a few nodes which act as “L7 LBs”. Configured with Nomad services, they can do the routing based on Host headers. DNS : To provide external access to a service without using a proxy, we developed a tool that scans all services registered in the cluster and creates a corresponding DNS record on AWS Route53. Monitoring : Ah my fav. You wanna monitor your K8s cluster. Sure, here’s kube-prometheus , prometheus-operator , kube-state-metrics . Choices, choices. Enough to confuse you for days. Anyone who’s ever deployed any of these, tell me why this thing needs such a monstrosity setup of CRDs and operators. Monitoring Nomad is such a breeze, 3 lines of HCL config and done. Statefulsets : It’s 2023 and the irony is rich - the recommended way to run a database inside K8s is… not to run it inside K8s at all. In Nomad, we run a bunch of EC2 instances and tag them as nodes. The DBs don’t float around as containers to random nodes. And there’s no CSI plugin reaching for a storage disk in AZ-1 when the node is basking in AZ-2. Running a DB on Nomad feels refreshingly like running it on an unadorned EC2 instance. Autoscale : All our client nodes (except for the nodes) are ephemeral and part of AWS’s Auto Scaling Groups (ASGs). We use ASG rules for the horizontal scaling of the cluster. While Nomad does have its own autoscale, our preference is to run large instances dedicated to specific workloads, avoiding a mix of different workloads on the same machine.

0 views
Humble Thoughts 2 years ago

Self-hosted Blog Guide for Engineers: (the Sweet) Part III — Further maintenance

This article is part of a guide for engineers on setting up a self-hosted blog (or any other service) on Ghost using Terraform, Ansible, and Kubernetes ( part I , part II ). Congratulations on successfully setting up your Kubernetes cluster with a Ghost blog! As you embark on this new journey of managing and maintaining your dynamic platform, it is essential to stay proactive in ensuring the smooth operation and optimal performance of your environment. This article will explore three crucial aspects of maintaining your newly created Kubernetes cluster with Ghost blog: updating the Ghost configuration file, upgrading Ghost and MySQL versions, and expanding storage volume. The Ghost configuration file serves as the backbone of your blog's settings, defining everything from database connectivity to theme customization. Regularly updating this file ensures that your Ghost installation remains up-to-date with any changes or improvements introduced by the Ghost team. Considering you've followed the instructions from Part II of this manual, you should now have the configuration file set up with Kubernetes secret. In that case, all you'd need to do to upgrade the configuration is: Ghost, being an open-source platform, constantly receives updates and new features that improve its security, performance, and overall user experience. Similarly, MySQL, the database management system that powers your Ghost blog, often releases new versions with bug fixes and optimizations. Fortunately for us, the setup we've configured allows us to update both technologies with little effort. All you need to do is simply update the image directive version ( ) in the and apply the change to the Kubernetes cluster in the namespace. Similarly, it's done for MySQL by updating the image version directive for MySQL deployment in the file and applying the change. As your Ghost blog grows in content and attracts more visitors, you may find that the initial storage volume allocated to your Kubernetes cluster becomes insufficient. In this section, we will look into expanding your storage volume to accommodate the increasing demands of your blog. One of the critical aspects of our setup was CSI Driver installation for Hetzner which would help us to modify a separate volume via Kubernetes cluster configuration changes. To expand the volume, change the param to a new value in the config and apply it running the command. This will result in smooth Hetzner Volume resizing without re-creation of it. Note that this value can only be increased and can't be shrunk back — don't go too far, as it will affect the cost. As you can see, maintaining such a blog is simple and hassle-free. In future articles, I'll aim to cover details on further expansion — moving from a single-node configuration to an indeed clustered, highly-available solution. Subscribe to not miss out on the updates!

0 views