Posts in Kubernetes (9 found)
alikhil 2 weeks ago

Kubernetes In-Place Pod Resize

About six years ago, while operating a large Java-based platform in Kubernetes, I noticed a recurring problem: our services required significantly higher CPU and memory during application startup. Heavy use of Spring Beans and AutoConfiguration forced us to set inflated resource requests and limits just to survive bootstrap, even though those resources were mostly unused afterwards. This workaround never felt right. As an engineer, I wanted a solution that reflected the actual lifecycle of an application rather than its worst moment. I opened an issue in the Kubernetes repository describing the problem and proposing an approach to adjust pod resources dynamically without restarts. The issue received little discussion but quietly accumulated interest over time (13 👍 emoji reaction). Every few months, an automation bot attempted to mark it as stale, and every time, I removed the label. This went on for nearly six years… Until the release of Kubernetes 1.35 where In-Place Pod Resize feature was marked as stable . In-Place Pod Resize allows Kubernetes to update CPU and memory requests and limits without restarting pods, whenever it is safe to do so. This significantly reduces unnecessary restarts caused by resource changes, leading to fewer disruptions and more reliable workloads. For applications whose resource needs evolve over time, especially after startup, this feature provides a long-missing building block. The new field is configured at the pod spec level. While it is technically possible to change pod resources manually, doing so does not scale. In practice, this feature should be driven by a workload controller. At the moment, the only controller that supports in-place pod resize is the Vertical Pod Autoscaler (VPA). There are two enhancement proposals enable this behavior: AEP-4016: Support for in place updates in VPA which introduces update mode AEP-7862: CPU Startup Boost which is about temporarily boosting pod by giving more cpu during pod startup. This is conceptually similar to the approach proposed in my original issue. Here is an example of Deployment and VPA using both AEP features: With such configuration pod will have doubled cpu requests and limits during startup. During the boost period no resizing will happen. Once the pod reaches the state, the VPA controller scales CPU down to the currently recommended value. After that, VPA continues operating normally, with the key difference that resource updates are applied in place whenever possible. Does this feature fully solve the problem described above? Only partially. First, most application runtimes still impose fundamental constraints. Java and Python runtimes do not currently support resizing memory limits without a restart. This limitation exists outside of Kubernetes itself and is tracked in the OpenJDK project via an open ticket . Second, Kubernetes does not yet support decreasing memory limits, even with in-place Pod Resize enabled. This is a known limitation documented in the enhancement proposal for memory limit decreases . As a result, while in-place Pod Resize effectively addresses CPU-related startup spikes, memory resizing remains an open problem. In place Pod Resize gives a foundation for cool new features like StartupBoost and makes use of VPA more reliable. While important gaps remain, such as memory decrease support and scheduling race condition , this change represents a meaningful step forward. For workloads with distinct startup and steady-state phases, Kubernetes is finally beginning to model reality more closely. AEP-4016: Support for in place updates in VPA which introduces update mode AEP-7862: CPU Startup Boost which is about temporarily boosting pod by giving more cpu during pod startup. This is conceptually similar to the approach proposed in my original issue.

0 views
alikhil 2 months ago

kubectl-find - UNIX-find-like plugin to find resources and perform action on them

Recently, I have developed a plugin for inspired by UNIX utility to find and perform action on resources. And few days ago number of stars in the repo reached 50! I think it’s a good moment to tell more about the project. As engineer who works with kubernetes everyday I use kubectl a lot. Actually, more than 50% of my terminal history commands are related to kubernetes. Here is a top 10 commands: Run this command if you are curious what about yours the most popular commands in terminal history. I use kubectl to check status of the pods, delete orphaned resources, trigger sync on and much more. When I realized half my terminal history was just kubectl commands, I thought — there must be a better way to find things in Kubernetes without chaining pipes with / / . And I imagined how nice it would be to have a UNIX -like tool — something that lets you search for exactly what you need in the cluster and then perform actions directly on the matching resources. I searched for a krew plugin like this but there was not any. For that reason, I decided to develop one ! I used sample-cli-plugin as a starting point. Its clean repository structure and straightforward design make it a great reference for working with the Kubernetes API. Additionally, it allows easy reuse of the extensive Kubernetes client libraries. Almost everything in the Kubernetes ecosystem is written in Go, and this plugin is no exception — which is great, as it allows building binaries for a wide range of CPU architectures and operating systems. Use filter to find any resource by any custom condition. uses gojq implementation of . By default, will print found resources to Stdout. However, there flags that you can provide to perform action on found resources: Use krew to install the plugin: I’m currently working on adding: If you’re tired of writing long chains, give a try — it’s already saved me countless keystrokes. Check out the repo ⭐ github.com/alikhil/kubectl-find and share your ideas or issues — I’d love to hear how you use it! - to delete them - to patch with provided JSON - to run command on pods JSON/YAML output format More filters Saved queries

0 views
matduggan.com 6 months ago

What Would a Kubernetes 2.0 Look Like

Around 2012-2013 I started to hear a lot in the sysadmin community about a technology called "Borg". It was (apparently) some sort of Linux container system inside of Google that ran all of their stuff. The terminology was a bit baffling, with something called a "Borglet" inside of clusters with "cells" but the basics started to leak. There was a concept of "services" and a concept of "jobs", where applications could use services to respond to user requests and then jobs to complete batch jobs that ran for much longer periods of time. Then on June 7th, 2014, we got our first commit of Kubernetes. The Greek word for 'helmsman' that absolutely no one could pronounce correctly for the first three years. (Is it koo-ber-NET-ees? koo-ber-NEET-ees? Just give up and call it k8s like the rest of us.) Microsoft, RedHat, IBM, Docker join the Kubernetes community pretty quickly after this, which raised Kubernetes from an interesting Google thing to "maybe this is a real product?" On July 21st 2015 we got the v1.0 release as well as the creation of the CNCF. In the ten years since that initial commit, Kubernetes has become a large part of my professional life. I use it at home, at work, on side projects—anywhere it makes sense. It's a tool with a steep learning curve, but it's also a massive force multiplier. We no longer "manage infrastructure" at the server level; everything is declarative, scalable, recoverable and (if you’re lucky) self-healing. But the journey hasn't been without problems. Some common trends have emerged, where mistakes or misconfiguration arise from where Kubernetes isn't opinionated enough. Even ten years on, we're still seeing a lot of churn inside of ecosystem and people stepping on well-documented landmines. So, knowing what we know now, what could we do differently to make this great tool even more applicable to more people and problems? Let's start with the positive stuff. Why are we still talking about this platform now? Containers at scale Containers as a tool for software development make perfect sense. Ditch the confusion of individual laptop configuration and have one standard, disposable concept that works across the entire stack. While tools like Docker Compose allowed for some deployments of containers, they were clunky and still required you as the admin to manage a lot of the steps. I set up a Compose stack with a deployment script that would remove the instance from the load balancer, pull the new containers, make sure they started and then re-added it to the LB, as did lots of folks. K8s allowed for this concept to scale out, meaning it was possible to take a container from your laptop and deploy an identical container across thousands of servers. This flexibility allowed organizations to revisit their entire design strategy, dropping monoliths and adopting more flexible (and often more complicated) micro-service designs. Low-Maintenance If you think of the history of Operations as a sort of "naming timeline from pets to cattle", we started with what I affectionately call the "Simpsons" era. Servers were bare metal boxes set up by teams, they often had one-off names that became slang inside of teams and everything was a snowflake. The longer a server ran, the more cruft it picked up until it became a scary operation to even reboot them, much less attempt to rebuild them. I call it the "Simpsons" era because among the jobs I was working at the time, naming them after Simpsons characters was surprisingly common. Nothing fixed itself, everything was a manual operation. Then we transition into the "01 Era". Tools like Puppet and Ansible have become common place, servers are more disposable and you start to see things like bastion hosts and other access control systems become the norm. Servers aren't all facing the internet, they're behind a load balancer and we've dropped the cute names for stuff like "app01" or "vpn02". Organizations designed it so they could lose some of their servers some of the time. However failures still weren't self-healing, someone still had to SSH in to see what broke, write up a fix in the tooling and then deploy it across the entire fleet. OS upgrades were still complicated affairs. We're now in the "UUID Era". Servers exist to run containers, they are entirely disposable concepts. Nobody cares about how long a particular version of the OS is supported for, you just bake a new AMI and replace the entire machine. K8s wasn't the only technology enabling this, but it was the one that accelerated it. Now the idea of a bastion server with SSH keys that I go to the underlying server to fix problems is seen as more of a "break-glass" solution. Almost all solutions are "destroy that Node, let k8s reorganize things as needed, make a new Node". A lot of the Linux skills that were critical to my career are largely nice to have now, not need to have. You can be happy or sad about that, I certainly switch between the two emotions on a regular basis, but it's just the truth. Running Jobs The k8s jobs system isn't perfect, but it's so much better than the "snowflake cron01 box" that was an extremely common sight at jobs for years. Running on a cron schedule or running from a message queue, it was now possible to reliably put jobs into a queue, have them get run, have them restart if they didn't work and then move on with your life. Not only does this free up humans from a time-consuming and boring task, but it's also simply a more efficient use of resources. You are still spinning up a pod for every item in the queue, but your teams have a lot of flexibility inside of the "pod" concept for what they need to run and how they want to run it. This has really been a quality of life improvement for a lot of people, myself included, who just need to be able to easily background tasks and not think about them again. Service Discoverability and Load Balancing Hard-coded IP addresses that lived inside of applications as the template for where requests should be routed has been a curse following me around for years. If you were lucky, these dependencies weren't based on IP address but were actually DNS entries and you could change the thing behind the DNS entry without coordinating a deployment of a million applications. K8s allowed for simple DNS names to call other services. It removed an entire category of errors and hassle and simplified the entire thing down. With the Service API you had a stable, long lived IP and hostname that you could just point things towards and not think about any of the underlying concepts. You even have concepts like ExternalName that allow you to treat external services like they're in the cluster. YAML was appealing because it wasn't JSON or XML, which is like saying your new car is great because it's neither a horse nor a unicycle. It demos nicer for k8s, looks nicer sitting in a repo and has the illusion of being a simple file format. In reality. YAML is just too much for what we're trying to do with k8s and it's not a safe enough format. Indentation is error-prone, the files don't scale great (you really don't want a super long YAML file), debugging can be annoying. YAML has so many subtle behaviors outlined in its spec. I still remember not believing what I was seeing the first time I saw the Norway Problem. For those lucky enough to not deal with it, the Norway Problem in YAML is when 'NO' gets interpreted as false. Imagine explaining to your Norwegian colleagues that their entire country evaluates to false in your configuration files. Add in accidental numbers from lack of quotes, the list goes on and on. There are much better posts on why YAML is crazy than I'm capable of writing: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell HCL is already the format for Terraform, so at least we'd only have to hate one configuration language instead of two. It's strongly typed with explicit types. There's already good validation mechanisms. It is specifically designed to do the job that we are asking YAML to do and it's not much harder to read. It has built-in functions people are already using that would allow us to remove some of the third-party tooling from the YAML workflow. I would wager 30% of Kubernetes clusters today are already being managed with HCL via Terraform. We don't need the Terraform part to get a lot of the benefits of a superior configuration language. The only downsides are that HCL is slightly more verbose than YAML, and its Mozilla Public License 2.0 (MPL-2.0) would require careful legal review for integration into an Apache 2.0 project like Kubernetes. However, for the quality-of-life improvements it offers, these are hurdles worth clearing. Why HCL is better Let's take a simple YAML file. Even in the most basic example, there are footguns everywhere. HCL and the type system would catch all of these problems. Take a YAML file like this that you probably have 6000 in your k8s repo. Now look at HCL without needing external tooling. Here's all the pros you get with this move. I know, I'm the 10,000 person to write this. Etcd has done a fine job, but it's a little crazy that it is the only tool for the job. For smaller clusters or smaller hardware configuration, it's a large use of resources in a cluster type where you will never hit the node count where it pays off. It's also a strange relationship between k8s and etcd now, where k8s is basically the only etcd customer left. What I'm suggesting is taking the work of kine and making it official. It makes sense for the long-term health of the project to have the ability to plug in more backends, adding this abstraction means it (should) be easier to swap in new/different backends in the future and it also allows for more specific tuning depending on the hardware I'm putting out there. What I suspect this would end up looking like is much like this: https://github.com/canonical/k8s-dqlite . Distributed SQlite in-memory with Raft consensus and almost zero upgrade work required that would allow cluster operators to have more flexibility with the persistence layer of their k8s installations. If you have a conventional server setup in a datacenter and etcd resource usage is not a problem, great! But this allows for lower-end k8s to be a nicer experience and (hopefully) reduces dependence on the etcd project. Helm is a perfect example of a temporary hack that has grown to be a permanent dependency. I'm grateful to the maintainers of Helm for all of their hard work, growing what was originally a hackathon project into the de-facto way to install software into k8s clusters. It has done as good a job as something could in fulfilling that role without having a deeper integration into k8s. All that said, Helm is a nightmare to use. The Go templates are tricky to debug, often containing complex logic that results in really confusing error scenarios. The error messages you get from those scenarios are often gibberish. Helm isn't a very good package system because it fails at some of the basic tasks you need a package system to do, which are transitive dependencies and resolving conflicts between dependencies. What do I mean? Tell me what this conditional logic is trying to do: Or if I provide multiple values files to my chart, which one wins: Ok, what if I want to manage my application and all the application dependencies with a Helm chart. This makes sense, I have an application that itself has dependencies on other stuff so I want to put them all together. So I define my sub-charts or umbrella charts inside of my Chart.yaml. But assuming I have multiple applications, it's entirely possible that I have 2 services both with a dependency on nginx or whatever like this: Helm doesn't handle this situation gracefully because template names are global with their templates loaded alphabetically. Basically you need to: The list of issues goes on and on. Let's just go to the front page of artifacthub: I'll grab elasticsearch cause that seems important. Seems pretty bad for the Official Elastic helm chart. Certainly will be right, it's an absolute critical dependency for the entire industry. Nope. Also how is the maintainer of the chart "Kubernetes" and it's still not marked as a . Like Christ how much more verified does it get. I could keep writing for another 5000 words and still wouldn't have outlined all the problems. There isn't a way to make Helm good enough for the task of "package manager for all the critical infrastructure on the planet". Let's call our hypothetical package system KubePkg, because if there's one thing the Kubernetes ecosystem needs, it's another abbreviated name with a 'K' in it. We would try to copy as much of the existing work inside the Linux ecosystem while taking advantage of the CRD power of k8s. My idea looks something like this: The packages are bundles like a Linux package: There's a definition file that accounts for as many of the real scenarios that you actually encounter when installing a thing. There's a real signing process that would be required and allow you more control over the process. Like how great would it be to have something where I could automatically update packages without needing to do anything on my side. What k8s needs is a system that meets the following requirements: Try to imagine, across the entire globe, how much time and energy has been invested in trying to solve any one of the following three problems. I am not suggesting the entire internet switches over to IPv6 and right now k8s happily supports IPv6-only if you want and a dualstack approach. But I'm saying now is the time to flip the default and just go IPv6. You eliminate a huge collection of problems all at once. It has nothing to do with driving IPv6 adoption across the entire globe and just an acknowledgement that we no longer live in a world where you have to accept the weird limitations of IPv4 in a universe where you may need 10,000 IPs suddenly with very little warning. The benefits for organizations with public IPv6 addresses is pretty obvious, but there's enough value there for cloud providers and users that even the corporate overlords might get behind it. AWS never needs to try and scrounge up more private IPv4 space inside of a VPC. That's gotta be worth something. The common rebuttal to these ideas is, "Kubernetes is an open platform, so the community can build these solutions." While true, this argument misses a crucial point: defaults are the most powerful force in technology. The "happy path" defined by the core project dictates how 90% of users will interact with it. If the system defaults to expecting signed packages and provides a robust, native way to manage them, that is what the ecosystem will adopt. This is an ambitious list, I know. But if we're going to dream, let's dream big. After all, we're the industry that thought naming a technology 'Kubernetes' would catch on, and somehow it did! We see this all the time in other areas like mobile developer and web development, where platforms assess their situation and make radical jumps forward. Not all of these are necessarily projects that the maintainers or companies would take on but I think they're all ideas that someone should at least revisit and think "is it worth doing now that we're this nontrivial percentage of all datacenter operations on the planet"? Questions/feedback/got something wrong? Find me here: https://c.im/@matdevdug Type Safety : Preventing type-related errors before deployment Variables and References : Reducing duplication and improving maintainability Functions and Expressions : Enabling dynamic configuration generation Conditional Logic : Supporting environment-specific configurations Loops and Iteration : Simplifying repetitive configurations Better Comments : Improving documentation and readability Error Handling : Making errors easier to identify and fix Modularity : Enabling reuse of configuration components Validation : Preventing invalid configurations Data Transformations : Supporting complex data manipulations Don't declare a dependency on the same chart more than once (hard to do for a lot of microservices) If you do have the same chart declared multiple times, has to use the exact same version Cross-Namespace installation stinks Chart verification process is a pain and nobody uses it No metadata in chart searching. You can only search by name and description, not by features, capabilities, or other metadata. Helm doesn't strictly enforce semantic versioning If you uninstall and reinstall a chart with CRDs, it might delete resources created by those CRDs. This one has screwed me multiple times and is crazy unsafe. True Kubernetes Native : Everything is a Kubernetes resource with proper status and events First-Class State Management : Built-in support for stateful applications Enhanced Security : Robust signing, verification, and security scanning Declarative Configuration : No templates, just structured configuration with schemas Lifecycle Management : Comprehensive lifecycle hooks and upgrade strategies Dependency Resolution : Linux-like dependency management with semantic versioning Audit Trail : Complete history of changes with who, what, and when, not what Helm currently provides. Policy Enforcement : Support for organizational policies and compliance. Simplified User Experience : Familiar Linux-like package management commands. It seems wild that we're trying to go a different direction from the package systems that have worked for decades. I need this pod in this cluster to talk to that pod in that cluster. There is a problem happening somewhere in the NAT traversal process and I need to solve it I have run out of IP addresses with my cluster because I didn't account for how many you use. Remember: A company starting with a /20 subnet (4,096 addresses), deploys 40 nodes with 30 pods each, and suddenly realizes they're approaching their IP limit. Not that many nodes! Flatter, less complicated network topology inside of the cluster. The distinction between multiple clusters becomes a thing organizations can choose to ignore if they want if they want to get public IPs. Easier to understand exactly the flow of traffic inside of your stack. Built-in IPSec

0 views
alikhil 7 months ago

Why Graceful Shutdown Matters in Kubernetes

Have you ever deployed a new version of your app in Kubernetes and noticed errors briefly spiking during rollout? Many teams do not even realize this is happening, especially if they are not closely monitoring their error rates during deployments. There is a common misconception in the Kubernetes world that bothers me. The official Kubernetes documentation and most guides claim that “if you want zero downtime upgrades, just use rolling update mode on deployments”. I have learned the hard way that this simply it is not true - rolling updates alone are NOT enough for true zero-downtime deployments. And it is not just about deployments. Your pods can be terminated for many other reasons: scaling events, node maintenance, preemption, resource constraints, and more. Without proper graceful shutdown handling, any of these events can lead to dropped requests and frustrated users. In this post, I will share what I have learned about implementing proper graceful shutdown in Kubernetes. I will show you exactly what happens behind the scenes, provide working code examples, and back everything with real test results that clearly demonstrate the difference. If you are running services on Kubernetes, you have probably noticed that even with rolling updates (where Kubernetes gradually replaces pods), you might still see errors during deployment. This is especially annoying when you are trying to maintain “zero-downtime” systems. When Kubernetes needs to terminate a pod (for any reason), it follows this sequence: The problem? Most applications do not properly handle that SIGTERM signal. They just die immediately, dropping any in-flight requests. In the real world, while most API requests complete in 100-300ms, there are often those long-running operations that take 5-15 seconds or more. Think about processing uploads, generating reports, or running complex database queries. When these longer operations get cut off, that’s when users really feel the pain. Rolling updates are just one scenario where your pods might be terminated. Here are other common situations that can lead to pod terminations: Horizontal Pod Autoscaler Events : When HPA scales down during low-traffic periods, some pods get terminated. Resource Pressure : If your nodes are under resource pressure, the Kubernetes scheduler might decide to evict certain pods. Node Maintenance : During cluster upgrades, node draining causes many pods to be evicted. Spot/Preemptible Instances : If you are using cost-saving node types like spot instances, these can be reclaimed with minimal notice. All these scenarios follow the same termination process, so implementing proper graceful shutdown handling protects you from errors in all of these cases - not just during upgrades. Instead of just talking about theory, I built a small lab to demonstrate the difference between proper and improper shutdown handling. I created two nearly identical Go services: Both services: I specifically chose a 4-second processing time to make the problem obvious. While this might seem long compared to typical 100-300ms API calls, it perfectly simulates those problematic long-running operations that occur in real-world applications. The only difference between the services is how they respond to termination signals. To test them, I wrote a simple k6 script that hammers both services with requests while triggering rolling restart of service’s deployment. Here is what happened: The results speak for themselves. The basic service dropped 14 requests during the update (that is 2% of all traffic), while the graceful service handled everything perfectly without a single error. You might think “2% it is not that bad” — but if you are doing several deployments per day and have thousands of users, that adds up to a lot of errors. Plus, in my experience, these errors tend to happen at the worst possible times. After digging into this problem and testing different solutions, I have put together a simple recipe for proper graceful shutdown. While my examples are in Go, the fundamental principles apply to any language or framework you are using. Here are the key ingredients: First, your app needs to catch that SIGTERM signal instead of ignoring it: This part is easy - you are just telling your app to wake up when Kubernetes asks it to shut down. You need to know when it is safe to shut down, so keep track of ongoing requests: This counter lets you check if there are still requests being processed before shutting down. it is especially important for those long-running operations that users have already waited several seconds for - the last thing they want is to see an error right before completion! Here is a commonly overlooked trick - you need different health check endpoints for liveness and readiness: This separation is crucial. The readiness probe tells Kubernetes to stop sending new traffic, while the liveness probe says “do not kill me yet, I’m still working!” Now for the most important part - the shutdown sequence: I’ve found this sequence to be optimal. First, we mark ourselves as “not ready” but keep running. We pause to give Kubernetes time to notice and update its routing. Then we patiently wait until all in-flight requests finish before actually shutting down the server. Do not forget to adjust your Kubernetes configuration: This tells Kubernetes to wait up to 30 seconds for your app to finish processing requests before forcefully terminating it. If you are in a hurry, here are the key takeaways: Catch SIGTERM Signals : Do not let your app be surprised when Kubernetes wants it to shut down. Track In-Flight Requests : Know when it is safe to exit by counting active requests. Split Your Health Checks : Use separate endpoints for liveness (am I running?) and readiness (can I take traffic?). Fail Readiness First : As soon as shutdown begins, start returning “not ready” on your readiness endpoint. Wait for Requests : Do not just shut down - wait for all active requests to complete first. Use Built-In Shutdown : Most modern web frameworks have graceful shutdown options; use them! Configure Terminaton Grace Period : Give your pods enough time to complete the shutdown sequence. Test Under Load : You will not catch these issues in simple tests - you need realistic traffic patterns. You might be wondering if adding all this extra code is really worth it. After all, we’re only talking about a 2% error rate during pod termination events. From my experience working with high-traffic services, I would say absolutely yes - for three reasons: User Experience : Even small error rates look bad to users. Nobody wants to see “Something went wrong” messages, especially after waiting 10+ seconds for a long-running operation to complete. Cascading Failures : Those errors can cascade through your system, especially if services depend on each other. Long-running requests often touch multiple critical systems. Deployment Confidence : With proper graceful shutdown, you can deploy more frequently without worrying about causing problems. The good news is that once you have implemented this pattern once, it is easy to reuse across your services. You can even create a small library or template for your organization. In production environments where I have implemented these patterns, we have gone from seeing a spike of errors with every deployment to deploying multiple times per day with zero impact on users. that is a win in my book! If you want to dive deeper into this topic, I recommend checking out the article Graceful shutdown and zero downtime deployments in Kubernetes from learnk8s.io. It provides additional technical details about graceful shutdown in Kubernetes, though it does not emphasize the critical role of readiness probes in properly implementing the pattern as we have discussed here. For those interested in seeing the actual code I used in my testing lab, I’ve published it on GitHub with instructions for running the demo yourself. Have you implemented graceful shutdown in your services? Did you encounter any other edge cases I didn’t cover? Let me know in the comments how this pattern has worked for you! Sends a SIGTERM signal to your container Waits for a grace period (30 seconds by default) If the container does not exit after the grace period, it gets brutal and sends a SIGKILL signal Horizontal Pod Autoscaler Events : When HPA scales down during low-traffic periods, some pods get terminated. Resource Pressure : If your nodes are under resource pressure, the Kubernetes scheduler might decide to evict certain pods. Node Maintenance : During cluster upgrades, node draining causes many pods to be evicted. Spot/Preemptible Instances : If you are using cost-saving node types like spot instances, these can be reclaimed with minimal notice. Basic Service : A standard HTTP server with no special shutdown handling Graceful Service : The same service but with proper SIGTERM handling Process requests that take about 4 seconds to complete (intentionally configured for easier demonstration) Run in the same Kubernetes cluster with identical configurations Serve the same endpoints Catch SIGTERM Signals : Do not let your app be surprised when Kubernetes wants it to shut down. Track In-Flight Requests : Know when it is safe to exit by counting active requests. Split Your Health Checks : Use separate endpoints for liveness (am I running?) and readiness (can I take traffic?). Fail Readiness First : As soon as shutdown begins, start returning “not ready” on your readiness endpoint. Wait for Requests : Do not just shut down - wait for all active requests to complete first. Use Built-In Shutdown : Most modern web frameworks have graceful shutdown options; use them! Configure Terminaton Grace Period : Give your pods enough time to complete the shutdown sequence. Test Under Load : You will not catch these issues in simple tests - you need realistic traffic patterns. User Experience : Even small error rates look bad to users. Nobody wants to see “Something went wrong” messages, especially after waiting 10+ seconds for a long-running operation to complete. Cascading Failures : Those errors can cascade through your system, especially if services depend on each other. Long-running requests often touch multiple critical systems. Deployment Confidence : With proper graceful shutdown, you can deploy more frequently without worrying about causing problems.

0 views
fasterthanli.me 9 months ago

More devops than I bargained for

I recently had a bit of impromptu disaster recovery , and it gave me a hunger for more! More downtime! More kubernetes manifest! More DNS! Ahhhh! The plan was really simple. I love dedicated Hetzner servers with all my heart but they are not very fungible. You have to wait entire minutes for a new dedicated server to be provisioned. Sometimes you pay a setup fee, et cetera. And at some point to server static websites and serve as a K3S server, it’s simply just too big, and approximately twice the price that I should pay.

0 views
//pauls dev blog 1 years ago

How To Deploy Portainer in Kubernetes With Traefik Ingress Controller

This tutorial will show how to deploy Portainer (Business Edition) with Traefik as an Ingress Controller in Kubernetes (or k8s) to manage installed Service. To follow this tutorial you need the following: Helm is the primarily used Package manager for our Kubernetes cluster. We can use the official Helm installer script to automatically install the latest version. To download the script and execute it locally we run the following command: To access our Kubernetes cluster we have to use  and supply a file which we can download from our provider. Then we can store our kubecfonig in the environment variable to enable the configuration for all following commands: Alternatively, we can install "Lens - The Kubernetes IDE" from https://k8slens.dev/ . I would recommend working with Lens! Running Portainer on Kubernetes needs data persistence to store user data and other important information. During installation using Helm Portainer will automatically use the default storage class from our Kubernetes cluster. To list all storage classes in our Kubernetes cluster and identify the default we execute : The default storage class is marked with after its name. As I (or we) don't want to use I switch the default storage class by executing the following command: It is also possible to use the following parameter while installing with Helm: We will deploy Portainer in our Kubernetes cluster with Helm. To install with Helm we have to add the Portainer Helm repository: After the update finishes we will install Portainer and expose it via NodePort because we utilize Traefik to proxy requests to a URL and generate an SSL certificate: With this command, Portainer will be installed with default values. After some seconds we should see the following output: If you are using Lens you can now select the Pod and scroll down to the Ports section to forward the port to your local machine: Press Forward and enter a Port to access the Portainer instance in your browser to test Portainer before creating the Deployment: Press Start and a new browser window will open showing the initial registration screen for Portainer in which we can insert the first user: After the Form is filled out and the button Create user is pressed, we successfully created our Administrator user and Portainer is ready to use. If you have installed the business edition you should now insert your License Key which we got f rom following the registration process on the Portainer website . To make Portainer available with an URL and a SSL certificate within the WWW we have to add an IngressRoute for Traefik. The IngressRoute will contain the service name, the port Portainer is using, and the URL on which Portainer can be accessed. We should save (and maybe adjust) this code snippet as and apply it to our Kubernetes cluster by executing: Congratulations! We have reached the end of this short tutorial! I hope this article gave you a quick and neat overview of how to set up Portainer in your Kubernetes cluster using Traefik Proxy as an Ingress Controller. I would love to hear your feedback about this tutorial. Furthermore, if you already used Portainer in Kubernetes with Traefik and use a different approach please comment here and explain what you have done differently. Also, if you have any questions, please ask them in the comments. I try to answer them if possible. Feel free to connect with me on  Medium ,  LinkedIn ,  Twitter , and  GitHub . Thank you for reading, and  happy deploying! A running Kubernetes cluster or a Managed Kubernetes running a Traefik Ingress Controller ( see this tutorial ) A PRIMARY_DOMAIN

0 views
Binary Igor 1 years ago

Kubernetes: maybe a few Bash/Python scripts is enough

When it comes to the infrastructure of a software system, there are some features that are virtually always needed, independently of the project nature, and some that are additional, optional, or useful only in some projects and contexts ... Infrastructure is a crucial component of every software system: what do we need from it?

0 views
Karan Sharma 2 years ago

Nomad can do everything that K8s can

This blog post is ignited by the following Twitter exchange : I don’t take the accusation of unsubstantiated argument, especially on a technical topic lightly. I firmly believe in substantiated arguments and hence, here I am, elaborating on my stance. If found mistaken, I am open to corrections and revise my stance. In my professional capacity, I have run and managed several K8s clusters (using AWS EKS) for our entire team of devs ( been there done that ). The most complex piece of our otherwise simple and clean stack was K8s and we’d been longing to find a better replacement. None of us knew whether that would be Nomad or anything else. But we took the chance and we have reached a stage where we can objectively argue that, for our specific workloads, Nomad has proven to be a superior tool compared to K8s. Nomad presents a fundamental building block approach to designing your own services. It used to be true that Nomad was primarily a scheduler, and for serious production workloads, you had to rely on Consul for service discovery and Vault for secret management. However, this scenario has changed as Nomad now seamlessly integrates these features, making them first-class citizens in its environment. Our team replaced our HashiCorp stack with just Nomad, and we never felt constrained in terms of what we could accomplish with Consul/Vault. While these tools still hold relevance for larger clusters managed by numerous teams, they are not necessary for our use case. Kubernetes employs a declarative state for every operation in the cluster, essentially operating as a reconciliation mechanism to keep everything in check. In contrast, Nomad requires dealing with fewer components, making it appear lacking compared to K8s’s concept of everything being a “resource.” However, that is far from the truth. One of my primary critiques of K8s is its hidden complexities. While these abstractions might simplify things on the surface, debugging becomes a nightmare when issues arise. Even after three years of managing K8s clusters, I’ve never felt confident dealing with databases or handling complex networking problems involving dropped packets. You might argue that it’s about technical chops, which I won’t disagree with - but then do you want to add value to the business by getting shit done or do you want to be the resident K8s whiz at your organization? Consider this: How many people do you know who run their own K8s clusters? Even the K8s experts themselves preach about running prod clusters on EKS/GKE etc. How many fully leverage all that K8s has to offer? How many are even aware of all the network routing intricacies managed by kube-proxy? If these queries stir up clouds of uncertainty, it’s possible you’re sipping the Kubernetes Kool-Aid without truly comprehending the recipe, much like I found myself doing at one point Now, if you’re under the impression that I’m singing unabashed praises for Nomad, let me clarify - Nomad has its share of challenges. I’ve personally encountered and reported several. However, the crucial difference lies in Nomad’s lesser degree of abstraction, allowing for a comprehensive understanding of its internals. For instance, we encountered service reconciliation issues with a particular Nomad version. However, we could query the APIs, identify the problem, and write a bash script to resolve and reconcile it. It wouldn’t have been possible when there are too many moving parts in the system and we don’t know where to even begin debugging. The YAML hell is all too well known to all of us. In K8s, writing job manifests required a lot of effort (by the developers who don’t work with K8s all day) and were very complex to understand. It felt “too verbose” and involved copy pasting large blocks from the docs and trying to make things work. Compare that to HCL, it feels much nicer to read and shorter. Things are more straightforward to understand. I’ve not even touched upon the nice-ities on Nomad yet. Like better humanly understandable ACLs? Cleaner and simpler job spec, which defines the entire job in one file? A UI which actually shows everything about your cluster, nodes, and jobs? Not restricting your workloads to be run as Docker containers? A single binary which powers all of this? The central question this post aims to raise is: What can K8s do that Nomads can’t, especially considering the features people truly need? My perspectives are informed not only by my organization but also through interactions with several other organizations at various meetups and conferences. Yet, I have rarely encountered a use case that could only be managed by K8s. While Nomad isn’t a panacea for all issues, it’s certainly worth a try. Reducing the complexity of your tech stack can prove beneficial for your applications and, most importantly, your developers. At this point, K8s enjoys immense industry-wide support, while Nomad remains the unassuming newcomer. This contrast is not a negative aspect, per se. Large organizations often gravitate towards complexity and the opportunity to engage more engineers. However, if simplicity were the primary goal, the prevailing sense of overwhelming complexity in the infrastructure and operations domain wouldn’t be as pervasive. I hope my arguments provide a more comprehensive perspective and address the earlier critique of being unsubstantiated. Darren has responded to this blog post. You can read the response on Twitter . Ingress : We run a set of HAProxy on a few nodes which act as “L7 LBs”. Configured with Nomad services, they can do the routing based on Host headers. DNS : To provide external access to a service without using a proxy, we developed a tool that scans all services registered in the cluster and creates a corresponding DNS record on AWS Route53. Monitoring : Ah my fav. You wanna monitor your K8s cluster. Sure, here’s kube-prometheus , prometheus-operator , kube-state-metrics . Choices, choices. Enough to confuse you for days. Anyone who’s ever deployed any of these, tell me why this thing needs such a monstrosity setup of CRDs and operators. Monitoring Nomad is such a breeze, 3 lines of HCL config and done. Statefulsets : It’s 2023 and the irony is rich - the recommended way to run a database inside K8s is… not to run it inside K8s at all. In Nomad, we run a bunch of EC2 instances and tag them as nodes. The DBs don’t float around as containers to random nodes. And there’s no CSI plugin reaching for a storage disk in AZ-1 when the node is basking in AZ-2. Running a DB on Nomad feels refreshingly like running it on an unadorned EC2 instance. Autoscale : All our client nodes (except for the nodes) are ephemeral and part of AWS’s Auto Scaling Groups (ASGs). We use ASG rules for the horizontal scaling of the cluster. While Nomad does have its own autoscale, our preference is to run large instances dedicated to specific workloads, avoiding a mix of different workloads on the same machine.

0 views
Humble Thoughts 2 years ago

Self-hosted Blog Guide for Engineers: (the Sweet) Part III — Further maintenance

This article is part of a guide for engineers on setting up a self-hosted blog (or any other service) on Ghost using Terraform, Ansible, and Kubernetes ( part I , part II ). Congratulations on successfully setting up your Kubernetes cluster with a Ghost blog! As you embark on this new journey of managing and maintaining your dynamic platform, it is essential to stay proactive in ensuring the smooth operation and optimal performance of your environment. This article will explore three crucial aspects of maintaining your newly created Kubernetes cluster with Ghost blog: updating the Ghost configuration file, upgrading Ghost and MySQL versions, and expanding storage volume. The Ghost configuration file serves as the backbone of your blog's settings, defining everything from database connectivity to theme customization. Regularly updating this file ensures that your Ghost installation remains up-to-date with any changes or improvements introduced by the Ghost team. Considering you've followed the instructions from Part II of this manual, you should now have the configuration file set up with Kubernetes secret. In that case, all you'd need to do to upgrade the configuration is: Ghost, being an open-source platform, constantly receives updates and new features that improve its security, performance, and overall user experience. Similarly, MySQL, the database management system that powers your Ghost blog, often releases new versions with bug fixes and optimizations. Fortunately for us, the setup we've configured allows us to update both technologies with little effort. All you need to do is simply update the image directive version ( ) in the and apply the change to the Kubernetes cluster in the namespace. Similarly, it's done for MySQL by updating the image version directive for MySQL deployment in the file and applying the change. As your Ghost blog grows in content and attracts more visitors, you may find that the initial storage volume allocated to your Kubernetes cluster becomes insufficient. In this section, we will look into expanding your storage volume to accommodate the increasing demands of your blog. One of the critical aspects of our setup was CSI Driver installation for Hetzner which would help us to modify a separate volume via Kubernetes cluster configuration changes. To expand the volume, change the param to a new value in the config and apply it running the command. This will result in smooth Hetzner Volume resizing without re-creation of it. Note that this value can only be increased and can't be shrunk back — don't go too far, as it will affect the cost. As you can see, maintaining such a blog is simple and hassle-free. In future articles, I'll aim to cover details on further expansion — moving from a single-node configuration to an indeed clustered, highly-available solution. Subscribe to not miss out on the updates!

0 views