Kubernetes Autoscaling: HPA vs. VPA vs. Keda vs. CA vs. Karpenter vs. Fargate
الملخص
TLDR本视频深入探讨了Kubernetes中的自动扩展机制,重点介绍了如何根据应用程序的负载动态调整Pod和节点的数量。首先,介绍了Horizontal Pod Autoscaler(HPA),它基于CPU和内存使用情况自动调整Pod的副本数量。接着,讨论了如何使用Prometheus和KEDA进行自定义指标的自动扩展。对于状态应用程序,Vertical Pod Autoscaler(VPA)提供了根据资源需求调整Pod的能力。视频还介绍了集群自动扩展器和Karpenter的工作原理,以及无服务器Kubernetes集群的优缺点。
الوجبات الجاهزة
- 📈 Kubernetes用于大规模运行应用程序
- 🔄 HPA根据CPU和内存自动扩展Pods
- 📊 VPA用于状态应用程序的资源调整
- 📉 KEDA支持基于消息队列的自动扩展
- ⚙️ 集群自动扩展器监视待处理Pods
- 🛠️ Karpenter优化节点扩展
- ☁️ 无服务器Kubernetes减少基础设施维护
- 📊 使用Prometheus进行自定义指标扩展
- 🔍 验证Metrics Server的部署
- ⚠️ HPA和VPA不能同时使用
الجدول الزمني
- 00:00:00 - 00:05:00
Kubernetes 旨在大规模运行应用程序,本视频讨论了自动扩展。自动扩展可以通过内置控制器或额外安装的控制器实现。举例来说,在线商店在白天流量大,夜间几乎无人使用,因此可以手动调整或使用定时任务在高峰时段扩展应用程序,夜间缩减以节省计算资源。无状态应用程序的扩展相对简单,而分布式数据库的扩展则更为复杂。视频中介绍了如何自动扩展 Pods 和 Kubernetes 节点,最常用的方法是基于 CPU 或内存的 HorizontalPodAutoscaler,它会根据指标自动更新部署或状态集的副本数量。
- 00:05:00 - 00:14:37
自动扩展基于 CPU 或内存的方式并不准确,因为不同应用程序的需求各异。最佳的扩展方式是使用更有意义的指标,如延迟、流量、错误和饱和度。为了使用自定义指标进行水平 Pod 自动扩展,需要在集群中部署 Prometheus 操作员和 Prometheus 实例,并使用 Prometheus 适配器将指标注册到自定义.metrics API。对于状态应用程序,Kubernetes 提供了垂直 Pod 自动扩展器(VPA),可以帮助增加现有 Pods 的 CPU 或内存。VPA 有多种模式,但不应与 HPA 同时使用,以避免冲突。
الخريطة الذهنية
فيديو أسئلة وأجوبة
什么是Horizontal Pod Autoscaler?
Horizontal Pod Autoscaler(HPA)是Kubernetes中的一个API资源和控制器,用于根据CPU或内存使用情况自动调整Pod的副本数量。
如何使用KEDA进行自动扩展?
KEDA可以根据消息队列中的消息数量自动扩展应用程序,支持多种云服务和开源项目。
Vertical Pod Autoscaler的作用是什么?
Vertical Pod Autoscaler(VPA)用于根据应用程序的资源需求自动调整Pod的CPU和内存请求。
如何验证Metrics Server是否已部署?
可以使用命令kubectl top pods来验证Metrics Server是否已成功部署。
Kubernetes节点如何进行自动扩展?
可以使用集群自动扩展器和Karpenter来自动扩展Kubernetes节点,前者监视待处理的Pod并增加节点数量,后者则根据待处理Pod的需求创建合适的实例。
عرض المزيد من ملخصات الفيديو
Short Movie - DISIPLIN ITU PERLU
DISPERSÕES - Misturas (soluções, colóides e suspensões)
Dari Pesisir ke Dunia: Saatnya Produk Laut Indonesia Naik Kelas
11 Critical Moments In Jen McCabe's Karen Read Retrial Testimony
Intro to IT | Google IT Support Certificate
Greta Thunberg’s speech at UN climate change conference
- 00:00:00Kubernetes was created to run applications at scale. So, in this video, let's talk
- 00:00:05about autoscaling. Now, there are many different controllers, some of them built-in and some that
- 00:00:11you need to additionally install in your cluster. For example, your online store may get a lot of
- 00:00:17traffic during the day and almost no one using it at night. So, in order to save on compute,
- 00:00:23you can either manually adjust or have some kind of cron job to scale up your application
- 00:00:29during the peak hours and scale down at night to avoid wasting resources. It’s much easier to do
- 00:00:36if you have stateless applications than scaling up and down some kind of distributed databases.
- 00:00:43Another example would be a big data ELT pipeline where you periodically, let's say every hour,
- 00:00:50run a batch job that takes maybe 10 minutes. And, you don’t want to pay for the remaining 50 minutes
- 00:00:56of compute when you're not running anything. For example, to scale up your application to
- 00:01:01handle more traffic, you increase the number of pods. Now, those pods run on Kubernetes nodes.
- 00:01:08A node can be a VM or a server in a data center. If you don’t have enough nodes in your Kubernetes
- 00:01:14cluster, you need to scale the cluster itself as well. So, in this video, we’ll talk about
- 00:01:19how to autoscale pods on one hand and Kubernetes nodes on another. The most common approach that
- 00:01:26comes to mind when you need to autoscale your application to handle an increasing load is to
- 00:01:32use the HorizontalPodAutoscaler based on CPU or memory. It automatically updates the replica count
- 00:01:39on your deployment or statefulset object. The HorizontalPodAutoscaler is implemented
- 00:01:44as a Kubernetes API resource and as a controller. The controller runs within the Kubernetes control
- 00:01:50plane, so there's no need to install anything extra in this case. It periodically adjusts the
- 00:01:57desired scale of its target, such as a Deployment object, based on metrics like CPU, memory,
- 00:02:03or custom metrics, which we will discuss later. Now, the controller ships with Kubernetes, but
- 00:02:09you still need to provide metrics for it to work. Again, the most common approach would be to deploy
- 00:02:16the metrics server in your cluster. Some managed Kubernetes services, such as GKE, come with a
- 00:02:22metrics server by default, while for others, such as EKS, you need to install it as additional step.
- 00:02:29To follow along, you can use Minikube, and you can find the source code in my GitHub repository. The
- 00:02:35easiest way to verify that the metrics server is deployed is to run kubectl top pods. If you get
- 00:02:42the error 'Metrics API not available', you will need to install it, and I’ll show you how. You
- 00:02:48can use the Helm CLI to deploy it manually, or my preferred approach would be to use Terraform
- 00:02:54with the Helm provider. Now, when you deploy the metrics server in your cluster, it will scrape the
- 00:03:01kubelet of each node and provide those aggregated metrics to other components in your Kubernetes
- 00:03:06cluster via the metrics API. To verify, you can use the kubectl top pods command to get the usage.
- 00:03:14Another way is to run kube proxy, and then in your browser, go to apis/metrics/namespaces to
- 00:03:21get the usage. And, you can also obtain the same metrics using the kubectl get --raw command. Now,
- 00:03:28you can start using the Horizontal Pod Autoscaler resource. Let's say we want to target this
- 00:03:34deployment object with 'myapp'. Keep in mind that for the autoscaler to work, you must provide
- 00:03:40resource requests. Limits are optional but highly recommended. The HPA uses requests, not limits,
- 00:03:48to calculate usage in percentage. For example, in this case, we want to automatically scale pods
- 00:03:55if the average CPU utilization across all pods exceeds 80 percent. You can also include memory
- 00:04:02usage. Here, we want to scale if the average exceeds 70%. Let me quickly run the demo with
- 00:04:09all these objects deployed. When you apply, it may take a few seconds for the HPA to show the current
- 00:04:16usage. If it takes longer, you can describe the HPA object to find any errors. Most likely,
- 00:04:23this happens when you forget to define requests for your pods. If we simulate high CPU usage,
- 00:04:30it will spin up enough pods to reduce the average CPU usage below 80 percent. Now,
- 00:04:36when the load decreases, it may take a minute or so for the HPA to scale down the pods. The last
- 00:04:43thing I want to mention is that you should not set the replica count on the deployment or statefulset
- 00:04:50object if you use a GitOps approach. In that case, the HPA and your tool, such as ArgoCD or FluxCD,
- 00:04:58will constantly fight to set the desired replica count based on their spec. Now,
- 00:05:03autoscaling based on CPU or memory is not very accurate because different applications may have
- 00:05:10different requirements. One application may be fine running at 90% CPU usage, while another may
- 00:05:17only handle 40% CPU usage. The best way to scale your app is to use more meaningful metrics from
- 00:05:25the client's perspective. A good starting point is the four golden signals: latency, traffic,
- 00:05:31errors, and saturation. For example, if you ran some tests and determined that a single instance
- 00:05:38of your application can only handle 100 requests per second, unfortunately, we can’t use a metrics
- 00:05:46server for that. We need something more powerful, such as Prometheus. In order to use custom metrics
- 00:05:53for the horizontal pod autoscaler, we need to deploy a few things in our cluster. First of all,
- 00:05:59we need a Prometheus operator that will manage the lifecycle of our Prometheus instances as well
- 00:06:05as convert service and pod monitors into the native Prometheus configuration. Then, we’ll
- 00:06:11deploy the Prometheus instance itself using the custom resource provided by the operator. Let’s
- 00:06:17say we also have the app running in Kubernetes that we want to monitor; we’ll create a service or
- 00:06:24pod monitor to scrape that app and store metrics in Prometheus itself. The next step is to provide
- 00:06:31those metrics to the horizontal pod autoscaler. For that, we need to deploy a Prometheus adapter
- 00:06:37that will convert Prometheus metrics and register them at the custom.metrics API. From that point,
- 00:06:44we can use custom Prometheus metrics exposed by our application in the autoscaling policy like
- 00:06:50this. Now, in the previous part, we discussed that to autoscale based on CPU and memory,
- 00:06:57we need to deploy a metrics server. Since we already have Prometheus, we can get rid
- 00:07:02of the metrics server altogether and use cAdvisor to get CPU and memory usage from the pods,
- 00:07:10and register a Prometheus adapter with a metrics API. If you want to fully replace the metrics
- 00:07:16server, you would also want to deploy a node exporter on each node to get node metrics.So,
- 00:07:24if you configured everything correctly, you should be able to scale your application based on custom
- 00:07:29metrics, such as the number of requests per second or any other metrics. Now, sometimes you
- 00:07:36have stateful applications that are very difficult or impossible to scale horizontally. For example,
- 00:07:43a standalone database such as Postgres or MySQL. The only option you have to handle more load is
- 00:07:51to scale those applications vertically. This simply means adding more CPU or memory to the
- 00:07:57existing pods. Kubernetes has a tool called Vertical Pod Autoscaler that can help you with
- 00:08:04that. It has a few modes. There is a 'Recreate' mode, which should be used rarely because VPA
- 00:08:10will try to evict and create a new pod with new recommended resources, which can be very
- 00:08:17dangerous for standalone databases. Another mode is 'Initial', which only sets requests
- 00:08:23and limits when you deploy the application. And finally, which I use most often, is to simply
- 00:08:30get recommendations and not take any actions. The Vertical Pod Autoscaler also consists of a
- 00:08:36custom resource and a controller, but it does not ship with Kubernetes, and you need to install it
- 00:08:43additionally in your cluster. Using this mode, you can describe or get the VPA in your cluster, see
- 00:08:49recommendations, and perhaps apply those requests and limits during the next maintenance window.
- 00:08:56Keep in mind that you should never use HPA and VPA simultaneously targeting the same deployment
- 00:09:03or stateful set. They will conflict with each other and may disrupt your workloads. Also,
- 00:09:09I don’t see a point in getting recommendations from the VPA for stateless applications that can
- 00:09:16be scaled horizontally. For example, if you run 5 web servers, you get recommendations specific to
- 00:09:23the current load that 5 servers can handle. If you run 20 of the same web servers and try to get VPA
- 00:09:31recommendations, they will be very different. So, don’t use VPA for stateless applications,
- 00:09:38even in recommendation mode, and only use it for stateful apps that cannot be scaled horizontally.
- 00:09:44There are a lot of companies nowadays using event sourcing. Some companies completely
- 00:09:50rely on some sort of messaging system to communicate between
- 00:09:54different microservices. It can be Apache Kafka, RabbitMQ, NATS, and many others.
- 00:10:01On one side, you have a bunch of producers that write to the messaging system,
- 00:10:06and on the other, you have consumers. This pattern allows you to decouple
- 00:10:11your services and simplifies the development of new features.
- 00:10:15There is a KEDA project that can help you to autoscale based on the number of messages in the
- 00:10:21queue or a topic. For example, KEDA can monitor a RabbitMQ queue and scale your application if
- 00:10:29the queue keeps getting more messages and your service is not able to handle the current load.
- 00:10:35One advantage of this approach is that it can scale your application
- 00:10:39to 0 if there are no messages in the queue.
- 00:10:42There are many different scalers supported, and you can find them
- 00:10:46on the official website. It includes cloud services such as DynamoDB as well
- 00:10:52as open-source projects such as Apache Kafka, etcd, MySQL, and many others.
- 00:10:59To start using KEDA, you need to deploy the controller using a
- 00:11:03single Helm chart. It does not have any other dependencies.
- 00:11:07After that, you can configure a custom resource to automatically scale your
- 00:11:11application. In this case, we will assign 5 messages from the queue for each replica.
- 00:11:18Now, if you deploy it, you can apply the Kubernetes job to start publishing messages to the
- 00:11:23queue, and in a few seconds, KEDA will scale up your application from 0 to the maximum you defined
- 00:11:30in the custom resource. After your application processes all the messages, KEDA will scale your
- 00:11:36application down to 0. It’s optional, but you can keep a few instances running if you want.
- 00:11:41So far, we’ve talked about how to autoscale applications or pods running
- 00:11:46in your Kubernetes cluster. The next major topic is how to scale the Kubernetes nodes.
- 00:11:52Let’s start with a cluster autoscaler. It was one of the
- 00:11:55first projects that automated Kubernetes node autoscaling.
- 00:11:59In some clouds, such as AWS, you need to explicitly configure permissions
- 00:12:05and deploy your own cluster autoscaler controller. Others, such as Azure and GCP,
- 00:12:11allow you to simply check a box, and the cloud will deploy and manage it for you.
- 00:12:17In most clouds, Kubernetes node groups are created as autoscaling groups.
- 00:12:21Some allow you to specify multiple instance types within a single group,
- 00:12:26while others only permit the use of a single instance type.
- 00:12:31After you deploy the autoscaler, it will watch for pending pods in your cluster. If it detects
- 00:12:37a pending pod that cannot fit onto the existing nodes, the autoscaler will increase the desired
- 00:12:43size of your autoscaling group, and the cloud will spin up additional nodes for your cluster.
- 00:12:49The problem with that approach is if you use large instance types and a single tiny
- 00:12:55pod does not fit onto the existing nodes, the cluster autoscaler will
- 00:12:59create another node with the same CPU and memory as all other nodes. In many cases,
- 00:13:06this can lead to wasted resources, and you would pay more than you actually use.
- 00:13:11To fix this issue, AWS developed another tool called Karpenter. It works with other
- 00:13:18clouds as well, not only AWS. Instead of simply scaling up your node group
- 00:13:23with the same instance types, Karpenter will analyze the pending pods and create
- 00:13:29EC2 instances directly with enough CPU and memory to fit the pending workloads.
- 00:13:36In general, this approach is more efficient than scaling up node groups. However, there are some
- 00:13:42edge cases when you run logging, monitoring, and other agents as daemonsets and generally
- 00:13:49want to use large instance types to minimize the number of agents you have to run on each node.
- 00:13:55And finally, you can use serverless Kubernetes clusters provided by AWS, such as Fargate,
- 00:14:01and by GCP, which they call Autopilot. When you create a pod, Kubernetes will spin up a dedicated
- 00:14:08node for it. In this case, you don’t have to manage your nodes yourself and worry about
- 00:14:14wasted resources. However, these serverless clusters are much more expensive in terms of
- 00:14:21how much you pay for CPU and memory compared to EC2. So, you might want to test it first before
- 00:14:27committing to the serverless approach, but it does reduce the maintenance of your infrastructure.
- 00:14:33That’s all for this video. Thank you for watching, and I’ll see you in the next one.
- Kubernetes
- Autoscaling
- Horizontal Pod Autoscaler
- Vertical Pod Autoscaler
- Prometheus
- KEDA
- Cluster Autoscaler
- Karpenter
- Serverless Kubernetes
- Metrics Server