什么是Horizontal Pod Autoscaler？

Horizontal Pod Autoscaler（HPA）是Kubernetes中的一个API资源和控制器，用于根据CPU或内存使用情况自动调整Pod的副本数量。

如何使用KEDA进行自动扩展？

KEDA可以根据消息队列中的消息数量自动扩展应用程序，支持多种云服务和开源项目。

Vertical Pod Autoscaler的作用是什么？

Vertical Pod Autoscaler（VPA）用于根据应用程序的资源需求自动调整Pod的CPU和内存请求。

如何验证Metrics Server是否已部署？

可以使用命令kubectl top pods来验证Metrics Server是否已成功部署。

Kubernetes节点如何进行自动扩展？

可以使用集群自动扩展器和Karpenter来自动扩展Kubernetes节点，前者监视待处理的Pod并增加节点数量，后者则根据待处理Pod的需求创建合适的实例。

Kubernetes Autoscaling: HPA vs. VPA vs. Keda vs. CA vs. Karpenter vs. Fargate

00:14:37

https://www.youtube.com/watch?v=hsJ2qtwoWZw

الملخص

TLDR本视频深入探讨了Kubernetes中的自动扩展机制，重点介绍了如何根据应用程序的负载动态调整Pod和节点的数量。首先，介绍了Horizontal Pod Autoscaler（HPA），它基于CPU和内存使用情况自动调整Pod的副本数量。接着，讨论了如何使用Prometheus和KEDA进行自定义指标的自动扩展。对于状态应用程序，Vertical Pod Autoscaler（VPA）提供了根据资源需求调整Pod的能力。视频还介绍了集群自动扩展器和Karpenter的工作原理，以及无服务器Kubernetes集群的优缺点。

الوجبات الجاهزة

📈 Kubernetes用于大规模运行应用程序
🔄 HPA根据CPU和内存自动扩展Pods
📊 VPA用于状态应用程序的资源调整
📉 KEDA支持基于消息队列的自动扩展
⚙️ 集群自动扩展器监视待处理Pods
🛠️ Karpenter优化节点扩展
☁️ 无服务器Kubernetes减少基础设施维护
📊 使用Prometheus进行自定义指标扩展
🔍 验证Metrics Server的部署
⚠️ HPA和VPA不能同时使用

الجدول الزمني

00:00:00 - 00:05:00
Kubernetes 旨在大规模运行应用程序，本视频讨论了自动扩展。自动扩展可以通过内置控制器或额外安装的控制器实现。举例来说，在线商店在白天流量大，夜间几乎无人使用，因此可以手动调整或使用定时任务在高峰时段扩展应用程序，夜间缩减以节省计算资源。无状态应用程序的扩展相对简单，而分布式数据库的扩展则更为复杂。视频中介绍了如何自动扩展 Pods 和 Kubernetes 节点，最常用的方法是基于 CPU 或内存的 HorizontalPodAutoscaler，它会根据指标自动更新部署或状态集的副本数量。
00:05:00 - 00:14:37
自动扩展基于 CPU 或内存的方式并不准确，因为不同应用程序的需求各异。最佳的扩展方式是使用更有意义的指标，如延迟、流量、错误和饱和度。为了使用自定义指标进行水平 Pod 自动扩展，需要在集群中部署 Prometheus 操作员和 Prometheus 实例，并使用 Prometheus 适配器将指标注册到自定义.metrics API。对于状态应用程序，Kubernetes 提供了垂直 Pod 自动扩展器（VPA），可以帮助增加现有 Pods 的 CPU 或内存。VPA 有多种模式，但不应与 HPA 同时使用，以避免冲突。

الخريطة الذهنية

فيديو أسئلة وأجوبة

什么是Horizontal Pod Autoscaler？
Horizontal Pod Autoscaler（HPA）是Kubernetes中的一个API资源和控制器，用于根据CPU或内存使用情况自动调整Pod的副本数量。
如何使用KEDA进行自动扩展？
KEDA可以根据消息队列中的消息数量自动扩展应用程序，支持多种云服务和开源项目。
Vertical Pod Autoscaler的作用是什么？
Vertical Pod Autoscaler（VPA）用于根据应用程序的资源需求自动调整Pod的CPU和内存请求。
如何验证Metrics Server是否已部署？
可以使用命令kubectl top pods来验证Metrics Server是否已成功部署。
Kubernetes节点如何进行自动扩展？
可以使用集群自动扩展器和Karpenter来自动扩展Kubernetes节点，前者监视待处理的Pod并增加节点数量，后者则根据待处理Pod的需求创建合适的实例。

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
Kubernetes was created to run applications at scale. So, in this video, let's talk
00:00:05
about autoscaling. Now, there are many different controllers, some of them built-in and some that
00:00:11
you need to additionally install in your cluster. For example, your online store may get a lot of
00:00:17
traffic during the day and almost no one using it at night. So, in order to save on compute,
00:00:23
you can either manually adjust or have some kind of cron job to scale up your application
00:00:29
during the peak hours and scale down at night to avoid wasting resources. It’s much easier to do
00:00:36
if you have stateless applications than scaling up and down some kind of distributed databases.
00:00:43
Another example would be a big data ELT pipeline where you periodically, let's say every hour,
00:00:50
run a batch job that takes maybe 10 minutes. And, you don’t want to pay for the remaining 50 minutes
00:00:56
of compute when you're not running anything. For example, to scale up your application to
00:01:01
handle more traffic, you increase the number of pods. Now, those pods run on Kubernetes nodes.
00:01:08
A node can be a VM or a server in a data center. If you don’t have enough nodes in your Kubernetes
00:01:14
cluster, you need to scale the cluster itself as well. So, in this video, we’ll talk about
00:01:19
how to autoscale pods on one hand and Kubernetes nodes on another. The most common approach that
00:01:26
comes to mind when you need to autoscale your application to handle an increasing load is to
00:01:32
use the HorizontalPodAutoscaler based on CPU or memory. It automatically updates the replica count
00:01:39
on your deployment or statefulset object. The HorizontalPodAutoscaler is implemented
00:01:44
as a Kubernetes API resource and as a controller. The controller runs within the Kubernetes control
00:01:50
plane, so there's no need to install anything extra in this case. It periodically adjusts the
00:01:57
desired scale of its target, such as a Deployment object, based on metrics like CPU, memory,
00:02:03
or custom metrics, which we will discuss later. Now, the controller ships with Kubernetes, but
00:02:09
you still need to provide metrics for it to work. Again, the most common approach would be to deploy
00:02:16
the metrics server in your cluster. Some managed Kubernetes services, such as GKE, come with a
00:02:22
metrics server by default, while for others, such as EKS, you need to install it as additional step.
00:02:29
To follow along, you can use Minikube, and you can find the source code in my GitHub repository. The
00:02:35
easiest way to verify that the metrics server is deployed is to run kubectl top pods. If you get
00:02:42
the error 'Metrics API not available', you will need to install it, and I’ll show you how. You
00:02:48
can use the Helm CLI to deploy it manually, or my preferred approach would be to use Terraform
00:02:54
with the Helm provider. Now, when you deploy the metrics server in your cluster, it will scrape the
00:03:01
kubelet of each node and provide those aggregated metrics to other components in your Kubernetes
00:03:06
cluster via the metrics API. To verify, you can use the kubectl top pods command to get the usage.
00:03:14
Another way is to run kube proxy, and then in your browser, go to apis/metrics/namespaces to
00:03:21
get the usage. And, you can also obtain the same metrics using the kubectl get --raw command. Now,
00:03:28
you can start using the Horizontal Pod Autoscaler resource. Let's say we want to target this
00:03:34
deployment object with 'myapp'. Keep in mind that for the autoscaler to work, you must provide
00:03:40
resource requests. Limits are optional but highly recommended. The HPA uses requests, not limits,
00:03:48
to calculate usage in percentage. For example, in this case, we want to automatically scale pods
00:03:55
if the average CPU utilization across all pods exceeds 80 percent. You can also include memory
00:04:02
usage. Here, we want to scale if the average exceeds 70%. Let me quickly run the demo with
00:04:09
all these objects deployed. When you apply, it may take a few seconds for the HPA to show the current
00:04:16
usage. If it takes longer, you can describe the HPA object to find any errors. Most likely,
00:04:23
this happens when you forget to define requests for your pods. If we simulate high CPU usage,
00:04:30
it will spin up enough pods to reduce the average CPU usage below 80 percent. Now,
00:04:36
when the load decreases, it may take a minute or so for the HPA to scale down the pods. The last
00:04:43
thing I want to mention is that you should not set the replica count on the deployment or statefulset
00:04:50
object if you use a GitOps approach. In that case, the HPA and your tool, such as ArgoCD or FluxCD,
00:04:58
will constantly fight to set the desired replica count based on their spec. Now,
00:05:03
autoscaling based on CPU or memory is not very accurate because different applications may have
00:05:10
different requirements. One application may be fine running at 90% CPU usage, while another may
00:05:17
only handle 40% CPU usage. The best way to scale your app is to use more meaningful metrics from
00:05:25
the client's perspective. A good starting point is the four golden signals: latency, traffic,
00:05:31
errors, and saturation. For example, if you ran some tests and determined that a single instance
00:05:38
of your application can only handle 100 requests per second, unfortunately, we can’t use a metrics
00:05:46
server for that. We need something more powerful, such as Prometheus. In order to use custom metrics
00:05:53
for the horizontal pod autoscaler, we need to deploy a few things in our cluster. First of all,
00:05:59
we need a Prometheus operator that will manage the lifecycle of our Prometheus instances as well
00:06:05
as convert service and pod monitors into the native Prometheus configuration. Then, we’ll
00:06:11
deploy the Prometheus instance itself using the custom resource provided by the operator. Let’s
00:06:17
say we also have the app running in Kubernetes that we want to monitor; we’ll create a service or
00:06:24
pod monitor to scrape that app and store metrics in Prometheus itself. The next step is to provide
00:06:31
those metrics to the horizontal pod autoscaler. For that, we need to deploy a Prometheus adapter
00:06:37
that will convert Prometheus metrics and register them at the custom.metrics API. From that point,
00:06:44
we can use custom Prometheus metrics exposed by our application in the autoscaling policy like
00:06:50
this. Now, in the previous part, we discussed that to autoscale based on CPU and memory,
00:06:57
we need to deploy a metrics server. Since we already have Prometheus, we can get rid
00:07:02
of the metrics server altogether and use cAdvisor to get CPU and memory usage from the pods,
00:07:10
and register a Prometheus adapter with a metrics API. If you want to fully replace the metrics
00:07:16
server, you would also want to deploy a node exporter on each node to get node metrics.So,
00:07:24
if you configured everything correctly, you should be able to scale your application based on custom
00:07:29
metrics, such as the number of requests per second or any other metrics. Now, sometimes you
00:07:36
have stateful applications that are very difficult or impossible to scale horizontally. For example,
00:07:43
a standalone database such as Postgres or MySQL. The only option you have to handle more load is
00:07:51
to scale those applications vertically. This simply means adding more CPU or memory to the
00:07:57
existing pods. Kubernetes has a tool called Vertical Pod Autoscaler that can help you with
00:08:04
that. It has a few modes. There is a 'Recreate' mode, which should be used rarely because VPA
00:08:10
will try to evict and create a new pod with new recommended resources, which can be very
00:08:17
dangerous for standalone databases. Another mode is 'Initial', which only sets requests
00:08:23
and limits when you deploy the application. And finally, which I use most often, is to simply
00:08:30
get recommendations and not take any actions. The Vertical Pod Autoscaler also consists of a
00:08:36
custom resource and a controller, but it does not ship with Kubernetes, and you need to install it
00:08:43
additionally in your cluster. Using this mode, you can describe or get the VPA in your cluster, see
00:08:49
recommendations, and perhaps apply those requests and limits during the next maintenance window.
00:08:56
Keep in mind that you should never use HPA and VPA simultaneously targeting the same deployment
00:09:03
or stateful set. They will conflict with each other and may disrupt your workloads. Also,
00:09:09
I don’t see a point in getting recommendations from the VPA for stateless applications that can
00:09:16
be scaled horizontally. For example, if you run 5 web servers, you get recommendations specific to
00:09:23
the current load that 5 servers can handle. If you run 20 of the same web servers and try to get VPA
00:09:31
recommendations, they will be very different. So, don’t use VPA for stateless applications,
00:09:38
even in recommendation mode, and only use it for stateful apps that cannot be scaled horizontally.
00:09:44
There are a lot of companies nowadays using event sourcing. Some companies completely
00:09:50
rely on some sort of messaging system to communicate between
00:09:54
different microservices. It can be Apache Kafka, RabbitMQ, NATS, and many others.
00:10:01
On one side, you have a bunch of producers that write to the messaging system,
00:10:06
and on the other, you have consumers. This pattern allows you to decouple
00:10:11
your services and simplifies the development of new features.
00:10:15
There is a KEDA project that can help you to autoscale based on the number of messages in the
00:10:21
queue or a topic. For example, KEDA can monitor a RabbitMQ queue and scale your application if
00:10:29
the queue keeps getting more messages and your service is not able to handle the current load.
00:10:35
One advantage of this approach is that it can scale your application
00:10:39
to 0 if there are no messages in the queue.
00:10:42
There are many different scalers supported, and you can find them
00:10:46
on the official website. It includes cloud services such as DynamoDB as well
00:10:52
as open-source projects such as Apache Kafka, etcd, MySQL, and many others.
00:10:59
To start using KEDA, you need to deploy the controller using a
00:11:03
single Helm chart. It does not have any other dependencies.
00:11:07
After that, you can configure a custom resource to automatically scale your
00:11:11
application. In this case, we will assign 5 messages from the queue for each replica.
00:11:18
Now, if you deploy it, you can apply the Kubernetes job to start publishing messages to the
00:11:23
queue, and in a few seconds, KEDA will scale up your application from 0 to the maximum you defined
00:11:30
in the custom resource. After your application processes all the messages, KEDA will scale your
00:11:36
application down to 0. It’s optional, but you can keep a few instances running if you want.
00:11:41
So far, we’ve talked about how to autoscale applications or pods running
00:11:46
in your Kubernetes cluster. The next major topic is how to scale the Kubernetes nodes.
00:11:52
Let’s start with a cluster autoscaler. It was one of the
00:11:55
first projects that automated Kubernetes node autoscaling.
00:11:59
In some clouds, such as AWS, you need to explicitly configure permissions
00:12:05
and deploy your own cluster autoscaler controller. Others, such as Azure and GCP,
00:12:11
allow you to simply check a box, and the cloud will deploy and manage it for you.
00:12:17
In most clouds, Kubernetes node groups are created as autoscaling groups.
00:12:21
Some allow you to specify multiple instance types within a single group,
00:12:26
while others only permit the use of a single instance type.
00:12:31
After you deploy the autoscaler, it will watch for pending pods in your cluster. If it detects
00:12:37
a pending pod that cannot fit onto the existing nodes, the autoscaler will increase the desired
00:12:43
size of your autoscaling group, and the cloud will spin up additional nodes for your cluster.
00:12:49
The problem with that approach is if you use large instance types and a single tiny
00:12:55
pod does not fit onto the existing nodes, the cluster autoscaler will
00:12:59
create another node with the same CPU and memory as all other nodes. In many cases,
00:13:06
this can lead to wasted resources, and you would pay more than you actually use.
00:13:11
To fix this issue, AWS developed another tool called Karpenter. It works with other
00:13:18
clouds as well, not only AWS. Instead of simply scaling up your node group
00:13:23
with the same instance types, Karpenter will analyze the pending pods and create
00:13:29
EC2 instances directly with enough CPU and memory to fit the pending workloads.
00:13:36
In general, this approach is more efficient than scaling up node groups. However, there are some
00:13:42
edge cases when you run logging, monitoring, and other agents as daemonsets and generally
00:13:49
want to use large instance types to minimize the number of agents you have to run on each node.
00:13:55
And finally, you can use serverless Kubernetes clusters provided by AWS, such as Fargate,
00:14:01
and by GCP, which they call Autopilot. When you create a pod, Kubernetes will spin up a dedicated
00:14:08
node for it. In this case, you don’t have to manage your nodes yourself and worry about
00:14:14
wasted resources. However, these serverless clusters are much more expensive in terms of
00:14:21
how much you pay for CPU and memory compared to EC2. So, you might want to test it first before
00:14:27
committing to the serverless approach, but it does reduce the maintenance of your infrastructure.
00:14:33
That’s all for this video. Thank you for watching, and I’ll see you in the next one.

الوسوم

Kubernetes
Autoscaling
Horizontal Pod Autoscaler
Vertical Pod Autoscaler
Prometheus
KEDA
Cluster Autoscaler
Karpenter
Serverless Kubernetes
Metrics Server