Kubernetes Autoscaling: HPA vs. VPA vs. Keda vs. CA vs. Karpenter vs. Fargate

00:14:37
https://www.youtube.com/watch?v=hsJ2qtwoWZw

الملخص

TLDR本视频深入探讨了Kubernetes中的自动扩展机制,重点介绍了如何根据应用程序的负载动态调整Pod和节点的数量。首先,介绍了Horizontal Pod Autoscaler(HPA),它基于CPU和内存使用情况自动调整Pod的副本数量。接着,讨论了如何使用Prometheus和KEDA进行自定义指标的自动扩展。对于状态应用程序,Vertical Pod Autoscaler(VPA)提供了根据资源需求调整Pod的能力。视频还介绍了集群自动扩展器和Karpenter的工作原理,以及无服务器Kubernetes集群的优缺点。

الوجبات الجاهزة

  • 📈 Kubernetes用于大规模运行应用程序
  • 🔄 HPA根据CPU和内存自动扩展Pods
  • 📊 VPA用于状态应用程序的资源调整
  • 📉 KEDA支持基于消息队列的自动扩展
  • ⚙️ 集群自动扩展器监视待处理Pods
  • 🛠️ Karpenter优化节点扩展
  • ☁️ 无服务器Kubernetes减少基础设施维护
  • 📊 使用Prometheus进行自定义指标扩展
  • 🔍 验证Metrics Server的部署
  • ⚠️ HPA和VPA不能同时使用

الجدول الزمني

  • 00:00:00 - 00:05:00

    Kubernetes 旨在大规模运行应用程序,本视频讨论了自动扩展。自动扩展可以通过内置控制器或额外安装的控制器实现。举例来说,在线商店在白天流量大,夜间几乎无人使用,因此可以手动调整或使用定时任务在高峰时段扩展应用程序,夜间缩减以节省计算资源。无状态应用程序的扩展相对简单,而分布式数据库的扩展则更为复杂。视频中介绍了如何自动扩展 Pods 和 Kubernetes 节点,最常用的方法是基于 CPU 或内存的 HorizontalPodAutoscaler,它会根据指标自动更新部署或状态集的副本数量。

  • 00:05:00 - 00:14:37

    自动扩展基于 CPU 或内存的方式并不准确,因为不同应用程序的需求各异。最佳的扩展方式是使用更有意义的指标,如延迟、流量、错误和饱和度。为了使用自定义指标进行水平 Pod 自动扩展,需要在集群中部署 Prometheus 操作员和 Prometheus 实例,并使用 Prometheus 适配器将指标注册到自定义.metrics API。对于状态应用程序,Kubernetes 提供了垂直 Pod 自动扩展器(VPA),可以帮助增加现有 Pods 的 CPU 或内存。VPA 有多种模式,但不应与 HPA 同时使用,以避免冲突。

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • 什么是Horizontal Pod Autoscaler?

    Horizontal Pod Autoscaler(HPA)是Kubernetes中的一个API资源和控制器,用于根据CPU或内存使用情况自动调整Pod的副本数量。

  • 如何使用KEDA进行自动扩展?

    KEDA可以根据消息队列中的消息数量自动扩展应用程序,支持多种云服务和开源项目。

  • Vertical Pod Autoscaler的作用是什么?

    Vertical Pod Autoscaler(VPA)用于根据应用程序的资源需求自动调整Pod的CPU和内存请求。

  • 如何验证Metrics Server是否已部署?

    可以使用命令kubectl top pods来验证Metrics Server是否已成功部署。

  • Kubernetes节点如何进行自动扩展?

    可以使用集群自动扩展器和Karpenter来自动扩展Kubernetes节点,前者监视待处理的Pod并增加节点数量,后者则根据待处理Pod的需求创建合适的实例。

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:00
    Kubernetes was created to run applications  at scale. So, in this video, let's talk
  • 00:00:05
    about autoscaling. Now, there are many different  controllers, some of them built-in and some that
  • 00:00:11
    you need to additionally install in your cluster.  For example, your online store may get a lot of
  • 00:00:17
    traffic during the day and almost no one using  it at night. So, in order to save on compute,
  • 00:00:23
    you can either manually adjust or have some  kind of cron job to scale up your application
  • 00:00:29
    during the peak hours and scale down at night to  avoid wasting resources. It’s much easier to do
  • 00:00:36
    if you have stateless applications than scaling  up and down some kind of distributed databases.
  • 00:00:43
    Another example would be a big data ELT pipeline  where you periodically, let's say every hour,
  • 00:00:50
    run a batch job that takes maybe 10 minutes. And,  you don’t want to pay for the remaining 50 minutes
  • 00:00:56
    of compute when you're not running anything.  For example, to scale up your application to
  • 00:01:01
    handle more traffic, you increase the number of  pods. Now, those pods run on Kubernetes nodes.
  • 00:01:08
    A node can be a VM or a server in a data center.  If you don’t have enough nodes in your Kubernetes
  • 00:01:14
    cluster, you need to scale the cluster itself  as well. So, in this video, we’ll talk about
  • 00:01:19
    how to autoscale pods on one hand and Kubernetes  nodes on another. The most common approach that
  • 00:01:26
    comes to mind when you need to autoscale your  application to handle an increasing load is to
  • 00:01:32
    use the HorizontalPodAutoscaler based on CPU or  memory. It automatically updates the replica count
  • 00:01:39
    on your deployment or statefulset object. The HorizontalPodAutoscaler is implemented
  • 00:01:44
    as a Kubernetes API resource and as a controller.  The controller runs within the Kubernetes control
  • 00:01:50
    plane, so there's no need to install anything  extra in this case. It periodically adjusts the
  • 00:01:57
    desired scale of its target, such as a Deployment  object, based on metrics like CPU, memory,
  • 00:02:03
    or custom metrics, which we will discuss later.  Now, the controller ships with Kubernetes, but
  • 00:02:09
    you still need to provide metrics for it to work.  Again, the most common approach would be to deploy
  • 00:02:16
    the metrics server in your cluster. Some managed  Kubernetes services, such as GKE, come with a
  • 00:02:22
    metrics server by default, while for others, such  as EKS, you need to install it as additional step.
  • 00:02:29
    To follow along, you can use Minikube, and you can  find the source code in my GitHub repository. The
  • 00:02:35
    easiest way to verify that the metrics server is  deployed is to run kubectl top pods. If you get
  • 00:02:42
    the error 'Metrics API not available', you will  need to install it, and I’ll show you how. You
  • 00:02:48
    can use the Helm CLI to deploy it manually, or  my preferred approach would be to use Terraform
  • 00:02:54
    with the Helm provider. Now, when you deploy the  metrics server in your cluster, it will scrape the
  • 00:03:01
    kubelet of each node and provide those aggregated  metrics to other components in your Kubernetes
  • 00:03:06
    cluster via the metrics API. To verify, you can  use the kubectl top pods command to get the usage.
  • 00:03:14
    Another way is to run kube proxy, and then in  your browser, go to apis/metrics/namespaces to
  • 00:03:21
    get the usage. And, you can also obtain the same  metrics using the kubectl get --raw command. Now,
  • 00:03:28
    you can start using the Horizontal Pod Autoscaler  resource. Let's say we want to target this
  • 00:03:34
    deployment object with 'myapp'. Keep in mind  that for the autoscaler to work, you must provide
  • 00:03:40
    resource requests. Limits are optional but highly  recommended. The HPA uses requests, not limits,
  • 00:03:48
    to calculate usage in percentage. For example,  in this case, we want to automatically scale pods
  • 00:03:55
    if the average CPU utilization across all pods  exceeds 80 percent. You can also include memory
  • 00:04:02
    usage. Here, we want to scale if the average  exceeds 70%. Let me quickly run the demo with
  • 00:04:09
    all these objects deployed. When you apply, it may  take a few seconds for the HPA to show the current
  • 00:04:16
    usage. If it takes longer, you can describe  the HPA object to find any errors. Most likely,
  • 00:04:23
    this happens when you forget to define requests  for your pods. If we simulate high CPU usage,
  • 00:04:30
    it will spin up enough pods to reduce the  average CPU usage below 80 percent. Now,
  • 00:04:36
    when the load decreases, it may take a minute or  so for the HPA to scale down the pods. The last
  • 00:04:43
    thing I want to mention is that you should not set  the replica count on the deployment or statefulset
  • 00:04:50
    object if you use a GitOps approach. In that case,  the HPA and your tool, such as ArgoCD or FluxCD,
  • 00:04:58
    will constantly fight to set the desired  replica count based on their spec. Now,
  • 00:05:03
    autoscaling based on CPU or memory is not very  accurate because different applications may have
  • 00:05:10
    different requirements. One application may be  fine running at 90% CPU usage, while another may
  • 00:05:17
    only handle 40% CPU usage. The best way to scale  your app is to use more meaningful metrics from
  • 00:05:25
    the client's perspective. A good starting point  is the four golden signals: latency, traffic,
  • 00:05:31
    errors, and saturation. For example, if you ran  some tests and determined that a single instance
  • 00:05:38
    of your application can only handle 100 requests  per second, unfortunately, we can’t use a metrics
  • 00:05:46
    server for that. We need something more powerful,  such as Prometheus. In order to use custom metrics
  • 00:05:53
    for the horizontal pod autoscaler, we need to  deploy a few things in our cluster. First of all,
  • 00:05:59
    we need a Prometheus operator that will manage  the lifecycle of our Prometheus instances as well
  • 00:06:05
    as convert service and pod monitors into the  native Prometheus configuration. Then, we’ll
  • 00:06:11
    deploy the Prometheus instance itself using the  custom resource provided by the operator. Let’s
  • 00:06:17
    say we also have the app running in Kubernetes  that we want to monitor; we’ll create a service or
  • 00:06:24
    pod monitor to scrape that app and store metrics  in Prometheus itself. The next step is to provide
  • 00:06:31
    those metrics to the horizontal pod autoscaler.  For that, we need to deploy a Prometheus adapter
  • 00:06:37
    that will convert Prometheus metrics and register  them at the custom.metrics API. From that point,
  • 00:06:44
    we can use custom Prometheus metrics exposed by  our application in the autoscaling policy like
  • 00:06:50
    this. Now, in the previous part, we discussed  that to autoscale based on CPU and memory,
  • 00:06:57
    we need to deploy a metrics server. Since  we already have Prometheus, we can get rid
  • 00:07:02
    of the metrics server altogether and use cAdvisor  to get CPU and memory usage from the pods,
  • 00:07:10
    and register a Prometheus adapter with a metrics  API. If you want to fully replace the metrics
  • 00:07:16
    server, you would also want to deploy a node  exporter on each node to get node metrics.So,
  • 00:07:24
    if you configured everything correctly, you should  be able to scale your application based on custom
  • 00:07:29
    metrics, such as the number of requests per  second or any other metrics. Now, sometimes you
  • 00:07:36
    have stateful applications that are very difficult  or impossible to scale horizontally. For example,
  • 00:07:43
    a standalone database such as Postgres or MySQL.  The only option you have to handle more load is
  • 00:07:51
    to scale those applications vertically. This  simply means adding more CPU or memory to the
  • 00:07:57
    existing pods. Kubernetes has a tool called  Vertical Pod Autoscaler that can help you with
  • 00:08:04
    that. It has a few modes. There is a 'Recreate'  mode, which should be used rarely because VPA
  • 00:08:10
    will try to evict and create a new pod with  new recommended resources, which can be very
  • 00:08:17
    dangerous for standalone databases. Another  mode is 'Initial', which only sets requests
  • 00:08:23
    and limits when you deploy the application. And  finally, which I use most often, is to simply
  • 00:08:30
    get recommendations and not take any actions.  The Vertical Pod Autoscaler also consists of a
  • 00:08:36
    custom resource and a controller, but it does not  ship with Kubernetes, and you need to install it
  • 00:08:43
    additionally in your cluster. Using this mode, you  can describe or get the VPA in your cluster, see
  • 00:08:49
    recommendations, and perhaps apply those requests  and limits during the next maintenance window.
  • 00:08:56
    Keep in mind that you should never use HPA and  VPA simultaneously targeting the same deployment
  • 00:09:03
    or stateful set. They will conflict with each  other and may disrupt your workloads. Also,
  • 00:09:09
    I don’t see a point in getting recommendations  from the VPA for stateless applications that can
  • 00:09:16
    be scaled horizontally. For example, if you run 5  web servers, you get recommendations specific to
  • 00:09:23
    the current load that 5 servers can handle. If you  run 20 of the same web servers and try to get VPA
  • 00:09:31
    recommendations, they will be very different.  So, don’t use VPA for stateless applications,
  • 00:09:38
    even in recommendation mode, and only use it for  stateful apps that cannot be scaled horizontally.
  • 00:09:44
    There are a lot of companies nowadays using  event sourcing. Some companies completely
  • 00:09:50
    rely on some sort of messaging  system to communicate between
  • 00:09:54
    different microservices. It can be Apache  Kafka, RabbitMQ, NATS, and many others.
  • 00:10:01
    On one side, you have a bunch of producers  that write to the messaging system,
  • 00:10:06
    and on the other, you have consumers.  This pattern allows you to decouple
  • 00:10:11
    your services and simplifies  the development of new features.
  • 00:10:15
    There is a KEDA project that can help you to  autoscale based on the number of messages in the
  • 00:10:21
    queue or a topic. For example, KEDA can monitor  a RabbitMQ queue and scale your application if
  • 00:10:29
    the queue keeps getting more messages and your  service is not able to handle the current load.
  • 00:10:35
    One advantage of this approach is  that it can scale your application
  • 00:10:39
    to 0 if there are no messages in the queue.
  • 00:10:42
    There are many different scalers  supported, and you can find them
  • 00:10:46
    on the official website. It includes  cloud services such as DynamoDB as well
  • 00:10:52
    as open-source projects such as Apache  Kafka, etcd, MySQL, and many others.
  • 00:10:59
    To start using KEDA, you need  to deploy the controller using a
  • 00:11:03
    single Helm chart. It does not  have any other dependencies.
  • 00:11:07
    After that, you can configure a custom  resource to automatically scale your
  • 00:11:11
    application. In this case, we will assign  5 messages from the queue for each replica.
  • 00:11:18
    Now, if you deploy it, you can apply the  Kubernetes job to start publishing messages to the
  • 00:11:23
    queue, and in a few seconds, KEDA will scale up  your application from 0 to the maximum you defined
  • 00:11:30
    in the custom resource. After your application  processes all the messages, KEDA will scale your
  • 00:11:36
    application down to 0. It’s optional, but you  can keep a few instances running if you want.
  • 00:11:41
    So far, we’ve talked about how to  autoscale applications or pods running
  • 00:11:46
    in your Kubernetes cluster. The next major  topic is how to scale the Kubernetes nodes.
  • 00:11:52
    Let’s start with a cluster  autoscaler. It was one of the
  • 00:11:55
    first projects that automated  Kubernetes node autoscaling.
  • 00:11:59
    In some clouds, such as AWS, you need  to explicitly configure permissions
  • 00:12:05
    and deploy your own cluster autoscaler  controller. Others, such as Azure and GCP,
  • 00:12:11
    allow you to simply check a box, and the  cloud will deploy and manage it for you.
  • 00:12:17
    In most clouds, Kubernetes node groups  are created as autoscaling groups.
  • 00:12:21
    Some allow you to specify multiple  instance types within a single group,
  • 00:12:26
    while others only permit the  use of a single instance type.
  • 00:12:31
    After you deploy the autoscaler, it will watch  for pending pods in your cluster. If it detects
  • 00:12:37
    a pending pod that cannot fit onto the existing  nodes, the autoscaler will increase the desired
  • 00:12:43
    size of your autoscaling group, and the cloud  will spin up additional nodes for your cluster.
  • 00:12:49
    The problem with that approach is if you  use large instance types and a single tiny
  • 00:12:55
    pod does not fit onto the existing  nodes, the cluster autoscaler will
  • 00:12:59
    create another node with the same CPU and  memory as all other nodes. In many cases,
  • 00:13:06
    this can lead to wasted resources, and  you would pay more than you actually use.
  • 00:13:11
    To fix this issue, AWS developed another  tool called Karpenter. It works with other
  • 00:13:18
    clouds as well, not only AWS. Instead  of simply scaling up your node group
  • 00:13:23
    with the same instance types, Karpenter  will analyze the pending pods and create
  • 00:13:29
    EC2 instances directly with enough CPU  and memory to fit the pending workloads.
  • 00:13:36
    In general, this approach is more efficient than  scaling up node groups. However, there are some
  • 00:13:42
    edge cases when you run logging, monitoring,  and other agents as daemonsets and generally
  • 00:13:49
    want to use large instance types to minimize the  number of agents you have to run on each node.
  • 00:13:55
    And finally, you can use serverless Kubernetes  clusters provided by AWS, such as Fargate,
  • 00:14:01
    and by GCP, which they call Autopilot. When you  create a pod, Kubernetes will spin up a dedicated
  • 00:14:08
    node for it. In this case, you don’t have to  manage your nodes yourself and worry about
  • 00:14:14
    wasted resources. However, these serverless  clusters are much more expensive in terms of
  • 00:14:21
    how much you pay for CPU and memory compared to  EC2. So, you might want to test it first before
  • 00:14:27
    committing to the serverless approach, but it does  reduce the maintenance of your infrastructure.
  • 00:14:33
    That’s all for this video. Thank you for  watching, and I’ll see you in the next one.
الوسوم
  • Kubernetes
  • Autoscaling
  • Horizontal Pod Autoscaler
  • Vertical Pod Autoscaler
  • Prometheus
  • KEDA
  • Cluster Autoscaler
  • Karpenter
  • Serverless Kubernetes
  • Metrics Server