There's more than one way to scale Redis/Valkey to 1M op/s...

00:28:00
https://www.youtube.com/watch?v=uptpnVdwFM4

الملخص

TLDRDie video bespreek hoe om die prestasie van Reddus en sy fork, Valky, te verhoog om meer as 1 miljoen versoeke per sekonde te hanteer. Die spreker deel sy ervarings met verskillende masjiene en hoe om die beste resultate te behaal. Hy verduidelik die verskil tussen vertikale en horisontale skaal, en hoe pipelining 'n effektiewe metode is om prestasie te verbeter. Die video sluit ook 'n bespreking van Dragonfly DB in, wat 'n hoëprestasie alternatief vir Reddus bied. Die spreker demonstreer hoe om 'n kluster op te stel en die voordele daarvan, insluitend verbeterde lees- en skryfprestasie. Ten slotte, bereik die spreker sy doel om 1 miljoen versoeke per sekonde te haal, beide op 'n plaaslike netwerk en deur die gebruik van VPS instansies.

الوجبات الجاهزة

  • 🚀 Reddus kan meer as 100,000 versoeke per sekonde hanteer.
  • 💻 Pipelining kan prestasie verdubbel sonder hardeware veranderinge.
  • 📈 Horisontale skaal verhoog prestasie deur verskeie instansies te gebruik.
  • 🔄 Lees-replikasie verbeter leesprestasie met replika's.
  • 🔑 Hashtags help om sleutels in dieselfde hash-slot te plaas.
  • ⚙️ Dragonfly DB bied 'n hoëprestasie alternatief vir Reddus.
  • 🌐 Klusters verdeel data oor verskeie nodes vir beter prestasie.
  • 📊 1 miljoen versoeke per sekonde is haalbaar met die regte konfigurasie.
  • 💰 VPS instansies kan duur wees, met koste van $1,100 per maand.
  • 🔧 Klusterbewuste kliënte is nodig vir multi-sleutel operasies.

الجدول الزمني

  • 00:00:00 - 00:05:00

    Die spreker was 'n fan van Reddus, maar het oorgeskakel na Valkyrie weens probleme met Reddus. Hy het 'n hoë deurset van 100,000 versoeke per sekonde op Valkyrie op 'n bare metal masjien behaal en wil dit verhoog na 1 miljoen versoeke per sekonde. Hy bespreek verskillende metodes om die prestasie van Reddus te verbeter, insluitend vertikale skaal.

  • 00:05:00 - 00:10:00

    Die eerste benadering was om Reddus op groter masjiene te laat loop, wat vertikale skaal genoem word. Hy het drie masjiene gebruik, elk met verskillende spesifikasies, en het 'n standaard toetsinstrument gebruik om die deurset te meet. Hy het bevind dat die prestasie nie noodwendig verbeter met meer kerne nie, aangesien Reddus slegs een draad gebruik om versoeke te hanteer.

  • 00:10:00 - 00:15:00

    Die spreker het pipelining as 'n metode bespreek om die prestasie te verbeter deur verskeie versoeke gelyktydig te stuur. Hy het 'n toename in deurset gesien, wat tot 2.5 miljoen versoeke per sekonde gestyg het, maar het 'n bottleneck op sy netwerk opgemerk. Hy het die beperkinge van pipelining bespreek, insluitend die behoefte aan 'n stel versoeke om dit te gebruik.

  • 00:15:00 - 00:20:00

    Die tweede benadering was horisontale skaal, wat behels dat verskeie instansies van Reddus op verskillende masjiene ontplooi word. Hy het die konsep van leesreplikasie bespreek, waar 'n primêre instansie data skryf en verskeie replika's lees. Hy het die voordele en nadele van hierdie benadering bespreek, insluitend die koste van geheue en die risiko van data-integriteitskwessies.

  • 00:20:00 - 00:28:00

    Die spreker het ook Reddus-kluster bespreek, wat 'n manier bied om data oor verskeie nodes te verdeel. Hy het die proses om 'n kluster op te stel verduidelik en die prestasie verbeterings wat dit bied, insluitend die vermoë om gelyktydig te lees en te skryf. Hy het die kompleksiteit van die bestuur van 'n kluster en die behoefte aan 'n klusterbewuste kliënt bespreek.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • Wat is die hoofdoel van die video?

    Die video fokus op hoe om die prestasie van Reddus en Valky te verhoog tot meer as 1 miljoen versoeke per sekonde.

  • Wat is pipelining?

    Pipelining is 'n tegniek wat gebruik word om prestasie te verbeter deur verskeie opdragte gelyktydig te stuur sonder om op die antwoord van elke opdrag te wag.

  • Wat is die verskil tussen vertikale en horisontale skaal?

    Vertikale skaal behels die opgradering van 'n enkele masjien se hulpbronne, terwyl horisontale skaal die implementering van verskeie masjiene behels.

  • Wat is die rol van Dragonfly DB in die video?

    Dragonfly DB word voorgestel as 'n hoëprestasie alternatief vir Reddus wat beter gebruik maak van beskikbare hulpbronne.

  • Wat is 'n kluster in die konteks van Reddus?

    'n Kluster is 'n stel Reddus instansies wat data outomaties oor verskeie nodes verdeel.

  • Wat is die voordeel van lees-replikasie?

    Lees-replikasie verhoog die leesprestasie deur verskeie replika's te hê wat slegs leesoperasies hanteer.

  • Wat is 'n hashtag in Reddus?

    'n Hashtag is 'n spesifieke deel van 'n sleutel wat help om te verseker dat verwante sleutels in dieselfde hash-slot geplaas word.

  • Wat is die uitdaging van die gebruik van 'n kluster?

    Die uitdaging sluit in die behoefte aan 'n klusterbewuste kliënt en die hantering van multi-sleutel operasies.

  • Hoeveel versoeke per sekonde het die spreker op die einde bereik?

    Die spreker het 1 miljoen versoeke per sekonde bereik met 'n kluster van 15 Valky instansies.

  • Wat is die koste verbonde aan die VPS instansies?

    Die koste vir die VPS instansies was ongeveer $1,100 per maand.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:00
    It's no secret that I'm a fan of Reddus,
  • 00:00:03
    or at least I was a fan of Reddus until
  • 00:00:06
    they went and forked things up. These
  • 00:00:09
    days, I'm now pinning pictures of
  • 00:00:10
    Valkyrie on my wall instead. In any
  • 00:00:13
    case, I've been using Reddus for a good
  • 00:00:16
    number of years, and through that time,
  • 00:00:18
    I've always been fascinated with how
  • 00:00:20
    incredibly fast both Reddus and its
  • 00:00:22
    forks, such as Valkyrie, can be. In
  • 00:00:25
    fact, by just running a single instance
  • 00:00:27
    of Valky on a bare metal machine, you
  • 00:00:30
    can easily exceed 100,000 requests per
  • 00:00:33
    second, which is a lot of throughput.
  • 00:00:36
    Whilst 100,000 requests per second is
  • 00:00:39
    pretty impressive. I wanted to see how
  • 00:00:41
    difficult it would be to push this even
  • 00:00:43
    further, say 10 times further to 1
  • 00:00:46
    million requests per second. So, how
  • 00:00:49
    does one go about increasing the number
  • 00:00:51
    of requests per second or RPS that
  • 00:00:54
    Reddis can handle to greater than 1
  • 00:00:57
    million? Well, as it turns out, there
  • 00:00:59
    are a number of different ways to do so,
  • 00:01:02
    each with their own pros and cons
  • 00:01:04
    depending on the situation that you find
  • 00:01:06
    yourself in. The first approach I
  • 00:01:09
    decided to take when it came to scaling
  • 00:01:10
    Reddis or Valky in this case was to just
  • 00:01:13
    run it on a bigger machine. This is
  • 00:01:16
    known as vertical scaling. And whilst it
  • 00:01:19
    can be effective for some software, when
  • 00:01:22
    it comes to reddis, it's a little bit
  • 00:01:23
    more complicated. To show what I mean, I
  • 00:01:26
    decided to deploy a Valky instance onto
  • 00:01:28
    three separate machines. The first being
  • 00:01:31
    a Blink Mini S12, which has a lowowered
  • 00:01:34
    4 core CPU, the Intel
  • 00:01:37
    N95, which is the least powerful of the
  • 00:01:40
    three. The second machine that I
  • 00:01:42
    installed Valkyrie on, I've defined as
  • 00:01:44
    the mid machine, which is yet another
  • 00:01:46
    Beink mini PC. This one, the STR6, which
  • 00:01:50
    runs an AMD 7735 CPU with eight cores
  • 00:01:55
    and 16 threads. So, it's a little bit
  • 00:01:58
    more powerful. The final machine in this
  • 00:02:00
    testing is the big boy. This is the AMD
  • 00:02:03
    Thread Ripper,
  • 00:02:04
    3970X with a massive 32 cores and 64
  • 00:02:08
    threads, as well as boasting 128 gigs of
  • 00:02:11
    RAM. To test how much throughput each of
  • 00:02:13
    these machines can handle, I'm going to
  • 00:02:15
    go ahead and use the following mem
  • 00:02:17
    benchmark command, which is pretty much
  • 00:02:19
    the standard tool when it comes to
  • 00:02:21
    benchmarking Reddus and its forks.
  • 00:02:24
    Additionally, I'm running this tool on
  • 00:02:25
    another machine and sending the commands
  • 00:02:27
    over the network in order to simulate
  • 00:02:29
    realworld usage. And when I run this
  • 00:02:32
    command, you can see that the mini PC is
  • 00:02:35
    handling a huge amount of requests per
  • 00:02:37
    second, around 200,000, which honestly
  • 00:02:41
    is pretty impressive. This high number
  • 00:02:44
    is already happening because I'm running
  • 00:02:45
    on my local network and running the
  • 00:02:47
    Valky instance on bare metal, which both
  • 00:02:50
    Reddus and Valky handle really well. If
  • 00:02:53
    instead I was making use of a couple of
  • 00:02:55
    VPS instances, one to host the instance
  • 00:02:57
    and one to actually perform the test in
  • 00:03:00
    the same data center, you'll see that
  • 00:03:02
    the request count drops significantly.
  • 00:03:05
    And if I try to go ahead and benchmark
  • 00:03:06
    this from my machine to one of these VPS
  • 00:03:08
    instances over the internet, then it
  • 00:03:11
    drops even further. So that's one thing
  • 00:03:13
    to consider if you want to scale Reddis.
  • 00:03:16
    bare metal is best, but also
  • 00:03:18
    colllocating or running in the same data
  • 00:03:21
    center is really important. If you're
  • 00:03:24
    sending commands halfway across the
  • 00:03:25
    world, then you're going to have a bad
  • 00:03:27
    time. In any case, going back to my
  • 00:03:29
    local area network setup, the four core
  • 00:03:31
    machine did pretty well. So, let's go
  • 00:03:34
    ahead and see how it works when it comes
  • 00:03:35
    to the midlevel machine. As you can see,
  • 00:03:38
    we're actually getting around the same
  • 00:03:40
    number of operations per second, which
  • 00:03:43
    at first may feel a little surprising.
  • 00:03:46
    This, however, is actually expected when
  • 00:03:48
    it comes to Reddus. This is because
  • 00:03:51
    Reddus and some of its forks such as
  • 00:03:53
    Valky only make use of a single thread
  • 00:03:55
    when it comes to handling commands.
  • 00:03:58
    Meaning that more cores or more CPUs
  • 00:04:00
    isn't going to increase performance. In
  • 00:04:03
    fact, in some cases, it can actually
  • 00:04:05
    hinder it. For example, if I go ahead
  • 00:04:07
    and run the benchmark against my
  • 00:04:09
    Valkyrie instance on the 32 core
  • 00:04:11
    machine, you can see now we're actually
  • 00:04:13
    producing less operations down around
  • 00:04:16
    1/4 what we were before. This is because
  • 00:04:19
    the 32 core Thread Ripper machine
  • 00:04:21
    actually has worse single core
  • 00:04:23
    performance than the other two. And
  • 00:04:25
    because Reddus is so dependent on single
  • 00:04:27
    core performance, then it's having a
  • 00:04:29
    negative impact. Now there are both
  • 00:04:32
    forks of Reddus and Reddis compatible
  • 00:04:34
    solutions that are better able to make
  • 00:04:36
    use of multiple cores on a machine with
  • 00:04:38
    one such solution being Dragonfly DB who
  • 00:04:42
    are also the sponsors of today's video.
  • 00:04:44
    We'll talk a little bit more about
  • 00:04:45
    Dragonfly DB later on and how they
  • 00:04:48
    provide increased performance on
  • 00:04:50
    multi-core machines. However, for the
  • 00:04:52
    meantime, we're going to go ahead and
  • 00:04:53
    focus on Reddus or Valky and see how we
  • 00:04:56
    can get it to perform millions of
  • 00:04:58
    requests per second using a single
  • 00:05:00
    threaded instance. One approach to doing
  • 00:05:03
    so is to make use of something called
  • 00:05:06
    pipelining. Pipelining is a technique
  • 00:05:08
    for improving performance by issuing
  • 00:05:10
    multiple commands at once without
  • 00:05:12
    waiting for the response to each
  • 00:05:14
    individual command to return. It's very
  • 00:05:16
    similar to the concept of batching. to
  • 00:05:19
    show the performance improvement of
  • 00:05:20
    using pipelining. If I go ahead and use
  • 00:05:22
    the benchmark test tool again, this time
  • 00:05:24
    passing in the d--peline flag, setting
  • 00:05:27
    it to be two. So, we're sending a
  • 00:05:29
    pipeline of two commands at once. As you
  • 00:05:32
    can see, we're now effectively doubling
  • 00:05:33
    our throughput without making any
  • 00:05:35
    hardware changes to our instance. This
  • 00:05:38
    is pretty great, but how far can we
  • 00:05:40
    actually push it? Well, if I go ahead
  • 00:05:42
    and set a pipeline to be an order of
  • 00:05:44
    magnitude higher from 2 to 10, you can
  • 00:05:47
    see now we're pushing over 1 million
  • 00:05:49
    requests a second, give or take. Pretty
  • 00:05:52
    cool. However, we don't just have to
  • 00:05:53
    stop here and we can actually push
  • 00:05:55
    pipelining a little further. Let's say
  • 00:05:57
    we go ahead and set a pipeline of 100.
  • 00:06:00
    This time we're now maxing out at about
  • 00:06:03
    2.5 million requests a second. Whilst
  • 00:06:06
    this is incredibly fast, you'll notice
  • 00:06:08
    it's slightly less than we would expect.
  • 00:06:10
    Given that we were hitting 1 million
  • 00:06:12
    operations when it came to a pipeline
  • 00:06:13
    size of 10, we should expect to see 10
  • 00:06:16
    million operations when it came to a
  • 00:06:17
    pipeline size of 100. Unfortunately,
  • 00:06:19
    however, we're actually hitting a
  • 00:06:21
    bottleneck, which is caused by the
  • 00:06:23
    available bandwidth on my local network.
  • 00:06:26
    Either way, reaching 2.5 million
  • 00:06:28
    operations per second is impressive. And
  • 00:06:30
    pipelining is an effective way to
  • 00:06:32
    achieve this, especially as it's
  • 00:06:34
    supported by most Reddis clients
  • 00:06:35
    already, and it's pretty easy to
  • 00:06:38
    perform. In Go, you can achieve this by
  • 00:06:40
    using the pipeline method as follows.
  • 00:06:42
    Then sending commands to this pipeline
  • 00:06:44
    followed by executing it, and all of the
  • 00:06:46
    responses will be in the same order you
  • 00:06:48
    pass them in. Despite being simple to
  • 00:06:50
    implement, there is unfortunately a
  • 00:06:52
    catch. Pipelining isn't always possible
  • 00:06:55
    for every use case when it comes to
  • 00:06:57
    working with Reddus as it requires you
  • 00:06:59
    to have a batch of commands in order to
  • 00:07:01
    send up. In some situations, this is
  • 00:07:04
    actually going to be the case. For
  • 00:07:06
    example, if you want to send multiple
  • 00:07:07
    commands at once, such as if you're
  • 00:07:09
    enriching a bunch of data and need to
  • 00:07:11
    perform a get command for a large number
  • 00:07:13
    of keys, you can effectively send all of
  • 00:07:16
    these keys up to Reddus in a batch of
  • 00:07:18
    say 100. However, for most cases when it
  • 00:07:21
    comes to Reddus, pipelining just doesn't
  • 00:07:23
    really make sense as you don't have a
  • 00:07:25
    batch of commands that you can send at
  • 00:07:27
    once. So, whilst it is a great way to
  • 00:07:29
    improve performance, it doesn't work for
  • 00:07:31
    every case. Not only this, but there's
  • 00:07:33
    also some other limitations when it
  • 00:07:34
    comes to Reddus that pipelining can't
  • 00:07:36
    resolve, specifically when it comes to
  • 00:07:39
    resources. As we've seen already, Reddus
  • 00:07:41
    uses a single thread when it comes to
  • 00:07:43
    handling commands, which means even
  • 00:07:46
    though it's incredibly fast, there's
  • 00:07:47
    going to be an upper limit as to what a
  • 00:07:49
    single instance can do. Not only this,
  • 00:07:52
    but the network stack itself can also be
  • 00:07:54
    a bottleneck, especially when dealing
  • 00:07:56
    with operations over the internet that
  • 00:07:58
    we saw already. Lastly, one thing that's
  • 00:08:00
    really important when it comes to Reddus
  • 00:08:02
    and Valky is system memory. given that
  • 00:08:05
    Reddus is an in-memory data store and
  • 00:08:07
    more memory means more data stored and
  • 00:08:09
    less evictions. Therefore, whilst
  • 00:08:12
    pipelining is a great way to get more
  • 00:08:13
    performance out of your Reddus instance,
  • 00:08:15
    it's not going to work when it comes to
  • 00:08:17
    true scaling. So, how can we achieve 1
  • 00:08:20
    million operations per second without
  • 00:08:22
    needing to use pipelining? Well, that's
  • 00:08:25
    where another approach comes in.
  • 00:08:27
    Horizontal scaling. Horizontal scaling
  • 00:08:30
    is where you increase the performance of
  • 00:08:32
    a system by scaling the number of
  • 00:08:34
    instances rather than the amount of
  • 00:08:36
    resources per instance. Basically,
  • 00:08:38
    you're deploying multiple instances of
  • 00:08:40
    Reddus or Valky across multiple
  • 00:08:43
    machines. However, just deploying these
  • 00:08:45
    across multiple machines doesn't really
  • 00:08:47
    do that much by itself. Instead, you
  • 00:08:49
    need to couple these deployments with a
  • 00:08:51
    horizontal scaling strategy in order to
  • 00:08:54
    determine how data is both stored and
  • 00:08:56
    retrieved. When it comes to Reddus and
  • 00:08:58
    well most data storage applications,
  • 00:09:01
    there are two horizontal scaling
  • 00:09:02
    strategies that you can take. The first
  • 00:09:05
    horizontal scaling strategy is known as
  • 00:09:08
    read
  • 00:09:09
    replication. This is where you deploy a
  • 00:09:11
    single instance known as the primary and
  • 00:09:14
    a number of other instances called
  • 00:09:16
    replicas. These replicas are constrained
  • 00:09:20
    to read operations only with the only
  • 00:09:23
    instance that allows data to be written
  • 00:09:25
    to it being the primary. When data is
  • 00:09:28
    then written to this primary instance,
  • 00:09:30
    it's then synchronized to the other
  • 00:09:31
    replicas in the replica set. To set up
  • 00:09:34
    read replication in Reddus and Valky is
  • 00:09:37
    actually incredibly simple. You can
  • 00:09:39
    either do so in the configuration or by
  • 00:09:42
    sending the replica of command which you
  • 00:09:44
    can do through the CLI. In my case, I
  • 00:09:46
    decided to set this up on both my small
  • 00:09:48
    instance and on my thread ripper to
  • 00:09:50
    become replicas of the mid instance. As
  • 00:09:53
    you can see, I'm using the replica of
  • 00:09:55
    command to achieve this, passing in the
  • 00:09:57
    midhost name. If your primary instance
  • 00:10:00
    requires authentication, then there are
  • 00:10:02
    a couple of other steps you need to
  • 00:10:04
    take. I recommend reading the Reddus or
  • 00:10:06
    Valky documentation for whichever
  • 00:10:08
    version you're using. In any case, upon
  • 00:10:10
    executing these commands, replication is
  • 00:10:12
    now set up. And if I go ahead and make a
  • 00:10:15
    right to my primary instance, you'll see
  • 00:10:17
    that this key is now available on the
  • 00:10:19
    two replicas as well. Therefore, we can
  • 00:10:22
    now go ahead and make use of these in
  • 00:10:24
    order to improve the throughput of our
  • 00:10:26
    Reddus deployment. In order to test how
  • 00:10:28
    much throughput we now have, we can
  • 00:10:30
    modify our benchmark command so that we
  • 00:10:33
    only write data to the primary and read
  • 00:10:35
    from the replicas, which is done using
  • 00:10:38
    the following three commands. setting
  • 00:10:40
    the ratio for the primary to be write
  • 00:10:42
    only and setting the ratio for the
  • 00:10:44
    replicas to be read only. Now, if I go
  • 00:10:46
    ahead and run this for about 60 seconds,
  • 00:10:48
    you can see the performance is pretty
  • 00:10:50
    good. We're hitting around 300,000
  • 00:10:53
    requests per second using three
  • 00:10:55
    instances with two replicas. So, all it
  • 00:10:57
    would take to reach 1 million would be
  • 00:10:59
    adding in maybe another seven. Whilst
  • 00:11:02
    this is perfectly achievable, when it
  • 00:11:04
    comes to realworld setups, read
  • 00:11:06
    replication isn't always viable. For
  • 00:11:09
    starters, whilst our total performance
  • 00:11:11
    across the three nodes has increased, we
  • 00:11:13
    haven't actually improved our write
  • 00:11:15
    performance, only our reads. This is
  • 00:11:17
    because we're constrained to only being
  • 00:11:19
    able to write to a single node, which
  • 00:11:21
    means our performance is constrained to
  • 00:11:23
    this one instance. In some setups, this
  • 00:11:26
    is actually okay, especially when it
  • 00:11:29
    comes to more read workflows where
  • 00:11:31
    having multiple instances or multiple
  • 00:11:33
    replicas where you can read from will
  • 00:11:35
    directly scale performance. However,
  • 00:11:38
    there are still some trade-offs when it
  • 00:11:39
    comes to using this horizontal scaling
  • 00:11:41
    strategy. For one thing, this approach
  • 00:11:44
    is more expensive when it comes to
  • 00:11:46
    memory as we're effectively having to
  • 00:11:48
    replicate our entire data set across
  • 00:11:50
    multiple nodes. This means when it comes
  • 00:11:52
    to scaling the actual storage or the
  • 00:11:54
    amount of memory that we have to store
  • 00:11:56
    keys in our instances, we're back to
  • 00:11:59
    vertical scaling. And we can only
  • 00:12:01
    increase this performance by increasing
  • 00:12:03
    the size of memory available on each of
  • 00:12:05
    our nodes. Additionally, replication
  • 00:12:08
    also comes with another caveat called
  • 00:12:10
    lag. This is the time delta where data
  • 00:12:13
    is written to the primary before it's
  • 00:12:15
    available to the replicas and in some
  • 00:12:17
    cases can cause data integrity issues.
  • 00:12:20
    This means that read replication has
  • 00:12:22
    what's known as eventual consistency,
  • 00:12:25
    which is something you don't normally
  • 00:12:26
    have to worry about when running on
  • 00:12:28
    standalone mode. Not only this, but
  • 00:12:30
    there's also a single point of failure
  • 00:12:32
    when it comes to this setup, the
  • 00:12:35
    primary. If this instance happens to go
  • 00:12:37
    down, then we're no longer able to write
  • 00:12:40
    any data to our entire Reddus system.
  • 00:12:43
    Fortunately, there is a solution
  • 00:12:45
    provided to this by both Reddis and
  • 00:12:47
    Valky, which is known as Sentinel. This
  • 00:12:50
    solution monitors both the replicas and
  • 00:12:53
    the primary and will promote a replica
  • 00:12:55
    in the event that a primary goes down.
  • 00:12:58
    Sentinel is actually really awesome when
  • 00:13:00
    it comes to ensuring high availability
  • 00:13:02
    on a Reddus installation. So much so
  • 00:13:04
    that it actually deserves its own
  • 00:13:06
    dedicated video. In any case, whilst
  • 00:13:08
    read replication is a simple approach to
  • 00:13:11
    horizontal scaling and is really
  • 00:13:13
    powerful when it comes to more read
  • 00:13:15
    heavy workflows, it still doesn't solve
  • 00:13:17
    some of the other issues that we've
  • 00:13:19
    mentioned. Therefore, this is where a
  • 00:13:21
    second approach to horizontally scaling
  • 00:13:23
    Reddis comes in known as Reddis cluster.
  • 00:13:27
    Reddis or Valky cluster provides a way
  • 00:13:30
    to run an installation where data is
  • 00:13:32
    automatically sharded across multiple
  • 00:13:34
    nodes. This allows you to distribute
  • 00:13:36
    your requests across multiple instances
  • 00:13:39
    which means you can effectively scale
  • 00:13:41
    CPU memory and networking to an infinite
  • 00:13:44
    amount. Not really. There is still a
  • 00:13:47
    finite limit. Regardless, the way that
  • 00:13:50
    this is achieved is by sharding data
  • 00:13:52
    across multiple nodes. Meaning rather
  • 00:13:54
    than each node having the full data set,
  • 00:13:56
    it splits across each node inside of the
  • 00:13:58
    cluster. This is done by using a
  • 00:14:00
    sharding algorithm. The way the
  • 00:14:03
    algorithm works is actually kind of
  • 00:14:05
    simple. The idea is that the cluster has
  • 00:14:09
    16,384 different hash slots. You can
  • 00:14:12
    think of these as being a bucket of
  • 00:14:15
    keys. Each of these slots or buckets is
  • 00:14:18
    then distributed across the nodes
  • 00:14:20
    evenly. Then in order to determine which
  • 00:14:22
    bucket or slot a key belongs to, the key
  • 00:14:26
    itself is then hashed using
  • 00:14:28
    CRC16 followed by then taking that
  • 00:14:30
    result and modulusing it by the number
  • 00:14:32
    of hash slots, i.e.
  • 00:14:36
    16,384. This then returns the slot that
  • 00:14:39
    the key belongs to, allowing you to then
  • 00:14:41
    distribute it to the node that owns that
  • 00:14:43
    hash slot. Whilst setting up cluster
  • 00:14:45
    mode is a little more involved than read
  • 00:14:47
    replication, it's still not that
  • 00:14:49
    difficult. To do so, you first need to
  • 00:14:52
    add in the following three options into
  • 00:14:54
    your Valky/Rice configuration. These are
  • 00:14:58
    cluster enabled, cluster config file,
  • 00:15:00
    and cluster node timeout. With those
  • 00:15:03
    three configuration options applied for
  • 00:15:05
    each of the Valky instances you wish to
  • 00:15:07
    clusterize, all that remains is to use
  • 00:15:10
    the following cluster create command on
  • 00:15:12
    one of the nodes, passing in the host
  • 00:15:14
    port combinations of all of the
  • 00:15:16
    instances you wish to form the cluster
  • 00:15:18
    with. This will then present you with
  • 00:15:21
    the following screen which will show you
  • 00:15:23
    the distribution of hash slots across
  • 00:15:25
    each of the nodes as well as prompting
  • 00:15:27
    you for confirmation. Upon doing so, the
  • 00:15:30
    cluster will then be set up hopefully
  • 00:15:32
    and should let you know when everything
  • 00:15:34
    is working, which we can then go ahead
  • 00:15:36
    and confirm using the cluster info
  • 00:15:38
    command on one or all of our instances.
  • 00:15:41
    With the cluster setup, if I now go
  • 00:15:43
    ahead and run the MEM tier benchmark
  • 00:15:45
    command again, this time making sure
  • 00:15:47
    it's set to cluster mode by using the
  • 00:15:49
    following flag, you can see that both
  • 00:15:51
    the read and write throughput is now
  • 00:15:54
    substantially increased, hitting around
  • 00:15:56
    400,000 requests a second. Very cool. As
  • 00:16:00
    you can see, this is similar to the
  • 00:16:02
    throughput we were getting when it came
  • 00:16:03
    to using read replicas. However, the
  • 00:16:06
    benefit here is that we're able to both
  • 00:16:08
    read from and write to all three of our
  • 00:16:11
    instances instead of just only being
  • 00:16:13
    able to write to one. Not only does this
  • 00:16:16
    mean that we have improved performance
  • 00:16:17
    when it comes to our write operations,
  • 00:16:19
    but it also means we can make use of the
  • 00:16:21
    available resources on each of our
  • 00:16:23
    machines by deploying multiple Valky
  • 00:16:26
    instances on each node. For example,
  • 00:16:29
    here I've gone and deployed another
  • 00:16:30
    seven instances on my midtier machine,
  • 00:16:33
    bringing the total number of nodes in my
  • 00:16:35
    Valkyrie cluster to 10. This means that
  • 00:16:37
    the cluster should be making better use
  • 00:16:39
    of all of the available hardware on my
  • 00:16:41
    mid-tier machine, which if I now go
  • 00:16:43
    ahead and run a benchmark test against,
  • 00:16:45
    you can see I'm hitting 1 million
  • 00:16:47
    requests per second. Hooray. Of course,
  • 00:16:51
    this is in perfect conditions, running
  • 00:16:54
    on my local network on bare metal. So,
  • 00:16:58
    in order to really complete this
  • 00:16:59
    challenge, then we're going to want to
  • 00:17:01
    take a look at how we can achieve 1
  • 00:17:03
    million operations per second whilst
  • 00:17:05
    running on the cloud using VPS
  • 00:17:08
    instances. Before we do that, however,
  • 00:17:10
    let's first talk about some of the
  • 00:17:11
    caveats associated with cluster mode
  • 00:17:14
    because there are a few. The first of
  • 00:17:16
    which is that in order to be able to
  • 00:17:18
    send commands to it, you need to use a
  • 00:17:21
    clusteraware client. Now to be fair,
  • 00:17:23
    this isn't too much of an issue as most
  • 00:17:26
    Reddus clients provide support for
  • 00:17:27
    cluster mode. However, it does mean that
  • 00:17:29
    any existing code that makes use of
  • 00:17:31
    Reddus does need to be modified at least
  • 00:17:34
    slightly. For example, if I try to
  • 00:17:35
    connect to an instance in my cluster
  • 00:17:37
    using the Valky CLI and try to pull out
  • 00:17:40
    the following key, you'll see I get an
  • 00:17:42
    error letting me know that the key has
  • 00:17:43
    been moved. Therefore, in order to be
  • 00:17:46
    able to use the Valky CLI to send
  • 00:17:48
    commands to the Valky cluster, I would
  • 00:17:51
    need to make use of the - C flag in
  • 00:17:53
    order to connect to cluster mode and my
  • 00:17:56
    client will then be routed to the
  • 00:17:57
    correct node that contains this key. In
  • 00:18:00
    addition to ensuring that the client
  • 00:18:01
    connects to the cluster mode correctly,
  • 00:18:03
    there are some other caveats as well.
  • 00:18:06
    Specifically, when it comes to multikey
  • 00:18:08
    operations, such as working with
  • 00:18:10
    transaction pipelines or reddis lure
  • 00:18:13
    scripts, in each of these cases, you
  • 00:18:15
    need to ensure that any related keys
  • 00:18:17
    will belong to the same key slot.
  • 00:18:20
    Otherwise, any scripts or transactions
  • 00:18:22
    won't be able to be used. Fortunately,
  • 00:18:24
    Reddus and Valky provide a way to
  • 00:18:26
    achieve this, which is to make use of a
  • 00:18:29
    hashtag. Not the social media kind of
  • 00:18:31
    hashtags. Instead, a hashtag is defined
  • 00:18:34
    within an actual key as follows.
  • 00:18:37
    Specifying the ID that you want to be
  • 00:18:39
    hashed using the following syntax. In
  • 00:18:42
    this case, the key being hashed is 1 2 3
  • 00:18:45
    rather than the actual full key itself.
  • 00:18:48
    By doing this, it means that any keys
  • 00:18:50
    that share the same hashtag will be
  • 00:18:52
    placed inside of the same hash slot,
  • 00:18:54
    which means you're then able to perform
  • 00:18:55
    any multi-key operations such as
  • 00:18:58
    transactions or scripts. Whilst hashtags
  • 00:19:01
    solve the issue of key distribution,
  • 00:19:03
    there are still many other problems when
  • 00:19:05
    it comes to running a distributed system
  • 00:19:07
    such as a Valky cluster with perhaps the
  • 00:19:10
    most major one being how to ensure
  • 00:19:12
    reliability in the event that a node
  • 00:19:14
    goes down. Now to be fair, Reddis
  • 00:19:16
    cluster does provide some resilience
  • 00:19:18
    when it comes to availability. If a node
  • 00:19:21
    is dropped from a cluster, then those
  • 00:19:22
    key slots will be redistributed.
  • 00:19:24
    However, the data can be lost.
  • 00:19:27
    Fortunately, cluster mode also provides
  • 00:19:29
    the ability to set up replication.
  • 00:19:31
    However, this differs slightly from the
  • 00:19:34
    replication we saw before in that rather
  • 00:19:36
    than replicating the entire data set,
  • 00:19:39
    these replicas instead contain a copy of
  • 00:19:41
    their respective shards or different
  • 00:19:43
    hash slots. Additionally, this means you
  • 00:19:45
    can have multiple replicas per shard,
  • 00:19:48
    providing you high availability and
  • 00:19:50
    reducing the risk of data loss in
  • 00:19:52
    cluster mode. This does mean however
  • 00:19:54
    that you'll want to ensure that each of
  • 00:19:55
    these replicas is on a different machine
  • 00:19:58
    than the primary and ideally from each
  • 00:20:00
    other as well and ultimately means your
  • 00:20:02
    total reddis system is going to become
  • 00:20:05
    more complex. Fortunately, there are
  • 00:20:07
    tools out there such as IA, Kubernetes
  • 00:20:10
    or managed providers to help make this
  • 00:20:12
    complexity more manageable. In my case,
  • 00:20:15
    when it came to deploying a cluster onto
  • 00:20:17
    a number of VPS instances, I ended up
  • 00:20:20
    writing the following Terraform
  • 00:20:21
    configuration. Well, actually, it's an
  • 00:20:23
    open Tofu configuration who I'm now
  • 00:20:26
    pinning on my wall instead. Regardless,
  • 00:20:28
    this configuration allows me to deploy a
  • 00:20:30
    Valky cluster onto one of two different
  • 00:20:33
    providers, either Digital Ocean or
  • 00:20:35
    Hzner, which I used to see if I could
  • 00:20:38
    reach 1 million operations per second.
  • 00:20:41
    As it turned out, it was a little bit
  • 00:20:43
    more challenging than I thought.
  • 00:20:45
    However, before we take a look at
  • 00:20:46
    whether or not I was able to achieve
  • 00:20:48
    this on the public cloud, let's take a
  • 00:20:50
    quick look at another way to achieve 1
  • 00:20:53
    million requests per second. One that is
  • 00:20:55
    actually a lot more simple than setting
  • 00:20:57
    up a Valkyrie cluster. This is through
  • 00:20:59
    using the sponsor of today's video,
  • 00:21:02
    Dragonfly DB, who, as I mentioned
  • 00:21:04
    before, provide a drop-in replacement
  • 00:21:06
    for Reddis that boasts greater
  • 00:21:08
    performance. to show how much
  • 00:21:09
    performance improvement Dragonfly DB
  • 00:21:11
    has. If I go ahead and deploy an
  • 00:21:13
    instance of it onto my small machine,
  • 00:21:16
    followed by performing the following
  • 00:21:17
    benchmark test we've been using before,
  • 00:21:19
    you can see I'm hitting about 250,000
  • 00:21:22
    requests per second, which isn't that
  • 00:21:24
    much of an improvement compared to the
  • 00:21:26
    existing instance I was using before.
  • 00:21:28
    However, if I go ahead and now deploy
  • 00:21:29
    this on my mid machine, you can see the
  • 00:21:32
    performance improvement is now
  • 00:21:34
    substantial. This time I'm running about
  • 00:21:37
    twice as fast as I was before, which
  • 00:21:39
    makes a lot of sense given there's twice
  • 00:21:40
    as many cores on this machine. As you
  • 00:21:43
    can see, by using Dragonfly DB, which
  • 00:21:45
    makes better use of the available
  • 00:21:46
    resources on a system, we're able to now
  • 00:21:49
    vertically scale our system compared to
  • 00:21:51
    just using a single core implementation
  • 00:21:53
    like we were before. So, let's see what
  • 00:21:55
    happens if we run Dragonfly on the big
  • 00:21:58
    boy Thread Ripper. Can we hit 1 million
  • 00:22:00
    requests per second by just running a
  • 00:22:03
    single instance in standalone mode?
  • 00:22:05
    Turns out we can by just a hair. So 1
  • 00:22:09
    million requests achieved by just using
  • 00:22:12
    vertical scaling. And this number can
  • 00:22:14
    actually go even further. In fact, the
  • 00:22:17
    team at Dragonfly managed to reach 6
  • 00:22:19
    million requests per second when it came
  • 00:22:21
    to running on a VPS. All of this is
  • 00:22:23
    achieved without having to worry about
  • 00:22:25
    setting up multiple instances or any of
  • 00:22:28
    the caveats that come from using cluster
  • 00:22:30
    mode. That being said, there are still
  • 00:22:33
    some good reasons to embrace the
  • 00:22:34
    complexity of clustering, such as when
  • 00:22:37
    you want to distribute your key set
  • 00:22:38
    across multiple machines, either for
  • 00:22:40
    scaling memory or for redundancy.
  • 00:22:43
    Dragonfly itself does provide a way to
  • 00:22:45
    horizontally scale using their own swarm
  • 00:22:48
    mode, which was announced a short while
  • 00:22:50
    before filming this video. So, if you're
  • 00:22:53
    interested in Dragonfly DB as a drop-in
  • 00:22:55
    replacement for Reddus that offers
  • 00:22:57
    greater performance, then check them out
  • 00:22:59
    using the link below. Okay, so now that
  • 00:23:02
    we've seen how to reach 1 million
  • 00:23:03
    requests per second in perfect
  • 00:23:05
    conditions, let's take a look at how
  • 00:23:07
    difficult it is to achieve this on
  • 00:23:09
    something less perfect, the cloud. As I
  • 00:23:12
    mentioned before, I'd managed to set up
  • 00:23:13
    a Terraform or Open Tofu configuration
  • 00:23:16
    for both Digital Ocean and Hzner, which
  • 00:23:19
    you can actually download yourself from
  • 00:23:20
    GitHub if you want to try it out. Just
  • 00:23:23
    remember, don't leave these instances
  • 00:23:25
    running for a long time, or you'll end
  • 00:23:27
    up with a large bill at the end of the
  • 00:23:30
    month. So there are some instructions on
  • 00:23:32
    how to actually deploy this Terraform
  • 00:23:33
    configuration on the actual repo itself,
  • 00:23:36
    but at a high level, you mainly just
  • 00:23:38
    need to set your API token and SSH key
  • 00:23:40
    values in the TF vase for whichever
  • 00:23:43
    cloud provider you want to use, followed
  • 00:23:46
    by then running the Tofu apply command.
  • 00:23:49
    Once you're done with your testing, in
  • 00:23:51
    order to clean everything up, go ahead
  • 00:23:52
    and use the tofu destroy command just to
  • 00:23:55
    make sure you don't go bankrupt. As I
  • 00:23:57
    mentioned, I ended up writing an open
  • 00:23:59
    Tofu configuration to deploy a Valky
  • 00:24:02
    cluster to either Digital Ocean or
  • 00:24:04
    Hetner. However, this wasn't my original
  • 00:24:07
    plan as I had intended to only use one
  • 00:24:10
    of these providers, HNER. But for some
  • 00:24:14
    reason, I couldn't seem to get anywhere
  • 00:24:16
    close to 1 million operations per
  • 00:24:18
    second. instead barely only exceeding
  • 00:24:22
    100,000 no matter what I did or how I
  • 00:24:24
    had the cluster configured. Overall, it
  • 00:24:28
    was kind of strange and my best guess as
  • 00:24:31
    to why this was happening was because I
  • 00:24:33
    was using shared vCPUs. So, I went about
  • 00:24:36
    migrating to dedicated ones instead.
  • 00:24:39
    However, because my account was too new
  • 00:24:41
    to request a limit increase, I was
  • 00:24:43
    unable to provision any more dedicated
  • 00:24:45
    vCPUs. So when it came to hitting a
  • 00:24:48
    million Valky operations per second
  • 00:24:50
    using Hzner, I was out of moves.
  • 00:24:53
    Therefore, I instead decided to try with
  • 00:24:55
    Digital Ocean. First reaching out to
  • 00:24:58
    support in order to get access to the
  • 00:25:00
    larger instance sizes so that I could
  • 00:25:02
    run benchmarking without having any
  • 00:25:04
    bottlenecking. Once I had my compute
  • 00:25:06
    limits increased, I then deployed the
  • 00:25:08
    cluster using Tofu Apply and SSH into my
  • 00:25:11
    benchmark box. Once I had everything set
  • 00:25:14
    up and the cluster was deployed, I then
  • 00:25:17
    went about running the memier benchmark
  • 00:25:19
    command. And on my initial attempt,
  • 00:25:22
    which had nine Valkyrie nodes inside of
  • 00:25:23
    the cluster and using a 16 vcpu machine
  • 00:25:26
    for the benchmarking tool, I was hitting
  • 00:25:28
    around 450,000 requests per second. Not
  • 00:25:32
    bad. After confirming that it was the
  • 00:25:34
    benchmark tool that was bottlenecking, I
  • 00:25:36
    decided to scale up the instance it was
  • 00:25:38
    running on to one with 32 vCPUs. This
  • 00:25:42
    time when I ran the benchmarking tool
  • 00:25:43
    with 32 threads, I was getting around
  • 00:25:47
    900,000 off the get- go. However, as
  • 00:25:50
    time went on, this number would start to
  • 00:25:52
    decrease down to about
  • 00:25:54
    800,000. So, I decided to go allin and
  • 00:25:58
    scaled up the number of Valky nodes I
  • 00:26:00
    had from 9 to 15. After doing a quick
  • 00:26:03
    tofu apply and seeing all of the
  • 00:26:05
    instances come through on the digital
  • 00:26:07
    ocean dashboard, I sshed in and ran the
  • 00:26:10
    ment
  • 00:26:13
    again. Success. I was now hitting a
  • 00:26:16
    sustained 1 million operations per
  • 00:26:19
    second.
  • 00:26:22
    With that, my goal had been achieved,
  • 00:26:24
    and all it took was for me to deploy 15
  • 00:26:26
    Valky instances on premium Intel nodes,
  • 00:26:30
    which had I left running would have only
  • 00:26:31
    cost me around $1,100 a month. Yeah, a
  • 00:26:35
    little out of my infrastructure budget.
  • 00:26:38
    Just for fun, I decided to see how much
  • 00:26:40
    throughput I could get by using a
  • 00:26:42
    pipeline of 100 when it came to this
  • 00:26:44
    setup, which ended up producing around
  • 00:26:47
    14 million operations per second. So
  • 00:26:50
    yeah, that just goes to show how much
  • 00:26:51
    improvement you can get when it comes to
  • 00:26:53
    reducing the round trip time by using
  • 00:26:55
    pipelining with Reddus. In any case, all
  • 00:26:58
    that remained was to tear down my setup
  • 00:27:00
    using Tofu Destroy. And with that, I had
  • 00:27:03
    managed to achieve my goal, hitting 1
  • 00:27:05
    million requests, both using bare metal
  • 00:27:08
    on my home network, which honestly is
  • 00:27:09
    kind of cheating, and through using a
  • 00:27:12
    VPS instance on Digital Ocean, using
  • 00:27:15
    private networking in the data center.
  • 00:27:18
    In any case, I want to give a big thank
  • 00:27:19
    you to Dragonfly DB for sponsoring this
  • 00:27:22
    video and making all of this happen.
  • 00:27:24
    Without them, I wouldn't have been able
  • 00:27:25
    to spend so much cash on VPS instances
  • 00:27:28
    for testing. Additionally, if you want
  • 00:27:30
    to use a high performance Reddit
  • 00:27:32
    alternative without managing your own
  • 00:27:34
    infrastructure, then Dragonfly also
  • 00:27:36
    offers a fully managed service,
  • 00:27:38
    Dragonfly Cloud, which runs the same
  • 00:27:41
    code as if you were self-hosting, but
  • 00:27:43
    just handles all of the operational
  • 00:27:45
    heavy lifting for you. So, if you want a
  • 00:27:48
    hassle-free caching solution, then check
  • 00:27:50
    it out using the link in the description
  • 00:27:52
    below. Otherwise, I want to give a big
  • 00:27:55
    thank you for watching, and I'll see you
  • 00:27:56
    on the next one.
الوسوم
  • Reddus
  • Valky
  • pipelining
  • horisontale skaal
  • vertikale skaal
  • Dragonfly DB
  • kluster
  • lees-replikasie
  • prestasie
  • versoeke per sekonde