Wat is die hoofdoel van die video?

Die video fokus op hoe om die prestasie van Reddus en Valky te verhoog tot meer as 1 miljoen versoeke per sekonde.

Pipelining is 'n tegniek wat gebruik word om prestasie te verbeter deur verskeie opdragte gelyktydig te stuur sonder om op die antwoord van elke opdrag te wag.

Wat is die verskil tussen vertikale en horisontale skaal?

Vertikale skaal behels die opgradering van 'n enkele masjien se hulpbronne, terwyl horisontale skaal die implementering van verskeie masjiene behels.

Wat is die rol van Dragonfly DB in die video?

Dragonfly DB word voorgestel as 'n hoëprestasie alternatief vir Reddus wat beter gebruik maak van beskikbare hulpbronne.

Wat is 'n kluster in die konteks van Reddus?

'n Kluster is 'n stel Reddus instansies wat data outomaties oor verskeie nodes verdeel.

Wat is die voordeel van lees-replikasie?

Lees-replikasie verhoog die leesprestasie deur verskeie replika's te hê wat slegs leesoperasies hanteer.

Wat is 'n hashtag in Reddus?

'n Hashtag is 'n spesifieke deel van 'n sleutel wat help om te verseker dat verwante sleutels in dieselfde hash-slot geplaas word.

Wat is die uitdaging van die gebruik van 'n kluster?

Die uitdaging sluit in die behoefte aan 'n klusterbewuste kliënt en die hantering van multi-sleutel operasies.

Hoeveel versoeke per sekonde het die spreker op die einde bereik?

Die spreker het 1 miljoen versoeke per sekonde bereik met 'n kluster van 15 Valky instansies.

Wat is die koste verbonde aan die VPS instansies?

Die koste vir die VPS instansies was ongeveer $1,100 per maand.

There's more than one way to scale Redis/Valkey to 1M op/s...

00:28:00

https://www.youtube.com/watch?v=uptpnVdwFM4

الملخص

TLDRDie video bespreek hoe om die prestasie van Reddus en sy fork, Valky, te verhoog om meer as 1 miljoen versoeke per sekonde te hanteer. Die spreker deel sy ervarings met verskillende masjiene en hoe om die beste resultate te behaal. Hy verduidelik die verskil tussen vertikale en horisontale skaal, en hoe pipelining 'n effektiewe metode is om prestasie te verbeter. Die video sluit ook 'n bespreking van Dragonfly DB in, wat 'n hoëprestasie alternatief vir Reddus bied. Die spreker demonstreer hoe om 'n kluster op te stel en die voordele daarvan, insluitend verbeterde lees- en skryfprestasie. Ten slotte, bereik die spreker sy doel om 1 miljoen versoeke per sekonde te haal, beide op 'n plaaslike netwerk en deur die gebruik van VPS instansies.

الوجبات الجاهزة

🚀 Reddus kan meer as 100,000 versoeke per sekonde hanteer.
💻 Pipelining kan prestasie verdubbel sonder hardeware veranderinge.
📈 Horisontale skaal verhoog prestasie deur verskeie instansies te gebruik.
🔄 Lees-replikasie verbeter leesprestasie met replika's.
🔑 Hashtags help om sleutels in dieselfde hash-slot te plaas.
⚙️ Dragonfly DB bied 'n hoëprestasie alternatief vir Reddus.
🌐 Klusters verdeel data oor verskeie nodes vir beter prestasie.
📊 1 miljoen versoeke per sekonde is haalbaar met die regte konfigurasie.
💰 VPS instansies kan duur wees, met koste van $1,100 per maand.
🔧 Klusterbewuste kliënte is nodig vir multi-sleutel operasies.

الجدول الزمني

00:00:00 - 00:05:00
Die spreker was 'n fan van Reddus, maar het oorgeskakel na Valkyrie weens probleme met Reddus. Hy het 'n hoë deurset van 100,000 versoeke per sekonde op Valkyrie op 'n bare metal masjien behaal en wil dit verhoog na 1 miljoen versoeke per sekonde. Hy bespreek verskillende metodes om die prestasie van Reddus te verbeter, insluitend vertikale skaal.
00:05:00 - 00:10:00
Die eerste benadering was om Reddus op groter masjiene te laat loop, wat vertikale skaal genoem word. Hy het drie masjiene gebruik, elk met verskillende spesifikasies, en het 'n standaard toetsinstrument gebruik om die deurset te meet. Hy het bevind dat die prestasie nie noodwendig verbeter met meer kerne nie, aangesien Reddus slegs een draad gebruik om versoeke te hanteer.
00:10:00 - 00:15:00
Die spreker het pipelining as 'n metode bespreek om die prestasie te verbeter deur verskeie versoeke gelyktydig te stuur. Hy het 'n toename in deurset gesien, wat tot 2.5 miljoen versoeke per sekonde gestyg het, maar het 'n bottleneck op sy netwerk opgemerk. Hy het die beperkinge van pipelining bespreek, insluitend die behoefte aan 'n stel versoeke om dit te gebruik.
00:15:00 - 00:20:00
Die tweede benadering was horisontale skaal, wat behels dat verskeie instansies van Reddus op verskillende masjiene ontplooi word. Hy het die konsep van leesreplikasie bespreek, waar 'n primêre instansie data skryf en verskeie replika's lees. Hy het die voordele en nadele van hierdie benadering bespreek, insluitend die koste van geheue en die risiko van data-integriteitskwessies.
00:20:00 - 00:28:00
Die spreker het ook Reddus-kluster bespreek, wat 'n manier bied om data oor verskeie nodes te verdeel. Hy het die proses om 'n kluster op te stel verduidelik en die prestasie verbeterings wat dit bied, insluitend die vermoë om gelyktydig te lees en te skryf. Hy het die kompleksiteit van die bestuur van 'n kluster en die behoefte aan 'n klusterbewuste kliënt bespreek.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

Wat is die hoofdoel van die video?
Die video fokus op hoe om die prestasie van Reddus en Valky te verhoog tot meer as 1 miljoen versoeke per sekonde.
Wat is pipelining?
Pipelining is 'n tegniek wat gebruik word om prestasie te verbeter deur verskeie opdragte gelyktydig te stuur sonder om op die antwoord van elke opdrag te wag.
Wat is die verskil tussen vertikale en horisontale skaal?
Vertikale skaal behels die opgradering van 'n enkele masjien se hulpbronne, terwyl horisontale skaal die implementering van verskeie masjiene behels.
Wat is die rol van Dragonfly DB in die video?
Dragonfly DB word voorgestel as 'n hoëprestasie alternatief vir Reddus wat beter gebruik maak van beskikbare hulpbronne.
Wat is 'n kluster in die konteks van Reddus?
'n Kluster is 'n stel Reddus instansies wat data outomaties oor verskeie nodes verdeel.
Wat is die voordeel van lees-replikasie?
Lees-replikasie verhoog die leesprestasie deur verskeie replika's te hê wat slegs leesoperasies hanteer.
Wat is 'n hashtag in Reddus?
'n Hashtag is 'n spesifieke deel van 'n sleutel wat help om te verseker dat verwante sleutels in dieselfde hash-slot geplaas word.
Wat is die uitdaging van die gebruik van 'n kluster?
Die uitdaging sluit in die behoefte aan 'n klusterbewuste kliënt en die hantering van multi-sleutel operasies.
Hoeveel versoeke per sekonde het die spreker op die einde bereik?
Die spreker het 1 miljoen versoeke per sekonde bereik met 'n kluster van 15 Valky instansies.
Wat is die koste verbonde aan die VPS instansies?
Die koste vir die VPS instansies was ongeveer $1,100 per maand.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
It's no secret that I'm a fan of Reddus,
00:00:03
or at least I was a fan of Reddus until
00:00:06
they went and forked things up. These
00:00:09
days, I'm now pinning pictures of
00:00:10
Valkyrie on my wall instead. In any
00:00:13
case, I've been using Reddus for a good
00:00:16
number of years, and through that time,
00:00:18
I've always been fascinated with how
00:00:20
incredibly fast both Reddus and its
00:00:22
forks, such as Valkyrie, can be. In
00:00:25
fact, by just running a single instance
00:00:27
of Valky on a bare metal machine, you
00:00:30
can easily exceed 100,000 requests per
00:00:33
second, which is a lot of throughput.
00:00:36
Whilst 100,000 requests per second is
00:00:39
pretty impressive. I wanted to see how
00:00:41
difficult it would be to push this even
00:00:43
further, say 10 times further to 1
00:00:46
million requests per second. So, how
00:00:49
does one go about increasing the number
00:00:51
of requests per second or RPS that
00:00:54
Reddis can handle to greater than 1
00:00:57
million? Well, as it turns out, there
00:00:59
are a number of different ways to do so,
00:01:02
each with their own pros and cons
00:01:04
depending on the situation that you find
00:01:06
yourself in. The first approach I
00:01:09
decided to take when it came to scaling
00:01:10
Reddis or Valky in this case was to just
00:01:13
run it on a bigger machine. This is
00:01:16
known as vertical scaling. And whilst it
00:01:19
can be effective for some software, when
00:01:22
it comes to reddis, it's a little bit
00:01:23
more complicated. To show what I mean, I
00:01:26
decided to deploy a Valky instance onto
00:01:28
three separate machines. The first being
00:01:31
a Blink Mini S12, which has a lowowered
00:01:34
4 core CPU, the Intel
00:01:37
N95, which is the least powerful of the
00:01:40
three. The second machine that I
00:01:42
installed Valkyrie on, I've defined as
00:01:44
the mid machine, which is yet another
00:01:46
Beink mini PC. This one, the STR6, which
00:01:50
runs an AMD 7735 CPU with eight cores
00:01:55
and 16 threads. So, it's a little bit
00:01:58
more powerful. The final machine in this
00:02:00
testing is the big boy. This is the AMD
00:02:03
Thread Ripper,
00:02:04
3970X with a massive 32 cores and 64
00:02:08
threads, as well as boasting 128 gigs of
00:02:11
RAM. To test how much throughput each of
00:02:13
these machines can handle, I'm going to
00:02:15
go ahead and use the following mem
00:02:17
benchmark command, which is pretty much
00:02:19
the standard tool when it comes to
00:02:21
benchmarking Reddus and its forks.
00:02:24
Additionally, I'm running this tool on
00:02:25
another machine and sending the commands
00:02:27
over the network in order to simulate
00:02:29
realworld usage. And when I run this
00:02:32
command, you can see that the mini PC is
00:02:35
handling a huge amount of requests per
00:02:37
second, around 200,000, which honestly
00:02:41
is pretty impressive. This high number
00:02:44
is already happening because I'm running
00:02:45
on my local network and running the
00:02:47
Valky instance on bare metal, which both
00:02:50
Reddus and Valky handle really well. If
00:02:53
instead I was making use of a couple of
00:02:55
VPS instances, one to host the instance
00:02:57
and one to actually perform the test in
00:03:00
the same data center, you'll see that
00:03:02
the request count drops significantly.
00:03:05
And if I try to go ahead and benchmark
00:03:06
this from my machine to one of these VPS
00:03:08
instances over the internet, then it
00:03:11
drops even further. So that's one thing
00:03:13
to consider if you want to scale Reddis.
00:03:16
bare metal is best, but also
00:03:18
colllocating or running in the same data
00:03:21
center is really important. If you're
00:03:24
sending commands halfway across the
00:03:25
world, then you're going to have a bad
00:03:27
time. In any case, going back to my
00:03:29
local area network setup, the four core
00:03:31
machine did pretty well. So, let's go
00:03:34
ahead and see how it works when it comes
00:03:35
to the midlevel machine. As you can see,
00:03:38
we're actually getting around the same
00:03:40
number of operations per second, which
00:03:43
at first may feel a little surprising.
00:03:46
This, however, is actually expected when
00:03:48
it comes to Reddus. This is because
00:03:51
Reddus and some of its forks such as
00:03:53
Valky only make use of a single thread
00:03:55
when it comes to handling commands.
00:03:58
Meaning that more cores or more CPUs
00:04:00
isn't going to increase performance. In
00:04:03
fact, in some cases, it can actually
00:04:05
hinder it. For example, if I go ahead
00:04:07
and run the benchmark against my
00:04:09
Valkyrie instance on the 32 core
00:04:11
machine, you can see now we're actually
00:04:13
producing less operations down around
00:04:16
1/4 what we were before. This is because
00:04:19
the 32 core Thread Ripper machine
00:04:21
actually has worse single core
00:04:23
performance than the other two. And
00:04:25
because Reddus is so dependent on single
00:04:27
core performance, then it's having a
00:04:29
negative impact. Now there are both
00:04:32
forks of Reddus and Reddis compatible
00:04:34
solutions that are better able to make
00:04:36
use of multiple cores on a machine with
00:04:38
one such solution being Dragonfly DB who
00:04:42
are also the sponsors of today's video.
00:04:44
We'll talk a little bit more about
00:04:45
Dragonfly DB later on and how they
00:04:48
provide increased performance on
00:04:50
multi-core machines. However, for the
00:04:52
meantime, we're going to go ahead and
00:04:53
focus on Reddus or Valky and see how we
00:04:56
can get it to perform millions of
00:04:58
requests per second using a single
00:05:00
threaded instance. One approach to doing
00:05:03
so is to make use of something called
00:05:06
pipelining. Pipelining is a technique
00:05:08
for improving performance by issuing
00:05:10
multiple commands at once without
00:05:12
waiting for the response to each
00:05:14
individual command to return. It's very
00:05:16
similar to the concept of batching. to
00:05:19
show the performance improvement of
00:05:20
using pipelining. If I go ahead and use
00:05:22
the benchmark test tool again, this time
00:05:24
passing in the d--peline flag, setting
00:05:27
it to be two. So, we're sending a
00:05:29
pipeline of two commands at once. As you
00:05:32
can see, we're now effectively doubling
00:05:33
our throughput without making any
00:05:35
hardware changes to our instance. This
00:05:38
is pretty great, but how far can we
00:05:40
actually push it? Well, if I go ahead
00:05:42
and set a pipeline to be an order of
00:05:44
magnitude higher from 2 to 10, you can
00:05:47
see now we're pushing over 1 million
00:05:49
requests a second, give or take. Pretty
00:05:52
cool. However, we don't just have to
00:05:53
stop here and we can actually push
00:05:55
pipelining a little further. Let's say
00:05:57
we go ahead and set a pipeline of 100.
00:06:00
This time we're now maxing out at about
00:06:03
2.5 million requests a second. Whilst
00:06:06
this is incredibly fast, you'll notice
00:06:08
it's slightly less than we would expect.
00:06:10
Given that we were hitting 1 million
00:06:12
operations when it came to a pipeline
00:06:13
size of 10, we should expect to see 10
00:06:16
million operations when it came to a
00:06:17
pipeline size of 100. Unfortunately,
00:06:19
however, we're actually hitting a
00:06:21
bottleneck, which is caused by the
00:06:23
available bandwidth on my local network.
00:06:26
Either way, reaching 2.5 million
00:06:28
operations per second is impressive. And
00:06:30
pipelining is an effective way to
00:06:32
achieve this, especially as it's
00:06:34
supported by most Reddis clients
00:06:35
already, and it's pretty easy to
00:06:38
perform. In Go, you can achieve this by
00:06:40
using the pipeline method as follows.
00:06:42
Then sending commands to this pipeline
00:06:44
followed by executing it, and all of the
00:06:46
responses will be in the same order you
00:06:48
pass them in. Despite being simple to
00:06:50
implement, there is unfortunately a
00:06:52
catch. Pipelining isn't always possible
00:06:55
for every use case when it comes to
00:06:57
working with Reddus as it requires you
00:06:59
to have a batch of commands in order to
00:07:01
send up. In some situations, this is
00:07:04
actually going to be the case. For
00:07:06
example, if you want to send multiple
00:07:07
commands at once, such as if you're
00:07:09
enriching a bunch of data and need to
00:07:11
perform a get command for a large number
00:07:13
of keys, you can effectively send all of
00:07:16
these keys up to Reddus in a batch of
00:07:18
say 100. However, for most cases when it
00:07:21
comes to Reddus, pipelining just doesn't
00:07:23
really make sense as you don't have a
00:07:25
batch of commands that you can send at
00:07:27
once. So, whilst it is a great way to
00:07:29
improve performance, it doesn't work for
00:07:31
every case. Not only this, but there's
00:07:33
also some other limitations when it
00:07:34
comes to Reddus that pipelining can't
00:07:36
resolve, specifically when it comes to
00:07:39
resources. As we've seen already, Reddus
00:07:41
uses a single thread when it comes to
00:07:43
handling commands, which means even
00:07:46
though it's incredibly fast, there's
00:07:47
going to be an upper limit as to what a
00:07:49
single instance can do. Not only this,
00:07:52
but the network stack itself can also be
00:07:54
a bottleneck, especially when dealing
00:07:56
with operations over the internet that
00:07:58
we saw already. Lastly, one thing that's
00:08:00
really important when it comes to Reddus
00:08:02
and Valky is system memory. given that
00:08:05
Reddus is an in-memory data store and
00:08:07
more memory means more data stored and
00:08:09
less evictions. Therefore, whilst
00:08:12
pipelining is a great way to get more
00:08:13
performance out of your Reddus instance,
00:08:15
it's not going to work when it comes to
00:08:17
true scaling. So, how can we achieve 1
00:08:20
million operations per second without
00:08:22
needing to use pipelining? Well, that's
00:08:25
where another approach comes in.
00:08:27
Horizontal scaling. Horizontal scaling
00:08:30
is where you increase the performance of
00:08:32
a system by scaling the number of
00:08:34
instances rather than the amount of
00:08:36
resources per instance. Basically,
00:08:38
you're deploying multiple instances of
00:08:40
Reddus or Valky across multiple
00:08:43
machines. However, just deploying these
00:08:45
across multiple machines doesn't really
00:08:47
do that much by itself. Instead, you
00:08:49
need to couple these deployments with a
00:08:51
horizontal scaling strategy in order to
00:08:54
determine how data is both stored and
00:08:56
retrieved. When it comes to Reddus and
00:08:58
well most data storage applications,
00:09:01
there are two horizontal scaling
00:09:02
strategies that you can take. The first
00:09:05
horizontal scaling strategy is known as
00:09:08
read
00:09:09
replication. This is where you deploy a
00:09:11
single instance known as the primary and
00:09:14
a number of other instances called
00:09:16
replicas. These replicas are constrained
00:09:20
to read operations only with the only
00:09:23
instance that allows data to be written
00:09:25
to it being the primary. When data is
00:09:28
then written to this primary instance,
00:09:30
it's then synchronized to the other
00:09:31
replicas in the replica set. To set up
00:09:34
read replication in Reddus and Valky is
00:09:37
actually incredibly simple. You can
00:09:39
either do so in the configuration or by
00:09:42
sending the replica of command which you
00:09:44
can do through the CLI. In my case, I
00:09:46
decided to set this up on both my small
00:09:48
instance and on my thread ripper to
00:09:50
become replicas of the mid instance. As
00:09:53
you can see, I'm using the replica of
00:09:55
command to achieve this, passing in the
00:09:57
midhost name. If your primary instance
00:10:00
requires authentication, then there are
00:10:02
a couple of other steps you need to
00:10:04
take. I recommend reading the Reddus or
00:10:06
Valky documentation for whichever
00:10:08
version you're using. In any case, upon
00:10:10
executing these commands, replication is
00:10:12
now set up. And if I go ahead and make a
00:10:15
right to my primary instance, you'll see
00:10:17
that this key is now available on the
00:10:19
two replicas as well. Therefore, we can
00:10:22
now go ahead and make use of these in
00:10:24
order to improve the throughput of our
00:10:26
Reddus deployment. In order to test how
00:10:28
much throughput we now have, we can
00:10:30
modify our benchmark command so that we
00:10:33
only write data to the primary and read
00:10:35
from the replicas, which is done using
00:10:38
the following three commands. setting
00:10:40
the ratio for the primary to be write
00:10:42
only and setting the ratio for the
00:10:44
replicas to be read only. Now, if I go
00:10:46
ahead and run this for about 60 seconds,
00:10:48
you can see the performance is pretty
00:10:50
good. We're hitting around 300,000
00:10:53
requests per second using three
00:10:55
instances with two replicas. So, all it
00:10:57
would take to reach 1 million would be
00:10:59
adding in maybe another seven. Whilst
00:11:02
this is perfectly achievable, when it
00:11:04
comes to realworld setups, read
00:11:06
replication isn't always viable. For
00:11:09
starters, whilst our total performance
00:11:11
across the three nodes has increased, we
00:11:13
haven't actually improved our write
00:11:15
performance, only our reads. This is
00:11:17
because we're constrained to only being
00:11:19
able to write to a single node, which
00:11:21
means our performance is constrained to
00:11:23
this one instance. In some setups, this
00:11:26
is actually okay, especially when it
00:11:29
comes to more read workflows where
00:11:31
having multiple instances or multiple
00:11:33
replicas where you can read from will
00:11:35
directly scale performance. However,
00:11:38
there are still some trade-offs when it
00:11:39
comes to using this horizontal scaling
00:11:41
strategy. For one thing, this approach
00:11:44
is more expensive when it comes to
00:11:46
memory as we're effectively having to
00:11:48
replicate our entire data set across
00:11:50
multiple nodes. This means when it comes
00:11:52
to scaling the actual storage or the
00:11:54
amount of memory that we have to store
00:11:56
keys in our instances, we're back to
00:11:59
vertical scaling. And we can only
00:12:01
increase this performance by increasing
00:12:03
the size of memory available on each of
00:12:05
our nodes. Additionally, replication
00:12:08
also comes with another caveat called
00:12:10
lag. This is the time delta where data
00:12:13
is written to the primary before it's
00:12:15
available to the replicas and in some
00:12:17
cases can cause data integrity issues.
00:12:20
This means that read replication has
00:12:22
what's known as eventual consistency,
00:12:25
which is something you don't normally
00:12:26
have to worry about when running on
00:12:28
standalone mode. Not only this, but
00:12:30
there's also a single point of failure
00:12:32
when it comes to this setup, the
00:12:35
primary. If this instance happens to go
00:12:37
down, then we're no longer able to write
00:12:40
any data to our entire Reddus system.
00:12:43
Fortunately, there is a solution
00:12:45
provided to this by both Reddis and
00:12:47
Valky, which is known as Sentinel. This
00:12:50
solution monitors both the replicas and
00:12:53
the primary and will promote a replica
00:12:55
in the event that a primary goes down.
00:12:58
Sentinel is actually really awesome when
00:13:00
it comes to ensuring high availability
00:13:02
on a Reddus installation. So much so
00:13:04
that it actually deserves its own
00:13:06
dedicated video. In any case, whilst
00:13:08
read replication is a simple approach to
00:13:11
horizontal scaling and is really
00:13:13
powerful when it comes to more read
00:13:15
heavy workflows, it still doesn't solve
00:13:17
some of the other issues that we've
00:13:19
mentioned. Therefore, this is where a
00:13:21
second approach to horizontally scaling
00:13:23
Reddis comes in known as Reddis cluster.
00:13:27
Reddis or Valky cluster provides a way
00:13:30
to run an installation where data is
00:13:32
automatically sharded across multiple
00:13:34
nodes. This allows you to distribute
00:13:36
your requests across multiple instances
00:13:39
which means you can effectively scale
00:13:41
CPU memory and networking to an infinite
00:13:44
amount. Not really. There is still a
00:13:47
finite limit. Regardless, the way that
00:13:50
this is achieved is by sharding data
00:13:52
across multiple nodes. Meaning rather
00:13:54
than each node having the full data set,
00:13:56
it splits across each node inside of the
00:13:58
cluster. This is done by using a
00:14:00
sharding algorithm. The way the
00:14:03
algorithm works is actually kind of
00:14:05
simple. The idea is that the cluster has
00:14:09
16,384 different hash slots. You can
00:14:12
think of these as being a bucket of
00:14:15
keys. Each of these slots or buckets is
00:14:18
then distributed across the nodes
00:14:20
evenly. Then in order to determine which
00:14:22
bucket or slot a key belongs to, the key
00:14:26
itself is then hashed using
00:14:28
CRC16 followed by then taking that
00:14:30
result and modulusing it by the number
00:14:32
of hash slots, i.e.
00:14:36
16,384. This then returns the slot that
00:14:39
the key belongs to, allowing you to then
00:14:41
distribute it to the node that owns that
00:14:43
hash slot. Whilst setting up cluster
00:14:45
mode is a little more involved than read
00:14:47
replication, it's still not that
00:14:49
difficult. To do so, you first need to
00:14:52
add in the following three options into
00:14:54
your Valky/Rice configuration. These are
00:14:58
cluster enabled, cluster config file,
00:15:00
and cluster node timeout. With those
00:15:03
three configuration options applied for
00:15:05
each of the Valky instances you wish to
00:15:07
clusterize, all that remains is to use
00:15:10
the following cluster create command on
00:15:12
one of the nodes, passing in the host
00:15:14
port combinations of all of the
00:15:16
instances you wish to form the cluster
00:15:18
with. This will then present you with
00:15:21
the following screen which will show you
00:15:23
the distribution of hash slots across
00:15:25
each of the nodes as well as prompting
00:15:27
you for confirmation. Upon doing so, the
00:15:30
cluster will then be set up hopefully
00:15:32
and should let you know when everything
00:15:34
is working, which we can then go ahead
00:15:36
and confirm using the cluster info
00:15:38
command on one or all of our instances.
00:15:41
With the cluster setup, if I now go
00:15:43
ahead and run the MEM tier benchmark
00:15:45
command again, this time making sure
00:15:47
it's set to cluster mode by using the
00:15:49
following flag, you can see that both
00:15:51
the read and write throughput is now
00:15:54
substantially increased, hitting around
00:15:56
400,000 requests a second. Very cool. As
00:16:00
you can see, this is similar to the
00:16:02
throughput we were getting when it came
00:16:03
to using read replicas. However, the
00:16:06
benefit here is that we're able to both
00:16:08
read from and write to all three of our
00:16:11
instances instead of just only being
00:16:13
able to write to one. Not only does this
00:16:16
mean that we have improved performance
00:16:17
when it comes to our write operations,
00:16:19
but it also means we can make use of the
00:16:21
available resources on each of our
00:16:23
machines by deploying multiple Valky
00:16:26
instances on each node. For example,
00:16:29
here I've gone and deployed another
00:16:30
seven instances on my midtier machine,
00:16:33
bringing the total number of nodes in my
00:16:35
Valkyrie cluster to 10. This means that
00:16:37
the cluster should be making better use
00:16:39
of all of the available hardware on my
00:16:41
mid-tier machine, which if I now go
00:16:43
ahead and run a benchmark test against,
00:16:45
you can see I'm hitting 1 million
00:16:47
requests per second. Hooray. Of course,
00:16:51
this is in perfect conditions, running
00:16:54
on my local network on bare metal. So,
00:16:58
in order to really complete this
00:16:59
challenge, then we're going to want to
00:17:01
take a look at how we can achieve 1
00:17:03
million operations per second whilst
00:17:05
running on the cloud using VPS
00:17:08
instances. Before we do that, however,
00:17:10
let's first talk about some of the
00:17:11
caveats associated with cluster mode
00:17:14
because there are a few. The first of
00:17:16
which is that in order to be able to
00:17:18
send commands to it, you need to use a
00:17:21
clusteraware client. Now to be fair,
00:17:23
this isn't too much of an issue as most
00:17:26
Reddus clients provide support for
00:17:27
cluster mode. However, it does mean that
00:17:29
any existing code that makes use of
00:17:31
Reddus does need to be modified at least
00:17:34
slightly. For example, if I try to
00:17:35
connect to an instance in my cluster
00:17:37
using the Valky CLI and try to pull out
00:17:40
the following key, you'll see I get an
00:17:42
error letting me know that the key has
00:17:43
been moved. Therefore, in order to be
00:17:46
able to use the Valky CLI to send
00:17:48
commands to the Valky cluster, I would
00:17:51
need to make use of the - C flag in
00:17:53
order to connect to cluster mode and my
00:17:56
client will then be routed to the
00:17:57
correct node that contains this key. In
00:18:00
addition to ensuring that the client
00:18:01
connects to the cluster mode correctly,
00:18:03
there are some other caveats as well.
00:18:06
Specifically, when it comes to multikey
00:18:08
operations, such as working with
00:18:10
transaction pipelines or reddis lure
00:18:13
scripts, in each of these cases, you
00:18:15
need to ensure that any related keys
00:18:17
will belong to the same key slot.
00:18:20
Otherwise, any scripts or transactions
00:18:22
won't be able to be used. Fortunately,
00:18:24
Reddus and Valky provide a way to
00:18:26
achieve this, which is to make use of a
00:18:29
hashtag. Not the social media kind of
00:18:31
hashtags. Instead, a hashtag is defined
00:18:34
within an actual key as follows.
00:18:37
Specifying the ID that you want to be
00:18:39
hashed using the following syntax. In
00:18:42
this case, the key being hashed is 1 2 3
00:18:45
rather than the actual full key itself.
00:18:48
By doing this, it means that any keys
00:18:50
that share the same hashtag will be
00:18:52
placed inside of the same hash slot,
00:18:54
which means you're then able to perform
00:18:55
any multi-key operations such as
00:18:58
transactions or scripts. Whilst hashtags
00:19:01
solve the issue of key distribution,
00:19:03
there are still many other problems when
00:19:05
it comes to running a distributed system
00:19:07
such as a Valky cluster with perhaps the
00:19:10
most major one being how to ensure
00:19:12
reliability in the event that a node
00:19:14
goes down. Now to be fair, Reddis
00:19:16
cluster does provide some resilience
00:19:18
when it comes to availability. If a node
00:19:21
is dropped from a cluster, then those
00:19:22
key slots will be redistributed.
00:19:24
However, the data can be lost.
00:19:27
Fortunately, cluster mode also provides
00:19:29
the ability to set up replication.
00:19:31
However, this differs slightly from the
00:19:34
replication we saw before in that rather
00:19:36
than replicating the entire data set,
00:19:39
these replicas instead contain a copy of
00:19:41
their respective shards or different
00:19:43
hash slots. Additionally, this means you
00:19:45
can have multiple replicas per shard,
00:19:48
providing you high availability and
00:19:50
reducing the risk of data loss in
00:19:52
cluster mode. This does mean however
00:19:54
that you'll want to ensure that each of
00:19:55
these replicas is on a different machine
00:19:58
than the primary and ideally from each
00:20:00
other as well and ultimately means your
00:20:02
total reddis system is going to become
00:20:05
more complex. Fortunately, there are
00:20:07
tools out there such as IA, Kubernetes
00:20:10
or managed providers to help make this
00:20:12
complexity more manageable. In my case,
00:20:15
when it came to deploying a cluster onto
00:20:17
a number of VPS instances, I ended up
00:20:20
writing the following Terraform
00:20:21
configuration. Well, actually, it's an
00:20:23
open Tofu configuration who I'm now
00:20:26
pinning on my wall instead. Regardless,
00:20:28
this configuration allows me to deploy a
00:20:30
Valky cluster onto one of two different
00:20:33
providers, either Digital Ocean or
00:20:35
Hzner, which I used to see if I could
00:20:38
reach 1 million operations per second.
00:20:41
As it turned out, it was a little bit
00:20:43
more challenging than I thought.
00:20:45
However, before we take a look at
00:20:46
whether or not I was able to achieve
00:20:48
this on the public cloud, let's take a
00:20:50
quick look at another way to achieve 1
00:20:53
million requests per second. One that is
00:20:55
actually a lot more simple than setting
00:20:57
up a Valkyrie cluster. This is through
00:20:59
using the sponsor of today's video,
00:21:02
Dragonfly DB, who, as I mentioned
00:21:04
before, provide a drop-in replacement
00:21:06
for Reddis that boasts greater
00:21:08
performance. to show how much
00:21:09
performance improvement Dragonfly DB
00:21:11
has. If I go ahead and deploy an
00:21:13
instance of it onto my small machine,
00:21:16
followed by performing the following
00:21:17
benchmark test we've been using before,
00:21:19
you can see I'm hitting about 250,000
00:21:22
requests per second, which isn't that
00:21:24
much of an improvement compared to the
00:21:26
existing instance I was using before.
00:21:28
However, if I go ahead and now deploy
00:21:29
this on my mid machine, you can see the
00:21:32
performance improvement is now
00:21:34
substantial. This time I'm running about
00:21:37
twice as fast as I was before, which
00:21:39
makes a lot of sense given there's twice
00:21:40
as many cores on this machine. As you
00:21:43
can see, by using Dragonfly DB, which
00:21:45
makes better use of the available
00:21:46
resources on a system, we're able to now
00:21:49
vertically scale our system compared to
00:21:51
just using a single core implementation
00:21:53
like we were before. So, let's see what
00:21:55
happens if we run Dragonfly on the big
00:21:58
boy Thread Ripper. Can we hit 1 million
00:22:00
requests per second by just running a
00:22:03
single instance in standalone mode?
00:22:05
Turns out we can by just a hair. So 1
00:22:09
million requests achieved by just using
00:22:12
vertical scaling. And this number can
00:22:14
actually go even further. In fact, the
00:22:17
team at Dragonfly managed to reach 6
00:22:19
million requests per second when it came
00:22:21
to running on a VPS. All of this is
00:22:23
achieved without having to worry about
00:22:25
setting up multiple instances or any of
00:22:28
the caveats that come from using cluster
00:22:30
mode. That being said, there are still
00:22:33
some good reasons to embrace the
00:22:34
complexity of clustering, such as when
00:22:37
you want to distribute your key set
00:22:38
across multiple machines, either for
00:22:40
scaling memory or for redundancy.
00:22:43
Dragonfly itself does provide a way to
00:22:45
horizontally scale using their own swarm
00:22:48
mode, which was announced a short while
00:22:50
before filming this video. So, if you're
00:22:53
interested in Dragonfly DB as a drop-in
00:22:55
replacement for Reddus that offers
00:22:57
greater performance, then check them out
00:22:59
using the link below. Okay, so now that
00:23:02
we've seen how to reach 1 million
00:23:03
requests per second in perfect
00:23:05
conditions, let's take a look at how
00:23:07
difficult it is to achieve this on
00:23:09
something less perfect, the cloud. As I
00:23:12
mentioned before, I'd managed to set up
00:23:13
a Terraform or Open Tofu configuration
00:23:16
for both Digital Ocean and Hzner, which
00:23:19
you can actually download yourself from
00:23:20
GitHub if you want to try it out. Just
00:23:23
remember, don't leave these instances
00:23:25
running for a long time, or you'll end
00:23:27
up with a large bill at the end of the
00:23:30
month. So there are some instructions on
00:23:32
how to actually deploy this Terraform
00:23:33
configuration on the actual repo itself,
00:23:36
but at a high level, you mainly just
00:23:38
need to set your API token and SSH key
00:23:40
values in the TF vase for whichever
00:23:43
cloud provider you want to use, followed
00:23:46
by then running the Tofu apply command.
00:23:49
Once you're done with your testing, in
00:23:51
order to clean everything up, go ahead
00:23:52
and use the tofu destroy command just to
00:23:55
make sure you don't go bankrupt. As I
00:23:57
mentioned, I ended up writing an open
00:23:59
Tofu configuration to deploy a Valky
00:24:02
cluster to either Digital Ocean or
00:24:04
Hetner. However, this wasn't my original
00:24:07
plan as I had intended to only use one
00:24:10
of these providers, HNER. But for some
00:24:14
reason, I couldn't seem to get anywhere
00:24:16
close to 1 million operations per
00:24:18
second. instead barely only exceeding
00:24:22
100,000 no matter what I did or how I
00:24:24
had the cluster configured. Overall, it
00:24:28
was kind of strange and my best guess as
00:24:31
to why this was happening was because I
00:24:33
was using shared vCPUs. So, I went about
00:24:36
migrating to dedicated ones instead.
00:24:39
However, because my account was too new
00:24:41
to request a limit increase, I was
00:24:43
unable to provision any more dedicated
00:24:45
vCPUs. So when it came to hitting a
00:24:48
million Valky operations per second
00:24:50
using Hzner, I was out of moves.
00:24:53
Therefore, I instead decided to try with
00:24:55
Digital Ocean. First reaching out to
00:24:58
support in order to get access to the
00:25:00
larger instance sizes so that I could
00:25:02
run benchmarking without having any
00:25:04
bottlenecking. Once I had my compute
00:25:06
limits increased, I then deployed the
00:25:08
cluster using Tofu Apply and SSH into my
00:25:11
benchmark box. Once I had everything set
00:25:14
up and the cluster was deployed, I then
00:25:17
went about running the memier benchmark
00:25:19
command. And on my initial attempt,
00:25:22
which had nine Valkyrie nodes inside of
00:25:23
the cluster and using a 16 vcpu machine
00:25:26
for the benchmarking tool, I was hitting
00:25:28
around 450,000 requests per second. Not
00:25:32
bad. After confirming that it was the
00:25:34
benchmark tool that was bottlenecking, I
00:25:36
decided to scale up the instance it was
00:25:38
running on to one with 32 vCPUs. This
00:25:42
time when I ran the benchmarking tool
00:25:43
with 32 threads, I was getting around
00:25:47
900,000 off the get- go. However, as
00:25:50
time went on, this number would start to
00:25:52
decrease down to about
00:25:54
800,000. So, I decided to go allin and
00:25:58
scaled up the number of Valky nodes I
00:26:00
had from 9 to 15. After doing a quick
00:26:03
tofu apply and seeing all of the
00:26:05
instances come through on the digital
00:26:07
ocean dashboard, I sshed in and ran the
00:26:10
ment
00:26:13
again. Success. I was now hitting a
00:26:16
sustained 1 million operations per
00:26:19
second.
00:26:22
With that, my goal had been achieved,
00:26:24
and all it took was for me to deploy 15
00:26:26
Valky instances on premium Intel nodes,
00:26:30
which had I left running would have only
00:26:31
cost me around $1,100 a month. Yeah, a
00:26:35
little out of my infrastructure budget.
00:26:38
Just for fun, I decided to see how much
00:26:40
throughput I could get by using a
00:26:42
pipeline of 100 when it came to this
00:26:44
setup, which ended up producing around
00:26:47
14 million operations per second. So
00:26:50
yeah, that just goes to show how much
00:26:51
improvement you can get when it comes to
00:26:53
reducing the round trip time by using
00:26:55
pipelining with Reddus. In any case, all
00:26:58
that remained was to tear down my setup
00:27:00
using Tofu Destroy. And with that, I had
00:27:03
managed to achieve my goal, hitting 1
00:27:05
million requests, both using bare metal
00:27:08
on my home network, which honestly is
00:27:09
kind of cheating, and through using a
00:27:12
VPS instance on Digital Ocean, using
00:27:15
private networking in the data center.
00:27:18
In any case, I want to give a big thank
00:27:19
you to Dragonfly DB for sponsoring this
00:27:22
video and making all of this happen.
00:27:24
Without them, I wouldn't have been able
00:27:25
to spend so much cash on VPS instances
00:27:28
for testing. Additionally, if you want
00:27:30
to use a high performance Reddit
00:27:32
alternative without managing your own
00:27:34
infrastructure, then Dragonfly also
00:27:36
offers a fully managed service,
00:27:38
Dragonfly Cloud, which runs the same
00:27:41
code as if you were self-hosting, but
00:27:43
just handles all of the operational
00:27:45
heavy lifting for you. So, if you want a
00:27:48
hassle-free caching solution, then check
00:27:50
it out using the link in the description
00:27:52
below. Otherwise, I want to give a big
00:27:55
thank you for watching, and I'll see you
00:27:56
on the next one.

الوسوم

Reddus
Valky
pipelining
horisontale skaal
vertikale skaal
Dragonfly DB
kluster
lees-replikasie
prestasie
versoeke per sekonde