Welche Märkte kann das beschriebene System abdecken?

Das System kann für Aktienmärkte, Kryptowährungen und Forex-Handel verwendet werden.

Was ist die Hauptfunktion des Matching-Engines?

Der Matching-Engine gleicht Kauf- und Verkaufsaufträge effizient ab.

Welche Eigenschaften muss der Matching-Engine haben?

Er muss in der Lage sein, Aufträge hinzuzufügen, zu löschen, aufzuteilen und effektiv in O(1) oder log(n) Zeit abzugleichen.

Warum muss der Matching-Engine seine Operationen im Speicher durchführen?

Damit der Engine effizient arbeitet und tausende Orders pro Sekunde verarbeiten kann, muss er Prozesse im Speicher ausführen.

Welche Ansätze gibt es zur Sicherstellung der Systemzuverlässigkeit?

Verwendung von aktiven und passiven Matching-Engines sowie regelmäßiges Speichern des Systemstatus als Backup.

STOCK EXCHANGE SYSTEM DESIGN | AMAZON INTERVIEW QUESTION DESIGN STOCK EXCHANGE

00:35:02

https://www.youtube.com/watch?v=dUMWMZmMsVE

Zusammenfassung

TLDRDas Video erläutert detailliert die Softwarearchitektur für ein Börsensystem, das sowohl Aktien- als auch Kryptowährungs- und Forex-Handel unterstützt. Es beschreibt die notwendigen Systemanforderungen, einschließlich der Fähigkeit, Kauf- und Verkaufsorders in Echtzeit zu verarbeiten und Risiken zu managen. Der Kern des Systems ist der Matching-Engine, der mit Datenstrukturen wie Listen, Linked Lists oder Heaps arbeitet, um Orders effizient zu matchen. Das Systemdesign diskutiert außerdem verschiedene Komponenten wie API-Gateways, Risiko-Managementsysteme und Datenbankspeicherung, um hohe Verfügbarkeit und schnelle Latenzzeit zu gewährleisten. Eine Vorgehensweise zur Sicherung der Systemzuverlässigkeit, einschließlich der Verwendung eines Zookeepers für den aktiven/passiven Status von Matching-Engines, wird ebenfalls beschrieben. Schließlich erklärt das Video, wie Echtzeit-Updates den Nutzern und Hochfrequenz-Händlern übermittelt werden.

Mitbringsel

📈 Börsensystem für Aktien, Krypto und Forex
⚙️ Matching-Engine zur Order-Abwicklung
🕒 Echtzeit-Preisinformationen erforderlich
📉 Risikoanalyse vor Orderausführung
💡 Nutzung von Listen oder Heaps im Engine
🌐 API-Gateways für Anfragenmanagement
🔄 Hohe Verfügbarkeit und Skalierbarkeit
🗂️ Einsatz von Zookeeper für zuverlässigen Betrieb
📊 Echtzeit-Updates für Nutzer und HFT
💾 Speicherung im Speicher für Effizienz

Zeitleiste

00:00:00 - 00:05:00
Marina stellt Softwarearchitektur und Systemdesign für den Börsenhandel vor. Das gleiche Design gilt auch für Kryptowährungen und den Forexhandel. Die Anforderungen umfassen Kauf- und Verkaufsfunktionen, Echtzeit-Preisaktualisierungen, Bestandsverwaltung und Risikomanagement. Ziele sind die Verarbeitung von 0,1 Millionen Bestellungen pro Sekunde, niedrige Latenz und hohe Verfügbarkeit.
00:05:00 - 00:10:00
Das Herzstück des Systems ist die Matching-Engine, die Kauf- und Verkaufsaufträge zusammenbringt. Diese Engine analysiert eingehende Kauf- und Verkaufsaufträge, sortiert sie nach bestimmten Kriterien und führt sie aus, indem sie die besten Preis- und Mengenübereinstimmungen findet. Datenstrukturen wie Arrays, Listen, Heaps werden verwendet, um die Effizienz zu optimieren.
00:10:00 - 00:15:00
Einzuhaltende Datenstrukturen sind notwendig, um Operationen effizient durchzuführen. Es ist wichtig, dass alle Daten im Speicher verarbeitet werden, um die Geschwindigkeit zu maximieren. Seriellisierbare Strukturen sind erforderlich, um einen Systemzustand beizubehalten, falls es zu einem Absturz kommt. Diese sichern die Fortsetzung von Prozessen nach einem Ausfall.
00:15:00 - 00:20:00
Systemdesign-Diagramm erklärt den Datenfluss vom Benutzer über API-Gateways, zu Risikomanagement, Routern und der Matching-Engine. Einzigartige IDs werden zur Auftragsverfolgung verwendet. Risikomanagementsysteme prüfen die Sicherheit von Aufträgen basierend auf festgelegten Parametern. Aufträge werden anhand ihrer Art bestimmten Warteschlangen zugewiesen, um die Verarbeitung zu optimieren.
00:20:00 - 00:25:00
Das Matching erfolgt in einem parallelen Setup, bei dem jede Warteschlange von einem separaten virtuellen Maschine bedient wird, die eine Instanz der Matching-Engine betreibt. Active und Passive Matching Engines sorgen für High Availability durch parallele Verarbeitung und Übernahme im Fehlerfall. Zookeeper koordiniert zwischen den Instanzen.
00:25:00 - 00:35:02
Sobald eine Übereinstimmung gefunden wurde, sendet die Matching-Engine Daten zur Auftragsabwicklung und Preisänderung an Server, die die Datenbanken und Benutzeroberflächen aktualisieren. Der Zahlungsverkehr wird integriert, um die Konten der Benutzer bei Transaktionen zu belasten oder zu gutschreiben. Es gibt Maßnahmen, um die Datenspeicherung zu gewährleisten und das System bei Abstürzen wiederherzustellen.

Mind Map

Video-Fragen und Antworten

Welche Märkte kann das beschriebene System abdecken?
Das System kann für Aktienmärkte, Kryptowährungen und Forex-Handel verwendet werden.
Was ist die Hauptfunktion des Matching-Engines?
Der Matching-Engine gleicht Kauf- und Verkaufsaufträge effizient ab.
Welche Eigenschaften muss der Matching-Engine haben?
Er muss in der Lage sein, Aufträge hinzuzufügen, zu löschen, aufzuteilen und effektiv in O(1) oder log(n) Zeit abzugleichen.
Warum muss der Matching-Engine seine Operationen im Speicher durchführen?
Damit der Engine effizient arbeitet und tausende Orders pro Sekunde verarbeiten kann, muss er Prozesse im Speicher ausführen.
Welche Ansätze gibt es zur Sicherstellung der Systemzuverlässigkeit?
Verwendung von aktiven und passiven Matching-Engines sowie regelmäßiges Speichern des Systemstatus als Backup.

Weitere Video-Zusammenfassungen anzeigen

Erhalten Sie sofortigen Zugang zu kostenlosen YouTube-Videozusammenfassungen, die von AI unterstützt werden!

Untertitel

Automatisches Blättern:

00:00:00
hello everyone my name is marina and
00:00:01
indecision let's understand software
00:00:03
architecture or system design for stock
00:00:05
exchange and this system design is also
00:00:08
holds good for crypto currency or forex
00:00:11
trading as well
00:00:13
thanks a lot for all the support until
00:00:15
now and in future as well if you like
00:00:19
the content on this channel and if you'd
00:00:20
like to share or buy a cup of coffee for
00:00:24
me you can do so by joining to the
00:00:26
channel
00:00:29
in written I have custom emojis for
00:00:32
Prada Me's and also I'm gonna feature
00:00:35
your name in the end of the videos which
00:00:38
I am going to upload in future and
00:00:40
thanks a lot for the support we can use
00:00:43
this system wherever we have to match
00:00:45
the supply to the demand or vice versa
00:00:47
so here are the requirements for our
00:00:50
system and this is the goals of our
00:00:52
system design so the requirements which
00:00:54
we want to cover in this system design
00:00:56
is users should be able to buy and sell
00:00:59
stocks on this platform the second one
00:01:01
is real-time price update information
00:01:03
because we need to keep on sending the
00:01:06
real-time price information to the users
00:01:08
for the fact that users will be looking
00:01:11
at the price fluctuation and then he is
00:01:13
going to place an order to buy our set
00:01:16
so it has to be real-time and the third
00:01:18
one is we should have basic
00:01:19
functionality like look at all the
00:01:21
stocks which he owns and also the
00:01:23
ordering in history or information and
00:01:25
the fourth one is risk management we
00:01:27
have to have our system which analyzes
00:01:29
the risk before executing the order
00:01:31
matching and what are the goals of our
00:01:34
system so the first one is we have to
00:01:36
process at least you know 0.1 million
00:01:39
orders for a second
00:01:41
second one is thousands of stocks are
00:01:44
registered companies are in the platform
00:01:46
so we have to process orders for all of
00:01:48
them with low latency and the third one
00:01:51
is geo-specific
00:01:52
we don't have to design the system to
00:01:54
cater users from all over the world
00:01:56
because usually the exchanges are
00:01:58
specific to our country even though
00:02:00
there are users from other outside the
00:02:02
country it's fine
00:02:03
our goal here is to design the system
00:02:06
specifically for users in a specific geo
00:02:08
and the fourth one is low latency we
00:02:10
have to process this art of matching as
00:02:13
fast as possible it could be in an
00:02:15
around 300 milliseconds something like
00:02:18
that and the fifth one is high
00:02:19
availability or scalability where in
00:02:21
this system should be always available
00:02:23
for users to place orders to buy or sell
00:02:26
and also the system should be able to
00:02:28
add more companies or stocks into the
00:02:31
platform and keep on scaling linearly
00:02:34
first let's design the core component of
00:02:37
our system that is matching engine but
00:02:40
before understanding the match
00:02:42
engine you need to understand what
00:02:43
exactly stock exchange does a stock
00:02:46
exchange is a place which is usually a
00:02:49
regulated place where people come
00:02:51
together and then they have something to
00:02:54
buy and something to sell and the
00:02:56
matching engine is going to match those
00:02:58
two together and how does it look like
00:03:00
it usually looks like this if you see
00:03:03
this image is for the reliance stock
00:03:06
which has these many orders to be debate
00:03:10
or buy and the right side shows the
00:03:14
orders to be sold and this one is the
00:03:17
screen shot taken from the crypto
00:03:18
currency exchange where people want to
00:03:20
trade bitcoins and it all boils down to
00:03:23
this one called as order book which
00:03:25
usually looks like this where in this
00:03:29
example we can see that there are four
00:03:31
columns the buy quantity is the number
00:03:34
of shares waiting to be born and the buy
00:03:37
price is the price which a buyer is
00:03:39
willing to pay for this particular shape
00:03:42
and the set price is on the right side
00:03:45
is the price which a seller is quoting
00:03:48
for a share the set quantity is the
00:03:51
number of shares you want to sell so in
00:03:54
total we have around 500 shares waiting
00:03:56
to be bought at a rate of 5,000 250 and
00:04:00
we have thousand six 50 shares waiting
00:04:02
to be sold at five thousand two hundred
00:04:04
and fifty two so now let's design our
00:04:08
matching engine so the characteristic of
00:04:10
our matching engine should be it should
00:04:12
be able to add an order it should be
00:04:14
able to cancel the order if I don't want
00:04:16
to buy or sell and it should be able to
00:04:18
split an order that means that if I want
00:04:20
to buy ten stocks and if nobody is
00:04:23
selling 10 stocks on a whole our
00:04:26
matching engine should be able to buy in
00:04:29
a split or whatever it can and it should
00:04:32
be able to match obviously and the
00:04:34
second one is it should be able to do
00:04:37
all of these actions much efficiently
00:04:40
ideally it is always good to do it in
00:04:42
order of one are better in log n time
00:04:45
and it should do everything in reliably
00:04:48
so how does the matching engine work
00:04:51
it's not really complicated it's very
00:04:53
simple consider you have these orders to
00:04:56
buy okay
00:04:57
usually people buy try to buy the shares
00:05:02
or the stocks or any commodities at
00:05:04
cheap price so the winner is who wants
00:05:07
to buy at the higher price so he's going
00:05:10
to stand out right because anyone who is
00:05:13
trying to sell something he doesn't want
00:05:14
to sell cheap so usually the sellers as
00:05:17
well won't try to sell it with higher
00:05:20
price but the people who stands out in
00:05:23
cell queue is one who wants to sell it
00:05:26
cheaply so the buy order list will be
00:05:31
sorted in the decreasing order where the
00:05:35
buyer with higher price or the buyer who
00:05:38
wants to buy a high price will stand out
00:05:40
okay here in this case there are a
00:05:42
couple of orders or people who wants to
00:05:44
buy some stocks and here someone want to
00:05:47
buy em an order with price of $10 and he
00:05:51
wants to buy 5 stocks and here with
00:05:53
price of $9 he wants to buy 10 this guy
00:05:56
with the price of $8 he wants to buy 5
00:05:59
stocks and in said cube there are a lot
00:06:02
of orders here and we want to sort this
00:06:06
in increasing order and this one in the
00:06:09
decreasing order
00:06:10
so here the one the guys who's who is
00:06:13
are the orders which stands out is the
00:06:15
one who wants to sell cheaper so in this
00:06:18
case the orders are there are order with
00:06:21
$8 and 5 quantity $9 and 5 20 and $9 by
00:06:26
quantity $9 by quantity because no one
00:06:28
wants to be reserved for cheap right
00:06:30
because most likely the prices will go
00:06:32
is same or it will be always higher so
00:06:36
when we start it in the increasing order
00:06:38
all the other orders which are selling
00:06:42
at really high price will go in the end
00:06:44
the orders who wants to sell it
00:06:46
if the cheaper price will come in the
00:06:48
front in this case they're the guys are
00:06:51
the orders wants to buy at higher price
00:06:54
will come in the starting of the list
00:06:57
here so now how to match it it's very
00:07:00
simple it's just like the array elements
00:07:02
matching that's simple so what we have
00:07:05
to do is take the first order from this
00:07:08
this and then try to match from the
00:07:10
starting of the cell list so $10 and 5
00:07:14
so someone wants to buy five shares at
00:07:16
$10 price but someone is also selling 5
00:07:20
shares with $8 price now it's definitely
00:07:24
a good match even we are getting for $2
00:07:27
cheap now we can match this because the
00:07:29
quantities are also matching that means
00:07:32
that this and this can be matched so the
00:07:36
the buyer who wanted to buy a $10 he is
00:07:39
actually getting it only for $8 and
00:07:41
these two are matched and we can execute
00:07:43
this consider we executed so I'm gonna
00:07:46
remove this from the list okay no those
00:07:51
two orders are gone and who is standing
00:07:53
out here so there is a guy who wants to
00:07:55
buy at $9 and 10 stocks and someone
00:07:59
selling some stocks with $9 price and
00:08:02
high quantity now we can definitely
00:08:03
match this and how do we match it is
00:08:06
this one and this one get matched okay
00:08:08
and the problem here is the quantity now
00:08:12
he wants to buy 10 and here there is an
00:08:15
order where the guy wants to sell only 5
00:08:18
shares only but there are a couple more
00:08:21
orders here where you can buy from but
00:08:23
we can't do really match like that what
00:08:25
do we do is we only purchase 5 starts
00:08:29
from here okay and then we can either
00:08:33
update this to 5 or other strategy is
00:08:38
totally remove this and then we add it
00:08:41
into this list so it will come back here
00:08:44
with you know $9 and 5 why it will come
00:08:48
back is every time when the new order
00:08:49
comes into the list will always be
00:08:51
sorting in the descending order and here
00:08:54
we always sorted in the increasing order
00:08:56
so the same order which was split comes
00:08:59
back and sits in the front of this list
00:09:01
and we can match it now now this and
00:09:03
this will get matched and these two can
00:09:04
be executed okay and these two are
00:09:06
executed and similarly we can keep on
00:09:08
executing and this list will keep on
00:09:10
burning and as and when the new order
00:09:11
comes in the list will also be inserted
00:09:14
with the new orders like say someone
00:09:16
wants to buy add say $10 20 shares and
00:09:20
if there is something we can
00:09:21
this match now $10 $9 $1 chip you can
00:09:24
keep matching and then executing so what
00:09:27
are the data structure we can use to do
00:09:29
with this the first one is you can use
00:09:31
lists and do this the second one is you
00:09:35
can use linked lists and the third one
00:09:37
is you can use heaps okay
00:09:41
now in linked lists you can keep the
00:09:43
order of this list in increasing and
00:09:46
decreasing because we don't have to sort
00:09:48
it because when we start usually we
00:09:51
start from one order and then when we
00:09:53
keep on appending we can always check
00:09:56
and then insert it and the same thing
00:09:59
with the list as well we start with
00:10:01
empty lists and then as and when the
00:10:03
entries are inserted we keep on
00:10:05
searching for the proper place where
00:10:08
that order should be inserted and then
00:10:09
we keep inserting based on the price now
00:10:12
the time complexity is you know right in
00:10:14
the in this case of Lists
00:10:15
the insertion deletion will all be order
00:10:19
n but in case of access it will be order
00:10:22
of 1 but in case of linked list it will
00:10:24
be in order of 1 where this is the idea
00:10:27
candidate where you can use and also you
00:10:29
can use heaps where you can use max heap
00:10:32
and min heap so in this case you can use
00:10:37
max it because we always want the the
00:10:41
order with the highest price so we can
00:10:43
use massive here and we can use min heap
00:10:45
here and then we can do perform this in
00:10:48
log in time as well couple of things to
00:10:52
remember here we need to make all of
00:10:54
this operation in memory there is no
00:10:57
escape to that the reason is we want to
00:11:00
perform all of this so efficiently as
00:11:03
soon as we write this into database this
00:11:06
list into the database or into the disk
00:11:08
we can tree need to function efficiently
00:11:11
or we can't really process thousands of
00:11:14
order millions of orders per second so
00:11:16
key takeaway is we have to consider to
00:11:20
do the whole thing always in memory and
00:11:24
the second thing is we need to consider
00:11:27
that these data structures are whatever
00:11:31
we are doing should be serializable
00:11:33
because we always
00:11:35
want to keep this state somewhere
00:11:37
periodically we want to take a backup
00:11:39
and then keep it in the cache so in case
00:11:41
of our matching engine crashes or the
00:11:45
machine where this matching engine is
00:11:47
running crashes we can go back to the
00:11:50
persistent storage and we can populate
00:11:52
the list which we had and all the orders
00:11:55
in it so we get back to the state so you
00:11:58
can think of the whole thing as a state
00:12:01
machine where it does all of these
00:12:04
things so the next thing we need to
00:12:05
understand is what are the information
00:12:07
which we need to capture for a specific
00:12:09
order if it if it was a linked list in
00:12:12
that case in each node what are the
00:12:14
information we need to save or if it was
00:12:16
a list in that case list of object or
00:12:19
list of dictionary or list of pupil what
00:12:21
is the information we want to capture
00:12:23
the first thing is we need to capture
00:12:25
the ID of the order and the next thing
00:12:27
is the price of the order at which we
00:12:29
want to sell or buy the quantity of the
00:12:32
stocks or the shares which we want to
00:12:35
buy and the status of that usually a
00:12:38
queue and once we execute it maybe we
00:12:40
can change it as executed and once we
00:12:42
delete it we will not need the status
00:12:45
anyway but if you want have any other
00:12:47
use cases you can use that and of course
00:12:49
metadata where you want us to store any
00:12:51
other extra information so this could
00:12:55
look like if you're using lists it will
00:12:57
be looking language either struct or
00:13:00
dictionary or it could be even tuple so
00:13:03
it will be looking like this tuple with
00:13:06
only values in it something like that
00:13:08
but if it was a node we'll have all of
00:13:11
these keys in it and you are basically
00:13:13
linking it all together let's understand
00:13:17
the system design diagram in here you
00:13:19
can see there are users and there is hfp
00:13:22
users are like any normal user who uses
00:13:26
UI to interact with the system whereas
00:13:29
hft basically uses API calls directly
00:13:33
what does hft stands for is
00:13:35
high-frequency trading what happens here
00:13:38
is the companies or individual persons
00:13:40
have will usually own couple of servers
00:13:44
and they know sophisticated algorithms
00:13:47
are machine
00:13:48
learning techniques to identify when to
00:13:51
purchase an order and when to sell it
00:13:54
usually as a user's what we do is we see
00:13:58
the information on the UI and take a
00:14:00
decision but in this case HFD consumes
00:14:02
all the price changes and how many
00:14:05
stocks were sold and bought all of this
00:14:07
information and based on that the
00:14:10
algorithms decide when to place an order
00:14:13
and when to sell those stocks for the
00:14:15
profit usually companies make a lot of
00:14:18
money like crores and crores of money
00:14:20
using
00:14:21
hft so that comes to the first entry
00:14:25
point for all our API is API gateway the
00:14:29
hf key might be placing a call here or
00:14:31
users using the UI or mobile apps could
00:14:34
be placing calls through HTTP REST API
00:14:39
calls to API gateway so what is the
00:14:42
functionality of API gateway so API
00:14:45
gateway takes all the API calls from the
00:14:47
clients and hft basically all of them
00:14:51
are clients and then this API gateway
00:14:54
routes them to appropriate services here
00:14:58
I have only shown the accumulator and
00:15:01
risk management system but internally
00:15:04
they could be having a number of micro
00:15:06
services they want to make call in this
00:15:10
case just just for your understanding
00:15:13
API gateway will mainly perform some of
00:15:15
the actions like protocol transformation
00:15:18
where in all of these API gates to a are
00:15:22
where in all of these ApS are exposed
00:15:26
using HTTP but internally demo might be
00:15:29
using some other protocol it could be
00:15:30
RPC it could be another rest call or it
00:15:35
could be anything it doesn't matter in
00:15:38
here usually all the API will be
00:15:41
interacted using HTTP or HTTPS and
00:15:45
inside it could be only HTTP so all the
00:15:49
security layer will be dropped here
00:15:50
there could be a firewall as well in the
00:15:56
front that is a web firewall or raph you
00:16:00
can call it as
00:16:01
once the request is sent from API
00:16:05
gateway it goes through accumulator what
00:16:10
happens in accumulator is all the others
00:16:13
are given a sequence number based on a
00:16:16
lot of different criterias for for your
00:16:18
understanding
00:16:19
all you have to understand is when the
00:16:21
order comes in it needs a unique ID the
00:16:24
unique ID is basically as I need at this
00:16:26
point of time and you could be thinking
00:16:29
why in the world we are as an unique ID
00:16:33
over here why not when we save in the
00:16:35
databases because you will understand a
00:16:38
lot of our functionality over here is
00:16:41
done all in memory without using
00:16:43
database so we need a unique
00:16:45
identification number so that will be
00:16:48
happening here so we will basically I
00:16:50
didn't assign a unique ID for any order
00:16:54
request it could be buy or a sell order
00:16:56
they will assign a unique ID oh here and
00:16:59
then that order information will be sent
00:17:02
to risk management system or it is also
00:17:06
called as RMS usually it is what it does
00:17:10
is it makes sure's that order is safe to
00:17:15
be entered into the whole system and
00:17:17
there are so many checks which are
00:17:20
mostly to check if the order is within
00:17:23
the predefined parameters of price size
00:17:26
capital etc there are I so these are the
00:17:31
list of risk management usually any
00:17:35
stock market will do for any given order
00:17:39
I will be posting some of the links in
00:17:41
the description you can go and read all
00:17:43
about it
00:17:45
so once the order passes through the
00:17:48
risk management if it is safe to be
00:17:50
processed that order will actually go to
00:17:53
the router what happens in the router is
00:17:57
based on the kind of stock you want to
00:18:01
sell or buy it will be redirected to one
00:18:04
of the cube so will be having n number
00:18:06
of queues which is equivalent to the
00:18:09
number of companies we will be handling
00:18:11
so if suppose we are handling 1000
00:18:16
companies in our system obviously will
00:18:20
be having 1000 stocks to handle so there
00:18:23
will be 1,000 Q's over here so all the
00:18:26
requests say for example all the
00:18:28
requests for Google stock will be
00:18:31
redirected over here all the orders you
00:18:34
know buy orders and sell orders for
00:18:37
Tesla will be redirected over here so at
00:18:40
any given point of time in any of the
00:18:42
queue only similar kind of stocks will
00:18:45
be there so this router could be
00:18:49
anything you could implement using Kafka
00:18:52
streams if you are using Kafka queue
00:18:55
otherwise if you're on AWS maybe SNS can
00:18:59
also act as a router based on your
00:19:04
routing key these queues could be sqs or
00:19:07
it could be Kafka as well once our order
00:19:12
is placed over here inside here and this
00:19:15
is picked up by our matching engine we
00:19:20
have already discussed how we match it
00:19:22
it is simple algorithm which runs
00:19:25
completely in memory so the idea over
00:19:30
here is so consider if you have three
00:19:35
queues so there will be one virtual
00:19:37
machine for our queue okay so this VM is
00:19:42
responsible to handle Apple and this is
00:19:46
responsible to handle Google and this is
00:19:48
responsible to handle Tesla so this will
00:19:51
be having one process which will be
00:19:54
reading from the queue and then matching
00:19:59
by using two arrays one is for by and
00:20:02
one is for cell so as we discussed the
00:20:05
algorithm it's the same algorithm or a
00:20:07
matching engine which will be running
00:20:09
inside this virtual machine which will
00:20:13
be using heaps or linked lists or
00:20:15
whatever array and keeps on consuming
00:20:18
from the queue and then matches it or if
00:20:21
you have very less orders per second
00:20:24
what you could do is you can just have a
00:20:27
couple of virtual machines and
00:20:29
then say for example it is this one
00:20:38
suppose if you're handling very less
00:20:41
traffic or orders per second you can
00:20:43
have a couple of virtual machines and
00:20:45
you can run you know one or more
00:20:49
instances of matching engine in in the
00:20:52
same virtual machine and you could be
00:20:55
consuming from multiple q q q so say for
00:20:58
example Tesla Google and Apple you might
00:21:03
be consuming all of that into same
00:21:06
machine in different processes instant
00:21:10
you know different processors each
00:21:12
instance of matching engine will be
00:21:14
running here and then they will be
00:21:16
executing in the same machine but it all
00:21:20
depends on how you want to design it's
00:21:21
okay
00:21:23
it based on the requirement and traffic
00:21:28
so the idea here again is we've so any
00:21:33
at any given part of time we need two
00:21:36
matching engine instances per stock or
00:21:40
two matching engine processors per stock
00:21:43
and there should be deployed in
00:21:45
different machines and different data
00:21:48
centers in this case let's see for
00:21:51
Google will be having active matching
00:21:54
engine over here and also a passive
00:21:56
matching engine over here why do we need
00:21:59
that way because we need to have high
00:22:02
availability and also reliability if for
00:22:05
some reason if our active matching
00:22:08
engine fails then immediately the
00:22:12
passive matching engine will take over
00:22:14
and starts processing the orders and
00:22:17
then it does all the things which active
00:22:21
matching engine without done we will be
00:22:24
needing zookeeper here because we need
00:22:28
someone to take care of who is the
00:22:31
active engine and it has to do its all
00:22:33
other responsibilities in in usual case
00:22:37
what happens is if both of these engines
00:22:40
like active engine passive engines are
00:22:42
always a
00:22:43
and running then both cannot be updating
00:22:47
into the databases only the active
00:22:50
engine should be updating all the data
00:22:52
into the database and backups and
00:22:54
everything so we need these systems to
00:22:58
understand who is the active matching
00:23:01
engine because both will be running they
00:23:03
don't have an idea of am i the active
00:23:06
matching engine so zookeeper will
00:23:08
basically help you to identify who is
00:23:10
the active matching engine zookeeper has
00:23:13
a concept called sequential Z node and
00:23:17
also FP ephemeral you know what this is
00:23:22
a short living Z node it only have an
00:23:27
entry or forgiving server when there is
00:23:29
always an active connection from any
00:23:32
machine in this case our epic ephemeral
00:23:34
0 will have two entries one for the
00:23:40
primary active matching engine and one
00:23:43
for the passive man you know matching
00:23:45
engine in the sequential node whoever
00:23:48
registers first they will have an entry
00:23:50
and the second will be having so the
00:23:52
first one is always the active instance
00:23:54
if suppose the active machine get
00:23:57
disconnected then this will go away so
00:24:00
the second will become active in that
00:24:01
case so this will not be there until
00:24:04
unless it comes back again and then the
00:24:07
ID will be here and that will be the
00:24:09
passive and also one more interesting
00:24:13
thing you need to know is both of these
00:24:18
matching engines will be active active
00:24:20
that means both will be consuming ok
00:24:24
both will be consuming the messages
00:24:26
which is sent from this router to this
00:24:29
group all at the same time that means
00:24:31
that everything is done over here in the
00:24:35
same time it could be a little bit of
00:24:37
difference in the time or it could be
00:24:39
same but basically the idea over here is
00:24:41
whatever this pass active matching
00:24:45
engine is processing in the same time
00:24:47
the passive matching engine is also
00:24:49
processing the only difference is it
00:24:51
will not be writing or sending any
00:24:54
outputs to the underlying system but the
00:24:56
active
00:24:57
matching agent will be keep on sending
00:24:59
why do we have to do that is if this
00:25:03
goes down this passive matching agent
00:25:06
should take over immediately so it is
00:25:07
already in the same state of the active
00:25:10
matching engine so this kind of design
00:25:12
is needed because we are doing
00:25:14
everything in memory so that's the
00:25:19
reason why we need both of these active
00:25:22
and passive matching engine to keep on
00:25:25
performing the same things so what
00:25:30
happens later there once the active
00:25:36
matching engine figures out a perfect
00:25:38
match for by order in the sales selling
00:25:42
array or linked list it what it does is
00:25:45
yeah not here it actually sends a
00:25:49
message to another router saying that I
00:25:53
actually found a matching order and then
00:25:56
it is sent to a lot of different
00:25:57
partitions so in case of Kafka the N
00:26:02
number of threads are a number of nodes
00:26:05
in the consumer side we will be having n
00:26:07
number of partitions you can design here
00:26:10
as well something like how we did it
00:26:12
here like you can have individual queue
00:26:15
for each stock or you can do it as a
00:26:18
partition and only the server consumes
00:26:21
every stock what what is the idea of the
00:26:25
primary data server is it basically
00:26:28
computes the stock tix star-struck tech
00:26:33
looks like this it is simply the
00:26:37
representation of minimum representation
00:26:41
or measurement of the minimum upward or
00:26:43
downward moment in the price in this
00:26:45
stock exchange the server once the order
00:26:49
is matched it receives the information
00:26:52
about the order or the trade we just
00:26:54
happen and then it is going to convert
00:26:57
the price changes into the tip and that
00:27:01
changes will be written into our DBMS
00:27:04
and the idea in the database is that
00:27:08
this our DBMS is sharded you can
00:27:11
shard it into shared by stock so
00:27:15
basically you'll have one shard
00:27:18
per stock so this could be Apple shard
00:27:24
this could be Google shard so you will
00:27:28
be maintaining thousands of charts or if
00:27:31
your data is not too much you can use
00:27:34
hashed charting or any other strategy
00:27:37
it's up to you so usually it's a good
00:27:39
way to charge buy the stock itself so
00:27:42
you have all of the specific stocks data
00:27:46
in one shard and it will be easier to
00:27:49
scale and you can have more number of
00:27:51
writes and reads achieved in the same
00:27:53
shard itself and also this primary data
00:27:56
server is going to write into to the
00:27:58
time series databases which will be used
00:28:01
to render the graphs like time did the
00:28:06
price of variation at any given point of
00:28:10
time and and something like that
00:28:13
so time series are really good in that
00:28:15
so who is going to consume that data is
00:28:17
the UI and the front-end application
00:28:21
needs to show graphs like this one so it
00:28:25
you know always need time series
00:28:27
databases it is not really a good idea
00:28:29
to use a DBMS for that pattern and also
00:28:33
all of this information is also consumed
00:28:35
by your stream processing engine where
00:28:37
you can do a lot of other streaming
00:28:41
processing like fraud detection other
00:28:44
machine learning whatever you want to do
00:28:46
basically will be having a you'll you
00:28:49
will be having a stream processing over
00:28:50
here so I guess we did a one cycle of
00:28:55
how an order starts its journey from an
00:29:00
API gateway into risk management router
00:29:02
and it goes to Q from the queue it goes
00:29:04
to active matching engine from matching
00:29:07
engine it comes to router and here and
00:29:10
then saves and everything so meanwhile
00:29:13
this primary data server will be will
00:29:18
can talk to this payment system and
00:29:21
deduct the amount or money from
00:29:25
the users wallet our account once the
00:29:28
matching is done and that information is
00:29:31
also updated over here okay so we need a
00:29:35
payment payment system that itself is a
00:29:37
big system I don't want to talk about it
00:29:40
now there if the user want to add some
00:29:44
money you can actually make this API
00:29:46
call and redirect it to bank or whatever
00:29:50
it is
00:29:51
so he'll be loading all the money into
00:29:53
the payment wallet so this system will
00:29:56
basically take care of that and you can
00:29:59
also see that the UI is coupled with
00:30:03
Syrian or the aps can be this ApS can be
00:30:08
coupled to the Syrian to cash some stuff
00:30:10
like like the graphs I showed you here
00:30:13
because this data is mostly once it is
00:30:16
there it is always historical data so
00:30:20
you don't need to really query the time
00:30:23
series database all the time so you can
00:30:25
cache that as well so the other
00:30:27
important thing we need to discuss is
00:30:29
how the reliability of this matching
00:30:33
engine we already spoke about passive
00:30:36
and active engine what happens in
00:30:38
certain cases that both machine is down
00:30:40
in that case how the things work is on a
00:30:44
specific duration or interval like say
00:30:47
every one minute once data in the active
00:30:52
matching engine will be dumped into you
00:30:56
know Redis or any storage engine the
00:31:00
idea here is as I already mentioned in
00:31:04
the algorithm your data structure
00:31:06
whatever you use in your algorithm
00:31:07
should be serializable
00:31:10
okay so that way you will be dumping all
00:31:14
the recent data into the snapshots so in
00:31:20
other words in the worst case if both
00:31:22
matching engine is down so when they
00:31:26
come back they can read all the latest
00:31:28
the last-known state of the machine from
00:31:32
here and and and that way you will get
00:31:37
back all the order
00:31:39
our orders which were there already in
00:31:41
the order book and one more thing you
00:31:43
also need to understand is Kafka does
00:31:47
support transactions and manual
00:31:53
acknowledgments which you can actually
00:31:57
manual acknowledgments and transactions
00:31:59
which you can actually use it to build
00:32:03
your matching algorithm as well that way
00:32:06
you can make it make the system more
00:32:08
robust also as and when you receive your
00:32:12
order in the active engine or passive
00:32:15
engine you can also write a copy of
00:32:18
order into Cassandra this is not really
00:32:24
the job of matching engine there could
00:32:26
be one more thread which is running that
00:32:30
could keep on writing into the Cassandra
00:32:32
so that way after you get all the latest
00:32:35
information from the snapshot you can
00:32:37
query the remaining order information
00:32:41
from the Cassandra and then load it back
00:32:43
to your matching engine memory so you
00:32:46
have the latest state available I mean
00:32:49
here there is a lot of different
00:32:51
strategy you can use it somehow you need
00:32:53
to persist the latest known state into
00:32:56
Redis and then get back that information
00:32:59
so that's about it
00:33:02
I guess I have covered most of the
00:33:04
information you need there the other
00:33:07
things to understand here is this
00:33:12
primary data server is also signing the
00:33:15
information to the pop servers so what
00:33:18
pop servers does is they are basically
00:33:21
providing the updates to the users or
00:33:26
hft in real time like basically the
00:33:28
broadcast using WebSockets are using
00:33:32
tcp/ip or UDP ports the idea here is you
00:33:39
should be able to send the price changes
00:33:42
or ticks as soon as possible to the end
00:33:48
users because H of T is high-frequency
00:33:51
trading where they will
00:33:53
deciding what stocks to purchase or sell
00:33:56
in you know in terms of ten milliseconds
00:34:01
precision so they need to know as soon
00:34:04
as possible so they will be listening to
00:34:08
these pop servers there are other
00:34:10
strategies as well in in real exchanges
00:34:13
what they do is they specifically as an
00:34:15
only specific number of connections to
00:34:18
each servers because they don't want to
00:34:21
overload and also they try to instead of
00:34:23
looping through each and every
00:34:24
connection instead of looping through
00:34:27
each and every connection and
00:34:28
broadcasting is they try to broadcast it
00:34:31
instead of just looping through because
00:34:33
when they loop through there were very
00:34:35
first connection might get the the the
00:34:41
tick information of the prize
00:34:42
information and then the last connection
00:34:44
might get a little slowly so they don't
00:34:47
want to lose out that as well so they
00:34:49
want to receive as fast as possible
00:34:51
there are a lot of strategies over here
00:34:52
as well that is out of the subject right
00:34:55
now