STOCK EXCHANGE SYSTEM DESIGN | AMAZON INTERVIEW QUESTION DESIGN STOCK EXCHANGE

00:35:02
https://www.youtube.com/watch?v=dUMWMZmMsVE

Zusammenfassung

TLDRDas Video erläutert detailliert die Softwarearchitektur für ein Börsensystem, das sowohl Aktien- als auch Kryptowährungs- und Forex-Handel unterstützt. Es beschreibt die notwendigen Systemanforderungen, einschließlich der Fähigkeit, Kauf- und Verkaufsorders in Echtzeit zu verarbeiten und Risiken zu managen. Der Kern des Systems ist der Matching-Engine, der mit Datenstrukturen wie Listen, Linked Lists oder Heaps arbeitet, um Orders effizient zu matchen. Das Systemdesign diskutiert außerdem verschiedene Komponenten wie API-Gateways, Risiko-Managementsysteme und Datenbankspeicherung, um hohe Verfügbarkeit und schnelle Latenzzeit zu gewährleisten. Eine Vorgehensweise zur Sicherung der Systemzuverlässigkeit, einschließlich der Verwendung eines Zookeepers für den aktiven/passiven Status von Matching-Engines, wird ebenfalls beschrieben. Schließlich erklärt das Video, wie Echtzeit-Updates den Nutzern und Hochfrequenz-Händlern übermittelt werden.

Mitbringsel

  • 📈 Börsensystem für Aktien, Krypto und Forex
  • ⚙️ Matching-Engine zur Order-Abwicklung
  • 🕒 Echtzeit-Preisinformationen erforderlich
  • 📉 Risikoanalyse vor Orderausführung
  • 💡 Nutzung von Listen oder Heaps im Engine
  • 🌐 API-Gateways für Anfragenmanagement
  • 🔄 Hohe Verfügbarkeit und Skalierbarkeit
  • 🗂️ Einsatz von Zookeeper für zuverlässigen Betrieb
  • 📊 Echtzeit-Updates für Nutzer und HFT
  • 💾 Speicherung im Speicher für Effizienz

Zeitleiste

  • 00:00:00 - 00:05:00

    Marina stellt Softwarearchitektur und Systemdesign für den Börsenhandel vor. Das gleiche Design gilt auch für Kryptowährungen und den Forexhandel. Die Anforderungen umfassen Kauf- und Verkaufsfunktionen, Echtzeit-Preisaktualisierungen, Bestandsverwaltung und Risikomanagement. Ziele sind die Verarbeitung von 0,1 Millionen Bestellungen pro Sekunde, niedrige Latenz und hohe Verfügbarkeit.

  • 00:05:00 - 00:10:00

    Das Herzstück des Systems ist die Matching-Engine, die Kauf- und Verkaufsaufträge zusammenbringt. Diese Engine analysiert eingehende Kauf- und Verkaufsaufträge, sortiert sie nach bestimmten Kriterien und führt sie aus, indem sie die besten Preis- und Mengenübereinstimmungen findet. Datenstrukturen wie Arrays, Listen, Heaps werden verwendet, um die Effizienz zu optimieren.

  • 00:10:00 - 00:15:00

    Einzuhaltende Datenstrukturen sind notwendig, um Operationen effizient durchzuführen. Es ist wichtig, dass alle Daten im Speicher verarbeitet werden, um die Geschwindigkeit zu maximieren. Seriellisierbare Strukturen sind erforderlich, um einen Systemzustand beizubehalten, falls es zu einem Absturz kommt. Diese sichern die Fortsetzung von Prozessen nach einem Ausfall.

  • 00:15:00 - 00:20:00

    Systemdesign-Diagramm erklärt den Datenfluss vom Benutzer über API-Gateways, zu Risikomanagement, Routern und der Matching-Engine. Einzigartige IDs werden zur Auftragsverfolgung verwendet. Risikomanagementsysteme prüfen die Sicherheit von Aufträgen basierend auf festgelegten Parametern. Aufträge werden anhand ihrer Art bestimmten Warteschlangen zugewiesen, um die Verarbeitung zu optimieren.

  • 00:20:00 - 00:25:00

    Das Matching erfolgt in einem parallelen Setup, bei dem jede Warteschlange von einem separaten virtuellen Maschine bedient wird, die eine Instanz der Matching-Engine betreibt. Active und Passive Matching Engines sorgen für High Availability durch parallele Verarbeitung und Übernahme im Fehlerfall. Zookeeper koordiniert zwischen den Instanzen.

  • 00:25:00 - 00:35:02

    Sobald eine Übereinstimmung gefunden wurde, sendet die Matching-Engine Daten zur Auftragsabwicklung und Preisänderung an Server, die die Datenbanken und Benutzeroberflächen aktualisieren. Der Zahlungsverkehr wird integriert, um die Konten der Benutzer bei Transaktionen zu belasten oder zu gutschreiben. Es gibt Maßnahmen, um die Datenspeicherung zu gewährleisten und das System bei Abstürzen wiederherzustellen.

Mehr anzeigen

Mind Map

Video-Fragen und Antworten

  • Welche Märkte kann das beschriebene System abdecken?

    Das System kann für Aktienmärkte, Kryptowährungen und Forex-Handel verwendet werden.

  • Was ist die Hauptfunktion des Matching-Engines?

    Der Matching-Engine gleicht Kauf- und Verkaufsaufträge effizient ab.

  • Welche Eigenschaften muss der Matching-Engine haben?

    Er muss in der Lage sein, Aufträge hinzuzufügen, zu löschen, aufzuteilen und effektiv in O(1) oder log(n) Zeit abzugleichen.

  • Warum muss der Matching-Engine seine Operationen im Speicher durchführen?

    Damit der Engine effizient arbeitet und tausende Orders pro Sekunde verarbeiten kann, muss er Prozesse im Speicher ausführen.

  • Welche Ansätze gibt es zur Sicherstellung der Systemzuverlässigkeit?

    Verwendung von aktiven und passiven Matching-Engines sowie regelmäßiges Speichern des Systemstatus als Backup.

Weitere Video-Zusammenfassungen anzeigen

Erhalten Sie sofortigen Zugang zu kostenlosen YouTube-Videozusammenfassungen, die von AI unterstützt werden!
Untertitel
en
Automatisches Blättern:
  • 00:00:00
    hello everyone my name is marina and
  • 00:00:01
    indecision let's understand software
  • 00:00:03
    architecture or system design for stock
  • 00:00:05
    exchange and this system design is also
  • 00:00:08
    holds good for crypto currency or forex
  • 00:00:11
    trading as well
  • 00:00:13
    thanks a lot for all the support until
  • 00:00:15
    now and in future as well if you like
  • 00:00:19
    the content on this channel and if you'd
  • 00:00:20
    like to share or buy a cup of coffee for
  • 00:00:24
    me you can do so by joining to the
  • 00:00:26
    channel
  • 00:00:29
    in written I have custom emojis for
  • 00:00:32
    Prada Me's and also I'm gonna feature
  • 00:00:35
    your name in the end of the videos which
  • 00:00:38
    I am going to upload in future and
  • 00:00:40
    thanks a lot for the support we can use
  • 00:00:43
    this system wherever we have to match
  • 00:00:45
    the supply to the demand or vice versa
  • 00:00:47
    so here are the requirements for our
  • 00:00:50
    system and this is the goals of our
  • 00:00:52
    system design so the requirements which
  • 00:00:54
    we want to cover in this system design
  • 00:00:56
    is users should be able to buy and sell
  • 00:00:59
    stocks on this platform the second one
  • 00:01:01
    is real-time price update information
  • 00:01:03
    because we need to keep on sending the
  • 00:01:06
    real-time price information to the users
  • 00:01:08
    for the fact that users will be looking
  • 00:01:11
    at the price fluctuation and then he is
  • 00:01:13
    going to place an order to buy our set
  • 00:01:16
    so it has to be real-time and the third
  • 00:01:18
    one is we should have basic
  • 00:01:19
    functionality like look at all the
  • 00:01:21
    stocks which he owns and also the
  • 00:01:23
    ordering in history or information and
  • 00:01:25
    the fourth one is risk management we
  • 00:01:27
    have to have our system which analyzes
  • 00:01:29
    the risk before executing the order
  • 00:01:31
    matching and what are the goals of our
  • 00:01:34
    system so the first one is we have to
  • 00:01:36
    process at least you know 0.1 million
  • 00:01:39
    orders for a second
  • 00:01:41
    second one is thousands of stocks are
  • 00:01:44
    registered companies are in the platform
  • 00:01:46
    so we have to process orders for all of
  • 00:01:48
    them with low latency and the third one
  • 00:01:51
    is geo-specific
  • 00:01:52
    we don't have to design the system to
  • 00:01:54
    cater users from all over the world
  • 00:01:56
    because usually the exchanges are
  • 00:01:58
    specific to our country even though
  • 00:02:00
    there are users from other outside the
  • 00:02:02
    country it's fine
  • 00:02:03
    our goal here is to design the system
  • 00:02:06
    specifically for users in a specific geo
  • 00:02:08
    and the fourth one is low latency we
  • 00:02:10
    have to process this art of matching as
  • 00:02:13
    fast as possible it could be in an
  • 00:02:15
    around 300 milliseconds something like
  • 00:02:18
    that and the fifth one is high
  • 00:02:19
    availability or scalability where in
  • 00:02:21
    this system should be always available
  • 00:02:23
    for users to place orders to buy or sell
  • 00:02:26
    and also the system should be able to
  • 00:02:28
    add more companies or stocks into the
  • 00:02:31
    platform and keep on scaling linearly
  • 00:02:34
    first let's design the core component of
  • 00:02:37
    our system that is matching engine but
  • 00:02:40
    before understanding the match
  • 00:02:42
    engine you need to understand what
  • 00:02:43
    exactly stock exchange does a stock
  • 00:02:46
    exchange is a place which is usually a
  • 00:02:49
    regulated place where people come
  • 00:02:51
    together and then they have something to
  • 00:02:54
    buy and something to sell and the
  • 00:02:56
    matching engine is going to match those
  • 00:02:58
    two together and how does it look like
  • 00:03:00
    it usually looks like this if you see
  • 00:03:03
    this image is for the reliance stock
  • 00:03:06
    which has these many orders to be debate
  • 00:03:10
    or buy and the right side shows the
  • 00:03:14
    orders to be sold and this one is the
  • 00:03:17
    screen shot taken from the crypto
  • 00:03:18
    currency exchange where people want to
  • 00:03:20
    trade bitcoins and it all boils down to
  • 00:03:23
    this one called as order book which
  • 00:03:25
    usually looks like this where in this
  • 00:03:29
    example we can see that there are four
  • 00:03:31
    columns the buy quantity is the number
  • 00:03:34
    of shares waiting to be born and the buy
  • 00:03:37
    price is the price which a buyer is
  • 00:03:39
    willing to pay for this particular shape
  • 00:03:42
    and the set price is on the right side
  • 00:03:45
    is the price which a seller is quoting
  • 00:03:48
    for a share the set quantity is the
  • 00:03:51
    number of shares you want to sell so in
  • 00:03:54
    total we have around 500 shares waiting
  • 00:03:56
    to be bought at a rate of 5,000 250 and
  • 00:04:00
    we have thousand six 50 shares waiting
  • 00:04:02
    to be sold at five thousand two hundred
  • 00:04:04
    and fifty two so now let's design our
  • 00:04:08
    matching engine so the characteristic of
  • 00:04:10
    our matching engine should be it should
  • 00:04:12
    be able to add an order it should be
  • 00:04:14
    able to cancel the order if I don't want
  • 00:04:16
    to buy or sell and it should be able to
  • 00:04:18
    split an order that means that if I want
  • 00:04:20
    to buy ten stocks and if nobody is
  • 00:04:23
    selling 10 stocks on a whole our
  • 00:04:26
    matching engine should be able to buy in
  • 00:04:29
    a split or whatever it can and it should
  • 00:04:32
    be able to match obviously and the
  • 00:04:34
    second one is it should be able to do
  • 00:04:37
    all of these actions much efficiently
  • 00:04:40
    ideally it is always good to do it in
  • 00:04:42
    order of one are better in log n time
  • 00:04:45
    and it should do everything in reliably
  • 00:04:48
    so how does the matching engine work
  • 00:04:51
    it's not really complicated it's very
  • 00:04:53
    simple consider you have these orders to
  • 00:04:56
    buy okay
  • 00:04:57
    usually people buy try to buy the shares
  • 00:05:02
    or the stocks or any commodities at
  • 00:05:04
    cheap price so the winner is who wants
  • 00:05:07
    to buy at the higher price so he's going
  • 00:05:10
    to stand out right because anyone who is
  • 00:05:13
    trying to sell something he doesn't want
  • 00:05:14
    to sell cheap so usually the sellers as
  • 00:05:17
    well won't try to sell it with higher
  • 00:05:20
    price but the people who stands out in
  • 00:05:23
    cell queue is one who wants to sell it
  • 00:05:26
    cheaply so the buy order list will be
  • 00:05:31
    sorted in the decreasing order where the
  • 00:05:35
    buyer with higher price or the buyer who
  • 00:05:38
    wants to buy a high price will stand out
  • 00:05:40
    okay here in this case there are a
  • 00:05:42
    couple of orders or people who wants to
  • 00:05:44
    buy some stocks and here someone want to
  • 00:05:47
    buy em an order with price of $10 and he
  • 00:05:51
    wants to buy 5 stocks and here with
  • 00:05:53
    price of $9 he wants to buy 10 this guy
  • 00:05:56
    with the price of $8 he wants to buy 5
  • 00:05:59
    stocks and in said cube there are a lot
  • 00:06:02
    of orders here and we want to sort this
  • 00:06:06
    in increasing order and this one in the
  • 00:06:09
    decreasing order
  • 00:06:10
    so here the one the guys who's who is
  • 00:06:13
    are the orders which stands out is the
  • 00:06:15
    one who wants to sell cheaper so in this
  • 00:06:18
    case the orders are there are order with
  • 00:06:21
    $8 and 5 quantity $9 and 5 20 and $9 by
  • 00:06:26
    quantity $9 by quantity because no one
  • 00:06:28
    wants to be reserved for cheap right
  • 00:06:30
    because most likely the prices will go
  • 00:06:32
    is same or it will be always higher so
  • 00:06:36
    when we start it in the increasing order
  • 00:06:38
    all the other orders which are selling
  • 00:06:42
    at really high price will go in the end
  • 00:06:44
    the orders who wants to sell it
  • 00:06:46
    if the cheaper price will come in the
  • 00:06:48
    front in this case they're the guys are
  • 00:06:51
    the orders wants to buy at higher price
  • 00:06:54
    will come in the starting of the list
  • 00:06:57
    here so now how to match it it's very
  • 00:07:00
    simple it's just like the array elements
  • 00:07:02
    matching that's simple so what we have
  • 00:07:05
    to do is take the first order from this
  • 00:07:08
    this and then try to match from the
  • 00:07:10
    starting of the cell list so $10 and 5
  • 00:07:14
    so someone wants to buy five shares at
  • 00:07:16
    $10 price but someone is also selling 5
  • 00:07:20
    shares with $8 price now it's definitely
  • 00:07:24
    a good match even we are getting for $2
  • 00:07:27
    cheap now we can match this because the
  • 00:07:29
    quantities are also matching that means
  • 00:07:32
    that this and this can be matched so the
  • 00:07:36
    the buyer who wanted to buy a $10 he is
  • 00:07:39
    actually getting it only for $8 and
  • 00:07:41
    these two are matched and we can execute
  • 00:07:43
    this consider we executed so I'm gonna
  • 00:07:46
    remove this from the list okay no those
  • 00:07:51
    two orders are gone and who is standing
  • 00:07:53
    out here so there is a guy who wants to
  • 00:07:55
    buy at $9 and 10 stocks and someone
  • 00:07:59
    selling some stocks with $9 price and
  • 00:08:02
    high quantity now we can definitely
  • 00:08:03
    match this and how do we match it is
  • 00:08:06
    this one and this one get matched okay
  • 00:08:08
    and the problem here is the quantity now
  • 00:08:12
    he wants to buy 10 and here there is an
  • 00:08:15
    order where the guy wants to sell only 5
  • 00:08:18
    shares only but there are a couple more
  • 00:08:21
    orders here where you can buy from but
  • 00:08:23
    we can't do really match like that what
  • 00:08:25
    do we do is we only purchase 5 starts
  • 00:08:29
    from here okay and then we can either
  • 00:08:33
    update this to 5 or other strategy is
  • 00:08:38
    totally remove this and then we add it
  • 00:08:41
    into this list so it will come back here
  • 00:08:44
    with you know $9 and 5 why it will come
  • 00:08:48
    back is every time when the new order
  • 00:08:49
    comes into the list will always be
  • 00:08:51
    sorting in the descending order and here
  • 00:08:54
    we always sorted in the increasing order
  • 00:08:56
    so the same order which was split comes
  • 00:08:59
    back and sits in the front of this list
  • 00:09:01
    and we can match it now now this and
  • 00:09:03
    this will get matched and these two can
  • 00:09:04
    be executed okay and these two are
  • 00:09:06
    executed and similarly we can keep on
  • 00:09:08
    executing and this list will keep on
  • 00:09:10
    burning and as and when the new order
  • 00:09:11
    comes in the list will also be inserted
  • 00:09:14
    with the new orders like say someone
  • 00:09:16
    wants to buy add say $10 20 shares and
  • 00:09:20
    if there is something we can
  • 00:09:21
    this match now $10 $9 $1 chip you can
  • 00:09:24
    keep matching and then executing so what
  • 00:09:27
    are the data structure we can use to do
  • 00:09:29
    with this the first one is you can use
  • 00:09:31
    lists and do this the second one is you
  • 00:09:35
    can use linked lists and the third one
  • 00:09:37
    is you can use heaps okay
  • 00:09:41
    now in linked lists you can keep the
  • 00:09:43
    order of this list in increasing and
  • 00:09:46
    decreasing because we don't have to sort
  • 00:09:48
    it because when we start usually we
  • 00:09:51
    start from one order and then when we
  • 00:09:53
    keep on appending we can always check
  • 00:09:56
    and then insert it and the same thing
  • 00:09:59
    with the list as well we start with
  • 00:10:01
    empty lists and then as and when the
  • 00:10:03
    entries are inserted we keep on
  • 00:10:05
    searching for the proper place where
  • 00:10:08
    that order should be inserted and then
  • 00:10:09
    we keep inserting based on the price now
  • 00:10:12
    the time complexity is you know right in
  • 00:10:14
    the in this case of Lists
  • 00:10:15
    the insertion deletion will all be order
  • 00:10:19
    n but in case of access it will be order
  • 00:10:22
    of 1 but in case of linked list it will
  • 00:10:24
    be in order of 1 where this is the idea
  • 00:10:27
    candidate where you can use and also you
  • 00:10:29
    can use heaps where you can use max heap
  • 00:10:32
    and min heap so in this case you can use
  • 00:10:37
    max it because we always want the the
  • 00:10:41
    order with the highest price so we can
  • 00:10:43
    use massive here and we can use min heap
  • 00:10:45
    here and then we can do perform this in
  • 00:10:48
    log in time as well couple of things to
  • 00:10:52
    remember here we need to make all of
  • 00:10:54
    this operation in memory there is no
  • 00:10:57
    escape to that the reason is we want to
  • 00:11:00
    perform all of this so efficiently as
  • 00:11:03
    soon as we write this into database this
  • 00:11:06
    list into the database or into the disk
  • 00:11:08
    we can tree need to function efficiently
  • 00:11:11
    or we can't really process thousands of
  • 00:11:14
    order millions of orders per second so
  • 00:11:16
    key takeaway is we have to consider to
  • 00:11:20
    do the whole thing always in memory and
  • 00:11:24
    the second thing is we need to consider
  • 00:11:27
    that these data structures are whatever
  • 00:11:31
    we are doing should be serializable
  • 00:11:33
    because we always
  • 00:11:35
    want to keep this state somewhere
  • 00:11:37
    periodically we want to take a backup
  • 00:11:39
    and then keep it in the cache so in case
  • 00:11:41
    of our matching engine crashes or the
  • 00:11:45
    machine where this matching engine is
  • 00:11:47
    running crashes we can go back to the
  • 00:11:50
    persistent storage and we can populate
  • 00:11:52
    the list which we had and all the orders
  • 00:11:55
    in it so we get back to the state so you
  • 00:11:58
    can think of the whole thing as a state
  • 00:12:01
    machine where it does all of these
  • 00:12:04
    things so the next thing we need to
  • 00:12:05
    understand is what are the information
  • 00:12:07
    which we need to capture for a specific
  • 00:12:09
    order if it if it was a linked list in
  • 00:12:12
    that case in each node what are the
  • 00:12:14
    information we need to save or if it was
  • 00:12:16
    a list in that case list of object or
  • 00:12:19
    list of dictionary or list of pupil what
  • 00:12:21
    is the information we want to capture
  • 00:12:23
    the first thing is we need to capture
  • 00:12:25
    the ID of the order and the next thing
  • 00:12:27
    is the price of the order at which we
  • 00:12:29
    want to sell or buy the quantity of the
  • 00:12:32
    stocks or the shares which we want to
  • 00:12:35
    buy and the status of that usually a
  • 00:12:38
    queue and once we execute it maybe we
  • 00:12:40
    can change it as executed and once we
  • 00:12:42
    delete it we will not need the status
  • 00:12:45
    anyway but if you want have any other
  • 00:12:47
    use cases you can use that and of course
  • 00:12:49
    metadata where you want us to store any
  • 00:12:51
    other extra information so this could
  • 00:12:55
    look like if you're using lists it will
  • 00:12:57
    be looking language either struct or
  • 00:13:00
    dictionary or it could be even tuple so
  • 00:13:03
    it will be looking like this tuple with
  • 00:13:06
    only values in it something like that
  • 00:13:08
    but if it was a node we'll have all of
  • 00:13:11
    these keys in it and you are basically
  • 00:13:13
    linking it all together let's understand
  • 00:13:17
    the system design diagram in here you
  • 00:13:19
    can see there are users and there is hfp
  • 00:13:22
    users are like any normal user who uses
  • 00:13:26
    UI to interact with the system whereas
  • 00:13:29
    hft basically uses API calls directly
  • 00:13:33
    what does hft stands for is
  • 00:13:35
    high-frequency trading what happens here
  • 00:13:38
    is the companies or individual persons
  • 00:13:40
    have will usually own couple of servers
  • 00:13:44
    and they know sophisticated algorithms
  • 00:13:47
    are machine
  • 00:13:48
    learning techniques to identify when to
  • 00:13:51
    purchase an order and when to sell it
  • 00:13:54
    usually as a user's what we do is we see
  • 00:13:58
    the information on the UI and take a
  • 00:14:00
    decision but in this case HFD consumes
  • 00:14:02
    all the price changes and how many
  • 00:14:05
    stocks were sold and bought all of this
  • 00:14:07
    information and based on that the
  • 00:14:10
    algorithms decide when to place an order
  • 00:14:13
    and when to sell those stocks for the
  • 00:14:15
    profit usually companies make a lot of
  • 00:14:18
    money like crores and crores of money
  • 00:14:20
    using
  • 00:14:21
    hft so that comes to the first entry
  • 00:14:25
    point for all our API is API gateway the
  • 00:14:29
    hf key might be placing a call here or
  • 00:14:31
    users using the UI or mobile apps could
  • 00:14:34
    be placing calls through HTTP REST API
  • 00:14:39
    calls to API gateway so what is the
  • 00:14:42
    functionality of API gateway so API
  • 00:14:45
    gateway takes all the API calls from the
  • 00:14:47
    clients and hft basically all of them
  • 00:14:51
    are clients and then this API gateway
  • 00:14:54
    routes them to appropriate services here
  • 00:14:58
    I have only shown the accumulator and
  • 00:15:01
    risk management system but internally
  • 00:15:04
    they could be having a number of micro
  • 00:15:06
    services they want to make call in this
  • 00:15:10
    case just just for your understanding
  • 00:15:13
    API gateway will mainly perform some of
  • 00:15:15
    the actions like protocol transformation
  • 00:15:18
    where in all of these API gates to a are
  • 00:15:22
    where in all of these ApS are exposed
  • 00:15:26
    using HTTP but internally demo might be
  • 00:15:29
    using some other protocol it could be
  • 00:15:30
    RPC it could be another rest call or it
  • 00:15:35
    could be anything it doesn't matter in
  • 00:15:38
    here usually all the API will be
  • 00:15:41
    interacted using HTTP or HTTPS and
  • 00:15:45
    inside it could be only HTTP so all the
  • 00:15:49
    security layer will be dropped here
  • 00:15:50
    there could be a firewall as well in the
  • 00:15:56
    front that is a web firewall or raph you
  • 00:16:00
    can call it as
  • 00:16:01
    once the request is sent from API
  • 00:16:05
    gateway it goes through accumulator what
  • 00:16:10
    happens in accumulator is all the others
  • 00:16:13
    are given a sequence number based on a
  • 00:16:16
    lot of different criterias for for your
  • 00:16:18
    understanding
  • 00:16:19
    all you have to understand is when the
  • 00:16:21
    order comes in it needs a unique ID the
  • 00:16:24
    unique ID is basically as I need at this
  • 00:16:26
    point of time and you could be thinking
  • 00:16:29
    why in the world we are as an unique ID
  • 00:16:33
    over here why not when we save in the
  • 00:16:35
    databases because you will understand a
  • 00:16:38
    lot of our functionality over here is
  • 00:16:41
    done all in memory without using
  • 00:16:43
    database so we need a unique
  • 00:16:45
    identification number so that will be
  • 00:16:48
    happening here so we will basically I
  • 00:16:50
    didn't assign a unique ID for any order
  • 00:16:54
    request it could be buy or a sell order
  • 00:16:56
    they will assign a unique ID oh here and
  • 00:16:59
    then that order information will be sent
  • 00:17:02
    to risk management system or it is also
  • 00:17:06
    called as RMS usually it is what it does
  • 00:17:10
    is it makes sure's that order is safe to
  • 00:17:15
    be entered into the whole system and
  • 00:17:17
    there are so many checks which are
  • 00:17:20
    mostly to check if the order is within
  • 00:17:23
    the predefined parameters of price size
  • 00:17:26
    capital etc there are I so these are the
  • 00:17:31
    list of risk management usually any
  • 00:17:35
    stock market will do for any given order
  • 00:17:39
    I will be posting some of the links in
  • 00:17:41
    the description you can go and read all
  • 00:17:43
    about it
  • 00:17:45
    so once the order passes through the
  • 00:17:48
    risk management if it is safe to be
  • 00:17:50
    processed that order will actually go to
  • 00:17:53
    the router what happens in the router is
  • 00:17:57
    based on the kind of stock you want to
  • 00:18:01
    sell or buy it will be redirected to one
  • 00:18:04
    of the cube so will be having n number
  • 00:18:06
    of queues which is equivalent to the
  • 00:18:09
    number of companies we will be handling
  • 00:18:11
    so if suppose we are handling 1000
  • 00:18:16
    companies in our system obviously will
  • 00:18:20
    be having 1000 stocks to handle so there
  • 00:18:23
    will be 1,000 Q's over here so all the
  • 00:18:26
    requests say for example all the
  • 00:18:28
    requests for Google stock will be
  • 00:18:31
    redirected over here all the orders you
  • 00:18:34
    know buy orders and sell orders for
  • 00:18:37
    Tesla will be redirected over here so at
  • 00:18:40
    any given point of time in any of the
  • 00:18:42
    queue only similar kind of stocks will
  • 00:18:45
    be there so this router could be
  • 00:18:49
    anything you could implement using Kafka
  • 00:18:52
    streams if you are using Kafka queue
  • 00:18:55
    otherwise if you're on AWS maybe SNS can
  • 00:18:59
    also act as a router based on your
  • 00:19:04
    routing key these queues could be sqs or
  • 00:19:07
    it could be Kafka as well once our order
  • 00:19:12
    is placed over here inside here and this
  • 00:19:15
    is picked up by our matching engine we
  • 00:19:20
    have already discussed how we match it
  • 00:19:22
    it is simple algorithm which runs
  • 00:19:25
    completely in memory so the idea over
  • 00:19:30
    here is so consider if you have three
  • 00:19:35
    queues so there will be one virtual
  • 00:19:37
    machine for our queue okay so this VM is
  • 00:19:42
    responsible to handle Apple and this is
  • 00:19:46
    responsible to handle Google and this is
  • 00:19:48
    responsible to handle Tesla so this will
  • 00:19:51
    be having one process which will be
  • 00:19:54
    reading from the queue and then matching
  • 00:19:59
    by using two arrays one is for by and
  • 00:20:02
    one is for cell so as we discussed the
  • 00:20:05
    algorithm it's the same algorithm or a
  • 00:20:07
    matching engine which will be running
  • 00:20:09
    inside this virtual machine which will
  • 00:20:13
    be using heaps or linked lists or
  • 00:20:15
    whatever array and keeps on consuming
  • 00:20:18
    from the queue and then matches it or if
  • 00:20:21
    you have very less orders per second
  • 00:20:24
    what you could do is you can just have a
  • 00:20:27
    couple of virtual machines and
  • 00:20:29
    then say for example it is this one
  • 00:20:38
    suppose if you're handling very less
  • 00:20:41
    traffic or orders per second you can
  • 00:20:43
    have a couple of virtual machines and
  • 00:20:45
    you can run you know one or more
  • 00:20:49
    instances of matching engine in in the
  • 00:20:52
    same virtual machine and you could be
  • 00:20:55
    consuming from multiple q q q so say for
  • 00:20:58
    example Tesla Google and Apple you might
  • 00:21:03
    be consuming all of that into same
  • 00:21:06
    machine in different processes instant
  • 00:21:10
    you know different processors each
  • 00:21:12
    instance of matching engine will be
  • 00:21:14
    running here and then they will be
  • 00:21:16
    executing in the same machine but it all
  • 00:21:20
    depends on how you want to design it's
  • 00:21:21
    okay
  • 00:21:23
    it based on the requirement and traffic
  • 00:21:28
    so the idea here again is we've so any
  • 00:21:33
    at any given part of time we need two
  • 00:21:36
    matching engine instances per stock or
  • 00:21:40
    two matching engine processors per stock
  • 00:21:43
    and there should be deployed in
  • 00:21:45
    different machines and different data
  • 00:21:48
    centers in this case let's see for
  • 00:21:51
    Google will be having active matching
  • 00:21:54
    engine over here and also a passive
  • 00:21:56
    matching engine over here why do we need
  • 00:21:59
    that way because we need to have high
  • 00:22:02
    availability and also reliability if for
  • 00:22:05
    some reason if our active matching
  • 00:22:08
    engine fails then immediately the
  • 00:22:12
    passive matching engine will take over
  • 00:22:14
    and starts processing the orders and
  • 00:22:17
    then it does all the things which active
  • 00:22:21
    matching engine without done we will be
  • 00:22:24
    needing zookeeper here because we need
  • 00:22:28
    someone to take care of who is the
  • 00:22:31
    active engine and it has to do its all
  • 00:22:33
    other responsibilities in in usual case
  • 00:22:37
    what happens is if both of these engines
  • 00:22:40
    like active engine passive engines are
  • 00:22:42
    always a
  • 00:22:43
    and running then both cannot be updating
  • 00:22:47
    into the databases only the active
  • 00:22:50
    engine should be updating all the data
  • 00:22:52
    into the database and backups and
  • 00:22:54
    everything so we need these systems to
  • 00:22:58
    understand who is the active matching
  • 00:23:01
    engine because both will be running they
  • 00:23:03
    don't have an idea of am i the active
  • 00:23:06
    matching engine so zookeeper will
  • 00:23:08
    basically help you to identify who is
  • 00:23:10
    the active matching engine zookeeper has
  • 00:23:13
    a concept called sequential Z node and
  • 00:23:17
    also FP ephemeral you know what this is
  • 00:23:22
    a short living Z node it only have an
  • 00:23:27
    entry or forgiving server when there is
  • 00:23:29
    always an active connection from any
  • 00:23:32
    machine in this case our epic ephemeral
  • 00:23:34
    0 will have two entries one for the
  • 00:23:40
    primary active matching engine and one
  • 00:23:43
    for the passive man you know matching
  • 00:23:45
    engine in the sequential node whoever
  • 00:23:48
    registers first they will have an entry
  • 00:23:50
    and the second will be having so the
  • 00:23:52
    first one is always the active instance
  • 00:23:54
    if suppose the active machine get
  • 00:23:57
    disconnected then this will go away so
  • 00:24:00
    the second will become active in that
  • 00:24:01
    case so this will not be there until
  • 00:24:04
    unless it comes back again and then the
  • 00:24:07
    ID will be here and that will be the
  • 00:24:09
    passive and also one more interesting
  • 00:24:13
    thing you need to know is both of these
  • 00:24:18
    matching engines will be active active
  • 00:24:20
    that means both will be consuming ok
  • 00:24:24
    both will be consuming the messages
  • 00:24:26
    which is sent from this router to this
  • 00:24:29
    group all at the same time that means
  • 00:24:31
    that everything is done over here in the
  • 00:24:35
    same time it could be a little bit of
  • 00:24:37
    difference in the time or it could be
  • 00:24:39
    same but basically the idea over here is
  • 00:24:41
    whatever this pass active matching
  • 00:24:45
    engine is processing in the same time
  • 00:24:47
    the passive matching engine is also
  • 00:24:49
    processing the only difference is it
  • 00:24:51
    will not be writing or sending any
  • 00:24:54
    outputs to the underlying system but the
  • 00:24:56
    active
  • 00:24:57
    matching agent will be keep on sending
  • 00:24:59
    why do we have to do that is if this
  • 00:25:03
    goes down this passive matching agent
  • 00:25:06
    should take over immediately so it is
  • 00:25:07
    already in the same state of the active
  • 00:25:10
    matching engine so this kind of design
  • 00:25:12
    is needed because we are doing
  • 00:25:14
    everything in memory so that's the
  • 00:25:19
    reason why we need both of these active
  • 00:25:22
    and passive matching engine to keep on
  • 00:25:25
    performing the same things so what
  • 00:25:30
    happens later there once the active
  • 00:25:36
    matching engine figures out a perfect
  • 00:25:38
    match for by order in the sales selling
  • 00:25:42
    array or linked list it what it does is
  • 00:25:45
    yeah not here it actually sends a
  • 00:25:49
    message to another router saying that I
  • 00:25:53
    actually found a matching order and then
  • 00:25:56
    it is sent to a lot of different
  • 00:25:57
    partitions so in case of Kafka the N
  • 00:26:02
    number of threads are a number of nodes
  • 00:26:05
    in the consumer side we will be having n
  • 00:26:07
    number of partitions you can design here
  • 00:26:10
    as well something like how we did it
  • 00:26:12
    here like you can have individual queue
  • 00:26:15
    for each stock or you can do it as a
  • 00:26:18
    partition and only the server consumes
  • 00:26:21
    every stock what what is the idea of the
  • 00:26:25
    primary data server is it basically
  • 00:26:28
    computes the stock tix star-struck tech
  • 00:26:33
    looks like this it is simply the
  • 00:26:37
    representation of minimum representation
  • 00:26:41
    or measurement of the minimum upward or
  • 00:26:43
    downward moment in the price in this
  • 00:26:45
    stock exchange the server once the order
  • 00:26:49
    is matched it receives the information
  • 00:26:52
    about the order or the trade we just
  • 00:26:54
    happen and then it is going to convert
  • 00:26:57
    the price changes into the tip and that
  • 00:27:01
    changes will be written into our DBMS
  • 00:27:04
    and the idea in the database is that
  • 00:27:08
    this our DBMS is sharded you can
  • 00:27:11
    shard it into shared by stock so
  • 00:27:15
    basically you'll have one shard
  • 00:27:18
    per stock so this could be Apple shard
  • 00:27:24
    this could be Google shard so you will
  • 00:27:28
    be maintaining thousands of charts or if
  • 00:27:31
    your data is not too much you can use
  • 00:27:34
    hashed charting or any other strategy
  • 00:27:37
    it's up to you so usually it's a good
  • 00:27:39
    way to charge buy the stock itself so
  • 00:27:42
    you have all of the specific stocks data
  • 00:27:46
    in one shard and it will be easier to
  • 00:27:49
    scale and you can have more number of
  • 00:27:51
    writes and reads achieved in the same
  • 00:27:53
    shard itself and also this primary data
  • 00:27:56
    server is going to write into to the
  • 00:27:58
    time series databases which will be used
  • 00:28:01
    to render the graphs like time did the
  • 00:28:06
    price of variation at any given point of
  • 00:28:10
    time and and something like that
  • 00:28:13
    so time series are really good in that
  • 00:28:15
    so who is going to consume that data is
  • 00:28:17
    the UI and the front-end application
  • 00:28:21
    needs to show graphs like this one so it
  • 00:28:25
    you know always need time series
  • 00:28:27
    databases it is not really a good idea
  • 00:28:29
    to use a DBMS for that pattern and also
  • 00:28:33
    all of this information is also consumed
  • 00:28:35
    by your stream processing engine where
  • 00:28:37
    you can do a lot of other streaming
  • 00:28:41
    processing like fraud detection other
  • 00:28:44
    machine learning whatever you want to do
  • 00:28:46
    basically will be having a you'll you
  • 00:28:49
    will be having a stream processing over
  • 00:28:50
    here so I guess we did a one cycle of
  • 00:28:55
    how an order starts its journey from an
  • 00:29:00
    API gateway into risk management router
  • 00:29:02
    and it goes to Q from the queue it goes
  • 00:29:04
    to active matching engine from matching
  • 00:29:07
    engine it comes to router and here and
  • 00:29:10
    then saves and everything so meanwhile
  • 00:29:13
    this primary data server will be will
  • 00:29:18
    can talk to this payment system and
  • 00:29:21
    deduct the amount or money from
  • 00:29:25
    the users wallet our account once the
  • 00:29:28
    matching is done and that information is
  • 00:29:31
    also updated over here okay so we need a
  • 00:29:35
    payment payment system that itself is a
  • 00:29:37
    big system I don't want to talk about it
  • 00:29:40
    now there if the user want to add some
  • 00:29:44
    money you can actually make this API
  • 00:29:46
    call and redirect it to bank or whatever
  • 00:29:50
    it is
  • 00:29:51
    so he'll be loading all the money into
  • 00:29:53
    the payment wallet so this system will
  • 00:29:56
    basically take care of that and you can
  • 00:29:59
    also see that the UI is coupled with
  • 00:30:03
    Syrian or the aps can be this ApS can be
  • 00:30:08
    coupled to the Syrian to cash some stuff
  • 00:30:10
    like like the graphs I showed you here
  • 00:30:13
    because this data is mostly once it is
  • 00:30:16
    there it is always historical data so
  • 00:30:20
    you don't need to really query the time
  • 00:30:23
    series database all the time so you can
  • 00:30:25
    cache that as well so the other
  • 00:30:27
    important thing we need to discuss is
  • 00:30:29
    how the reliability of this matching
  • 00:30:33
    engine we already spoke about passive
  • 00:30:36
    and active engine what happens in
  • 00:30:38
    certain cases that both machine is down
  • 00:30:40
    in that case how the things work is on a
  • 00:30:44
    specific duration or interval like say
  • 00:30:47
    every one minute once data in the active
  • 00:30:52
    matching engine will be dumped into you
  • 00:30:56
    know Redis or any storage engine the
  • 00:31:00
    idea here is as I already mentioned in
  • 00:31:04
    the algorithm your data structure
  • 00:31:06
    whatever you use in your algorithm
  • 00:31:07
    should be serializable
  • 00:31:10
    okay so that way you will be dumping all
  • 00:31:14
    the recent data into the snapshots so in
  • 00:31:20
    other words in the worst case if both
  • 00:31:22
    matching engine is down so when they
  • 00:31:26
    come back they can read all the latest
  • 00:31:28
    the last-known state of the machine from
  • 00:31:32
    here and and and that way you will get
  • 00:31:37
    back all the order
  • 00:31:39
    our orders which were there already in
  • 00:31:41
    the order book and one more thing you
  • 00:31:43
    also need to understand is Kafka does
  • 00:31:47
    support transactions and manual
  • 00:31:53
    acknowledgments which you can actually
  • 00:31:57
    manual acknowledgments and transactions
  • 00:31:59
    which you can actually use it to build
  • 00:32:03
    your matching algorithm as well that way
  • 00:32:06
    you can make it make the system more
  • 00:32:08
    robust also as and when you receive your
  • 00:32:12
    order in the active engine or passive
  • 00:32:15
    engine you can also write a copy of
  • 00:32:18
    order into Cassandra this is not really
  • 00:32:24
    the job of matching engine there could
  • 00:32:26
    be one more thread which is running that
  • 00:32:30
    could keep on writing into the Cassandra
  • 00:32:32
    so that way after you get all the latest
  • 00:32:35
    information from the snapshot you can
  • 00:32:37
    query the remaining order information
  • 00:32:41
    from the Cassandra and then load it back
  • 00:32:43
    to your matching engine memory so you
  • 00:32:46
    have the latest state available I mean
  • 00:32:49
    here there is a lot of different
  • 00:32:51
    strategy you can use it somehow you need
  • 00:32:53
    to persist the latest known state into
  • 00:32:56
    Redis and then get back that information
  • 00:32:59
    so that's about it
  • 00:33:02
    I guess I have covered most of the
  • 00:33:04
    information you need there the other
  • 00:33:07
    things to understand here is this
  • 00:33:12
    primary data server is also signing the
  • 00:33:15
    information to the pop servers so what
  • 00:33:18
    pop servers does is they are basically
  • 00:33:21
    providing the updates to the users or
  • 00:33:26
    hft in real time like basically the
  • 00:33:28
    broadcast using WebSockets are using
  • 00:33:32
    tcp/ip or UDP ports the idea here is you
  • 00:33:39
    should be able to send the price changes
  • 00:33:42
    or ticks as soon as possible to the end
  • 00:33:48
    users because H of T is high-frequency
  • 00:33:51
    trading where they will
  • 00:33:53
    deciding what stocks to purchase or sell
  • 00:33:56
    in you know in terms of ten milliseconds
  • 00:34:01
    precision so they need to know as soon
  • 00:34:04
    as possible so they will be listening to
  • 00:34:08
    these pop servers there are other
  • 00:34:10
    strategies as well in in real exchanges
  • 00:34:13
    what they do is they specifically as an
  • 00:34:15
    only specific number of connections to
  • 00:34:18
    each servers because they don't want to
  • 00:34:21
    overload and also they try to instead of
  • 00:34:23
    looping through each and every
  • 00:34:24
    connection instead of looping through
  • 00:34:27
    each and every connection and
  • 00:34:28
    broadcasting is they try to broadcast it
  • 00:34:31
    instead of just looping through because
  • 00:34:33
    when they loop through there were very
  • 00:34:35
    first connection might get the the the
  • 00:34:41
    tick information of the prize
  • 00:34:42
    information and then the last connection
  • 00:34:44
    might get a little slowly so they don't
  • 00:34:47
    want to lose out that as well so they
  • 00:34:49
    want to receive as fast as possible
  • 00:34:51
    there are a lot of strategies over here
  • 00:34:52
    as well that is out of the subject right
  • 00:34:55
    now
Tags
  • Softwarearchitektur
  • Börsensystem
  • Kryptowährung
  • Forex-Handel
  • Matching-Engine
  • Echtzeitdaten
  • Risiko-Management
  • Skalierbarkeit
  • Hochfrequenzhandel
  • Systemzuverlässigkeit