Zerodha yaiku broker saham finansial teknologi India, kadhangkala diarani Robin Hood saka India.

Kapan pasar saham dibuka ing India?

Pasar saham ing India dibuka saka jam 9:15 nganti 3:30 sore saben dina.

Apa sing nggawe PostgreSQL istimewa miturut pembicara?

PostgreSQL dikenal amarga dokumentasi sing apik lan komunitas sing ramah, uga kekuwatan lan ketahanan panggunaan kanggo data finansial kritis.

Kepiye cara Zerodha nyimpen lan ngimpor buku perdagangan?

Zerodha ngimpor buku perdagangan saben wengi kanggo ngitung informasi finansial kayata rugi laba lan buku besar.

Apa jensi lapisan cache sing digunakake Zerodha?

Zerodha nggunakake lapisan cache basis data PostgreSQL tambahan sing disinkronake karo basis data utama kanggo nyimpen data sauntara.

Apa sebabane Zerodha ora luwih milih replikasi DB tradisional?

Zerodha ora nganggo replikasi Master-Slave nanging nggunakake lisensi backup lan simpenan arsip data ing S3 supaya bisa mbalekake kanthi cepet yen ana masalah.

Kepiye carane Zerodha ngatur sharding data?

Zerodha sekat basis data adhedhasar taunan finansial kanthi nggunakake pengiris data asing kanggo nyambungake panggunaan data antar server.

Kenapa Zerodha mutusake kanggo ora ngaktifake autovakum kanthi otomatis ing PostgreSQL?

Autovakum dikonfigurasi manual amarga jadwal impor data sing ora teratur ora ngidini autovakum otomatis kanggo ngepel. ja uga mbantu ngatur sumber daya server kanthi cepet.

Apa peran konsol ing sistem Zerodha?

Konsol iku platform backend sing ngimpor lan nyimpen buku perdagangan kanggo ngitung informasi finansial kayata rugi laba lan buku besar pangguna.

Kepiye Zerodha nggunakake indeksing ing PostgreSQL?

Zerodha nggunakake indeks parsial lan ora overindexing supaya bisa tetep efisien ing pamrosesan query.

Database Tuning at Zerodha - India's Largest Stock Broker

00:44:36

https://www.youtube.com/watch?v=XB2lF_Z9cbs

الملخص

TLDRPidato iki saka anggota tim teknik Zerodha, sing nerangake babagan pengalaman ing ngembangake lan ngatur basis data nggunakake PostgreSQL, khusus kanggo ngatur data finansial kritis. Dheweke nerangake bab carane sistem basa data Zerodha diatur, kalebu sharding data adhedhasar taun finansial lan panggunaan lapisan cache tambahan kanggo ngoptimalake kinerja basis data. Dheweke uga nekanake pentinge nemokake keseimbangan ing usage indexing lan materialized views kanggo ngilangi bottlenecks ing query basis data. Salah siji conto praktik apik sing diwenehake yaiku ngaktifake indeks parsial lan ora ngandelake indexing sing berlebihan. Dheweke uga nandhesake pentinge ngerti perencana query lan panggunaan denormalisasi kanggo ngelidiki bottlenecks kinerja. Pembicara nuduhake cara nuduhake pengalaman praktis sing bener-bener bisa ditindakake kanggo ngukur usaha sukses tanpa over-engineering solusi lan nyinggung keperluan kanggo fleksibel karo database.

الوجبات الجاهزة

🍽️ Acara diadakan sawisé makan siang, njamin peserta supaya tetep waspodo.
📉 Zerodha, broker saham fintech, nggunakake PostgreSQL kanggo ngatur data finansial kritis.
🔍 Pengalaman pribadi lan pelajaran penting kanggo manajemen basis data.
🗄️ Sharding adhedhasar taun finansial supaya data bisa ngatur beban.
💾 Lapisan cache kanggo ngoptimalake pangolahan data sewenang-wenang.
🤝 Puji kanggo komunitas PostgreSQL lan dokumentasi sing apik.
🛠️ Praktik indexing parsial kanggo kinerja database lebih baik.
🛡️ PostgreSQL dibuktekake ketahanan tanpa ngidini log file overflow.
🚀 Materialized views lan denormalising kanggo ngoptimalake query.
🔧 Konfigurasi manual vakum kanggo jadwal impor data irreguler.

الجدول الزمني

00:00:00 - 00:05:00
Pembicara miwiti presentasi marang basa subyek babagan carane Zerodha nggunakake Postgres kanggo ngatur data. Dheweke nyoroti manawa piwulang sing diduweni bisa uga ora bisa ditrapake kanggo pihak liya, nyatakake yen database kudu spesifik lan ora generik. Dheweke uga menehi ucapan terima kasih marang tim pengembangan Postgres amarga dokumentasi sing apik lan ketahanan saka Postgres.
00:05:00 - 00:10:00
Sejarah panggunaan Postgres ing Zerodha diwiwiti nalika perusahaan iki anyar dibangun lan ngimpor sekitar 150 MB data saben dina. Sawetara masalah sing diadhepi yaiku indexing berlebihan, file log overflow, lan crash, sing kabeh nyebabake akeh masalah sing kudu diatasi. Piwulang sing diduweni yaiku nyetel desain skema, ngoptimalake aplikasi supaya cocog karo database, lan cepet mutusake masalah kanggo paningkatan sing efisien.
00:10:00 - 00:15:00
Pembicara ngandhani babagan ngelola data gedhe, nyatakake yen istilah 'big data' kadhangkala ora relevan karo kabeh organisasi. Dheweke mromosikaken pentingé panggunaane index sing selektif, denormalisasi data supaya querying luwih gampang, lan nggunakake views materialisasi kanggo kinerja sing luwih apik.
00:15:00 - 00:20:00
Manajemen query lan optimisasi dadi topik utama, kanthi eksplorasi panggunaan indexing sing selektif, denormalisasi lan views materialisasi supaya queries liyane efisien. Pembicara ngandani babagan pentingé ngerti planner query lan nggawe tuning database sing fokus marang tabel individu lan queries tartamtu.
00:20:00 - 00:25:00
Ngerti planner query lan pigura data luwih jero ningkatake kinerja database. Dheweke uga ngomong babagan eksperimen karo autovacuum Postgres, biasane nyimpulake supaya dipateni lan dikelola kanthi manual, amarga kadang ora bisa ditrapake ing konteks khusus. Nawakake nostalgia amarga piwulang iki dijupuk liwat pengalaman.
00:25:00 - 00:30:00
Postgres uga digunakake ing caching ing Zerodha kanthi nggawe database sekunder kanggo caching data sing akeh diakses. Implementasi iki ngidini sistem utama ora akeh beban, lan iki dadi struktur caching transaction supaya optimalisasi data cepet ing wektu dagang.
00:30:00 - 00:35:00
Presentasi njelasake struktur caching lan proses query sing digunakake kanggo ngurangi beban ing database utama nalika nggampangake para pedagang. Pembicara menerangake cara partitioning data adhedhasar taun fiskal lan kegunaan database asing kanggo manajemen data skala gedhe.
00:35:00 - 00:44:36
Pembicara tutup presentasi kanthi nekanake kekarepan kanggo nguji lan nggunakake solusi database sing paling apik kanggo konteks tartamtu, ora bias banget marang Postgres utawa sistem liyane. Dheweke nuduhake yen sawetara solusi sing ora di-overengineer mujudake kemajuan sing signifikan lan stabilitas kanggo Zerodha.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

Apa Zerodha iku?
Zerodha yaiku broker saham finansial teknologi India, kadhangkala diarani Robin Hood saka India.
Kapan pasar saham dibuka ing India?
Pasar saham ing India dibuka saka jam 9:15 nganti 3:30 sore saben dina.
Apa sing nggawe PostgreSQL istimewa miturut pembicara?
PostgreSQL dikenal amarga dokumentasi sing apik lan komunitas sing ramah, uga kekuwatan lan ketahanan panggunaan kanggo data finansial kritis.
Kepiye cara Zerodha nyimpen lan ngimpor buku perdagangan?
Zerodha ngimpor buku perdagangan saben wengi kanggo ngitung informasi finansial kayata rugi laba lan buku besar.
Apa jensi lapisan cache sing digunakake Zerodha?
Zerodha nggunakake lapisan cache basis data PostgreSQL tambahan sing disinkronake karo basis data utama kanggo nyimpen data sauntara.
Apa sebabane Zerodha ora luwih milih replikasi DB tradisional?
Zerodha ora nganggo replikasi Master-Slave nanging nggunakake lisensi backup lan simpenan arsip data ing S3 supaya bisa mbalekake kanthi cepet yen ana masalah.
Kepiye carane Zerodha ngatur sharding data?
Zerodha sekat basis data adhedhasar taunan finansial kanthi nggunakake pengiris data asing kanggo nyambungake panggunaan data antar server.
Kenapa Zerodha mutusake kanggo ora ngaktifake autovakum kanthi otomatis ing PostgreSQL?
Autovakum dikonfigurasi manual amarga jadwal impor data sing ora teratur ora ngidini autovakum otomatis kanggo ngepel. ja uga mbantu ngatur sumber daya server kanthi cepet.
Apa peran konsol ing sistem Zerodha?
Konsol iku platform backend sing ngimpor lan nyimpen buku perdagangan kanggo ngitung informasi finansial kayata rugi laba lan buku besar pangguna.
Kepiye Zerodha nggunakake indeksing ing PostgreSQL?
Zerodha nggunakake indeks parsial lan ora overindexing supaya bisa tetep efisien ing pamrosesan query.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
okay um first of all uh good afternoon
00:00:03
everyone uh I hope the lunch was good uh
00:00:06
but obviously not too good so that you
00:00:08
don't sleep off while I give this talk
00:00:11
uh and welcome to my presentation of how
00:00:14
we use postris in
00:00:16
zeroda uh and what what have we learned
00:00:19
from it and our mistakes our experiences
00:00:22
everything and where we are right
00:00:25
now so setting up the context for the
00:00:28
talk um to quot my favorite Salman B
00:00:32
movie race three our learnings are our
00:00:34
learnings none of your
00:00:36
learnings uh what it means is that
00:00:38
everything that I'm going to speak about
00:00:40
here is something that we have learned
00:00:42
in our experience in the context of the
00:00:45
data that how we use how we import it
00:00:47
might not apply to even one person in
00:00:50
this room and that is how databases
00:00:52
should be it it should not be extremely
00:00:54
generic either um you might disagree or
00:00:57
be triggered by how we news postgress
00:01:01
and that is okay I have been told by uh
00:01:04
Kash our CTO to not make any jokes even
00:01:07
pg3
00:01:10
months uh little bit about me uh I've
00:01:12
been at zeroda for since day one of Tech
00:01:15
Team um as the 10x engineer uh these are
00:01:19
all memes that we have internally about
00:01:21
each other and been managing the
00:01:24
backend uh full stack backend for the
00:01:28
entire time I've been at zeroda gone
00:01:31
through all possible databases um my SQL
00:01:35
postest redis uh mongod DV uh click
00:01:40
house cockroach the list is
00:01:43
endless uh and and before I get into
00:01:47
this talk I first of all like to I mean
00:01:50
say thanks to the core team of postgress
00:01:54
uh because I've come across multiple
00:01:56
languages databases softwares Force or
00:02:00
Enterprise but I don't think there has
00:02:03
been anyone better at
00:02:06
documenting their features as well as
00:02:08
postgress has done I don't think there
00:02:10
is anyone that has a better blueprint of
00:02:13
what they want to do in their future
00:02:15
updates like postgress has done I don't
00:02:17
think there is I might be wrong here
00:02:19
again because as I said it's our
00:02:21
learnings but I don't think there is
00:02:22
anything as resilient as postrace has
00:02:24
been for us um and we have done
00:02:28
ridiculous things with it and this just
00:02:30
worked uh from upgrading from postgress
00:02:34
8 is where we started to postgress uh we
00:02:37
right now at
00:02:38
PG-13 uh and it has the updation has
00:02:42
never caused an issue no data loss
00:02:44
nothing and that is U like cannot be
00:02:48
more thankful to the code development
00:02:50
team of postgress and the community of
00:02:52
postgress which has always been super
00:02:54
nice in answering any of our doubts on
00:02:56
the slack
00:02:57
channels so
00:03:00
uh history of uh postgress usage in
00:03:03
seroa we started out uh first of all let
00:03:06
me set a bit of context of how zeroda
00:03:09
imports or uses its data and maybe that
00:03:11
will be helpful in understanding why we
00:03:13
do things with postris the way we do
00:03:16
it the as you know V zeroda is an
00:03:21
fintech Indian broker maybe I should
00:03:23
have introduce zeroda first I don't
00:03:24
think everyone knows uh what zeroda is
00:03:27
so we are a stock broker uh we uh Robin
00:03:32
Hood of India or Robin Hood is zeroda of
00:03:33
us uh we deal with stock market and uh
00:03:38
we
00:03:40
import trade books we basically build a
00:03:43
software for people to trade on so which
00:03:44
means that we have to deal with all
00:03:46
kinds of financial information and it
00:03:48
also means Computing a lot of financial
00:03:50
information like pnl of uh profit and
00:03:53
loss of users how much The Ledger of
00:03:55
users how much money they have
00:03:57
transferred in transferred out all
00:03:58
critical financial information that we
00:04:00
store and we use postgress for
00:04:02
it uh markets are open from 9:15 to 3
00:04:08
3:30 every day after that M6 is open but
00:04:11
I don't think we ever cared a lot about
00:04:14
it but uh yeah markets are open from
00:04:16
9:15 to 330 for majority for most of our
00:04:19
Traders um and we our systems that we
00:04:24
have built some of them are read only
00:04:26
throughout the day and become right only
00:04:29
at night
00:04:30
many of many of the systems that are
00:04:32
built are usually read and write
00:04:34
throughout the day and night but our
00:04:36
systems are a bit different than that
00:04:37
and the systems that I have worked on uh
00:04:40
we have a trading platform called kite
00:04:43
uh which has a transactional DB which
00:04:46
again uses postris that is a read write
00:04:48
throughout the day but console which is
00:04:51
our backend back office platform where
00:04:54
all the trade books all the information
00:04:56
regarding anything the user has done
00:04:58
throughout the day on our Trading
00:04:59
platform gets imported in that import
00:05:03
happens at night that is the rights of
00:05:05
bulk rights happen at night but majority
00:05:08
of it it remains a readon platform
00:05:10
throughout the day with very few rights
00:05:12
so that is the context on which we built
00:05:15
our schemas our queries our databases
00:05:17
and how we
00:05:19
scale so uh we started off with
00:05:21
importing around uh so when I joined
00:05:24
zeroa used to have 20,000 clients um not
00:05:28
even all of them are active and we used
00:05:30
to import around 150 MBS of data per day
00:05:35
at best and I used to have uh I am
00:05:40
saying I a lot here because at point of
00:05:42
time it was just two or three of us uh I
00:05:44
mean if you have read our blogs you
00:05:46
would know that we are a very lean very
00:05:47
small team and we still have remained so
00:05:50
like that so I used to face a lot of
00:05:52
issues with scaling even that 100 MB of
00:05:55
data when we started out with um when I
00:05:59
look back back at it lot of things that
00:06:00
I did was extremely obviously dumb uh
00:06:03
lack of understanding of how data Works
00:06:05
understanding of how databases work um
00:06:08
over indexing issues under indexing
00:06:11
everything every possible thing that you
00:06:12
can think of can go wrong in a database
00:06:15
um for example let's say uh the log
00:06:18
files overflowing and causing the
00:06:21
database to crash so everything that can
00:06:24
possibly go wrong uh has gone wrong with
00:06:26
us we have learned from it uh We've
00:06:28
improved our softwares way we deal with
00:06:32
uh storing our own data so started off
00:06:35
with 100 MB uh 100 MB failed uh there
00:06:38
was postgress 8 uh improved on our
00:06:41
schemas improved our schema design
00:06:43
improved the way an app has to built on
00:06:46
has to be built on top of um our
00:06:49
databases not rewrote our apps multiple
00:06:51
times uh again if you have read any of
00:06:54
our posts you would know that we we
00:06:56
rewrite a lot of our things multiple
00:06:59
times over over and over again um it is
00:07:02
mundan might be but it solves it solves
00:07:05
a lot of headache for us by removing uh
00:07:08
Legacy code Legacy issues and I would
00:07:11
say Legacy schemas too because you might
00:07:13
have started with a schema that doesn't
00:07:16
make sense right now uh because your
00:07:18
queries have changed the way you deal
00:07:19
with the data has changed so we end up
00:07:23
rewriting everything we know that
00:07:24
nothing is constant Everything Will
00:07:26
Change needs to change everything will
00:07:27
break and that's okay we are okay with
00:07:29
it uh we currently deal with hundreds of
00:07:32
GBS of import every single day um uh
00:07:36
absolutely no issues at all I mean there
00:07:38
are plenty of issues but postgress has
00:07:40
worked fine for us till now though we
00:07:44
have other plans of doing other things
00:07:45
with it but till now again nothing as
00:07:49
resilient as good as postgress has been
00:07:52
for us so how do we
00:07:56
manage uh this big amount of data I've
00:08:00
put a question mark there
00:08:01
because when we when we started out um
00:08:06
understanding our data better I remember
00:08:08
this was six years back probably I
00:08:11
remember sitting with Nan our CEO and
00:08:13
even Kash and Nan used to be like so so
00:08:16
we are very close to Big Data right
00:08:18
because big data used to be this fancy
00:08:21
term at that point of time I never
00:08:23
understood what Big Data meant uh I
00:08:26
assumed that it's just a nice looking
00:08:28
term on your assume right you you're
00:08:30
you're managing Big Data um eventually
00:08:33
we uh eventually I guess we all realize
00:08:37
that all of that is pretty much hogwash
00:08:39
uh there are companies which need big
00:08:42
data there are companies which don't
00:08:43
need big data you don't have to be a
00:08:46
serious engineering company if you I
00:08:48
mean if you don't need to have big data
00:08:50
to be a serious engineering company you
00:08:52
can make do with little less data so um
00:08:56
I'm going to be this talk is probably
00:08:58
going to be a bit of an over overview of
00:09:00
how we manage our data till now but um I
00:09:03
glad to I'll be more than glad to take
00:09:05
questions at the end of it if there are
00:09:07
more doubts or anything else uh first
00:09:10
thing is uh index uh but don't overdo it
00:09:14
so when we started
00:09:16
out I I thought that indexing was like a
00:09:19
fullprof plan to solve everything that
00:09:22
is there realized it much later that
00:09:24
indexing itself takes a lot of space
00:09:27
indexing in itself uh uh you can't index
00:09:31
for every query that you write you need
00:09:33
to First understand that there are some
00:09:35
queries that need to be fast and some
00:09:36
queries that you can afford it to be
00:09:38
slow and that's okay so how we have
00:09:42
designed our systems is the queries that
00:09:44
um
00:09:46
are the the the number of queries are
00:09:49
higher for let's say a particular set of
00:09:50
columns those columns are indexed and uh
00:09:54
the columns that are not indexed they
00:09:56
might be queried and but we don't index
00:09:58
them at all and that's okay those
00:10:00
queries might take a long enough long
00:10:02
time but they're not user facing they
00:10:05
are backend reports that it generated
00:10:07
over time not everything has to happen
00:10:09
in 1 second or half a millisecond or
00:10:11
stuff like that so we're very aware of
00:10:12
that when we index we use partial
00:10:14
indexes everywhere U that's another
00:10:16
thing that we learned that uh even if
00:10:18
you're indexing a column you can partial
00:10:21
indexing will be much more helpful for
00:10:23
you in categorizing the kind of data
00:10:25
that you want to search um the second
00:10:28
thing is materialized views um I'll
00:10:31
combine materialized views and the
00:10:33
denormalization point into one uh the
00:10:35
reason being uh if if any of you have
00:10:38
done engineering here you would you
00:10:40
would have studied database systems and
00:10:41
one of the first things that that is
00:10:43
taught to us is normalize normalize
00:10:45
normalize everything right and when we
00:10:47
come out we we come out with this with
00:10:49
this idea that we need to
00:10:51
normalize uh all of our data sets you'll
00:10:54
realize that this works well on smaller
00:10:58
data
00:10:59
as the data grows those join queries
00:11:02
will stop working those join queries
00:11:04
will become so slow that there is
00:11:06
absolutely nothing you can do to fix it
00:11:09
so we took a conscious decision to
00:11:12
denormalize a lot of our data sets so
00:11:15
majority of our data sets majority of
00:11:17
our tables have nothing to do with each
00:11:19
other and we are okay with that it
00:11:21
obviously leads to
00:11:23
increase in the size of data that we
00:11:25
store but the the trade-off that we get
00:11:29
in improvement improvement of query is
00:11:31
much higher than the size increase we
00:11:35
can always Shard and make our database
00:11:37
smaller or delete data or do whatever
00:11:39
but query Improvement is a very
00:11:41
difficult task to pull off uh if you if
00:11:44
your entire query is a bunch of nested
00:11:46
joints across uh two heavy tables we
00:11:50
avoid that everywhere and one of the
00:11:52
ways we avoid it is obviously as I said
00:11:53
we denormalize a lot and we uh have
00:11:58
materialized views
00:11:59
everywhere in our system uh and that is
00:12:03
one of the easiest cleanest fastest way
00:12:06
to make your queries work faster if
00:12:09
there is a bunch of small data set that
00:12:12
is getting reused all over your
00:12:13
postgress query multiple times over use
00:12:16
width statements use materialized views
00:12:18
and it will be uh your queries will
00:12:21
automatically be fast I don't want to
00:12:23
give you statistics about 10x fast or
00:12:25
20x fast and all because it again
00:12:27
depends upon data your query your server
00:12:29
size all of those things so no no
00:12:32
metrics as such being thrown here but it
00:12:35
will have a much better experience than
00:12:37
doing multiple joints across massive
00:12:39
tables avoid that at all costs um one
00:12:43
more thing is understanding your data
00:12:45
better and by that I
00:12:48
mean I feel like uh and this is
00:12:51
something that I've learned after
00:12:52
talking to a lot of people uh of
00:12:55
different companies or uh different
00:12:57
startups and how they work
00:12:59
and they pick the database first and
00:13:02
then they figure out how to put the data
00:13:03
into the database I don't know why they
00:13:05
do that maybe the stack looks more uh
00:13:08
Rockstar like I guess uh if you choose
00:13:10
some fancy database and then try to pige
00:13:12
and hold the data into it uh picking
00:13:15
first understanding the data then
00:13:18
understanding how you will query the
00:13:19
data should be the first step before you
00:13:22
pick what kind of database and how you
00:13:25
will uh design the schema of the
00:13:27
database if you don't do that if if you
00:13:29
say that you know what it's it's a
00:13:30
postgress conference it's going to be
00:13:33
just postgress in my stack there will be
00:13:34
nothing else nowhere uh postgress is
00:13:38
like the one true solution for
00:13:40
everything so that's that's not going to
00:13:42
work um then the next point is post is
00:13:46
Db tuning around queries uh one more
00:13:50
thing we have uh realized is many people
00:13:53
tune the database this something that I
00:13:55
came across again very recently while I
00:13:57
was dealing with another company uh
00:13:59
database stack they have tuned their
00:14:02
database in in a wholesome manner that
00:14:04
means that the entire database has a set
00:14:07
of parameters that they have done PG
00:14:08
tuning for uh and it caters to every
00:14:12
single table that is there in database
00:14:13
and that is a terrible approach if you
00:14:16
have a lot of data a better way to do is
00:14:19
you tune your D there's no denying that
00:14:22
but you also tune your tables maybe a
00:14:24
particular table needs more parallel
00:14:26
workers maybe a particular table needs
00:14:29
frequently vacuumed compared to the
00:14:31
other set of tables that you have in
00:14:33
your DB so um you need to you need to
00:14:37
tune based upon the queries that hit
00:14:39
those particular tables rather than the
00:14:40
entire database in
00:14:42
itself um the last I mean understanding
00:14:46
a query planner I'm sure there is uh
00:14:49
there's a mistake understanding a query
00:14:50
planner so uh another mistake when I
00:14:54
started out was I'm sure I don't know
00:14:57
how many of you feel that way with a
00:14:59
query planner of postgress or any
00:15:01
database is a little hard to understand
00:15:04
um and I felt that for the longest time
00:15:06
I would it will just print a bunch of
00:15:08
things and all I will read is the the
00:15:10
last set of things right so it took this
00:15:13
much time it accessed this much data and
00:15:16
that's all I understood from those query
00:15:18
planners took me a very long time to
00:15:21
understand the direction of the query
00:15:23
which is very very important to
00:15:25
understand uh direction of the query
00:15:26
would be what is called first a where
00:15:28
clause and and Clause a join clause in
00:15:30
your entire query if you do not
00:15:32
understand that you will not be able to
00:15:33
understand your query plan at all and
00:15:36
it's very easy to understand a query
00:15:37
plan of a simple query right if you do a
00:15:39
select star from whatever table and
00:15:40
fetch that you don't even need a query
00:15:42
plan for that if the database is if the
00:15:45
if there's if the index is not there
00:15:47
that query will be slow you don't need a
00:15:48
query plan to tell you that but query
00:15:51
plan is super helpful when you're doing
00:15:53
joints across multiple tables and uh
00:15:57
understanding what kind of
00:15:59
uh sorts are being called is very very
00:16:02
important to understand I think me and
00:16:04
Kash must have sat and debugged multiple
00:16:06
queries trying to understand the query
00:16:08
planner of it all and pogus is very
00:16:10
funny with its query planning so uh
00:16:14
there will be a certain clause in which
00:16:17
a completely different query plan will
00:16:18
be chosen for no reason at all and you
00:16:21
have to and there have been reasons
00:16:22
where we don't I still don't understand
00:16:24
some of the query plans that are there
00:16:25
but we have backtracked like into a into
00:16:28
a explanation for ourselves that if we
00:16:31
do this this this this then our query
00:16:33
plans will look like this and if we do
00:16:35
these set of things our query plans will
00:16:36
look like that this is better than this
00:16:38
we'll stick to this and we have followed
00:16:40
that everywhere
00:16:42
and I don't think I don't think you can
00:16:45
look at a documentation and understand a
00:16:47
query plan either this is something that
00:16:49
you have to play around with your
00:16:50
queries play around with your data to
00:16:52
get to the point um the queries that I
00:16:55
would have in my system on my set of
00:16:58
data would have a you reduce a you
00:17:01
reduce the data by half and the query
00:17:03
plan will work very differently just the
00:17:05
way postgress is and that is something
00:17:09
that you have to respect you have to
00:17:11
understand and if you don't understand
00:17:12
query plan uh forget about optimizing
00:17:15
your queries DB schema nothing nothing
00:17:17
will ever happen you will just keep
00:17:18
vacuuming which which brings me back to
00:17:20
the last point and this is this is funny
00:17:24
because I was in the vacuuming talk the
00:17:27
one that happened right before for uh uh
00:17:30
right before lunch break so the first
00:17:33
thing he said was do not turn off autov
00:17:35
vacuum the first thing I would say is
00:17:37
turn off autov vacuum so uh and I'll
00:17:40
tell you why we do that and why it works
00:17:43
in our context and might not work for
00:17:45
someone else autov vacuum is an
00:17:47
incredible feature if tuned properly if
00:17:50
you have seen the tuning parameters
00:17:52
they're not very easy to understand what
00:17:54
does delete tuples after X number of
00:17:57
things even mean there
00:17:59
they're not easy to what does nap time
00:18:02
mean how does someone who has not dealt
00:18:05
with database for a very long time
00:18:07
understand the set of parameters there
00:18:08
that is documentation and all of that
00:18:10
but it's really hard to read an abstract
00:18:13
documentation and relate it to a schema
00:18:17
um we we played around with every single
00:18:20
parameter that autov vacuum has nothing
00:18:22
worked for us and I'll tell you why we
00:18:25
would bulk we would bulk import billions
00:18:28
of row in a fixed set of time now you
00:18:31
might say that well if you are importing
00:18:33
everything in a fixed set of time why
00:18:35
don't you trigger why don't you write
00:18:37
your autov vacuum to work right after
00:18:40
the UT has been
00:18:41
done that UT is never under our control
00:18:45
the files can come delayed from anywhere
00:18:47
any point point of time and because none
00:18:51
of it is under our control we decided
00:18:54
that autov vacuum is not a solution for
00:18:56
us turned it off because it was it was
00:18:59
going to run forever and ever and ever
00:19:02
we vacuum uh so I hope most of you know
00:19:05
the difference between vacuum full and
00:19:06
vacuum analyze but if you don't know
00:19:07
vacuum full a very simple explanation
00:19:10
vacuum full will give you back your
00:19:11
space that you have updated deleted
00:19:13
vacuum analyze will improve your query
00:19:14
plan we don't vacuum full anything
00:19:16
because that completely blocks the DB we
00:19:18
vacuum analyze all our queries right
00:19:20
after doing a massive bulk UT uh we we
00:19:24
realize that um I'm sure if you have
00:19:27
been in the talk he spoke about Max
00:19:28
parallel workers while autov vacuuming
00:19:30
we understand that autov vacuuming uses
00:19:32
the parallelism of postgress that is
00:19:34
inbuilt into it which we don't but we
00:19:37
don't really care about it because this
00:19:39
happens late in the night
00:19:41
and vacuuming taking half an hour more
00:19:44
or 10 minutes more doesn't make a big
00:19:46
difference for us at that point of time
00:19:48
so in this context in this scenario
00:19:50
turning off autov vacuum and running
00:19:52
vacuum on our own as a script that
00:19:55
triggers vacuum for multiple tables one
00:19:57
after the other once are Imports are
00:19:59
done works for us but to uh to reiterate
00:20:03
again it might not work for your context
00:20:05
and maybe autov vacuum is the better
00:20:07
solution but remember that autov vacuum
00:20:09
has a lot of pitfalls and I will I mean
00:20:13
I read postgress 13 documentation a
00:20:16
while back it still hasn't improved to
00:20:18
an extent that I thought it should have
00:20:20
by now and it still has its set of
00:20:23
issues while dealing with massive sets
00:20:24
of data um but I hope I hope it gets
00:20:27
better over time and uh if if some if
00:20:30
some code developers can do it then it
00:20:32
has to be postest so I hope they do that
00:20:34
so
00:20:37
yeah um okay so this is another
00:20:40
interesting part of the the talk I guess
00:20:44
but before I get there um I remember
00:20:47
speaking to someone outside and they
00:20:49
said that how is your setup like and uh
00:20:52
what do you do for um
00:20:55
replica and Master Slave and all of of
00:20:58
those set of things
00:20:59
so I guess this will be triggering for
00:21:02
everyone we don't have a Master Slave at
00:21:04
all we don't have a replica either uh we
00:21:08
have one database and one one node we
00:21:12
have started it using foreign database
00:21:14
rapper uh why we have shed it like that
00:21:17
I will explain I'll get to that um but
00:21:20
we have shed it using foreign database
00:21:21
rapper so we have divided our data
00:21:23
across multiple Financial years and kept
00:21:27
older historical fin IAL years in a
00:21:29
different uh database server and
00:21:32
connected both of them using FDB and we
00:21:36
query the primary DB and it figures out
00:21:38
from the partitioning that the other the
00:21:42
data is not in this database server
00:21:44
right now and it is in the other
00:21:45
database it figures it out fetches the
00:21:47
query for us fetches the data for us um
00:21:51
no slave setup uh our backups are
00:21:53
archived in S3 we are okay this is by
00:21:57
the way to reiterate this is uh a back
00:22:00
office platform we do not promise that
00:22:03
we'll have 100% off time we are okay
00:22:05
with that we understand that if
00:22:08
postgress goes down which has never ever
00:22:10
happened again thankfully to postgress
00:22:12
but even if it goes down for whatever
00:22:14
number of reasons we have been able to
00:22:16
bring that back up bring the database
00:22:18
back up by using S3 within minutes and I
00:22:22
have restarted postgress that is
00:22:24
pointing towards a completely fresh
00:22:26
backup from S3 with maybe 15 20
00:22:29
terabytes of data under under a minute
00:22:32
or two so it works so there is there
00:22:35
there might be fancy complicated
00:22:37
interesting setup to make your replicas
00:22:40
work but this also works and I many
00:22:43
people might call it jugar uh hacky way
00:22:46
of doing things but I don't think it is
00:22:48
I think it's a sensible approach we
00:22:50
don't want to over engineer anything at
00:22:52
all if this works why have a bunch of
00:22:55
systems that you need to understand just
00:22:57
to manage manage um a replica setup now
00:23:02
coming back uh to the question of if we
00:23:05
don't have a replica how do we load
00:23:07
balance we don't but what we have done
00:23:11
differently is that we have a we have a
00:23:14
second postgress server that sits on top
00:23:17
of our primary DB and acts like a
00:23:19
caching layer we have uh we have an
00:23:23
open-source uh piece of software called
00:23:25
SQL Java which is a a sync uh job based
00:23:30
mechanism that keeps pulling the DB and
00:23:33
then fetches the data stores it in
00:23:34
another postgress instance um and then
00:23:37
eventually our app understands that the
00:23:41
fetch is done data is ready to be served
00:23:43
and it fetches the DV fetches the data
00:23:46
from the caching layer so we end up
00:23:49
creating
00:23:51
around 500 GB worth of I would say
00:23:54
around 20 30 millions of tables per day
00:23:58
uh I remember speaking I remember asking
00:24:00
someone postgress slack a long time back
00:24:03
that we are doing this thing where we
00:24:05
creating 20 million tables a day and
00:24:07
they like why are you doing this isn't
00:24:09
there another way of doing it and we're
00:24:11
like no this works and the reason why we
00:24:13
do this is because uh postgress in in
00:24:16
itself supports sorting uh which red
00:24:18
this doesn't postgress I mean at that
00:24:20
point of time uh it it lets us do
00:24:23
pagination it lets us do search on top
00:24:26
of trading symbols if we need I mean
00:24:28
search on top of any columns that we
00:24:29
need to do if necessary so we have
00:24:32
postgress setting as a caching layer on
00:24:34
top of our primary postgress and all the
00:24:38
queries first come to the SQL jobber
00:24:40
application they go to our primary DB
00:24:44
nothing gets hammered to the primary DB
00:24:45
though so the primary DB is not under
00:24:47
any load at any point of time I mean
00:24:49
there is a query load but it's not
00:24:51
getting hammered at all the hammering
00:24:53
happens to this caching DB which gets
00:24:55
set eventually at some point of time
00:24:57
with the data
00:24:59
and then we serve the data to to our end
00:25:02
users and that remains for the entire
00:25:04
day because as I said during the day the
00:25:06
data doesn't change a lot so we can
00:25:08
afford to cach this data for the entire
00:25:10
time duration there are some instances
00:25:12
in which we need to clear our C clear
00:25:15
the cache we can just delete the key and
00:25:17
then this entire process happens all
00:25:18
over again for that particular user so
00:25:21
that's our that's how our postgress
00:25:23
caching layer is Works has worked fine
00:25:25
for us every night we clean the 500gb so
00:25:28
how we do is every night uh we have two
00:25:31
500 GB discs uh pointing at to the
00:25:34
server we switch from disk one to dis
00:25:36
two then the disk one gets cleaned up
00:25:38
then the next day goes to dis goes back
00:25:40
from disk two to disk one and again the
00:25:42
new tables are set all over it again
00:25:44
works fine uh never been an issue with
00:25:47
this um coming back to our learnings
00:25:50
with postgress yeah sorry
00:26:05
can hello are you able to hear me yeah
00:26:09
yeah uh you know you're telling that
00:26:10
about kite platform so uh from the kite
00:26:14
platform data that is LP enement right
00:26:16
so from the kite platform data you are
00:26:19
porting to the console database yeah so
00:26:21
that is a nightly job yeah that's a
00:26:23
nightly job that's a nightly job yeah so
00:26:25
that's what you telling in the console
00:26:28
uh uh system that create millions of
00:26:30
data right yeah so um okay maybe I
00:26:33
should explain this again so uh you
00:26:35
place your orders your trades and
00:26:37
everything on kite right at the night we
00:26:40
get a order book or a trade book that
00:26:42
gets imported into console we do that to
00:26:46
compute the buy average which is the
00:26:48
average price at which you bought that
00:26:50
stock or the profit and loss statement
00:26:53
which you'll use for taxation for any
00:26:54
other reason that you might need it for
00:26:56
that is why we import data into console
00:26:59
so to fetch these set of statements you
00:27:01
have to come to console to fetch that
00:27:03
now when you are fetching the statement
00:27:05
we this caching layer sits on top of
00:27:07
that your fetches go to this caching
00:27:10
layer first it checks if there is
00:27:11
already a prefetched cach for you ready
00:27:13
or not if not the query goes through the
00:27:16
DB the the data is fetched put into the
00:27:18
caching layer and for the entire day the
00:27:20
caching layer is serving the data to you
00:27:22
not the primary DB the primary DB
00:27:24
remains free at least for you as the
00:27:26
user so let's say you come in you lay
00:27:28
around in console you load a bunch of
00:27:29
reports everything is cashed for the
00:27:31
entire day in this caching layer so
00:27:33
primary DB remains as it is till the
00:27:36
till that night so that night we would
00:27:38
have gotten all the trades orders any
00:27:40
things that you have done on your
00:27:41
trading platform into uh console we
00:27:44
import all of that we clear our cash
00:27:46
because it's a fresh set of data your
00:27:48
pnl your Ledger your financial
00:27:50
statements have changed because maybe
00:27:51
you have traded that day maybe have
00:27:53
bought stocks that day or anything would
00:27:54
have done happened to your account that
00:27:55
day so we clear our cach then next year
00:27:59
when you come and fetch the data again
00:28:01
all of this is set all over again and
00:28:03
then whenever you can keep revisiting
00:28:04
console keep fetching whatever amounts
00:28:06
of data you want to it will come from
00:28:08
this cache unless of course you change
00:28:10
the the date parameters only then we uh
00:28:15
uh go and fet the data from our primary
00:28:17
DP but we have realized that most users
00:28:20
use the data of a particular time frame
00:28:23
and they don't they don't want to come
00:28:25
and check for last 3 years what has
00:28:27
happened it is always last 6 months last
00:28:28
2 months last one month and they check
00:28:31
that once they go back and uh we cannot
00:28:36
obviously build a system where every
00:28:38
single date range has to be uh equally
00:28:41
scalable and equally available uh we are
00:28:44
very aware that the older which I'll
00:28:46
talk about how we have shed we are very
00:28:48
aware that our older Financial year data
00:28:50
points don't need to be available all
00:28:52
the time at the highest possible metrics
00:28:54
of a server uh they don't have to be at
00:28:57
a don't have to be served at a very fast
00:28:59
rate either right so these are the
00:29:01
decisions that we have taken and it has
00:29:03
worked fine for us uh might not work for
00:29:05
another person but yeah so I I hope that
00:29:09
answered your question one doubt on that
00:29:11
kite is also having postgress DB right
00:29:13
so are using PG dump or postgress
00:29:16
utilities itself uh no so kite uh uses
00:29:19
postgress for its market watch uh so
00:29:22
market watch is the place where you add
00:29:24
different scripts or different stocks
00:29:27
and it tells you the current price of
00:29:28
the stock uh though we have we have
00:29:31
plans of moving away from that to S DB
00:29:34
um that has got nothing to do with this
00:29:37
uh how I mean I guess you're asking a
00:29:39
more of a how a broker Works question or
00:29:41
how a trading platform works but you
00:29:43
place an order the order comes as an
00:29:46
exchange file at the end of the day for
00:29:48
us and we import that so there is no PG
00:29:50
dump that happens from kite to console
00:29:53
so those are completely two different
00:29:54
silos that have very little to do with
00:29:56
each other they rarely share data among
00:29:58
each other and they're never in the hot
00:30:00
path because we understand that kite is
00:30:02
a u extremely fast read and WR platform
00:30:05
where everything has to happen over
00:30:06
milliseconds and it can't ever go down
00:30:08
so these set of fundamentals will not
00:30:10
really work there so there is no
00:30:12
connection of PG dumping kite data into
00:30:14
console kite Works throughout the day
00:30:16
console Works after the day after your
00:30:19
trading Market is done that's where you
00:30:20
come to console and check how well you
00:30:22
have performed in the day so I that's
00:30:26
that one more just caching layer do you
00:30:28
have in kite also caching layer on kite
00:30:31
yeah it's redis it's predominantly redis
00:30:34
caching layer so we also use caching uh
00:30:37
redis caching on everywhere actually
00:30:39
it's not just kite we pretty much use
00:30:41
redis like uh if you have used our
00:30:44
platform coin uh it used to set on a $5
00:30:47
digital ocean droplet for the longest
00:30:49
possible time because everything was
00:30:50
cached on a redis uh instance and used
00:30:53
to work just fine so we use redis
00:30:56
predominantly to cach we don't use redis
00:30:58
in console for these kind of caching
00:31:00
layer because sorting and pagination is
00:31:02
not supported on it uh this is a very
00:31:05
specific requirement here it works here
00:31:07
so we use postgress here for
00:31:09
that is it fine yeah that's I use this
00:31:12
skyen console that's why I asked this
00:31:13
cool no issues um thank
00:31:17
you so our learnings with postgress and
00:31:21
um I'll start off
00:31:23
with how we because I I remember my my
00:31:27
summary of my talk uh that is there on
00:31:29
the posters and Etc outside talks about
00:31:32
how we Shard and why we Shard the way we
00:31:34
do it um if you have seen cus DB
00:31:38
extension or a lot of sharding examples
00:31:40
all over the world of all the DBS in the
00:31:43
world how they set it up is have
00:31:47
a have a have a master DB have a parent
00:31:51
DB or whatever and have tenets to every
00:31:54
single child that is connected to it now
00:31:58
how those tenets uh work is that you
00:32:00
query the master DB it figures out that
00:32:02
these set of tenets are in this uh child
00:32:05
setup or the sharded setup and the query
00:32:08
goes there we believe that there is no
00:32:13
reason to add another column that has
00:32:15
these IDs on it we actually in most of
00:32:18
our tables we have deleted all our IDs U
00:32:20
extra data don't need it so we follow
00:32:23
that in a lot of places so um what we
00:32:27
did decided was was that we partition
00:32:29
our database per month because it works
00:32:32
for
00:32:32
us then for every single Financial year
00:32:36
we put it in a different database uh
00:32:38
server and we connect it via FDB rapper
00:32:42
and that is our entire sharded
00:32:45
setup uh has worked fine for us but I
00:32:49
would I would say
00:32:50
that at our scale um and our scale
00:32:55
is 30 40 terab of 50 terabytes you can
00:32:58
say right now
00:33:01
um it it's starting to falter a bit it's
00:33:04
not it's not a great experience anymore
00:33:07
and which is why we are moving to a very
00:33:09
different setup different way of
00:33:11
sharding maybe that is for another talk
00:33:13
but till now we could scale uh to
00:33:16
millions of users serving billions of
00:33:19
requests uh 500 600 GBS of data per day
00:33:23
using just foreign data rapper and a SQL
00:33:25
jobber caching layer on top of our
00:33:27
primary DB no nodes no uh load balancer
00:33:31
nothing at all um so our learnings of
00:33:36
postgress um has been that this is
00:33:39
something that is a there is a gut
00:33:42
feeling when you write your queries or
00:33:45
when you write when you look at a
00:33:46
database schema that uh our gut feeling
00:33:49
is
00:33:50
that every query has a time to it like
00:33:55
for a particular amount of data for a
00:33:57
particular query should not take more
00:33:58
than x number of milliseconds I guess
00:33:59
that comes with experience many of you
00:34:01
can just look at the data look at the
00:34:03
query and know that something is wrong
00:34:05
if even if it's slow by a few
00:34:06
milliseconds you can figure that out so
00:34:08
we have a hard limit that certain
00:34:10
queries cannot cross this limit and we
00:34:12
optimize and keep on optimizing based on
00:34:15
that um most of our heavy queries are in
00:34:17
an async setup like the job or cash you
00:34:20
said we ensure that none of it is on the
00:34:23
hot path of an app um there is no glory
00:34:27
in storing to too much data so we we
00:34:30
delete a lot of data uh so someone was
00:34:32
surprised that our total database is 50
00:34:35
terabytes or um yeah probably around 50
00:34:39
or 60 not more than that for sure um and
00:34:42
one of the reasons why it is 50 and not
00:34:44
500 terabytes is we delete a lot of data
00:34:47
we do not believe in storing data that
00:34:50
we do not need what what does it mean is
00:34:53
that we uh for most of the computations
00:34:56
that we do for most of the Imports and
00:34:58
inserts and everything that we do we
00:35:00
have a hot backup or whatever you can
00:35:03
call it of the last 15 days or last 15
00:35:06
days after that we have checkpoint
00:35:08
backups of last one month last two
00:35:10
months last 3 months one backup for each
00:35:12
month we do not have any backup in
00:35:15
between any of those dates because we
00:35:17
can go back to any single month and
00:35:19
regenerate everyday's data till now we
00:35:22
are okay doing that because we have that
00:35:25
a night where uh anything can go wrong
00:35:28
and we can run these set of computations
00:35:30
and come back to the current state that
00:35:32
is right now maybe it doesn't work for
00:35:35
others but I again this is another
00:35:38
experience that I've learned looking at
00:35:39
databases of others that there is a lot
00:35:41
of frivolous data that people like to
00:35:42
keep for no reason at all because it
00:35:44
just makes the database looks bigger and
00:35:46
I don't know makes it looks fancier just
00:35:48
delete it it doesn't it's back it up in
00:35:51
a S3 and put it somewhere like don't
00:35:53
have to be in a database why does
00:35:54
six-year-old data unless it's a
00:35:56
compliance that is being set by the the
00:35:59
company you work for or the organization
00:36:01
you work for unless it's a compliance
00:36:02
that you have to do it it can be an S3
00:36:05
backup it can be a file um doesn't have
00:36:08
to be in a database and you don't have
00:36:09
to be responsible for every query of
00:36:12
last 10 years to be served under 1
00:36:15
millisecond doesn't make sense it will
00:36:17
never scale don't do that um the other
00:36:22
thing that I've also noticed is a lot of
00:36:24
people write maybe this is a front end
00:36:27
develop are going into backend issue uh
00:36:29
where a lot of the logic that should
00:36:32
have been done by postgress gets done by
00:36:34
the app and I've noticed that in a lot
00:36:37
of places and I think that is uh
00:36:39
something that fundamentally should
00:36:41
change post this in itself can do a lot
00:36:44
of competitions like summing average
00:36:47
window functions you can do so many
00:36:48
things by overloading into postd rather
00:36:51
than your app doing it um and I find
00:36:54
that strange because your app should be
00:36:56
responsible for just loading the queries
00:36:59
fetching the data it should not be
00:37:01
Computing for most of the scenarios I
00:37:03
think I mean I don't know why this this
00:37:06
this is something that we had done as a
00:37:08
mistake too and we learned and I hope
00:37:11
that uh maybe if there is only one
00:37:14
learning from my entire talk uh because
00:37:16
I've noticed this in a lot of places uh
00:37:20
is to overload your postgress with most
00:37:22
of the computations it can do it faster
00:37:24
than any app that you write unless I
00:37:26
don't know you using r or something else
00:37:28
but still poist will be really fast so
00:37:30
try to do that and um yeah uh as you
00:37:35
would have noticed that our engineering
00:37:37
setup is very lean we are it's not
00:37:41
overwhelming or underwhelming it's stay
00:37:43
whelmed I guess uh we we don't overdo
00:37:46
anything at all we we
00:37:49
always uh we always hit the limits of
00:37:52
what we have right now in every possible
00:37:54
way and only then look out for other
00:37:57
Solutions
00:37:58
and it has worked pretty good for us we
00:38:00
have never over engineered any of our
00:38:03
Solutions till now and we have always
00:38:05
organically found solutions for whenever
00:38:07
we have come across issues if postgress
00:38:09
hasn't worked for us then that's fine
00:38:12
we'll find another solution for it so as
00:38:15
I said sometimes postgress is might not
00:38:17
be the answer sometimes a different
00:38:18
database would be the answer for you
00:38:20
and you should be I guess humble enough
00:38:22
to accept that and move on from
00:38:24
postgress most databases are very
00:38:26
similar to each other if you go through
00:38:28
there how they design the data how the
00:38:30
schemas are made unless you're dealing
00:38:32
with columa databases they're very
00:38:34
similar and this the the uh the
00:38:37
fundamentals remain the same across all
00:38:40
databases if they are not then that is a
00:38:42
wrong database so even if so your route
00:38:45
is experimenting with click house a lot
00:38:47
and the fundamentals are very similar
00:38:49
so do not be afraid to experiment with a
00:38:53
different set of databases we all do
00:38:54
that a lot in our free time uh because
00:38:57
because I mean it's a strange way to I
00:38:59
guess end the talk but post gu might not
00:39:01
be an answer for every single problem
00:39:03
though we found an answer for a lot of
00:39:04
our problems and you should be okay with
00:39:07
that uh thank
00:39:09
[Applause]
00:39:14
you hello um so even if the application
00:39:18
users are you can have R inside post so
00:39:20
that that that Sol the problem anyway
00:39:22
but my question is uh when you say the
00:39:24
caching layer has 20 million tables um
00:39:27
do how do you take care of the catalog
00:39:29
bloat or do you just drop and recreate
00:39:31
the whole cluster every night we just
00:39:34
rmrf the entire data okay Fant yeah
00:39:36
that's what I was thinking the other
00:39:37
problem is uh even with that um I've had
00:39:41
scenarios where uh you run into extfs or
00:39:44
whatever file system related limitations
00:39:46
on because like poster stores everything
00:39:49
in a single directory right so have you
00:39:51
had hit something like that and if so
00:39:52
what do you do yeah I mean
00:39:55
U I would I would categorize it as some
00:39:58
of the mistakes we did at the beginning
00:39:59
the file limits were wrong at the to
00:40:01
begin with but post that we' have never
00:40:03
AC never really come across any file
00:40:05
limit issues uh we have I mean more than
00:40:09
happy to admit it we have come across
00:40:10
issues where the we have run out of
00:40:12
integers for our
00:40:14
ID because that's a number of columns we
00:40:17
stored in one single go that also has
00:40:18
happened so uh and then the import
00:40:21
stopped then we had to do a ridiculous
00:40:23
amount of things alter you know how much
00:40:24
time would have altering the table would
00:40:26
have taken but but no we didn't uh it
00:40:29
was a it was a server configuration
00:40:31
mistake that from our end but it was
00:40:33
never the issue of post so I haven't
00:40:36
come across it in my experience okay
00:40:38
thank
00:40:41
you so you said you hardly have any
00:40:44
crashes or any know downtime so is it
00:40:47
with some kind of a ha solution or it's
00:40:50
just you know the instance doesn't crash
00:40:52
what's the magic oh what's the m i mean
00:40:55
I think the magic is by the post
00:40:57
developers no uh I think the reason we
00:41:00
don't have a lot of Crash is we um we
00:41:05
have ensured that all our apps are not
00:41:07
sitting on top of massive databases
00:41:09
they're always sitting on top of caching
00:41:10
layers one uh you cannot ever ever ever
00:41:14
scale an app on top of 10 20 terabytes
00:41:17
of data and expect it to work without
00:41:18
crashing it will crash if that happens
00:41:20
it will overload and we have crashed our
00:41:22
databases but the mistake was not of
00:41:24
postgress that is wrong to expect that
00:41:26
the mistake was that we thought our app
00:41:28
can easily query that much data in this
00:41:30
much amount of time and be fine with it
00:41:33
it will never work as soon as we meet it
00:41:34
asnc as as soon as we made it uh behind
00:41:37
our caching layer it worked absolutely
00:41:39
fine so it's uh again to there's the
00:41:42
same answer it wasn't the issue of post
00:41:44
it was our mistake that we had to
00:41:45
rectify
00:41:51
thanks okay so we'll take last questions
00:41:55
after that you go offline questions
00:42:00
uh this is regarding today's morning
00:42:02
session right like kaas was addressing
00:42:05
that uh before covid you could able to
00:42:07
take uh 2 million request and during
00:42:11
covid like you are able to scale up to 8
00:42:14
million to 12 million uh without scaling
00:42:17
your system how did that
00:42:19
happen
00:42:22
um okay um I'm going to S sound a little
00:42:25
dumb here I guess but caching is a
00:42:27
magical layer on top of everything I
00:42:29
guess we were already ready to serve uh
00:42:32
we did increase we did increase our
00:42:34
primary DB servers uh the number of
00:42:36
cores number of parallel workers that
00:42:38
query the database all of those tuning
00:42:40
had to change obviously now was it over
00:42:42
provisioned uh no it was never
00:42:44
over-provisioned it was always 1db so
00:42:46
there is no over-provisioning 1 DB it's
00:42:47
not like it was multi- sharded setup so
00:42:49
it was 1db we added more cores to it the
00:42:52
the jobber is a separate server that
00:42:54
runs the the caching server that we call
00:42:56
it right right so that was never
00:42:58
over-provisioned that is still whatever
00:43:00
we started with it's the exact same
00:43:02
setup till now 16 CES 32 GB Ram still
00:43:04
now and that's how we started three
00:43:06
years back uh works fine um I don't know
00:43:10
man the I guess that's how good the
00:43:12
caching layer
00:43:13
is uh you can say that probably we over
00:43:17
proficient before that because when you
00:43:21
we by default start with this 16 uh
00:43:23
course 32 when you're dealing with a
00:43:25
pogus DB because we are used to tuning
00:43:27
it for that so we know the tuning
00:43:30
parameters for those set of numbers so
00:43:32
that's how we start off with that
00:43:33
usually in that case maybe that's how we
00:43:35
started here like that we thought that
00:43:36
it would work fine have you ever
00:43:37
forecasted that have you ever forecasted
00:43:40
that load uh sorry I couldn't load load
00:43:42
load tested uh yeah couple of times uh
00:43:45
the maximum load that we have gone to uh
00:43:48
was four or five uh and that's it it's
00:43:51
never been more than that our post
00:43:53
database has been overloaded multiple
00:43:54
times and every single time it has been
00:43:57
loaded has been our mistake where we
00:43:59
have skipped the caching layer and hit
00:44:01
the database directly and as I said that
00:44:03
will never scale it doesn't matter if
00:44:04
it's one terabyte or 500 GB it it will
00:44:06
not work so we have every time we
00:44:09
consciously write a new API endpoint we
00:44:11
ensure that the first uh thing first
00:44:14
Frontier has to be the caching layer on
00:44:16
sitting on top of it and everything has
00:44:18
to be async it cannot be concurrent uh
00:44:21
it cannot be concurrent queries hitting
00:44:22
the DB and uh an HTTP API endpoint
00:44:26
waiting for the response to happen uh
00:44:28
again that will not scale your app will
00:44:29
go down for sure eventually everything
00:44:32
will be in a weight IO situation and
00:44:33
nothing will work thank you

الوسوم

PostgreSQL
Zerodha
fintech
basis data
indeksing
sharding
vakum manual
materialized views
kinerja query
lapisan cache