00:00:00
If you want to level up from a junior
00:00:02
developer to a senior engineer or land a
00:00:04
high paying job at a big tech company,
00:00:06
you need to learn system design. But
00:00:07
where do you start? To master system
00:00:09
design, you first need to understand the
00:00:11
core concepts and fundamental building
00:00:13
blocks that come up when designing real
00:00:15
world systems or tackling system design
00:00:17
interview questions. In this video, I
00:00:18
will break down the 30 most important
00:00:20
system design concepts you need to know.
00:00:22
Learning these concepts helped me land
00:00:24
high paying offers from multiple big
00:00:25
tech companies. And in my 8 years as a
00:00:27
software engineer, I've seen them used
00:00:29
repeatedly when building and scaling
00:00:31
large scale systems. Let's get started.
00:00:34
Almost every web application that you
00:00:35
use is built on this simple yet powerful
00:00:37
concept called client server
00:00:39
architecture. Here is how it works. On
00:00:41
one side, you have a client. This could
00:00:43
be a web browser, a mobile app or any
00:00:45
other frontend application. And on the
00:00:46
other side, you have a server, a machine
00:00:49
that runs continuously waiting to handle
00:00:51
incoming request. The client sends a
00:00:53
request to store, retrieve or modify
00:00:55
data. The server receives the request,
00:00:58
processes it, performs the necessary
00:01:00
operations, and sends back a response.
00:01:02
This sounds simple, right? But there is
00:01:04
a big question. How does the client even
00:01:06
know where to find a server? A client
00:01:08
doesn't magically know where a server
00:01:10
is. It needs an address to locate and
00:01:12
communicate with it. On the internet,
00:01:14
computers identify each other using IP
00:01:16
addresses, which works like phone
00:01:18
numbers for servers. Every publicly
00:01:20
deployed server has a unique IP address.
00:01:22
Something like this. When a client wants
00:01:24
to interact with a service, it must send
00:01:26
request to the correct IP address. But
00:01:28
there's a problem. When we visit a
00:01:30
website, we don't type its IP address.
00:01:32
We just enter the website name. Right?
00:01:33
Instead of relying on hard to remember
00:01:35
IP addresses, we use something much more
00:01:37
human friendly, domain names. But we
00:01:39
need a way to map a domain name to its
00:01:41
corresponding IP address. This is where
00:01:43
DNS or domain name system comes in. It
00:01:46
maps easy to remember domain names like
00:01:48
algo masteraster.io io to their
00:01:49
corresponding IP addresses. When you
00:01:51
type algo masteraster.io into your
00:01:53
browser, your computer asks a DNS server
00:01:55
for the corresponding IP address. Once
00:01:57
the DNS server responds with the IP,
00:01:59
your browser uses it to establish a
00:02:01
connection with the server and make a
00:02:03
request. You can find the IP address of
00:02:05
any domain name using the ping command.
00:02:07
When you visit a website, your request
00:02:09
doesn't always go directly to the
00:02:10
server. Sometimes it passes through a
00:02:12
proxy or reverse proxy first. A proxy
00:02:14
server acts as a middleman between your
00:02:17
device and the internet. When you
00:02:18
request a web page, the proxy forwards
00:02:20
your request to the target server,
00:02:22
retrieves the response, and sends it
00:02:24
back to you. A proxy server hides your
00:02:26
IP address, keeping your location and
00:02:28
identity private. A reverse proxy works
00:02:30
the other way around. It intercepts the
00:02:32
client request and forwards them to the
00:02:34
back end server based on predefined
00:02:36
rules. Whenever a client communicates
00:02:38
with a server, there is always some
00:02:39
delay. One of the biggest cause of this
00:02:41
delay is physical distance. For example,
00:02:43
if our server is in New York, but a user
00:02:45
in India sends a request, the data has
00:02:48
to travel halfway across the world and
00:02:50
then the response has to make the same
00:02:52
long trip back. This roundtrip delay is
00:02:54
called latency. High latency can make
00:02:56
applications feel slow and unresponsive.
00:02:58
One way to reduce latency is by
00:03:00
deploying our service across multiple
00:03:02
data centers worldwide. This way, users
00:03:05
can connect to the nearest server
00:03:06
instead of waiting for data to travel
00:03:08
across the globe. Once a connection is
00:03:10
made, how do clients and servers
00:03:11
actually communicate? Every time you
00:03:13
visit a website, your browser and the
00:03:15
server communicate using a set of rules
00:03:17
called HTTP. That's why most URLs is
00:03:20
start with HTTP or it secure version
00:03:22
HTTPS. The client sends a request to the
00:03:24
server. This request includes a header
00:03:26
containing details like the request
00:03:28
type, browser type, and cookies and
00:03:30
sometimes a request body which carries
00:03:32
additional data like form inputs. The
00:03:34
server processes the request and
00:03:36
responds with an HTTP response either
00:03:39
returning the requested data or an error
00:03:41
message if something goes wrong. HTTP
00:03:43
has a major security flaw. It sends data
00:03:45
in plain text. Modern websites use
00:03:47
HTTPS. HTTPS encrypts all data using SSL
00:03:51
or TLS protocol ensuring that even if
00:03:53
someone intercepts the request, they
00:03:55
can't read or alter it. But clients and
00:03:57
servers don't directly exchange raw HTTP
00:03:59
requests and response. HTTP is just a
00:04:01
protocol for transferring data, but it
00:04:03
doesn't define how requests should be
00:04:05
structured, what format responses should
00:04:08
be in, or how different clients should
00:04:10
interact with the server. This is where
00:04:12
APIs or application programming
00:04:14
interfaces come in. Think of an API as a
00:04:16
middleman that allows clients to
00:04:18
communicate with servers without
00:04:20
worrying about low-level details. A
00:04:22
client sends a request to an API. The
00:04:24
API hosted on a server processes the
00:04:26
request, interacts with databases or
00:04:28
other services, and prepares a response.
00:04:30
The API sends back the response in a
00:04:32
structured format, usually JSON or XML,
00:04:35
which the client understands and can
00:04:37
display. There are different API styles
00:04:39
to serve different needs. Two of the
00:04:41
most popular ones are REST and GraphQL.
00:04:43
Just a quick note to keep this video
00:04:45
concise, I'm covering these topics at a
00:04:47
high level, but if you want to go deeper
00:04:48
and learn these topics in more detail,
00:04:50
check out my blog at
00:04:52
blog.algammaster.io. Every week I
00:04:54
publish in-depth articles on complex
00:04:56
system design topics with clear
00:04:57
explanations and real world examples.
00:05:00
Make sure to subscribe so that you don't
00:05:02
miss my new articles. Among the
00:05:03
different API styles, REST is the most
00:05:05
widely used. A REST API follows a set of
00:05:08
rules that defines how clients and
00:05:10
servers communicate over HTTP in a
00:05:12
structured way. REST is stateless. Every
00:05:15
request is independent. Everything is
00:05:17
created as a resource. For example,
00:05:18
users, orders, products. It uses
00:05:21
standard HTTP methods like get to
00:05:23
retrieve data, post to create new data,
00:05:25
put to update existing data, and delete
00:05:28
to remove data. Rest APIs are great
00:05:30
because they are simple, scalable, and
00:05:32
easy to cast, but they have limitations,
00:05:34
especially when dealing with complex
00:05:35
data retrieval. REST endpoints often
00:05:38
return more data than needed, leading to
00:05:40
inefficient network uses. To address
00:05:42
these challenges, GraphQL was introduced
00:05:44
in 2015 by Facebook. Unlike REST,
00:05:46
GraphQL lets client ask for exactly what
00:05:49
they need. Nothing more, nothing less.
00:05:50
With a REST API, if you need a user's
00:05:53
profile along with their recent post,
00:05:54
you might have to make multiple requests
00:05:56
to different endpoints. With GraphQL,
00:05:58
you can combine those requests into one
00:06:00
and fetch exactly the data you need in a
00:06:03
single query. The server responds with
00:06:05
only the requested fields. However,
00:06:07
GraphQL also comes with trade-offs. It
00:06:09
requires more processing on the server
00:06:11
side, and it isn't as easy to cast as
00:06:14
REST. Now when a client makes a request
00:06:16
they usually want to store or retrieve
00:06:18
data. But this brings up another
00:06:19
question where is the actual data
00:06:20
stored. If our application deals with
00:06:22
small amounts of data we could store it
00:06:25
as a variable or as a file and load it
00:06:27
in memory. But modern applications
00:06:29
handle massive volumes of data far more
00:06:31
than what memory can efficiently handle.
00:06:33
That's why we need a dedicated server
00:06:35
for storing and managing data. A
00:06:38
database. A database is the backbone of
00:06:40
any modern application. It ensures that
00:06:42
data is stored, retrieved and managed
00:06:44
efficiently while keeping it secure,
00:06:46
consistent and durable. When a client
00:06:48
request to store or retrieve data, the
00:06:50
server communicates with the database,
00:06:52
fetches the required information and
00:06:54
returns it to the client. But not all
00:06:56
databases are the same. In system
00:06:58
design, we typically choose between SQL
00:07:00
and NoSQL databases. SQL databases store
00:07:03
data in tables with a strict predefined
00:07:05
schema and they follow ACIT properties.
00:07:08
Because of these guarantees, SQL
00:07:09
databases are ideal for applications
00:07:11
that require strong consistency and
00:07:13
structured relationships such as banking
00:07:15
systems. NoSQL databases on the other
00:07:17
hand are designed for high scalability
00:07:19
and performance. They don't require a
00:07:20
fixed schema and use different data
00:07:22
models including key value stores,
00:07:25
document stores, graph databases, and
00:07:27
wide column stores which are optimized
00:07:29
for large scale distributed data. So,
00:07:31
which one should you use? If you need
00:07:33
structured relational data with a strong
00:07:35
consistency, SQL is the better choice.
00:07:38
If you need high scalability, flexible
00:07:39
schema, NoSQL is the better choice. Many
00:07:42
modern applications use both SQL and
00:07:43
NoSQL together. As our user base grows,
00:07:46
so does the number of requests hitting
00:07:47
our application servers. One of the
00:07:49
quickest solutions is to upgrade the
00:07:51
existing server by adding more CPU, RAM
00:07:53
or storage. This approach is called
00:07:55
vertical scaling or scaling up which
00:07:57
makes a single machine more powerful.
00:07:59
But there are some major limitations
00:08:01
with this approach. You can't keep
00:08:02
upgrading a server forever. Every
00:08:04
machine has a maximum capacity. More
00:08:06
powerful servers become exponentially
00:08:08
more expensive. If this one server
00:08:10
crashes, the entire system goes down.
00:08:12
So, while vertical scaling is a quick
00:08:13
fix, it's not a long-term solution for
00:08:15
handling high traffic and ensuring
00:08:17
system reliability. Let's look at a
00:08:19
better approach, one that makes our
00:08:21
system more scalable and fall tolerant.
00:08:23
Instead of upgrading a single server,
00:08:25
what if we add more servers to share the
00:08:27
load? This approach is called horizontal
00:08:29
scaling or scaling out where we
00:08:31
distribute the workload across multiple
00:08:32
machines. More servers is equal to more
00:08:34
capacity which means the system can
00:08:36
handle increasing traffic more
00:08:37
effectively. If one server goes down,
00:08:39
others can take over which improves
00:08:41
reliability. But horizontal scaling
00:08:44
introduces a new challenge. How do
00:08:46
clients know which server to connect to?
00:08:48
This is where a load balancer comes in.
00:08:50
A load balancer sits between clients and
00:08:52
backend servers acting as a traffic
00:08:54
manager that distributes request across
00:08:56
multiple servers. If one server crashes,
00:08:58
the load balancer automatically
00:08:59
redirects traffic to another healthy
00:09:01
server. But how does a load balancer
00:09:03
decide which server should handle the
00:09:05
next request? It is a load balancing
00:09:06
algorithm such as roundroin, least
00:09:09
connections and IP hashing. So far we
00:09:11
have talked about scaling our
00:09:12
application servers. But as traffic
00:09:14
grows, the volume of data also
00:09:16
increases. At first we can scale a
00:09:18
database vertically by adding more CPU,
00:09:20
RAM and storage similar to application
00:09:22
servers. But there is a limit of how
00:09:24
much a single machine can handle. So
00:09:26
let's explore other database scaling
00:09:28
techniques that can help manage large
00:09:29
volumes of data efficiently. One of the
00:09:32
quickest and most effective ways to
00:09:33
speed up database read queries is
00:09:35
indexing. Think of it like the index
00:09:37
page at the back of a book. Instead of
00:09:39
flipping through every page, you jump
00:09:40
directly to the relevant section. A
00:09:42
database index works the same way. It's
00:09:44
a super efficient lookup table that
00:09:46
helps the database quickly locate the
00:09:47
required data without scanning the
00:09:49
entire table. An index is stores column
00:09:51
values along with pointers to actual
00:09:53
data rows in the table. Indexes are
00:09:55
typically created on columns that are
00:09:57
frequently queried such as primary keys,
00:09:59
foreign keys, and columns frequently
00:10:01
used in rare conditions. While indexes
00:10:04
speed up reads, they slow down rights
00:10:06
since the index needs to be updated
00:10:08
whenever data changes. That's why we
00:10:10
should only index the most frequently
00:10:11
accessed columns. Indexing can
00:10:13
significantly improve read performance.
00:10:15
But what if even indexing isn't enough
00:10:17
and our single database server can't
00:10:19
handle the growing number of read
00:10:21
request? That's where our next database
00:10:23
scaling technique, replication, comes
00:10:24
in. Just like we added more application
00:10:26
servers to handle increasing traffic, we
00:10:28
can scale our database by creating
00:10:30
copies of it across multiple servers.
00:10:32
Here is how it works. We have one
00:10:34
primary database also called the primary
00:10:36
replica that handles all write
00:10:37
operations. We have multiple read
00:10:39
replicas that handle read queries.
00:10:41
Whenever data is written to primary
00:10:43
database, it gets copied to read
00:10:45
replicas so that they stay in sync.
00:10:47
Replication improves the read
00:10:48
performance since read requests are
00:10:50
spread across multiple replicas reducing
00:10:52
the load on each one. This also improves
00:10:54
availability since if the primary
00:10:56
replica fails, a read replica can take
00:10:58
over as the new primary. Replication is
00:11:00
great for scaling read heavy
00:11:02
applications. But what do we need to
00:11:04
scale right operations or store huge
00:11:05
amounts of data? Let's say our service
00:11:07
became popular. It now has millions of
00:11:09
users and our database has grown to
00:11:11
terabytes of data. A single database
00:11:13
server will eventually struggle to
00:11:15
handle all this data efficiently.
00:11:16
Instead of keeping everything in one
00:11:18
place, we split the database into
00:11:20
smaller, more manageable pieces and
00:11:22
distribute them across multiple servers.
00:11:24
This technique is called sarding. Here
00:11:26
is how it works. We divide the database
00:11:28
into smaller parts called Sards. Each
00:11:30
sard contains a subset of the total
00:11:32
data. Data is distributed based on the
00:11:34
sarding key. For example, user ID. By
00:11:36
distributing data this way, we reduce
00:11:39
database load since each SA handles only
00:11:41
a portion of queries and speed of read
00:11:43
and write performance since queries are
00:11:45
distributed across multiple SS instead
00:11:47
of hitting a single database. Singing is
00:11:49
also referred to as horizontal
00:11:50
partitioning since it splits data by
00:11:53
rows. But what if the issue isn't the
00:11:55
number of rows but rather the number of
00:11:57
columns? In such cases, we use vertical
00:11:59
partitioning where we split the database
00:12:01
by columns. Imagine we have a user table
00:12:04
that stores profile details, login
00:12:06
history, and billing information. As
00:12:09
this table grows, queries become slower
00:12:11
because the table must scan many columns
00:12:13
even when a request only needs a few
00:12:15
specific fields. To optimize this, we
00:12:17
use vertical partitioning where we split
00:12:19
user table into smaller, more focused
00:12:21
tables based on users patterns. This
00:12:23
improves query performance since each
00:12:25
request only scans relevant columns
00:12:27
instead of the entire table. It also
00:12:29
reduces unnecessary disk IO making data
00:12:31
retrieval quicker. However, no matter
00:12:33
how much we optimize the database,
00:12:35
retrieving data from disk is always
00:12:36
slower than retrieving it from memory.
00:12:38
What if we could store frequently access
00:12:40
data in memory? This is called caching.
00:12:42
Caching is used to optimize the
00:12:43
performance of a system by storing
00:12:45
frequently access data in memory instead
00:12:47
of repeatedly fetching it from database.
00:12:49
One of the most common caching
00:12:50
strategies is the cash aside pattern.
00:12:52
Here is how it works. When a user
00:12:54
requests the data, the application first
00:12:56
checks the C. If the data is in the
00:12:58
cache, it's returned instantly avoiding
00:13:00
a database call. If the data is not in
00:13:02
the C, the application retrieves it from
00:13:04
the database. It stores it in the C for
00:13:07
future request and returns it to the
00:13:08
user. Next time the same data is
00:13:10
requested, it's served directly from C,
00:13:13
making the request much faster. To
00:13:15
prevent outdated data from being served,
00:13:17
we use time to live value or TTL. Let's
00:13:19
look at next database scaling technique.
00:13:21
Most relational database use
00:13:23
normalization to store data efficiently
00:13:25
by breaking it into separate tables.
00:13:27
While this reduces redundancy, it also
00:13:29
introduces joins. When retrieving data
00:13:31
from multiple tables, the data must
00:13:33
combine them using join operations,
00:13:35
which can slow down queries as the data
00:13:37
set grows. Denormalization reduces the
00:13:39
number of joints by combining related
00:13:41
data into a single table, even if it
00:13:43
means some data gets duplicated. For
00:13:45
example, instead of keeping users and
00:13:47
orders in a separate table, we create
00:13:49
user orders table that stores user
00:13:51
details along with the latest orders.
00:13:53
Now, when retrieving a user's order
00:13:55
history, we don't need a join operation.
00:13:57
The data is already stored together
00:13:58
leading to faster queries and better
00:14:00
read performance. Denormalization is
00:14:02
often used in read heavy applications
00:14:04
where speed is more critical. But the
00:14:06
downside is it leads to increased
00:14:08
storage and more complex update request.
00:14:11
As we scale our system across multiple
00:14:13
servers, databases and data centers, we
00:14:15
enter the world of distributed systems.
00:14:18
One of the fundamental principles of
00:14:19
distributed systems is the cap theorem
00:14:21
which states that no distributed system
00:14:23
can achieve all three of the following
00:14:25
at the same time. Consistency,
00:14:27
availability, and partition tolerance.
00:14:29
Since network failures are inevitable,
00:14:31
we must choose between consistency plus
00:14:33
partition tolerance or availability plus
00:14:36
partition tolerance. If you want to
00:14:38
learn about cap theorem in more detail,
00:14:39
you can check out this article on my
00:14:41
blog called Cap theorem explained. Most
00:14:42
modern applications don't just store
00:14:44
text record. They also need to handle
00:14:46
images, videos, PDFs, and other large
00:14:49
files. Traditional databases are not
00:14:50
designed to store large unstructured
00:14:52
files efficiently. So, what's the
00:14:54
solution? We use blob storage like
00:14:56
Amazon S3. Blobs are like individual
00:14:58
files like images, videos, or documents.
00:15:01
These blobs are stored inside logical
00:15:03
containers or buckets in the cloud. Each
00:15:05
file gets a unique URL making it easy to
00:15:07
retrieve and serve over the web. There
00:15:09
are several advantages with using blob
00:15:11
storage like scalability, pay as you go
00:15:14
model, automatic replication, easy
00:15:16
access. A common use case is to stream
00:15:18
audio or video files to user
00:15:20
applications in real time. But streaming
00:15:22
the video file directly from blob
00:15:24
storage can be slow, especially if the
00:15:26
data is stored in a distant location.
00:15:28
For example, imagine you are in India
00:15:30
trying to watch a YouTube video that's
00:15:32
hosted on a server in California. Since
00:15:34
the video data has to travel across the
00:15:36
world, this could lead to buffering and
00:15:38
slow load times. A content delivery
00:15:40
network or CDN solves this problem by
00:15:42
delivering content faster to users based
00:15:44
on their location. A CDN is a global
00:15:46
network of distributed servers that work
00:15:48
together to deliver web content like
00:15:50
HTML pages, JavaScript files, images,
00:15:53
and videos to users based on their
00:15:55
geographic location. Since content is
00:15:56
served from the closest CDN server,
00:15:58
users experience faster load times with
00:16:00
minimal buffering. Let's move to the
00:16:02
next system design concept which can
00:16:04
help us build realtime applications.
00:16:06
Most web applications use HTTP which
00:16:08
follows a request response model. The
00:16:11
client sends a request. The server
00:16:12
processes the request and sends a
00:16:14
response. If the client needs new data,
00:16:16
it must send another request. This works
00:16:18
fine for static web pages but it's too
00:16:21
slow and inefficient for real-time
00:16:22
applications like live chat
00:16:24
applications, stock market dashboards or
00:16:27
online multiplayer games. With HTTP, the
00:16:29
only way to get real-time update is
00:16:31
through frequent polling, sending
00:16:33
repeated request every few seconds. But
00:16:35
polling is inefficient because it
00:16:37
increases the server load and waste
00:16:39
bandwidth. As most responses are empty
00:16:41
when there is no new data, webockets
00:16:43
solve this problem by allowing
00:16:44
continuous two-way communication between
00:16:46
the client and the server over a single
00:16:48
persistent connection. The client
00:16:50
initiates a websocket connection with
00:16:52
the server. Once established, the
00:16:54
connection remains open. The server can
00:16:55
push updates to the client at any time
00:16:58
without waiting for a request. The
00:16:59
client can also send messages instantly
00:17:01
to the server. This enables real-time
00:17:03
interactions and eliminates the need for
00:17:05
polling. Webockets enables real-time
00:17:07
communication between a client and a
00:17:08
server. But what if a server needs to
00:17:10
notify another server when an event
00:17:12
occurs? For example, when a user makes a
00:17:15
payment, the payment gateway needs to
00:17:16
notify your application instantly.
00:17:19
Instead of constantly pulling an API to
00:17:21
check if an event has occurred, web
00:17:23
hooks allow a server to send an HTTP
00:17:25
request to another server as soon as the
00:17:27
event occurs. Here is how it works. The
00:17:29
receiver, for example, your app
00:17:31
registers a web hook URL with the
00:17:33
provider. When an event occurs, the
00:17:35
provider sends a HTTP post request to
00:17:37
the web hook URL with event details.
00:17:40
This saves server resources and reduces
00:17:42
unnecessary API calls. Traditionally
00:17:44
applications were built using a
00:17:45
monolithic architecture where all
00:17:47
features are inside one large codebase.
00:17:50
This setup works fine for small
00:17:52
applications but for large scale systems
00:17:54
monoliths become hard to manage, scale
00:17:56
and deploy. The solution is to break
00:17:58
down your application into smaller
00:18:00
independent services called
00:18:01
microservices that work together. Each
00:18:04
microser handles a single responsibility
00:18:06
has its own database and logic so it can
00:18:09
scale independently. communicates with
00:18:11
other microservices using APIs or
00:18:13
message cues. This way, services can be
00:18:15
scared and deployed individually without
00:18:16
affecting the entire system. However,
00:18:18
when multiple microservices need to
00:18:20
communicate, direct API calls aren't
00:18:22
always efficient. This is where message
00:18:24
cues come in. Synchronous communication,
00:18:26
for example, waiting for immediate
00:18:28
responses doesn't scale well. A message
00:18:30
Q enables services to communicate
00:18:32
asynchronously, allowing requests to be
00:18:34
processed without blocking other
00:18:35
operations. Here is how it works. There
00:18:37
is a producer which places a message in
00:18:39
the queue. The queue temporarily host
00:18:41
the message. The consumer retrieves the
00:18:43
message and processes it. Using message
00:18:45
cues, we can decouple services and
00:18:47
improve the scalability and we can
00:18:49
prevent overload on internal services
00:18:51
within our system. But how do we prevent
00:18:53
overload for the public APIs and
00:18:55
services that we deploy? For that we use
00:18:57
rate limiting. Imagine a bot starts
00:18:59
making thousands of requests per second
00:19:01
to your website. Without restrictions,
00:19:03
this could crash your servers by
00:19:04
consuming all available resources and
00:19:06
degrade performance for legitimate
00:19:08
users. Rate limiting restricts the
00:19:10
number of requests a client can send
00:19:11
within a specific time frame. Every user
00:19:14
or IP address is assigned a request
00:19:16
kota, for example, 100 requests per
00:19:18
minute. If they exceed this limit, the
00:19:20
server blocks additional requests
00:19:21
temporarily and returns an error. There
00:19:23
are various rate limiting algorithms.
00:19:25
Some of the popular ones are fix window,
00:19:27
sliding window, and token bucket. We
00:19:29
don't need to implement our own rate
00:19:31
limiting system. This can be handled by
00:19:32
something called an API gateway. An API
00:19:34
gateway is a centralized service that
00:19:36
handles authentication, rate limiting,
00:19:38
logging, monitoring, request routing,
00:19:40
and much more. Imagine a
00:19:41
microservices-based application with
00:19:43
multiple services. Instead of exposing
00:19:45
each service directly, an API gateway
00:19:47
acts as a single entry point for all
00:19:49
client request. It routes the request to
00:19:51
the appropriate microser and the
00:19:53
response is sent back through the
00:19:54
gateway to the client. API gateway
00:19:56
simplifies API management and improves
00:19:58
the scalability and security. In
00:20:00
distributed systems, network failures
00:20:02
and service retries are common. If a
00:20:04
user accidentally refreshes a payment
00:20:06
page, the system might receive two
00:20:07
payment request instead of one. Adam
00:20:09
potency ensures that repeated request
00:20:11
produced the same result as if the
00:20:13
request was made only once. Here is how
00:20:15
it works. Each request is assigned a
00:20:17
unique ID. Before processing, the system
00:20:19
checks if the request has already been
00:20:21
handled. If yes, it ignores the
00:20:23
duplicate request. If no, it processes
00:20:25
the request normally. If you enjoyed
00:20:27
this video, I think you will love my
00:20:28
weekly newsletter where I dive deeper
00:20:30
into system design concepts with real
00:20:32
world examples. I also share articles on
00:20:34
system design interview questions and
00:20:36
tips to help you prepare for interviews.
00:20:38
You can subscribe it at
00:20:39
blog.algamaster.io. Thanks for watching
00:20:41
and I will see you in the next