What is client-server architecture?

Client-server architecture is a model where a client (like a web browser) requests services from a server that processes these requests and sends back responses.

DNS, or Domain Name System, translates human-friendly domain names into IP addresses that computers use to identify each other on the internet.

What is the difference between REST and GraphQL?

REST APIs follow a set of rules for structured communication, while GraphQL allows clients to request exactly the data they need, reducing over-fetching.

What are SQL and NoSQL databases?

SQL databases use structured tables and schemas, while NoSQL databases offer flexibility with various data models and are designed for high scalability.

What is horizontal scaling?

Horizontal scaling involves adding more servers to distribute the load, improving capacity and reliability, as opposed to vertical scaling which upgrades a single server.

Caching stores frequently accessed data in memory to speed up data retrieval and reduce database load.

What is the CAP theorem?

The CAP theorem states that in a distributed system, you can only achieve two out of three guarantees: consistency, availability, and partition tolerance.

What is an API gateway?

An API gateway is a centralized service that manages API requests, handling tasks like authentication, rate limiting, and request routing.

What is rate limiting?

Rate limiting restricts the number of requests a client can make to a server within a specific time frame to prevent overload.

Idempotency ensures that repeated requests produce the same result as a single request, preventing duplicate processing.

System Design was HARD until I Learned these 30 Concepts

00:20:43

https://www.youtube.com/watch?v=s9Qh9fWeOAk

Ringkasan

TLDRThis video outlines 30 key system design concepts essential for developers looking to advance their careers. It begins with the client-server architecture, explaining how clients communicate with servers using IP addresses and DNS. The video then delves into HTTP/HTTPS protocols, the role of APIs, and compares REST and GraphQL. It discusses database types (SQL vs. NoSQL) and scaling techniques (vertical vs. horizontal), emphasizing the importance of load balancing, caching, indexing, replication, and sharding. The presenter also covers advanced topics like the CAP theorem, API gateways, rate limiting, and idempotency, providing insights from personal experience in software engineering.

Takeaways

💻 Understand client-server architecture for web applications.
🌐 Learn how DNS maps domain names to IP addresses.
🔒 Use HTTPS for secure data transmission.
📡 APIs facilitate communication between clients and servers.
📊 REST vs. GraphQL: Choose based on data needs.
🗄️ SQL vs. NoSQL: Select based on data structure and scalability.
📈 Horizontal scaling improves system reliability.
⚖️ Load balancers distribute traffic across servers.
📚 Caching speeds up data retrieval from databases.
🔄 Idempotency ensures consistent request handling.

Garis waktu

00:00:00 - 00:05:00
The video introduces the importance of system design for developers aiming to advance their careers. It outlines the core concepts necessary for mastering system design, emphasizing the client-server architecture as a foundational element. The client sends requests to a server, which processes them and returns responses. The video explains how clients locate servers using IP addresses and domain names, facilitated by the DNS system. It also discusses the role of proxy servers in managing requests and the impact of latency on application performance.
00:05:00 - 00:10:00
The video continues by discussing HTTP and HTTPS protocols for client-server communication, highlighting the importance of APIs in structuring requests and responses. It introduces REST and GraphQL as popular API styles, explaining their differences and use cases. The video then transitions to data storage, comparing SQL and NoSQL databases, and discussing their respective advantages for different application needs. It emphasizes the importance of choosing the right database type based on the application's requirements for consistency and scalability.
00:10:00 - 00:15:00
As user traffic increases, the video explores scaling strategies for application servers, contrasting vertical scaling with horizontal scaling. It introduces load balancers as a solution for distributing requests across multiple servers, enhancing reliability and performance. The video also addresses database scaling techniques, including indexing, replication, and sharding, to manage large volumes of data efficiently. It explains how these techniques improve read and write performance while maintaining data availability.
00:15:00 - 00:20:43
The final segments cover advanced concepts such as caching, denormalization, and the CAP theorem in distributed systems. It discusses the use of blob storage for unstructured data and the role of CDNs in optimizing content delivery. The video concludes with an overview of real-time communication using WebSockets, the benefits of microservices architecture, and the importance of rate limiting and API gateways in managing public APIs. It emphasizes the need for idempotency in handling duplicate requests and invites viewers to subscribe for more insights on system design.

Tampilkan lebih banyak

Peta Pikiran

Video Tanya Jawab

What is client-server architecture?
Client-server architecture is a model where a client (like a web browser) requests services from a server that processes these requests and sends back responses.
What is DNS?
DNS, or Domain Name System, translates human-friendly domain names into IP addresses that computers use to identify each other on the internet.
What is the difference between REST and GraphQL?
REST APIs follow a set of rules for structured communication, while GraphQL allows clients to request exactly the data they need, reducing over-fetching.
What are SQL and NoSQL databases?
SQL databases use structured tables and schemas, while NoSQL databases offer flexibility with various data models and are designed for high scalability.
What is horizontal scaling?
Horizontal scaling involves adding more servers to distribute the load, improving capacity and reliability, as opposed to vertical scaling which upgrades a single server.
What is caching?
Caching stores frequently accessed data in memory to speed up data retrieval and reduce database load.
What is the CAP theorem?
The CAP theorem states that in a distributed system, you can only achieve two out of three guarantees: consistency, availability, and partition tolerance.
What is an API gateway?
An API gateway is a centralized service that manages API requests, handling tasks like authentication, rate limiting, and request routing.
What is rate limiting?
Rate limiting restricts the number of requests a client can make to a server within a specific time frame to prevent overload.
What is idempotency?
Idempotency ensures that repeated requests produce the same result as a single request, preventing duplicate processing.

Lihat lebih banyak ringkasan video

Dapatkan akses instan ke ringkasan video YouTube gratis yang didukung oleh AI!

Teks

Gulir Otomatis:

00:00:00
If you want to level up from a junior
00:00:02
developer to a senior engineer or land a
00:00:04
high paying job at a big tech company,
00:00:06
you need to learn system design. But
00:00:07
where do you start? To master system
00:00:09
design, you first need to understand the
00:00:11
core concepts and fundamental building
00:00:13
blocks that come up when designing real
00:00:15
world systems or tackling system design
00:00:17
interview questions. In this video, I
00:00:18
will break down the 30 most important
00:00:20
system design concepts you need to know.
00:00:22
Learning these concepts helped me land
00:00:24
high paying offers from multiple big
00:00:25
tech companies. And in my 8 years as a
00:00:27
software engineer, I've seen them used
00:00:29
repeatedly when building and scaling
00:00:31
large scale systems. Let's get started.
00:00:34
Almost every web application that you
00:00:35
use is built on this simple yet powerful
00:00:37
concept called client server
00:00:39
architecture. Here is how it works. On
00:00:41
one side, you have a client. This could
00:00:43
be a web browser, a mobile app or any
00:00:45
other frontend application. And on the
00:00:46
other side, you have a server, a machine
00:00:49
that runs continuously waiting to handle
00:00:51
incoming request. The client sends a
00:00:53
request to store, retrieve or modify
00:00:55
data. The server receives the request,
00:00:58
processes it, performs the necessary
00:01:00
operations, and sends back a response.
00:01:02
This sounds simple, right? But there is
00:01:04
a big question. How does the client even
00:01:06
know where to find a server? A client
00:01:08
doesn't magically know where a server
00:01:10
is. It needs an address to locate and
00:01:12
communicate with it. On the internet,
00:01:14
computers identify each other using IP
00:01:16
addresses, which works like phone
00:01:18
numbers for servers. Every publicly
00:01:20
deployed server has a unique IP address.
00:01:22
Something like this. When a client wants
00:01:24
to interact with a service, it must send
00:01:26
request to the correct IP address. But
00:01:28
there's a problem. When we visit a
00:01:30
website, we don't type its IP address.
00:01:32
We just enter the website name. Right?
00:01:33
Instead of relying on hard to remember
00:01:35
IP addresses, we use something much more
00:01:37
human friendly, domain names. But we
00:01:39
need a way to map a domain name to its
00:01:41
corresponding IP address. This is where
00:01:43
DNS or domain name system comes in. It
00:01:46
maps easy to remember domain names like
00:01:48
algo masteraster.io io to their
00:01:49
corresponding IP addresses. When you
00:01:51
type algo masteraster.io into your
00:01:53
browser, your computer asks a DNS server
00:01:55
for the corresponding IP address. Once
00:01:57
the DNS server responds with the IP,
00:01:59
your browser uses it to establish a
00:02:01
connection with the server and make a
00:02:03
request. You can find the IP address of
00:02:05
any domain name using the ping command.
00:02:07
When you visit a website, your request
00:02:09
doesn't always go directly to the
00:02:10
server. Sometimes it passes through a
00:02:12
proxy or reverse proxy first. A proxy
00:02:14
server acts as a middleman between your
00:02:17
device and the internet. When you
00:02:18
request a web page, the proxy forwards
00:02:20
your request to the target server,
00:02:22
retrieves the response, and sends it
00:02:24
back to you. A proxy server hides your
00:02:26
IP address, keeping your location and
00:02:28
identity private. A reverse proxy works
00:02:30
the other way around. It intercepts the
00:02:32
client request and forwards them to the
00:02:34
back end server based on predefined
00:02:36
rules. Whenever a client communicates
00:02:38
with a server, there is always some
00:02:39
delay. One of the biggest cause of this
00:02:41
delay is physical distance. For example,
00:02:43
if our server is in New York, but a user
00:02:45
in India sends a request, the data has
00:02:48
to travel halfway across the world and
00:02:50
then the response has to make the same
00:02:52
long trip back. This roundtrip delay is
00:02:54
called latency. High latency can make
00:02:56
applications feel slow and unresponsive.
00:02:58
One way to reduce latency is by
00:03:00
deploying our service across multiple
00:03:02
data centers worldwide. This way, users
00:03:05
can connect to the nearest server
00:03:06
instead of waiting for data to travel
00:03:08
across the globe. Once a connection is
00:03:10
made, how do clients and servers
00:03:11
actually communicate? Every time you
00:03:13
visit a website, your browser and the
00:03:15
server communicate using a set of rules
00:03:17
called HTTP. That's why most URLs is
00:03:20
start with HTTP or it secure version
00:03:22
HTTPS. The client sends a request to the
00:03:24
server. This request includes a header
00:03:26
containing details like the request
00:03:28
type, browser type, and cookies and
00:03:30
sometimes a request body which carries
00:03:32
additional data like form inputs. The
00:03:34
server processes the request and
00:03:36
responds with an HTTP response either
00:03:39
returning the requested data or an error
00:03:41
message if something goes wrong. HTTP
00:03:43
has a major security flaw. It sends data
00:03:45
in plain text. Modern websites use
00:03:47
HTTPS. HTTPS encrypts all data using SSL
00:03:51
or TLS protocol ensuring that even if
00:03:53
someone intercepts the request, they
00:03:55
can't read or alter it. But clients and
00:03:57
servers don't directly exchange raw HTTP
00:03:59
requests and response. HTTP is just a
00:04:01
protocol for transferring data, but it
00:04:03
doesn't define how requests should be
00:04:05
structured, what format responses should
00:04:08
be in, or how different clients should
00:04:10
interact with the server. This is where
00:04:12
APIs or application programming
00:04:14
interfaces come in. Think of an API as a
00:04:16
middleman that allows clients to
00:04:18
communicate with servers without
00:04:20
worrying about low-level details. A
00:04:22
client sends a request to an API. The
00:04:24
API hosted on a server processes the
00:04:26
request, interacts with databases or
00:04:28
other services, and prepares a response.
00:04:30
The API sends back the response in a
00:04:32
structured format, usually JSON or XML,
00:04:35
which the client understands and can
00:04:37
display. There are different API styles
00:04:39
to serve different needs. Two of the
00:04:41
most popular ones are REST and GraphQL.
00:04:43
Just a quick note to keep this video
00:04:45
concise, I'm covering these topics at a
00:04:47
high level, but if you want to go deeper
00:04:48
and learn these topics in more detail,
00:04:50
check out my blog at
00:04:52
blog.algammaster.io. Every week I
00:04:54
publish in-depth articles on complex
00:04:56
system design topics with clear
00:04:57
explanations and real world examples.
00:05:00
Make sure to subscribe so that you don't
00:05:02
miss my new articles. Among the
00:05:03
different API styles, REST is the most
00:05:05
widely used. A REST API follows a set of
00:05:08
rules that defines how clients and
00:05:10
servers communicate over HTTP in a
00:05:12
structured way. REST is stateless. Every
00:05:15
request is independent. Everything is
00:05:17
created as a resource. For example,
00:05:18
users, orders, products. It uses
00:05:21
standard HTTP methods like get to
00:05:23
retrieve data, post to create new data,
00:05:25
put to update existing data, and delete
00:05:28
to remove data. Rest APIs are great
00:05:30
because they are simple, scalable, and
00:05:32
easy to cast, but they have limitations,
00:05:34
especially when dealing with complex
00:05:35
data retrieval. REST endpoints often
00:05:38
return more data than needed, leading to
00:05:40
inefficient network uses. To address
00:05:42
these challenges, GraphQL was introduced
00:05:44
in 2015 by Facebook. Unlike REST,
00:05:46
GraphQL lets client ask for exactly what
00:05:49
they need. Nothing more, nothing less.
00:05:50
With a REST API, if you need a user's
00:05:53
profile along with their recent post,
00:05:54
you might have to make multiple requests
00:05:56
to different endpoints. With GraphQL,
00:05:58
you can combine those requests into one
00:06:00
and fetch exactly the data you need in a
00:06:03
single query. The server responds with
00:06:05
only the requested fields. However,
00:06:07
GraphQL also comes with trade-offs. It
00:06:09
requires more processing on the server
00:06:11
side, and it isn't as easy to cast as
00:06:14
REST. Now when a client makes a request
00:06:16
they usually want to store or retrieve
00:06:18
data. But this brings up another
00:06:19
question where is the actual data
00:06:20
stored. If our application deals with
00:06:22
small amounts of data we could store it
00:06:25
as a variable or as a file and load it
00:06:27
in memory. But modern applications
00:06:29
handle massive volumes of data far more
00:06:31
than what memory can efficiently handle.
00:06:33
That's why we need a dedicated server
00:06:35
for storing and managing data. A
00:06:38
database. A database is the backbone of
00:06:40
any modern application. It ensures that
00:06:42
data is stored, retrieved and managed
00:06:44
efficiently while keeping it secure,
00:06:46
consistent and durable. When a client
00:06:48
request to store or retrieve data, the
00:06:50
server communicates with the database,
00:06:52
fetches the required information and
00:06:54
returns it to the client. But not all
00:06:56
databases are the same. In system
00:06:58
design, we typically choose between SQL
00:07:00
and NoSQL databases. SQL databases store
00:07:03
data in tables with a strict predefined
00:07:05
schema and they follow ACIT properties.
00:07:08
Because of these guarantees, SQL
00:07:09
databases are ideal for applications
00:07:11
that require strong consistency and
00:07:13
structured relationships such as banking
00:07:15
systems. NoSQL databases on the other
00:07:17
hand are designed for high scalability
00:07:19
and performance. They don't require a
00:07:20
fixed schema and use different data
00:07:22
models including key value stores,
00:07:25
document stores, graph databases, and
00:07:27
wide column stores which are optimized
00:07:29
for large scale distributed data. So,
00:07:31
which one should you use? If you need
00:07:33
structured relational data with a strong
00:07:35
consistency, SQL is the better choice.
00:07:38
If you need high scalability, flexible
00:07:39
schema, NoSQL is the better choice. Many
00:07:42
modern applications use both SQL and
00:07:43
NoSQL together. As our user base grows,
00:07:46
so does the number of requests hitting
00:07:47
our application servers. One of the
00:07:49
quickest solutions is to upgrade the
00:07:51
existing server by adding more CPU, RAM
00:07:53
or storage. This approach is called
00:07:55
vertical scaling or scaling up which
00:07:57
makes a single machine more powerful.
00:07:59
But there are some major limitations
00:08:01
with this approach. You can't keep
00:08:02
upgrading a server forever. Every
00:08:04
machine has a maximum capacity. More
00:08:06
powerful servers become exponentially
00:08:08
more expensive. If this one server
00:08:10
crashes, the entire system goes down.
00:08:12
So, while vertical scaling is a quick
00:08:13
fix, it's not a long-term solution for
00:08:15
handling high traffic and ensuring
00:08:17
system reliability. Let's look at a
00:08:19
better approach, one that makes our
00:08:21
system more scalable and fall tolerant.
00:08:23
Instead of upgrading a single server,
00:08:25
what if we add more servers to share the
00:08:27
load? This approach is called horizontal
00:08:29
scaling or scaling out where we
00:08:31
distribute the workload across multiple
00:08:32
machines. More servers is equal to more
00:08:34
capacity which means the system can
00:08:36
handle increasing traffic more
00:08:37
effectively. If one server goes down,
00:08:39
others can take over which improves
00:08:41
reliability. But horizontal scaling
00:08:44
introduces a new challenge. How do
00:08:46
clients know which server to connect to?
00:08:48
This is where a load balancer comes in.
00:08:50
A load balancer sits between clients and
00:08:52
backend servers acting as a traffic
00:08:54
manager that distributes request across
00:08:56
multiple servers. If one server crashes,
00:08:58
the load balancer automatically
00:08:59
redirects traffic to another healthy
00:09:01
server. But how does a load balancer
00:09:03
decide which server should handle the
00:09:05
next request? It is a load balancing
00:09:06
algorithm such as roundroin, least
00:09:09
connections and IP hashing. So far we
00:09:11
have talked about scaling our
00:09:12
application servers. But as traffic
00:09:14
grows, the volume of data also
00:09:16
increases. At first we can scale a
00:09:18
database vertically by adding more CPU,
00:09:20
RAM and storage similar to application
00:09:22
servers. But there is a limit of how
00:09:24
much a single machine can handle. So
00:09:26
let's explore other database scaling
00:09:28
techniques that can help manage large
00:09:29
volumes of data efficiently. One of the
00:09:32
quickest and most effective ways to
00:09:33
speed up database read queries is
00:09:35
indexing. Think of it like the index
00:09:37
page at the back of a book. Instead of
00:09:39
flipping through every page, you jump
00:09:40
directly to the relevant section. A
00:09:42
database index works the same way. It's
00:09:44
a super efficient lookup table that
00:09:46
helps the database quickly locate the
00:09:47
required data without scanning the
00:09:49
entire table. An index is stores column
00:09:51
values along with pointers to actual
00:09:53
data rows in the table. Indexes are
00:09:55
typically created on columns that are
00:09:57
frequently queried such as primary keys,
00:09:59
foreign keys, and columns frequently
00:10:01
used in rare conditions. While indexes
00:10:04
speed up reads, they slow down rights
00:10:06
since the index needs to be updated
00:10:08
whenever data changes. That's why we
00:10:10
should only index the most frequently
00:10:11
accessed columns. Indexing can
00:10:13
significantly improve read performance.
00:10:15
But what if even indexing isn't enough
00:10:17
and our single database server can't
00:10:19
handle the growing number of read
00:10:21
request? That's where our next database
00:10:23
scaling technique, replication, comes
00:10:24
in. Just like we added more application
00:10:26
servers to handle increasing traffic, we
00:10:28
can scale our database by creating
00:10:30
copies of it across multiple servers.
00:10:32
Here is how it works. We have one
00:10:34
primary database also called the primary
00:10:36
replica that handles all write
00:10:37
operations. We have multiple read
00:10:39
replicas that handle read queries.
00:10:41
Whenever data is written to primary
00:10:43
database, it gets copied to read
00:10:45
replicas so that they stay in sync.
00:10:47
Replication improves the read
00:10:48
performance since read requests are
00:10:50
spread across multiple replicas reducing
00:10:52
the load on each one. This also improves
00:10:54
availability since if the primary
00:10:56
replica fails, a read replica can take
00:10:58
over as the new primary. Replication is
00:11:00
great for scaling read heavy
00:11:02
applications. But what do we need to
00:11:04
scale right operations or store huge
00:11:05
amounts of data? Let's say our service
00:11:07
became popular. It now has millions of
00:11:09
users and our database has grown to
00:11:11
terabytes of data. A single database
00:11:13
server will eventually struggle to
00:11:15
handle all this data efficiently.
00:11:16
Instead of keeping everything in one
00:11:18
place, we split the database into
00:11:20
smaller, more manageable pieces and
00:11:22
distribute them across multiple servers.
00:11:24
This technique is called sarding. Here
00:11:26
is how it works. We divide the database
00:11:28
into smaller parts called Sards. Each
00:11:30
sard contains a subset of the total
00:11:32
data. Data is distributed based on the
00:11:34
sarding key. For example, user ID. By
00:11:36
distributing data this way, we reduce
00:11:39
database load since each SA handles only
00:11:41
a portion of queries and speed of read
00:11:43
and write performance since queries are
00:11:45
distributed across multiple SS instead
00:11:47
of hitting a single database. Singing is
00:11:49
also referred to as horizontal
00:11:50
partitioning since it splits data by
00:11:53
rows. But what if the issue isn't the
00:11:55
number of rows but rather the number of
00:11:57
columns? In such cases, we use vertical
00:11:59
partitioning where we split the database
00:12:01
by columns. Imagine we have a user table
00:12:04
that stores profile details, login
00:12:06
history, and billing information. As
00:12:09
this table grows, queries become slower
00:12:11
because the table must scan many columns
00:12:13
even when a request only needs a few
00:12:15
specific fields. To optimize this, we
00:12:17
use vertical partitioning where we split
00:12:19
user table into smaller, more focused
00:12:21
tables based on users patterns. This
00:12:23
improves query performance since each
00:12:25
request only scans relevant columns
00:12:27
instead of the entire table. It also
00:12:29
reduces unnecessary disk IO making data
00:12:31
retrieval quicker. However, no matter
00:12:33
how much we optimize the database,
00:12:35
retrieving data from disk is always
00:12:36
slower than retrieving it from memory.
00:12:38
What if we could store frequently access
00:12:40
data in memory? This is called caching.
00:12:42
Caching is used to optimize the
00:12:43
performance of a system by storing
00:12:45
frequently access data in memory instead
00:12:47
of repeatedly fetching it from database.
00:12:49
One of the most common caching
00:12:50
strategies is the cash aside pattern.
00:12:52
Here is how it works. When a user
00:12:54
requests the data, the application first
00:12:56
checks the C. If the data is in the
00:12:58
cache, it's returned instantly avoiding
00:13:00
a database call. If the data is not in
00:13:02
the C, the application retrieves it from
00:13:04
the database. It stores it in the C for
00:13:07
future request and returns it to the
00:13:08
user. Next time the same data is
00:13:10
requested, it's served directly from C,
00:13:13
making the request much faster. To
00:13:15
prevent outdated data from being served,
00:13:17
we use time to live value or TTL. Let's
00:13:19
look at next database scaling technique.
00:13:21
Most relational database use
00:13:23
normalization to store data efficiently
00:13:25
by breaking it into separate tables.
00:13:27
While this reduces redundancy, it also
00:13:29
introduces joins. When retrieving data
00:13:31
from multiple tables, the data must
00:13:33
combine them using join operations,
00:13:35
which can slow down queries as the data
00:13:37
set grows. Denormalization reduces the
00:13:39
number of joints by combining related
00:13:41
data into a single table, even if it
00:13:43
means some data gets duplicated. For
00:13:45
example, instead of keeping users and
00:13:47
orders in a separate table, we create
00:13:49
user orders table that stores user
00:13:51
details along with the latest orders.
00:13:53
Now, when retrieving a user's order
00:13:55
history, we don't need a join operation.
00:13:57
The data is already stored together
00:13:58
leading to faster queries and better
00:14:00
read performance. Denormalization is
00:14:02
often used in read heavy applications
00:14:04
where speed is more critical. But the
00:14:06
downside is it leads to increased
00:14:08
storage and more complex update request.
00:14:11
As we scale our system across multiple
00:14:13
servers, databases and data centers, we
00:14:15
enter the world of distributed systems.
00:14:18
One of the fundamental principles of
00:14:19
distributed systems is the cap theorem
00:14:21
which states that no distributed system
00:14:23
can achieve all three of the following
00:14:25
at the same time. Consistency,
00:14:27
availability, and partition tolerance.
00:14:29
Since network failures are inevitable,
00:14:31
we must choose between consistency plus
00:14:33
partition tolerance or availability plus
00:14:36
partition tolerance. If you want to
00:14:38
learn about cap theorem in more detail,
00:14:39
you can check out this article on my
00:14:41
blog called Cap theorem explained. Most
00:14:42
modern applications don't just store
00:14:44
text record. They also need to handle
00:14:46
images, videos, PDFs, and other large
00:14:49
files. Traditional databases are not
00:14:50
designed to store large unstructured
00:14:52
files efficiently. So, what's the
00:14:54
solution? We use blob storage like
00:14:56
Amazon S3. Blobs are like individual
00:14:58
files like images, videos, or documents.
00:15:01
These blobs are stored inside logical
00:15:03
containers or buckets in the cloud. Each
00:15:05
file gets a unique URL making it easy to
00:15:07
retrieve and serve over the web. There
00:15:09
are several advantages with using blob
00:15:11
storage like scalability, pay as you go
00:15:14
model, automatic replication, easy
00:15:16
access. A common use case is to stream
00:15:18
audio or video files to user
00:15:20
applications in real time. But streaming
00:15:22
the video file directly from blob
00:15:24
storage can be slow, especially if the
00:15:26
data is stored in a distant location.
00:15:28
For example, imagine you are in India
00:15:30
trying to watch a YouTube video that's
00:15:32
hosted on a server in California. Since
00:15:34
the video data has to travel across the
00:15:36
world, this could lead to buffering and
00:15:38
slow load times. A content delivery
00:15:40
network or CDN solves this problem by
00:15:42
delivering content faster to users based
00:15:44
on their location. A CDN is a global
00:15:46
network of distributed servers that work
00:15:48
together to deliver web content like
00:15:50
HTML pages, JavaScript files, images,
00:15:53
and videos to users based on their
00:15:55
geographic location. Since content is
00:15:56
served from the closest CDN server,
00:15:58
users experience faster load times with
00:16:00
minimal buffering. Let's move to the
00:16:02
next system design concept which can
00:16:04
help us build realtime applications.
00:16:06
Most web applications use HTTP which
00:16:08
follows a request response model. The
00:16:11
client sends a request. The server
00:16:12
processes the request and sends a
00:16:14
response. If the client needs new data,
00:16:16
it must send another request. This works
00:16:18
fine for static web pages but it's too
00:16:21
slow and inefficient for real-time
00:16:22
applications like live chat
00:16:24
applications, stock market dashboards or
00:16:27
online multiplayer games. With HTTP, the
00:16:29
only way to get real-time update is
00:16:31
through frequent polling, sending
00:16:33
repeated request every few seconds. But
00:16:35
polling is inefficient because it
00:16:37
increases the server load and waste
00:16:39
bandwidth. As most responses are empty
00:16:41
when there is no new data, webockets
00:16:43
solve this problem by allowing
00:16:44
continuous two-way communication between
00:16:46
the client and the server over a single
00:16:48
persistent connection. The client
00:16:50
initiates a websocket connection with
00:16:52
the server. Once established, the
00:16:54
connection remains open. The server can
00:16:55
push updates to the client at any time
00:16:58
without waiting for a request. The
00:16:59
client can also send messages instantly
00:17:01
to the server. This enables real-time
00:17:03
interactions and eliminates the need for
00:17:05
polling. Webockets enables real-time
00:17:07
communication between a client and a
00:17:08
server. But what if a server needs to
00:17:10
notify another server when an event
00:17:12
occurs? For example, when a user makes a
00:17:15
payment, the payment gateway needs to
00:17:16
notify your application instantly.
00:17:19
Instead of constantly pulling an API to
00:17:21
check if an event has occurred, web
00:17:23
hooks allow a server to send an HTTP
00:17:25
request to another server as soon as the
00:17:27
event occurs. Here is how it works. The
00:17:29
receiver, for example, your app
00:17:31
registers a web hook URL with the
00:17:33
provider. When an event occurs, the
00:17:35
provider sends a HTTP post request to
00:17:37
the web hook URL with event details.
00:17:40
This saves server resources and reduces
00:17:42
unnecessary API calls. Traditionally
00:17:44
applications were built using a
00:17:45
monolithic architecture where all
00:17:47
features are inside one large codebase.
00:17:50
This setup works fine for small
00:17:52
applications but for large scale systems
00:17:54
monoliths become hard to manage, scale
00:17:56
and deploy. The solution is to break
00:17:58
down your application into smaller
00:18:00
independent services called
00:18:01
microservices that work together. Each
00:18:04
microser handles a single responsibility
00:18:06
has its own database and logic so it can
00:18:09
scale independently. communicates with
00:18:11
other microservices using APIs or
00:18:13
message cues. This way, services can be
00:18:15
scared and deployed individually without
00:18:16
affecting the entire system. However,
00:18:18
when multiple microservices need to
00:18:20
communicate, direct API calls aren't
00:18:22
always efficient. This is where message
00:18:24
cues come in. Synchronous communication,
00:18:26
for example, waiting for immediate
00:18:28
responses doesn't scale well. A message
00:18:30
Q enables services to communicate
00:18:32
asynchronously, allowing requests to be
00:18:34
processed without blocking other
00:18:35
operations. Here is how it works. There
00:18:37
is a producer which places a message in
00:18:39
the queue. The queue temporarily host
00:18:41
the message. The consumer retrieves the
00:18:43
message and processes it. Using message
00:18:45
cues, we can decouple services and
00:18:47
improve the scalability and we can
00:18:49
prevent overload on internal services
00:18:51
within our system. But how do we prevent
00:18:53
overload for the public APIs and
00:18:55
services that we deploy? For that we use
00:18:57
rate limiting. Imagine a bot starts
00:18:59
making thousands of requests per second
00:19:01
to your website. Without restrictions,
00:19:03
this could crash your servers by
00:19:04
consuming all available resources and
00:19:06
degrade performance for legitimate
00:19:08
users. Rate limiting restricts the
00:19:10
number of requests a client can send
00:19:11
within a specific time frame. Every user
00:19:14
or IP address is assigned a request
00:19:16
kota, for example, 100 requests per
00:19:18
minute. If they exceed this limit, the
00:19:20
server blocks additional requests
00:19:21
temporarily and returns an error. There
00:19:23
are various rate limiting algorithms.
00:19:25
Some of the popular ones are fix window,
00:19:27
sliding window, and token bucket. We
00:19:29
don't need to implement our own rate
00:19:31
limiting system. This can be handled by
00:19:32
something called an API gateway. An API
00:19:34
gateway is a centralized service that
00:19:36
handles authentication, rate limiting,
00:19:38
logging, monitoring, request routing,
00:19:40
and much more. Imagine a
00:19:41
microservices-based application with
00:19:43
multiple services. Instead of exposing
00:19:45
each service directly, an API gateway
00:19:47
acts as a single entry point for all
00:19:49
client request. It routes the request to
00:19:51
the appropriate microser and the
00:19:53
response is sent back through the
00:19:54
gateway to the client. API gateway
00:19:56
simplifies API management and improves
00:19:58
the scalability and security. In
00:20:00
distributed systems, network failures
00:20:02
and service retries are common. If a
00:20:04
user accidentally refreshes a payment
00:20:06
page, the system might receive two
00:20:07
payment request instead of one. Adam
00:20:09
potency ensures that repeated request
00:20:11
produced the same result as if the
00:20:13
request was made only once. Here is how
00:20:15
it works. Each request is assigned a
00:20:17
unique ID. Before processing, the system
00:20:19
checks if the request has already been
00:20:21
handled. If yes, it ignores the
00:20:23
duplicate request. If no, it processes
00:20:25
the request normally. If you enjoyed
00:20:27
this video, I think you will love my
00:20:28
weekly newsletter where I dive deeper
00:20:30
into system design concepts with real
00:20:32
world examples. I also share articles on
00:20:34
system design interview questions and
00:20:36
tips to help you prepare for interviews.
00:20:38
You can subscribe it at
00:20:39
blog.algamaster.io. Thanks for watching
00:20:41
and I will see you in the next