System Design was HARD until I Learned these 30 Concepts

00:20:43
https://www.youtube.com/watch?v=s9Qh9fWeOAk

Ringkasan

TLDRThis video outlines 30 key system design concepts essential for developers looking to advance their careers. It begins with the client-server architecture, explaining how clients communicate with servers using IP addresses and DNS. The video then delves into HTTP/HTTPS protocols, the role of APIs, and compares REST and GraphQL. It discusses database types (SQL vs. NoSQL) and scaling techniques (vertical vs. horizontal), emphasizing the importance of load balancing, caching, indexing, replication, and sharding. The presenter also covers advanced topics like the CAP theorem, API gateways, rate limiting, and idempotency, providing insights from personal experience in software engineering.

Takeaways

  • 💻 Understand client-server architecture for web applications.
  • 🌐 Learn how DNS maps domain names to IP addresses.
  • 🔒 Use HTTPS for secure data transmission.
  • 📡 APIs facilitate communication between clients and servers.
  • 📊 REST vs. GraphQL: Choose based on data needs.
  • 🗄️ SQL vs. NoSQL: Select based on data structure and scalability.
  • 📈 Horizontal scaling improves system reliability.
  • ⚖️ Load balancers distribute traffic across servers.
  • 📚 Caching speeds up data retrieval from databases.
  • 🔄 Idempotency ensures consistent request handling.

Garis waktu

  • 00:00:00 - 00:05:00

    The video introduces the importance of system design for developers aiming to advance their careers. It outlines the core concepts necessary for mastering system design, emphasizing the client-server architecture as a foundational element. The client sends requests to a server, which processes them and returns responses. The video explains how clients locate servers using IP addresses and domain names, facilitated by the DNS system. It also discusses the role of proxy servers in managing requests and the impact of latency on application performance.

  • 00:05:00 - 00:10:00

    The video continues by discussing HTTP and HTTPS protocols for client-server communication, highlighting the importance of APIs in structuring requests and responses. It introduces REST and GraphQL as popular API styles, explaining their differences and use cases. The video then transitions to data storage, comparing SQL and NoSQL databases, and discussing their respective advantages for different application needs. It emphasizes the importance of choosing the right database type based on the application's requirements for consistency and scalability.

  • 00:10:00 - 00:15:00

    As user traffic increases, the video explores scaling strategies for application servers, contrasting vertical scaling with horizontal scaling. It introduces load balancers as a solution for distributing requests across multiple servers, enhancing reliability and performance. The video also addresses database scaling techniques, including indexing, replication, and sharding, to manage large volumes of data efficiently. It explains how these techniques improve read and write performance while maintaining data availability.

  • 00:15:00 - 00:20:43

    The final segments cover advanced concepts such as caching, denormalization, and the CAP theorem in distributed systems. It discusses the use of blob storage for unstructured data and the role of CDNs in optimizing content delivery. The video concludes with an overview of real-time communication using WebSockets, the benefits of microservices architecture, and the importance of rate limiting and API gateways in managing public APIs. It emphasizes the need for idempotency in handling duplicate requests and invites viewers to subscribe for more insights on system design.

Tampilkan lebih banyak

Peta Pikiran

Video Tanya Jawab

  • What is client-server architecture?

    Client-server architecture is a model where a client (like a web browser) requests services from a server that processes these requests and sends back responses.

  • What is DNS?

    DNS, or Domain Name System, translates human-friendly domain names into IP addresses that computers use to identify each other on the internet.

  • What is the difference between REST and GraphQL?

    REST APIs follow a set of rules for structured communication, while GraphQL allows clients to request exactly the data they need, reducing over-fetching.

  • What are SQL and NoSQL databases?

    SQL databases use structured tables and schemas, while NoSQL databases offer flexibility with various data models and are designed for high scalability.

  • What is horizontal scaling?

    Horizontal scaling involves adding more servers to distribute the load, improving capacity and reliability, as opposed to vertical scaling which upgrades a single server.

  • What is caching?

    Caching stores frequently accessed data in memory to speed up data retrieval and reduce database load.

  • What is the CAP theorem?

    The CAP theorem states that in a distributed system, you can only achieve two out of three guarantees: consistency, availability, and partition tolerance.

  • What is an API gateway?

    An API gateway is a centralized service that manages API requests, handling tasks like authentication, rate limiting, and request routing.

  • What is rate limiting?

    Rate limiting restricts the number of requests a client can make to a server within a specific time frame to prevent overload.

  • What is idempotency?

    Idempotency ensures that repeated requests produce the same result as a single request, preventing duplicate processing.

Lihat lebih banyak ringkasan video

Dapatkan akses instan ke ringkasan video YouTube gratis yang didukung oleh AI!
Teks
en
Gulir Otomatis:
  • 00:00:00
    If you want to level up from a junior
  • 00:00:02
    developer to a senior engineer or land a
  • 00:00:04
    high paying job at a big tech company,
  • 00:00:06
    you need to learn system design. But
  • 00:00:07
    where do you start? To master system
  • 00:00:09
    design, you first need to understand the
  • 00:00:11
    core concepts and fundamental building
  • 00:00:13
    blocks that come up when designing real
  • 00:00:15
    world systems or tackling system design
  • 00:00:17
    interview questions. In this video, I
  • 00:00:18
    will break down the 30 most important
  • 00:00:20
    system design concepts you need to know.
  • 00:00:22
    Learning these concepts helped me land
  • 00:00:24
    high paying offers from multiple big
  • 00:00:25
    tech companies. And in my 8 years as a
  • 00:00:27
    software engineer, I've seen them used
  • 00:00:29
    repeatedly when building and scaling
  • 00:00:31
    large scale systems. Let's get started.
  • 00:00:34
    Almost every web application that you
  • 00:00:35
    use is built on this simple yet powerful
  • 00:00:37
    concept called client server
  • 00:00:39
    architecture. Here is how it works. On
  • 00:00:41
    one side, you have a client. This could
  • 00:00:43
    be a web browser, a mobile app or any
  • 00:00:45
    other frontend application. And on the
  • 00:00:46
    other side, you have a server, a machine
  • 00:00:49
    that runs continuously waiting to handle
  • 00:00:51
    incoming request. The client sends a
  • 00:00:53
    request to store, retrieve or modify
  • 00:00:55
    data. The server receives the request,
  • 00:00:58
    processes it, performs the necessary
  • 00:01:00
    operations, and sends back a response.
  • 00:01:02
    This sounds simple, right? But there is
  • 00:01:04
    a big question. How does the client even
  • 00:01:06
    know where to find a server? A client
  • 00:01:08
    doesn't magically know where a server
  • 00:01:10
    is. It needs an address to locate and
  • 00:01:12
    communicate with it. On the internet,
  • 00:01:14
    computers identify each other using IP
  • 00:01:16
    addresses, which works like phone
  • 00:01:18
    numbers for servers. Every publicly
  • 00:01:20
    deployed server has a unique IP address.
  • 00:01:22
    Something like this. When a client wants
  • 00:01:24
    to interact with a service, it must send
  • 00:01:26
    request to the correct IP address. But
  • 00:01:28
    there's a problem. When we visit a
  • 00:01:30
    website, we don't type its IP address.
  • 00:01:32
    We just enter the website name. Right?
  • 00:01:33
    Instead of relying on hard to remember
  • 00:01:35
    IP addresses, we use something much more
  • 00:01:37
    human friendly, domain names. But we
  • 00:01:39
    need a way to map a domain name to its
  • 00:01:41
    corresponding IP address. This is where
  • 00:01:43
    DNS or domain name system comes in. It
  • 00:01:46
    maps easy to remember domain names like
  • 00:01:48
    algo masteraster.io io to their
  • 00:01:49
    corresponding IP addresses. When you
  • 00:01:51
    type algo masteraster.io into your
  • 00:01:53
    browser, your computer asks a DNS server
  • 00:01:55
    for the corresponding IP address. Once
  • 00:01:57
    the DNS server responds with the IP,
  • 00:01:59
    your browser uses it to establish a
  • 00:02:01
    connection with the server and make a
  • 00:02:03
    request. You can find the IP address of
  • 00:02:05
    any domain name using the ping command.
  • 00:02:07
    When you visit a website, your request
  • 00:02:09
    doesn't always go directly to the
  • 00:02:10
    server. Sometimes it passes through a
  • 00:02:12
    proxy or reverse proxy first. A proxy
  • 00:02:14
    server acts as a middleman between your
  • 00:02:17
    device and the internet. When you
  • 00:02:18
    request a web page, the proxy forwards
  • 00:02:20
    your request to the target server,
  • 00:02:22
    retrieves the response, and sends it
  • 00:02:24
    back to you. A proxy server hides your
  • 00:02:26
    IP address, keeping your location and
  • 00:02:28
    identity private. A reverse proxy works
  • 00:02:30
    the other way around. It intercepts the
  • 00:02:32
    client request and forwards them to the
  • 00:02:34
    back end server based on predefined
  • 00:02:36
    rules. Whenever a client communicates
  • 00:02:38
    with a server, there is always some
  • 00:02:39
    delay. One of the biggest cause of this
  • 00:02:41
    delay is physical distance. For example,
  • 00:02:43
    if our server is in New York, but a user
  • 00:02:45
    in India sends a request, the data has
  • 00:02:48
    to travel halfway across the world and
  • 00:02:50
    then the response has to make the same
  • 00:02:52
    long trip back. This roundtrip delay is
  • 00:02:54
    called latency. High latency can make
  • 00:02:56
    applications feel slow and unresponsive.
  • 00:02:58
    One way to reduce latency is by
  • 00:03:00
    deploying our service across multiple
  • 00:03:02
    data centers worldwide. This way, users
  • 00:03:05
    can connect to the nearest server
  • 00:03:06
    instead of waiting for data to travel
  • 00:03:08
    across the globe. Once a connection is
  • 00:03:10
    made, how do clients and servers
  • 00:03:11
    actually communicate? Every time you
  • 00:03:13
    visit a website, your browser and the
  • 00:03:15
    server communicate using a set of rules
  • 00:03:17
    called HTTP. That's why most URLs is
  • 00:03:20
    start with HTTP or it secure version
  • 00:03:22
    HTTPS. The client sends a request to the
  • 00:03:24
    server. This request includes a header
  • 00:03:26
    containing details like the request
  • 00:03:28
    type, browser type, and cookies and
  • 00:03:30
    sometimes a request body which carries
  • 00:03:32
    additional data like form inputs. The
  • 00:03:34
    server processes the request and
  • 00:03:36
    responds with an HTTP response either
  • 00:03:39
    returning the requested data or an error
  • 00:03:41
    message if something goes wrong. HTTP
  • 00:03:43
    has a major security flaw. It sends data
  • 00:03:45
    in plain text. Modern websites use
  • 00:03:47
    HTTPS. HTTPS encrypts all data using SSL
  • 00:03:51
    or TLS protocol ensuring that even if
  • 00:03:53
    someone intercepts the request, they
  • 00:03:55
    can't read or alter it. But clients and
  • 00:03:57
    servers don't directly exchange raw HTTP
  • 00:03:59
    requests and response. HTTP is just a
  • 00:04:01
    protocol for transferring data, but it
  • 00:04:03
    doesn't define how requests should be
  • 00:04:05
    structured, what format responses should
  • 00:04:08
    be in, or how different clients should
  • 00:04:10
    interact with the server. This is where
  • 00:04:12
    APIs or application programming
  • 00:04:14
    interfaces come in. Think of an API as a
  • 00:04:16
    middleman that allows clients to
  • 00:04:18
    communicate with servers without
  • 00:04:20
    worrying about low-level details. A
  • 00:04:22
    client sends a request to an API. The
  • 00:04:24
    API hosted on a server processes the
  • 00:04:26
    request, interacts with databases or
  • 00:04:28
    other services, and prepares a response.
  • 00:04:30
    The API sends back the response in a
  • 00:04:32
    structured format, usually JSON or XML,
  • 00:04:35
    which the client understands and can
  • 00:04:37
    display. There are different API styles
  • 00:04:39
    to serve different needs. Two of the
  • 00:04:41
    most popular ones are REST and GraphQL.
  • 00:04:43
    Just a quick note to keep this video
  • 00:04:45
    concise, I'm covering these topics at a
  • 00:04:47
    high level, but if you want to go deeper
  • 00:04:48
    and learn these topics in more detail,
  • 00:04:50
    check out my blog at
  • 00:04:52
    blog.algammaster.io. Every week I
  • 00:04:54
    publish in-depth articles on complex
  • 00:04:56
    system design topics with clear
  • 00:04:57
    explanations and real world examples.
  • 00:05:00
    Make sure to subscribe so that you don't
  • 00:05:02
    miss my new articles. Among the
  • 00:05:03
    different API styles, REST is the most
  • 00:05:05
    widely used. A REST API follows a set of
  • 00:05:08
    rules that defines how clients and
  • 00:05:10
    servers communicate over HTTP in a
  • 00:05:12
    structured way. REST is stateless. Every
  • 00:05:15
    request is independent. Everything is
  • 00:05:17
    created as a resource. For example,
  • 00:05:18
    users, orders, products. It uses
  • 00:05:21
    standard HTTP methods like get to
  • 00:05:23
    retrieve data, post to create new data,
  • 00:05:25
    put to update existing data, and delete
  • 00:05:28
    to remove data. Rest APIs are great
  • 00:05:30
    because they are simple, scalable, and
  • 00:05:32
    easy to cast, but they have limitations,
  • 00:05:34
    especially when dealing with complex
  • 00:05:35
    data retrieval. REST endpoints often
  • 00:05:38
    return more data than needed, leading to
  • 00:05:40
    inefficient network uses. To address
  • 00:05:42
    these challenges, GraphQL was introduced
  • 00:05:44
    in 2015 by Facebook. Unlike REST,
  • 00:05:46
    GraphQL lets client ask for exactly what
  • 00:05:49
    they need. Nothing more, nothing less.
  • 00:05:50
    With a REST API, if you need a user's
  • 00:05:53
    profile along with their recent post,
  • 00:05:54
    you might have to make multiple requests
  • 00:05:56
    to different endpoints. With GraphQL,
  • 00:05:58
    you can combine those requests into one
  • 00:06:00
    and fetch exactly the data you need in a
  • 00:06:03
    single query. The server responds with
  • 00:06:05
    only the requested fields. However,
  • 00:06:07
    GraphQL also comes with trade-offs. It
  • 00:06:09
    requires more processing on the server
  • 00:06:11
    side, and it isn't as easy to cast as
  • 00:06:14
    REST. Now when a client makes a request
  • 00:06:16
    they usually want to store or retrieve
  • 00:06:18
    data. But this brings up another
  • 00:06:19
    question where is the actual data
  • 00:06:20
    stored. If our application deals with
  • 00:06:22
    small amounts of data we could store it
  • 00:06:25
    as a variable or as a file and load it
  • 00:06:27
    in memory. But modern applications
  • 00:06:29
    handle massive volumes of data far more
  • 00:06:31
    than what memory can efficiently handle.
  • 00:06:33
    That's why we need a dedicated server
  • 00:06:35
    for storing and managing data. A
  • 00:06:38
    database. A database is the backbone of
  • 00:06:40
    any modern application. It ensures that
  • 00:06:42
    data is stored, retrieved and managed
  • 00:06:44
    efficiently while keeping it secure,
  • 00:06:46
    consistent and durable. When a client
  • 00:06:48
    request to store or retrieve data, the
  • 00:06:50
    server communicates with the database,
  • 00:06:52
    fetches the required information and
  • 00:06:54
    returns it to the client. But not all
  • 00:06:56
    databases are the same. In system
  • 00:06:58
    design, we typically choose between SQL
  • 00:07:00
    and NoSQL databases. SQL databases store
  • 00:07:03
    data in tables with a strict predefined
  • 00:07:05
    schema and they follow ACIT properties.
  • 00:07:08
    Because of these guarantees, SQL
  • 00:07:09
    databases are ideal for applications
  • 00:07:11
    that require strong consistency and
  • 00:07:13
    structured relationships such as banking
  • 00:07:15
    systems. NoSQL databases on the other
  • 00:07:17
    hand are designed for high scalability
  • 00:07:19
    and performance. They don't require a
  • 00:07:20
    fixed schema and use different data
  • 00:07:22
    models including key value stores,
  • 00:07:25
    document stores, graph databases, and
  • 00:07:27
    wide column stores which are optimized
  • 00:07:29
    for large scale distributed data. So,
  • 00:07:31
    which one should you use? If you need
  • 00:07:33
    structured relational data with a strong
  • 00:07:35
    consistency, SQL is the better choice.
  • 00:07:38
    If you need high scalability, flexible
  • 00:07:39
    schema, NoSQL is the better choice. Many
  • 00:07:42
    modern applications use both SQL and
  • 00:07:43
    NoSQL together. As our user base grows,
  • 00:07:46
    so does the number of requests hitting
  • 00:07:47
    our application servers. One of the
  • 00:07:49
    quickest solutions is to upgrade the
  • 00:07:51
    existing server by adding more CPU, RAM
  • 00:07:53
    or storage. This approach is called
  • 00:07:55
    vertical scaling or scaling up which
  • 00:07:57
    makes a single machine more powerful.
  • 00:07:59
    But there are some major limitations
  • 00:08:01
    with this approach. You can't keep
  • 00:08:02
    upgrading a server forever. Every
  • 00:08:04
    machine has a maximum capacity. More
  • 00:08:06
    powerful servers become exponentially
  • 00:08:08
    more expensive. If this one server
  • 00:08:10
    crashes, the entire system goes down.
  • 00:08:12
    So, while vertical scaling is a quick
  • 00:08:13
    fix, it's not a long-term solution for
  • 00:08:15
    handling high traffic and ensuring
  • 00:08:17
    system reliability. Let's look at a
  • 00:08:19
    better approach, one that makes our
  • 00:08:21
    system more scalable and fall tolerant.
  • 00:08:23
    Instead of upgrading a single server,
  • 00:08:25
    what if we add more servers to share the
  • 00:08:27
    load? This approach is called horizontal
  • 00:08:29
    scaling or scaling out where we
  • 00:08:31
    distribute the workload across multiple
  • 00:08:32
    machines. More servers is equal to more
  • 00:08:34
    capacity which means the system can
  • 00:08:36
    handle increasing traffic more
  • 00:08:37
    effectively. If one server goes down,
  • 00:08:39
    others can take over which improves
  • 00:08:41
    reliability. But horizontal scaling
  • 00:08:44
    introduces a new challenge. How do
  • 00:08:46
    clients know which server to connect to?
  • 00:08:48
    This is where a load balancer comes in.
  • 00:08:50
    A load balancer sits between clients and
  • 00:08:52
    backend servers acting as a traffic
  • 00:08:54
    manager that distributes request across
  • 00:08:56
    multiple servers. If one server crashes,
  • 00:08:58
    the load balancer automatically
  • 00:08:59
    redirects traffic to another healthy
  • 00:09:01
    server. But how does a load balancer
  • 00:09:03
    decide which server should handle the
  • 00:09:05
    next request? It is a load balancing
  • 00:09:06
    algorithm such as roundroin, least
  • 00:09:09
    connections and IP hashing. So far we
  • 00:09:11
    have talked about scaling our
  • 00:09:12
    application servers. But as traffic
  • 00:09:14
    grows, the volume of data also
  • 00:09:16
    increases. At first we can scale a
  • 00:09:18
    database vertically by adding more CPU,
  • 00:09:20
    RAM and storage similar to application
  • 00:09:22
    servers. But there is a limit of how
  • 00:09:24
    much a single machine can handle. So
  • 00:09:26
    let's explore other database scaling
  • 00:09:28
    techniques that can help manage large
  • 00:09:29
    volumes of data efficiently. One of the
  • 00:09:32
    quickest and most effective ways to
  • 00:09:33
    speed up database read queries is
  • 00:09:35
    indexing. Think of it like the index
  • 00:09:37
    page at the back of a book. Instead of
  • 00:09:39
    flipping through every page, you jump
  • 00:09:40
    directly to the relevant section. A
  • 00:09:42
    database index works the same way. It's
  • 00:09:44
    a super efficient lookup table that
  • 00:09:46
    helps the database quickly locate the
  • 00:09:47
    required data without scanning the
  • 00:09:49
    entire table. An index is stores column
  • 00:09:51
    values along with pointers to actual
  • 00:09:53
    data rows in the table. Indexes are
  • 00:09:55
    typically created on columns that are
  • 00:09:57
    frequently queried such as primary keys,
  • 00:09:59
    foreign keys, and columns frequently
  • 00:10:01
    used in rare conditions. While indexes
  • 00:10:04
    speed up reads, they slow down rights
  • 00:10:06
    since the index needs to be updated
  • 00:10:08
    whenever data changes. That's why we
  • 00:10:10
    should only index the most frequently
  • 00:10:11
    accessed columns. Indexing can
  • 00:10:13
    significantly improve read performance.
  • 00:10:15
    But what if even indexing isn't enough
  • 00:10:17
    and our single database server can't
  • 00:10:19
    handle the growing number of read
  • 00:10:21
    request? That's where our next database
  • 00:10:23
    scaling technique, replication, comes
  • 00:10:24
    in. Just like we added more application
  • 00:10:26
    servers to handle increasing traffic, we
  • 00:10:28
    can scale our database by creating
  • 00:10:30
    copies of it across multiple servers.
  • 00:10:32
    Here is how it works. We have one
  • 00:10:34
    primary database also called the primary
  • 00:10:36
    replica that handles all write
  • 00:10:37
    operations. We have multiple read
  • 00:10:39
    replicas that handle read queries.
  • 00:10:41
    Whenever data is written to primary
  • 00:10:43
    database, it gets copied to read
  • 00:10:45
    replicas so that they stay in sync.
  • 00:10:47
    Replication improves the read
  • 00:10:48
    performance since read requests are
  • 00:10:50
    spread across multiple replicas reducing
  • 00:10:52
    the load on each one. This also improves
  • 00:10:54
    availability since if the primary
  • 00:10:56
    replica fails, a read replica can take
  • 00:10:58
    over as the new primary. Replication is
  • 00:11:00
    great for scaling read heavy
  • 00:11:02
    applications. But what do we need to
  • 00:11:04
    scale right operations or store huge
  • 00:11:05
    amounts of data? Let's say our service
  • 00:11:07
    became popular. It now has millions of
  • 00:11:09
    users and our database has grown to
  • 00:11:11
    terabytes of data. A single database
  • 00:11:13
    server will eventually struggle to
  • 00:11:15
    handle all this data efficiently.
  • 00:11:16
    Instead of keeping everything in one
  • 00:11:18
    place, we split the database into
  • 00:11:20
    smaller, more manageable pieces and
  • 00:11:22
    distribute them across multiple servers.
  • 00:11:24
    This technique is called sarding. Here
  • 00:11:26
    is how it works. We divide the database
  • 00:11:28
    into smaller parts called Sards. Each
  • 00:11:30
    sard contains a subset of the total
  • 00:11:32
    data. Data is distributed based on the
  • 00:11:34
    sarding key. For example, user ID. By
  • 00:11:36
    distributing data this way, we reduce
  • 00:11:39
    database load since each SA handles only
  • 00:11:41
    a portion of queries and speed of read
  • 00:11:43
    and write performance since queries are
  • 00:11:45
    distributed across multiple SS instead
  • 00:11:47
    of hitting a single database. Singing is
  • 00:11:49
    also referred to as horizontal
  • 00:11:50
    partitioning since it splits data by
  • 00:11:53
    rows. But what if the issue isn't the
  • 00:11:55
    number of rows but rather the number of
  • 00:11:57
    columns? In such cases, we use vertical
  • 00:11:59
    partitioning where we split the database
  • 00:12:01
    by columns. Imagine we have a user table
  • 00:12:04
    that stores profile details, login
  • 00:12:06
    history, and billing information. As
  • 00:12:09
    this table grows, queries become slower
  • 00:12:11
    because the table must scan many columns
  • 00:12:13
    even when a request only needs a few
  • 00:12:15
    specific fields. To optimize this, we
  • 00:12:17
    use vertical partitioning where we split
  • 00:12:19
    user table into smaller, more focused
  • 00:12:21
    tables based on users patterns. This
  • 00:12:23
    improves query performance since each
  • 00:12:25
    request only scans relevant columns
  • 00:12:27
    instead of the entire table. It also
  • 00:12:29
    reduces unnecessary disk IO making data
  • 00:12:31
    retrieval quicker. However, no matter
  • 00:12:33
    how much we optimize the database,
  • 00:12:35
    retrieving data from disk is always
  • 00:12:36
    slower than retrieving it from memory.
  • 00:12:38
    What if we could store frequently access
  • 00:12:40
    data in memory? This is called caching.
  • 00:12:42
    Caching is used to optimize the
  • 00:12:43
    performance of a system by storing
  • 00:12:45
    frequently access data in memory instead
  • 00:12:47
    of repeatedly fetching it from database.
  • 00:12:49
    One of the most common caching
  • 00:12:50
    strategies is the cash aside pattern.
  • 00:12:52
    Here is how it works. When a user
  • 00:12:54
    requests the data, the application first
  • 00:12:56
    checks the C. If the data is in the
  • 00:12:58
    cache, it's returned instantly avoiding
  • 00:13:00
    a database call. If the data is not in
  • 00:13:02
    the C, the application retrieves it from
  • 00:13:04
    the database. It stores it in the C for
  • 00:13:07
    future request and returns it to the
  • 00:13:08
    user. Next time the same data is
  • 00:13:10
    requested, it's served directly from C,
  • 00:13:13
    making the request much faster. To
  • 00:13:15
    prevent outdated data from being served,
  • 00:13:17
    we use time to live value or TTL. Let's
  • 00:13:19
    look at next database scaling technique.
  • 00:13:21
    Most relational database use
  • 00:13:23
    normalization to store data efficiently
  • 00:13:25
    by breaking it into separate tables.
  • 00:13:27
    While this reduces redundancy, it also
  • 00:13:29
    introduces joins. When retrieving data
  • 00:13:31
    from multiple tables, the data must
  • 00:13:33
    combine them using join operations,
  • 00:13:35
    which can slow down queries as the data
  • 00:13:37
    set grows. Denormalization reduces the
  • 00:13:39
    number of joints by combining related
  • 00:13:41
    data into a single table, even if it
  • 00:13:43
    means some data gets duplicated. For
  • 00:13:45
    example, instead of keeping users and
  • 00:13:47
    orders in a separate table, we create
  • 00:13:49
    user orders table that stores user
  • 00:13:51
    details along with the latest orders.
  • 00:13:53
    Now, when retrieving a user's order
  • 00:13:55
    history, we don't need a join operation.
  • 00:13:57
    The data is already stored together
  • 00:13:58
    leading to faster queries and better
  • 00:14:00
    read performance. Denormalization is
  • 00:14:02
    often used in read heavy applications
  • 00:14:04
    where speed is more critical. But the
  • 00:14:06
    downside is it leads to increased
  • 00:14:08
    storage and more complex update request.
  • 00:14:11
    As we scale our system across multiple
  • 00:14:13
    servers, databases and data centers, we
  • 00:14:15
    enter the world of distributed systems.
  • 00:14:18
    One of the fundamental principles of
  • 00:14:19
    distributed systems is the cap theorem
  • 00:14:21
    which states that no distributed system
  • 00:14:23
    can achieve all three of the following
  • 00:14:25
    at the same time. Consistency,
  • 00:14:27
    availability, and partition tolerance.
  • 00:14:29
    Since network failures are inevitable,
  • 00:14:31
    we must choose between consistency plus
  • 00:14:33
    partition tolerance or availability plus
  • 00:14:36
    partition tolerance. If you want to
  • 00:14:38
    learn about cap theorem in more detail,
  • 00:14:39
    you can check out this article on my
  • 00:14:41
    blog called Cap theorem explained. Most
  • 00:14:42
    modern applications don't just store
  • 00:14:44
    text record. They also need to handle
  • 00:14:46
    images, videos, PDFs, and other large
  • 00:14:49
    files. Traditional databases are not
  • 00:14:50
    designed to store large unstructured
  • 00:14:52
    files efficiently. So, what's the
  • 00:14:54
    solution? We use blob storage like
  • 00:14:56
    Amazon S3. Blobs are like individual
  • 00:14:58
    files like images, videos, or documents.
  • 00:15:01
    These blobs are stored inside logical
  • 00:15:03
    containers or buckets in the cloud. Each
  • 00:15:05
    file gets a unique URL making it easy to
  • 00:15:07
    retrieve and serve over the web. There
  • 00:15:09
    are several advantages with using blob
  • 00:15:11
    storage like scalability, pay as you go
  • 00:15:14
    model, automatic replication, easy
  • 00:15:16
    access. A common use case is to stream
  • 00:15:18
    audio or video files to user
  • 00:15:20
    applications in real time. But streaming
  • 00:15:22
    the video file directly from blob
  • 00:15:24
    storage can be slow, especially if the
  • 00:15:26
    data is stored in a distant location.
  • 00:15:28
    For example, imagine you are in India
  • 00:15:30
    trying to watch a YouTube video that's
  • 00:15:32
    hosted on a server in California. Since
  • 00:15:34
    the video data has to travel across the
  • 00:15:36
    world, this could lead to buffering and
  • 00:15:38
    slow load times. A content delivery
  • 00:15:40
    network or CDN solves this problem by
  • 00:15:42
    delivering content faster to users based
  • 00:15:44
    on their location. A CDN is a global
  • 00:15:46
    network of distributed servers that work
  • 00:15:48
    together to deliver web content like
  • 00:15:50
    HTML pages, JavaScript files, images,
  • 00:15:53
    and videos to users based on their
  • 00:15:55
    geographic location. Since content is
  • 00:15:56
    served from the closest CDN server,
  • 00:15:58
    users experience faster load times with
  • 00:16:00
    minimal buffering. Let's move to the
  • 00:16:02
    next system design concept which can
  • 00:16:04
    help us build realtime applications.
  • 00:16:06
    Most web applications use HTTP which
  • 00:16:08
    follows a request response model. The
  • 00:16:11
    client sends a request. The server
  • 00:16:12
    processes the request and sends a
  • 00:16:14
    response. If the client needs new data,
  • 00:16:16
    it must send another request. This works
  • 00:16:18
    fine for static web pages but it's too
  • 00:16:21
    slow and inefficient for real-time
  • 00:16:22
    applications like live chat
  • 00:16:24
    applications, stock market dashboards or
  • 00:16:27
    online multiplayer games. With HTTP, the
  • 00:16:29
    only way to get real-time update is
  • 00:16:31
    through frequent polling, sending
  • 00:16:33
    repeated request every few seconds. But
  • 00:16:35
    polling is inefficient because it
  • 00:16:37
    increases the server load and waste
  • 00:16:39
    bandwidth. As most responses are empty
  • 00:16:41
    when there is no new data, webockets
  • 00:16:43
    solve this problem by allowing
  • 00:16:44
    continuous two-way communication between
  • 00:16:46
    the client and the server over a single
  • 00:16:48
    persistent connection. The client
  • 00:16:50
    initiates a websocket connection with
  • 00:16:52
    the server. Once established, the
  • 00:16:54
    connection remains open. The server can
  • 00:16:55
    push updates to the client at any time
  • 00:16:58
    without waiting for a request. The
  • 00:16:59
    client can also send messages instantly
  • 00:17:01
    to the server. This enables real-time
  • 00:17:03
    interactions and eliminates the need for
  • 00:17:05
    polling. Webockets enables real-time
  • 00:17:07
    communication between a client and a
  • 00:17:08
    server. But what if a server needs to
  • 00:17:10
    notify another server when an event
  • 00:17:12
    occurs? For example, when a user makes a
  • 00:17:15
    payment, the payment gateway needs to
  • 00:17:16
    notify your application instantly.
  • 00:17:19
    Instead of constantly pulling an API to
  • 00:17:21
    check if an event has occurred, web
  • 00:17:23
    hooks allow a server to send an HTTP
  • 00:17:25
    request to another server as soon as the
  • 00:17:27
    event occurs. Here is how it works. The
  • 00:17:29
    receiver, for example, your app
  • 00:17:31
    registers a web hook URL with the
  • 00:17:33
    provider. When an event occurs, the
  • 00:17:35
    provider sends a HTTP post request to
  • 00:17:37
    the web hook URL with event details.
  • 00:17:40
    This saves server resources and reduces
  • 00:17:42
    unnecessary API calls. Traditionally
  • 00:17:44
    applications were built using a
  • 00:17:45
    monolithic architecture where all
  • 00:17:47
    features are inside one large codebase.
  • 00:17:50
    This setup works fine for small
  • 00:17:52
    applications but for large scale systems
  • 00:17:54
    monoliths become hard to manage, scale
  • 00:17:56
    and deploy. The solution is to break
  • 00:17:58
    down your application into smaller
  • 00:18:00
    independent services called
  • 00:18:01
    microservices that work together. Each
  • 00:18:04
    microser handles a single responsibility
  • 00:18:06
    has its own database and logic so it can
  • 00:18:09
    scale independently. communicates with
  • 00:18:11
    other microservices using APIs or
  • 00:18:13
    message cues. This way, services can be
  • 00:18:15
    scared and deployed individually without
  • 00:18:16
    affecting the entire system. However,
  • 00:18:18
    when multiple microservices need to
  • 00:18:20
    communicate, direct API calls aren't
  • 00:18:22
    always efficient. This is where message
  • 00:18:24
    cues come in. Synchronous communication,
  • 00:18:26
    for example, waiting for immediate
  • 00:18:28
    responses doesn't scale well. A message
  • 00:18:30
    Q enables services to communicate
  • 00:18:32
    asynchronously, allowing requests to be
  • 00:18:34
    processed without blocking other
  • 00:18:35
    operations. Here is how it works. There
  • 00:18:37
    is a producer which places a message in
  • 00:18:39
    the queue. The queue temporarily host
  • 00:18:41
    the message. The consumer retrieves the
  • 00:18:43
    message and processes it. Using message
  • 00:18:45
    cues, we can decouple services and
  • 00:18:47
    improve the scalability and we can
  • 00:18:49
    prevent overload on internal services
  • 00:18:51
    within our system. But how do we prevent
  • 00:18:53
    overload for the public APIs and
  • 00:18:55
    services that we deploy? For that we use
  • 00:18:57
    rate limiting. Imagine a bot starts
  • 00:18:59
    making thousands of requests per second
  • 00:19:01
    to your website. Without restrictions,
  • 00:19:03
    this could crash your servers by
  • 00:19:04
    consuming all available resources and
  • 00:19:06
    degrade performance for legitimate
  • 00:19:08
    users. Rate limiting restricts the
  • 00:19:10
    number of requests a client can send
  • 00:19:11
    within a specific time frame. Every user
  • 00:19:14
    or IP address is assigned a request
  • 00:19:16
    kota, for example, 100 requests per
  • 00:19:18
    minute. If they exceed this limit, the
  • 00:19:20
    server blocks additional requests
  • 00:19:21
    temporarily and returns an error. There
  • 00:19:23
    are various rate limiting algorithms.
  • 00:19:25
    Some of the popular ones are fix window,
  • 00:19:27
    sliding window, and token bucket. We
  • 00:19:29
    don't need to implement our own rate
  • 00:19:31
    limiting system. This can be handled by
  • 00:19:32
    something called an API gateway. An API
  • 00:19:34
    gateway is a centralized service that
  • 00:19:36
    handles authentication, rate limiting,
  • 00:19:38
    logging, monitoring, request routing,
  • 00:19:40
    and much more. Imagine a
  • 00:19:41
    microservices-based application with
  • 00:19:43
    multiple services. Instead of exposing
  • 00:19:45
    each service directly, an API gateway
  • 00:19:47
    acts as a single entry point for all
  • 00:19:49
    client request. It routes the request to
  • 00:19:51
    the appropriate microser and the
  • 00:19:53
    response is sent back through the
  • 00:19:54
    gateway to the client. API gateway
  • 00:19:56
    simplifies API management and improves
  • 00:19:58
    the scalability and security. In
  • 00:20:00
    distributed systems, network failures
  • 00:20:02
    and service retries are common. If a
  • 00:20:04
    user accidentally refreshes a payment
  • 00:20:06
    page, the system might receive two
  • 00:20:07
    payment request instead of one. Adam
  • 00:20:09
    potency ensures that repeated request
  • 00:20:11
    produced the same result as if the
  • 00:20:13
    request was made only once. Here is how
  • 00:20:15
    it works. Each request is assigned a
  • 00:20:17
    unique ID. Before processing, the system
  • 00:20:19
    checks if the request has already been
  • 00:20:21
    handled. If yes, it ignores the
  • 00:20:23
    duplicate request. If no, it processes
  • 00:20:25
    the request normally. If you enjoyed
  • 00:20:27
    this video, I think you will love my
  • 00:20:28
    weekly newsletter where I dive deeper
  • 00:20:30
    into system design concepts with real
  • 00:20:32
    world examples. I also share articles on
  • 00:20:34
    system design interview questions and
  • 00:20:36
    tips to help you prepare for interviews.
  • 00:20:38
    You can subscribe it at
  • 00:20:39
    blog.algamaster.io. Thanks for watching
  • 00:20:41
    and I will see you in the next
Tags
  • System Design
  • Client-Server Architecture
  • DNS
  • HTTP
  • REST
  • GraphQL
  • SQL
  • NoSQL
  • Scaling
  • Caching