I ACED my Technical Interviews knowing these System Design Basics

00:09:40
https://www.youtube.com/watch?v=FxAom29OEKE

Resumo

TLDRThe video provides a detailed overview of designing a scalable social media platform. It begins by highlighting the limitations of using a simple web server and database and emphasizes the importance of distributed systems. Key characteristics of these systems, such as scalability, reliability, and efficiency are discussed. The presenter covers load balancing strategies, caching methods, and the considerations when choosing between SQL and NoSQL databases. It also details data partitioning techniques for managing the growing amount of data as the user base expands. The video equips viewers with foundational knowledge necessary for system design.

Conclusões

  • ⚙️ Start with a simple web server and database.
  • 🌍 Transition to a distributed system for scalability.
  • 📈 Understand scalability: horizontal vs vertical.
  • 🔄 Use load balancers to manage traffic efficiently.
  • ⚡ Implement caching to improve data retrieval speed.
  • 📊 Consider SQL for structured data and NoSQL for flexibility.
  • 📂 Choose appropriate database based on requirements.
  • 🔍 Use indexing to speed up query performance.
  • 🔄 Partition databases to manage large volumes of data.
  • 🔁 Balance performance and reliability in system design.

Linha do tempo

  • 00:00:00 - 00:09:40

    The video discusses the design of a scalable social media platform, starting with the basics of web servers and databases. It emphasizes the need for distributed systems to handle millions of user requests, highlighting key characteristics such as scalability, reliability, availability, and efficiency. The CAP theorem is introduced, explaining the trade-offs between consistency, availability, and partition tolerance in distributed systems. The importance of load balancers is also covered, detailing how they distribute requests across servers to prevent overload and ensure reliability. Caching strategies are discussed to improve data retrieval speed, along with the challenges of maintaining data consistency. The video compares SQL and NoSQL databases, outlining their structures, scalability, and use cases. Indexing is introduced as a method to enhance query performance, while data partitioning is suggested as a solution for managing large databases. The video concludes by summarizing the evolution of the platform's architecture and inviting viewers to learn more about system design.

Mapa mental

Vídeo de perguntas e respostas

  • What is horizontal scaling?

    Horizontal scaling involves adding more servers to handle increased load.

  • What is vertical scaling?

    Vertical scaling means upgrading existing hardware to improve performance.

  • What does the CAP theorem state?

    The CAP theorem states that a distributed system can only guarantee two of the three properties: consistency, availability, and partition tolerance.

  • What is a load balancer?

    A load balancer distributes incoming requests across multiple servers to prevent any single server from being overwhelmed.

  • What are caching strategies?

    Caching strategies include write-through, write-around, write-back, and eviction policies to manage data consistency and retrieval efficiency.

  • What are the types of databases discussed?

    The video discusses SQL (relational) and NoSQL (non-relational) databases, comparing their structures, scalability, and use cases.

  • How does partitioning improve database performance?

    Partitioning breaks large databases into smaller parts, improving availability, performance, and load balancing.

  • What is consistent hashing?

    Consistent hashing minimizes data redistribution when scaling the number of servers, allowing for dynamic scaling.

  • What are primary and secondary indexes?

    Primary indexes uniquely identify records, while secondary indexes allow faster searches on non-primary key columns.

  • What challenges does caching present?

    Caching must maintain data consistency and ensure data is in sync with the original source.

Ver mais resumos de vídeos

Obtenha acesso instantâneo a resumos gratuitos de vídeos do YouTube com tecnologia de IA!
Legendas
en
Rolagem automática:
  • 00:00:00
    design a social media platform that can
  • 00:00:02
    handle millions of user requests where
  • 00:00:04
    do you even start from here today we'll
  • 00:00:06
    walk through core fundamentals that you
  • 00:00:08
    need to get started we can start with a
  • 00:00:10
    simple web server and one database to
  • 00:00:12
    store your user data however this will
  • 00:00:14
    not scale as your user base grows so
  • 00:00:17
    distributed system are the go-to
  • 00:00:19
    solution these are network of
  • 00:00:20
    independent computers working as one
  • 00:00:22
    coherent system when we talk about
  • 00:00:24
    distributed system we need to understand
  • 00:00:26
    the key characteristics scalability this
  • 00:00:29
    is the system ability to handle growing
  • 00:00:31
    demands there are two ways to scale
  • 00:00:33
    horizontal scaling by adding more
  • 00:00:35
    servers and vertical scaling by
  • 00:00:37
    upgrading existing Hardwares reliability
  • 00:00:40
    a reliable system continues to function
  • 00:00:42
    correctly even when components fail
  • 00:00:45
    availability is the percentage of time a
  • 00:00:48
    system remains operational this is often
  • 00:00:50
    expressed in nines for example 99.9%
  • 00:00:53
    availability means the system is down
  • 00:00:55
    for no more than 8.76 hours per year
  • 00:00:58
    while 99.99% with would only be down for
  • 00:01:00
    52.6 minutes per year efficiency
  • 00:01:03
    measured by two main factors latency
  • 00:01:06
    which is the delay in getting the first
  • 00:01:07
    response and throughput which is the
  • 00:01:09
    number of operations handled in a given
  • 00:01:12
    time these characteristics often involve
  • 00:01:14
    trade-offs your goal is to balance these
  • 00:01:16
    factors based on the given requirements
  • 00:01:18
    although ideal distributed system face
  • 00:01:20
    inherent limitations the cap theorum
  • 00:01:23
    states that a distributed system can
  • 00:01:24
    only guarantee two out of the three
  • 00:01:26
    properties consistency all notes display
  • 00:01:29
    identically data guaranteeing that reads
  • 00:01:31
    always reflect the most recent rate
  • 00:01:33
    availability every request receives a
  • 00:01:35
    respond without guaranteeing that the
  • 00:01:38
    data is the most recent partition
  • 00:01:40
    tolerant the system continues to
  • 00:01:41
    function despite Network failures
  • 00:01:43
    between note this trade-off is crucial
  • 00:01:45
    in designing distributed system
  • 00:01:47
    influencing how systems handle data
  • 00:01:49
    updates and respon to failures now our
  • 00:01:52
    architecture uses multiple web servers
  • 00:01:54
    which is amazing because we can handle
  • 00:01:55
    more load by adding more servers but
  • 00:01:58
    what happens if one server ends up
  • 00:02:00
    receiving more requests than others to
  • 00:02:02
    manage distributed system load
  • 00:02:03
    efficiently we need a load balancer
  • 00:02:06
    which distributes incoming requests
  • 00:02:07
    across multiple servers to ensure that
  • 00:02:10
    no single servers becomes overwhelmed if
  • 00:02:12
    one server goes down the load balancer
  • 00:02:14
    will only redirect traffic to healthy
  • 00:02:16
    servers a load balancer can be placed at
  • 00:02:18
    various levels between the users and web
  • 00:02:20
    servers between web servers and
  • 00:02:22
    application servers and between the
  • 00:02:24
    application servers and databases there
  • 00:02:26
    are several algorithms load balancers
  • 00:02:28
    used to distribute traffic such as lease
  • 00:02:30
    connection method sends request to
  • 00:02:32
    server with the fewest active
  • 00:02:34
    connections round robin cycle through a
  • 00:02:36
    list of servers sequentially IP hash
  • 00:02:39
    uses the client's IP address to
  • 00:02:40
    determine which server receives the
  • 00:02:42
    request which one to use really depends
  • 00:02:44
    on the specific needs it's also worth
  • 00:02:47
    noting that load balancer itself could
  • 00:02:49
    become a single point of failure to
  • 00:02:51
    prevent this we can add another load
  • 00:02:53
    balancer for standby if the primary one
  • 00:02:55
    fails the second one takes over
  • 00:02:57
    immediately things are going great so
  • 00:03:00
    far but we start to notice that these
  • 00:03:01
    servers often request the same data to
  • 00:03:04
    our database that's where caching comes
  • 00:03:06
    into play caching takes advantage of the
  • 00:03:08
    principle that recently requested data
  • 00:03:10
    is likely to be requested again
  • 00:03:12
    retrieving data from cash is typically
  • 00:03:14
    way faster than from the original
  • 00:03:16
    database aside from application cache
  • 00:03:18
    there is also content delivery Network
  • 00:03:20
    or CDN which is ideal for serving static
  • 00:03:23
    media cdns cach content closer to the
  • 00:03:26
    user to reduce latency however caching
  • 00:03:28
    does come with its own set of challenges
  • 00:03:30
    which is maintaining data consistency
  • 00:03:32
    and making sure that the data is in sync
  • 00:03:34
    with the source of Truth we don't want
  • 00:03:36
    to serve the data from cash if it's not
  • 00:03:38
    up to date this leads us to cash and
  • 00:03:40
    validation strategies write through data
  • 00:03:42
    is written to both cash and storage at
  • 00:03:44
    the same time uring consistency but
  • 00:03:47
    increasing right latency right around
  • 00:03:50
    data bypasses the cash and goes directly
  • 00:03:52
    to the storage preventing cash flooding
  • 00:03:54
    but potentially increasing read latency
  • 00:03:56
    for new data right back data is written
  • 00:03:59
    to cash first and later to storage
  • 00:04:01
    offering low latency but risking data
  • 00:04:04
    loss in case of system failures when a
  • 00:04:06
    cach reaches capacity we need eviction
  • 00:04:09
    policy to make room for new data some
  • 00:04:11
    common ones are least recently used
  • 00:04:13
    removes the least recently accessed
  • 00:04:16
    data first in first out removes the
  • 00:04:19
    oldest item
  • 00:04:21
    first and least frequently used removes
  • 00:04:23
    the least often accessed items as our
  • 00:04:26
    platform grows we need to think about
  • 00:04:28
    storage strategy should we stick with a
  • 00:04:30
    traditional SQL database or go with no
  • 00:04:32
    SQL SQL stores data and tables with
  • 00:04:35
    predefined schemas each row contains all
  • 00:04:37
    the information about a piece of record
  • 00:04:40
    if you want to add a new column the
  • 00:04:41
    changes would need to be applied to all
  • 00:04:43
    the records in the table popular SQL
  • 00:04:45
    database include MySQL Oracle and
  • 00:04:47
    postgress nosql on the other hand is a
  • 00:04:50
    non-relational databases that have a
  • 00:04:52
    more flexible data structure they come
  • 00:04:54
    in four main types key value stores like
  • 00:04:57
    redis document databases like m DB wide
  • 00:05:00
    column like Cassandra graph databases
  • 00:05:03
    like neo4j when comparing SQL versus no
  • 00:05:06
    SQL we often look at the structure SQL
  • 00:05:08
    has a rigid schema while no SQL has a
  • 00:05:11
    more flexible schema querying SQL
  • 00:05:14
    databases use standard structured query
  • 00:05:17
    language while no SQL databases queries
  • 00:05:19
    are more focused on collection of
  • 00:05:21
    documents in terms of scalability SQL
  • 00:05:24
    typically scales vertically although can
  • 00:05:26
    be done horizontally through sharting
  • 00:05:28
    while no SQL scales hor onally
  • 00:05:30
    reliability SQL is AIT compliant while
  • 00:05:33
    no SQL often sacrifices this for
  • 00:05:35
    performance and scalability AIT refers
  • 00:05:38
    to a set of principle where automac City
  • 00:05:40
    ensures that transaction is fully
  • 00:05:42
    completed or not at all consistency
  • 00:05:44
    guarantees that a transaction takes a
  • 00:05:46
    database from one valid state to another
  • 00:05:49
    enforcing all defined rules isolation
  • 00:05:51
    keeps transactions separate so their
  • 00:05:53
    operations don't interfere with each
  • 00:05:55
    other durability ensures that once a
  • 00:05:57
    transaction is committed it remains
  • 00:05:59
    permanent even in case of failure so
  • 00:06:01
    which one to use we want to use SQL when
  • 00:06:04
    we need access compliance than financial
  • 00:06:06
    applications and when our data structure
  • 00:06:08
    doesn't change often no SQL would be a
  • 00:06:10
    good option if we're dealing with large
  • 00:06:12
    volumes of unstructured data or if we're
  • 00:06:14
    in need of Rapid development that
  • 00:06:16
    requires a lot of flexibility after
  • 00:06:18
    choosing our database we notice that
  • 00:06:20
    queries are really slow and we need to
  • 00:06:22
    fix this ASAP we notice that data that
  • 00:06:24
    we're querying doesn't have an index so
  • 00:06:26
    we're constantly having to search
  • 00:06:28
    through the entire user table every
  • 00:06:30
    single time indexes work by creating a
  • 00:06:32
    separate data structure that points to
  • 00:06:34
    the location of the actual data speeding
  • 00:06:36
    up search operations the most common
  • 00:06:38
    types of indexes are primary key the
  • 00:06:40
    unique identifier for each record in the
  • 00:06:43
    table secondary index the additional
  • 00:06:45
    index on a non-p primary key columns for
  • 00:06:47
    faster search query such as searching
  • 00:06:49
    for users's first name Composite Index
  • 00:06:52
    which created on multiple columns useful
  • 00:06:54
    for queries involving those columns
  • 00:06:56
    together such as first name and last
  • 00:06:58
    name not an end but worth mentioning
  • 00:07:01
    foreign key which is a constraint that
  • 00:07:03
    enforces a relationship between columns
  • 00:07:05
    and different tables while index is
  • 00:07:07
    dramatically improve read performance
  • 00:07:09
    they can also slow down right operations
  • 00:07:11
    this is because every time you insert
  • 00:07:13
    update or delete data the index must
  • 00:07:16
    also be updated that's why it's very
  • 00:07:18
    important that we are decisive and
  • 00:07:20
    intentional when creating indexes
  • 00:07:22
    because we designed such an amazing
  • 00:07:23
    platform so many users decide to sign up
  • 00:07:26
    for this app and that we're now facing
  • 00:07:28
    challenges with the sheer volume of data
  • 00:07:30
    our database is literally going to
  • 00:07:32
    explode so you try to beef up your
  • 00:07:34
    database by adding more Hardware but the
  • 00:07:36
    growth continues and it's just not
  • 00:07:38
    enough when our database can no longer
  • 00:07:40
    scale vertically we can look into Data
  • 00:07:42
    partitioning which is a technique for
  • 00:07:44
    breaking large databases into smaller
  • 00:07:47
    more manageable Parts this improves
  • 00:07:49
    performance availability and load
  • 00:07:50
    balancing as your application scale
  • 00:07:52
    there are three main partitioning
  • 00:07:54
    methods horizontal partitioning which
  • 00:07:56
    divides rows of a table across multiple
  • 00:07:59
    data bases vertical partitioning
  • 00:08:01
    separates entire features or columns
  • 00:08:03
    into two different databases and
  • 00:08:05
    directory based partitioning uses a
  • 00:08:07
    lookup service to abstract the
  • 00:08:09
    partitioning scheme partitioning can be
  • 00:08:11
    done on various criteria key or hash
  • 00:08:13
    base applies a hash function to a key
  • 00:08:16
    attribute to determine which partition
  • 00:08:18
    the data belongs to a notable approach
  • 00:08:20
    here is consistent hashing which is a
  • 00:08:23
    technique that minimizes data
  • 00:08:24
    redistribution when scaling the number
  • 00:08:26
    of servers at a very high level it works
  • 00:08:29
    by Distributing data across some number
  • 00:08:31
    of servers around a hash ring each data
  • 00:08:34
    is hashed to determine which server it
  • 00:08:36
    belongs to each server is also only
  • 00:08:39
    responsible for a portion of the hash
  • 00:08:41
    range when adding or removing servers
  • 00:08:43
    only a small fraction of data needs to
  • 00:08:45
    be remapped this makes it very easy to
  • 00:08:47
    scale dynamically and reduces the impact
  • 00:08:50
    of server changes list partitioning
  • 00:08:52
    assign each partition a list of value
  • 00:08:54
    storing each data based on which list
  • 00:08:57
    its key belongs to round robin this
  • 00:08:59
    distribute data evenly across partition
  • 00:09:01
    in a circular order composite
  • 00:09:03
    partitioning combines two or more
  • 00:09:05
    partitioning methods while partitioning
  • 00:09:07
    solve scaling issues it also introduces
  • 00:09:10
    its own challenges like difficulty in
  • 00:09:12
    joining across multiple partitions
  • 00:09:14
    leading to potentially tricky data
  • 00:09:17
    rebalancing we've taken our social media
  • 00:09:20
    platform from a simple single server
  • 00:09:22
    setup to a robust scalable architecture
  • 00:09:25
    I couldn't cover everything in this
  • 00:09:27
    introductory overview to system design
  • 00:09:29
    and there's just so much more to cover
  • 00:09:31
    if you're interested let me know if you
  • 00:09:32
    want to see more of this but I hope that
  • 00:09:34
    you are able to learn something new
  • 00:09:35
    today as always thank you so much for
  • 00:09:37
    watching and see you in the next one
Etiquetas
  • scalability
  • distributed systems
  • load balancer
  • caching strategies
  • SQL
  • NoSQL
  • CAP theorem
  • data partitioning
  • indexing
  • system design