What are the three stages of building a large language model?

Stage 1: Data preparation and understanding LLM architecture; Stage 2: Pre-training the model on unlabeled data; Stage 3: Fine-tuning the model for specific applications.

What is the focus of Stage 1?

Stage 1 focuses on data pre-processing, attention mechanisms, and understanding the architecture of large language models.

What is the significance of fine-tuning in LLMs?

Fine-tuning allows the model to adapt to specific tasks using labeled data, improving performance on those tasks.

What is the attention mechanism?

The attention mechanism allows the model to weigh the importance of different words in the input sequence when generating output.

What is the cost of pre-training GPT-3?

The total pre-training cost for GPT-3 is around $4.6 million.

What is the difference between pre-training and fine-tuning?

Pre-training is done on unlabeled data to create a foundational model, while fine-tuning uses labeled data to adapt the model for specific tasks.

What are some applications of fine-tuned LLMs?

Fine-tuned LLMs can be used for tasks like email classification and building chatbots.

What is the role of tokenization in data preparation?

Tokenization breaks down sentences into individual tokens, which are then transformed into high-dimensional vectors for processing.

What is the importance of vector embeddings?

Vector embeddings capture the semantic meaning of words, ensuring that similar words are represented closely in the vector space.

What is the main architecture behind modern LLMs?

The main architecture behind modern LLMs is the Transformer architecture, which utilizes attention mechanisms.

Lecture 6: Stages of building an LLM from Scratch

00:20:15

https://www.youtube.com/watch?v=z9fgKz1Drlc

Summary

TLDRThis lecture provides a comprehensive overview of the stages involved in building large language models (LLMs) from scratch. It reviews previous lectures on GPT architectures and learning methods, and introduces a structured roadmap divided into three main stages: Stage 1 focuses on data preparation, attention mechanisms, and understanding LLM architecture; Stage 2 covers the pre-training of the model on unlabeled data; and Stage 3 involves fine-tuning the model for specific applications. The lecture emphasizes the importance of understanding each stage thoroughly to build confidence and competence in LLM development, and sets the stage for hands-on coding in future lectures.

Takeaways

📚 Overview of LLM building stages
🔍 Focus on data preparation and architecture
💡 Importance of attention mechanisms
💰 High cost of pre-training GPT-3
🔄 Difference between pre-training and fine-tuning
📝 Applications of fine-tuned LLMs
🔑 Role of tokenization in data processing
📊 Significance of vector embeddings
⚙️ Understanding Transformer architecture
🚀 Hands-on coding in future lectures

Timeline

00:00:00 - 00:05:00
In this lecture series on building large language models (LLMs), the instructor recaps previous lectures, focusing on the evolution of the GPT architecture from GPT to GPT-4, and discusses the high costs associated with pre-training models like GPT-3. The lecture outlines the roadmap for upcoming sessions, emphasizing a hands-on approach to building LLMs, starting with data preparation, attention mechanisms, and LLM architecture. The instructor expresses gratitude for the foundational book used in the series and highlights the importance of understanding the underlying concepts before diving into practical applications.
00:05:00 - 00:10:00
The playlist is divided into three stages: Stage 1 focuses on building the foundational aspects of LLMs, including data pre-processing, attention mechanisms, and architecture. Key topics include tokenization, vector embeddings, and constructing data batches for training. The instructor emphasizes the importance of understanding these building blocks before moving on to the next stages, ensuring a comprehensive grasp of how LLMs function at a fundamental level.
00:10:00 - 00:15:00
Stage 2 will cover the pre-training of the LLM, where the assembled data and architecture will be used to train the model on unlabeled data. The instructor outlines the training process, including computing gradients, updating parameters, and generating sample text for evaluation. Additionally, the importance of saving and loading model weights is discussed, along with the integration of pre-trained weights from OpenAI to enhance the foundational model.
00:15:00 - 00:20:15
Stage 3 will focus on fine-tuning the LLM for specific applications, such as spam classification and chatbot development. The instructor stresses the significance of fine-tuning with labeled data to improve model performance on specific tasks. The lecture concludes with a recap of key concepts learned so far, including the transformative impact of LLMs on natural language processing, the necessity of pre-training and fine-tuning, and the pivotal role of the transformer architecture and attention mechanisms in enabling LLMs to perform a wide range of tasks.

Mind Map

Video Q&A

What are the three stages of building a large language model?
Stage 1: Data preparation and understanding LLM architecture; Stage 2: Pre-training the model on unlabeled data; Stage 3: Fine-tuning the model for specific applications.
What is the focus of Stage 1?
Stage 1 focuses on data pre-processing, attention mechanisms, and understanding the architecture of large language models.
What is the significance of fine-tuning in LLMs?
Fine-tuning allows the model to adapt to specific tasks using labeled data, improving performance on those tasks.
What is the attention mechanism?
The attention mechanism allows the model to weigh the importance of different words in the input sequence when generating output.
What is the cost of pre-training GPT-3?
The total pre-training cost for GPT-3 is around $4.6 million.
What is the difference between pre-training and fine-tuning?
Pre-training is done on unlabeled data to create a foundational model, while fine-tuning uses labeled data to adapt the model for specific tasks.
What are some applications of fine-tuned LLMs?
Fine-tuned LLMs can be used for tasks like email classification and building chatbots.
What is the role of tokenization in data preparation?
Tokenization breaks down sentences into individual tokens, which are then transformed into high-dimensional vectors for processing.
What is the importance of vector embeddings?
Vector embeddings capture the semantic meaning of words, ensuring that similar words are represented closely in the vector space.
What is the main architecture behind modern LLMs?
The main architecture behind modern LLMs is the Transformer architecture, which utilizes attention mechanisms.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!

Subtitles

Auto Scroll:

00:00:00
[Music]
00:00:16
hello everyone welcome to this lecture
00:00:19
in the building large language models
00:00:21
from scratch series we have covered five
00:00:25
lectures up till now and in the previous
00:00:28
lecture we looked at the gpt3
00:00:30
architecture in a lot of detail we also
00:00:33
saw the progression from GPT to gpt2 to
00:00:36
gpt3 and finally to GPT
00:00:39
4 uh we saw that the total pre-training
00:00:42
cost for gpt3 is around 4.6 million
00:00:46
which is insanely
00:00:48
high and up till now we have also looked
00:00:51
at the data set which was used for
00:00:53
pre-training gpt3 and we have seen this
00:00:56
several times until
00:00:58
now in the prev previous lecture we
00:01:00
learned about the differences between
00:01:02
zero shot versus few shot learning as
00:01:05
well so if you have not been through the
00:01:07
previous lectures we have already
00:01:09
covered five lectures in this series and
00:01:11
uh all of them have actually received a
00:01:13
very good response from YouTube and I've
00:01:16
received a number of comments saying
00:01:18
that they have really helped people so I
00:01:21
encourage you to go through those
00:01:23
videos in today's lecture we are going
00:01:25
to be discussing about what we will
00:01:28
exactly cover in the playlist in these
00:01:30
five lectures we have looked at some of
00:01:32
the theory modules some of the intuition
00:01:35
modules behind attention behind self
00:01:37
attention prediction of next word uh
00:01:40
zero short versus few short learning
00:01:42
basics of the Transformer architecture
00:01:45
data sets used for llm pre pre-training
00:01:48
difference between pre-training and fine
00:01:49
tuning Etc but now from the next lecture
00:01:54
onwards we are going to start with the
00:01:56
Hands-On aspects of actually building an
00:01:59
llm so I wanted to utilize this
00:02:01
particular lecture to give you a road
00:02:03
map of what all we will be doing in this
00:02:07
series and what all stages which we will
00:02:09
be covering during this
00:02:12
playlist so that is the title of today's
00:02:14
lecture stages of building a large
00:02:16
language model towards the end of this
00:02:18
lecture we will also do a recap of what
00:02:21
all we have learned until now so let's
00:02:24
get started with today's
00:02:26
lecture okay so we will break this
00:02:28
playlist into three stage stages stage
00:02:31
one stage two and stage three remember
00:02:35
before we get started that this material
00:02:37
which I showing is heavily borrowed from
00:02:40
U the book building a large language
00:02:42
model from scratch which is written by
00:02:44
sebasian rashka so I'm very grateful to
00:02:47
the author for writing this book which
00:02:49
is allowing me to make this
00:02:51
playlist okay so we'll be dividing the
00:02:54
playlist into three stages stage one
00:02:56
stage two and stage number three unfort
00:02:59
fortunately all of the playlists
00:03:01
currently which are available on YouTube
00:03:03
only go through some of these stages and
00:03:06
that two they do not cover these stages
00:03:08
in detail my plan is to devote a number
00:03:11
of lectures to each stage in this uh
00:03:15
playlist so that you get a very detailed
00:03:18
understanding of how the nuts and bolts
00:03:20
really
00:03:21
work so in stage one we are going to be
00:03:24
looking at uh essentially building a
00:03:26
large language model and we are going to
00:03:29
look at the building blocks which are
00:03:31
necessary so before we go to train the
00:03:34
large language model we need to do the
00:03:36
data pre-processing and sampling in a
00:03:38
very specific manner we need to
00:03:40
understand the attention mechanism and
00:03:42
we will need to understand the llm
00:03:44
architecture so in the stage one we are
00:03:46
going to focus on these three things
00:03:49
understanding how the data is collected
00:03:51
from different data sets how the data is
00:03:54
processed how the data is sampled number
00:03:56
one then we will go to attention
00:03:58
mechanism how to C out the attention
00:04:00
mechanism completely from scratch in
00:04:02
Python what is meant by key query value
00:04:05
what is the attention score what is
00:04:08
positional encoding what is Vector
00:04:10
embedding all of this will be covered in
00:04:12
this stage we'll also be looking at the
00:04:14
llm architecture such as how to stack
00:04:17
different layers on top of each other
00:04:19
where should the attention head go all
00:04:21
of these things essentially uh the
00:04:25
main understanding or the main part of
00:04:28
this stage will be to understand
00:04:29
understand the basic mechanism behind
00:04:32
the large language model so what exactly
00:04:34
we will cover in data preparation and
00:04:36
sampling first we'll see tokenization if
00:04:39
you are given sentences how to break
00:04:41
them down into individual tokens as we
00:04:44
have seen earlier a token can be thought
00:04:46
of as a unit of a sentence but there is
00:04:48
a particular way of doing tokenization
00:04:50
we'll cover that then we will cover
00:04:53
Vector embedding essentially after we do
00:04:56
tokenization every word needs to be
00:04:59
transformed into a very high dimensional
00:05:01
Vector space so that the semantic
00:05:04
meaning between words is captured as you
00:05:07
can see here we want apple banana and
00:05:10
orange to be closer together which are
00:05:12
seen in this red circle over here we
00:05:14
want King man and woman to be closer
00:05:17
together which is shown in the blue
00:05:18
circle and we want Sports such as
00:05:20
football Golf and Tennis to be closer
00:05:22
together as shown in the green these are
00:05:25
just representative examples what I want
00:05:27
to explain is that before we give the
00:05:30
data set for training we need to encode
00:05:32
every word so that the semantic meaning
00:05:36
between the words are captured so Words
00:05:38
which mean similar things lie closer
00:05:40
together so we will learn about Vector
00:05:43
embeddings in a lot of detail here we'll
00:05:45
also learn about positional encoding the
00:05:47
order in which the word appears in a
00:05:49
sentence is also very important and we
00:05:52
need to give that information to the
00:05:54
pre-training
00:05:55
model after learning about tokenization
00:05:58
Vector embedding we will learn about how
00:06:01
to construct batches of the data so if
00:06:04
we have a huge amount of data set how to
00:06:06
give the data in batches to uh GPT or to
00:06:09
the large language model which we are
00:06:11
going to build so we will be looking at
00:06:14
the next word prediction task so you
00:06:16
will be given a bunch of words and then
00:06:18
predicting the next word so we'll also
00:06:20
see the meaning of context how many
00:06:22
words should be taken for training to
00:06:25
predict the next output we'll see about
00:06:27
that and how to basically Fe the data in
00:06:31
different sets of batches so that the
00:06:33
computation becomes much more efficient
00:06:36
so we'll be implementing a data batching
00:06:38
sequence before giving all of the data
00:06:41
set into the large language model for
00:06:44
pre-training after this the second Point
00:06:46
as I mentioned here is the attention
00:06:48
mechanism so here is the attention
00:06:50
mechanism for the Transformer model
00:06:52
we'll first understand what is meant by
00:06:54
every single thing here what is meant by
00:06:56
multi-ad attention what is meant by Mas
00:06:59
multi head attention what is meant by
00:07:01
positional encoding input embedding
00:07:03
output embedding all of these things and
00:07:05
then we will build our own llm
00:07:08
architecture so uh these are the two
00:07:11
things attention mechanism and llm
00:07:13
architecture after we cover all of these
00:07:15
aspects we are essentially ready with
00:07:17
stage one of this playlist and then we
00:07:20
can move to the stage two stage two of
00:07:23
this series is essentially going to be
00:07:25
pre-training which is after we have
00:07:27
assembled all the data after we have
00:07:29
constructed the large language model
00:07:31
architecture which we are going to use
00:07:33
we are going to write down a code which
00:07:35
trains the large language model on the
00:07:37
underlying data set that is also called
00:07:40
as pre-training so the outcome of stage
00:07:43
two is to build a foundational model on
00:07:45
unlabeled
00:07:47
data now uh I'll just show a schematic
00:07:50
from the book which we will be following
00:07:52
so this is how the training data set
00:07:53
will look like we'll break it down into
00:07:56
epox and we will compute the gradient
00:08:00
uh of the loss in each Epoch and we'll
00:08:02
update the parameters towards the end
00:08:04
we'll generate sample text for visual
00:08:06
inspection this is what will happen
00:08:08
exactly in the training procedure of the
00:08:11
large language model and then we'll also
00:08:13
do model evaluation and loading
00:08:15
pre-train weaps so let me show you the
00:08:17
schematic for that so we'll do text
00:08:19
generation evaluation training and
00:08:21
validation losses then we'll write the
00:08:24
llm training function which I showed you
00:08:26
uh and then we'll do one more thing we
00:08:28
will Implement function to save and lo
00:08:30
load the large language model weights to
00:08:33
use or continue training the llm later
00:08:35
so there is no point in training the LM
00:08:38
from scratch every single time right
00:08:39
weight saving and loading essentially
00:08:41
saves you a ton of computational cost
00:08:43
and
00:08:44
memory and then at the end of this we'll
00:08:47
also load pre-trained weights from open
00:08:49
AI into our large language model so open
00:08:52
AI has already made some of the weights
00:08:54
available they are pre-trained weights
00:08:56
so we'll be loading uh pre-trained
00:08:58
weights from open a into our llm model
00:09:02
this is all what we'll be covering in
00:09:04
the stage two which is essentially
00:09:06
training Loop plus uh training Loop plus
00:09:09
model evaluation plus loading
00:09:10
pre-trained weights to build our
00:09:12
foundational model so the main goal of
00:09:15
stage two as I as I told you is
00:09:17
pre-training and llm on unlabelled data
00:09:20
great but we will not stop here after
00:09:22
this we move to stage number three and
00:09:25
the main goal of stage number three is
00:09:27
fine tuning the large language model so
00:09:29
if we want to build specific
00:09:31
applications we will do fine tuning in
00:09:33
this playlist we are going to build two
00:09:35
applications which are mentioned in the
00:09:37
book I showed you at the start one is
00:09:39
building a classifier and one is
00:09:41
building your own personal assistant so
00:09:44
here are some schematics to show so if
00:09:46
you want to let you have got a lot of
00:09:48
emails right and if you want to use your
00:09:50
llm to classify spam or no spam for
00:09:54
example you are a winner you have been
00:09:56
uh specially selected to receive th000
00:09:58
cash now this should be classified as
00:10:01
spam whereas hey just wanted to check if
00:10:03
we are still on for dinner tonight let
00:10:05
me know this will be not spam so we will
00:10:08
build a large language model this
00:10:10
application which classifies between
00:10:12
spam and no spam and we cannot just use
00:10:14
the pre-trained or foundational model
00:10:16
for this because we need to train with
00:10:17
labeled data to the pre-train model we
00:10:20
need to give some more data and tell it
00:10:22
that hey this is usually spam and this
00:10:24
is not spam can you use the foundational
00:10:26
model plus this additional specific
00:10:28
label data asset which I have given to
00:10:30
build a fine-tuned llm application for
00:10:34
email classification so this is what
00:10:36
we'll be building as the first
00:10:38
application the second application which
00:10:40
we'll be building is a type of a chat
00:10:42
bot which Bas basically answers queries
00:10:44
so there is an instruction there is an
00:10:46
input and there is an output and we'll
00:10:48
be building this chatbot after fine
00:10:51
tuning the large language model so if
00:10:54
you want to be a very serious llm
00:10:56
engineer all the stages are equally
00:10:58
important many students what they are
00:11:00
doing right now is that they just look
00:11:02
at stage number three and they either
00:11:04
use Lang chain let's
00:11:06
say they use Lang chain they use tools
00:11:09
like
00:11:10
AMA and they directly deploy
00:11:13
applications but they do not understand
00:11:15
what's going on in stage one and stage
00:11:17
two at all so this leaves you also a bit
00:11:19
underc confident and insecure about
00:11:21
whether I really know the nuts and bolts
00:11:23
whether I really know the details my
00:11:25
plan is to go over every single thing
00:11:27
without skipping even a single Concept
00:11:30
in stage one stage two and stage number
00:11:33
three so this is the plan which you'll
00:11:35
be following in this playlist and I hope
00:11:37
you are excited for this because at the
00:11:39
end of this really my vision for this
00:11:42
playlist is to make it the most detailed
00:11:44
llm playlist uh which many people can
00:11:46
refer not just students but working
00:11:48
professionals startup Founders managers
00:11:51
Etc and then you can once this playlist
00:11:53
is built over I think two to 3 months
00:11:56
later you can uh refer to whichever part
00:11:59
you are more interested in so people who
00:12:01
are following this in the early stages
00:12:03
of this journey it's awesome because
00:12:05
I'll reply to all the comments in the um
00:12:09
chat section and we'll build this
00:12:11
journey
00:12:13
together I want to end this a lecture by
00:12:16
providing a recap of what all we have
00:12:18
learned so far this is very uh this is
00:12:21
going to be very important because from
00:12:22
the next lecture we are going to start a
00:12:24
bit of the Hands-On
00:12:26
approach okay so number one large
00:12:29
language models have really transformed
00:12:31
uh the field of natural language
00:12:34
processing they have led to advancements
00:12:36
in generating understanding and
00:12:38
translating human language this is very
00:12:40
important uh so the field of NLP before
00:12:43
you needed to train a separate algorithm
00:12:45
for each specific task but large
00:12:47
language models are pretty generic if
00:12:49
you train an llm for predicting the next
00:12:51
word it turns out that it develops
00:12:53
emergent properties which means it's not
00:12:55
only good at predicting the next word
00:12:57
but also at things like uh multiple
00:13:00
choice questions text summarization then
00:13:03
emotion classification language
00:13:05
translation Etc it's useful for a wide
00:13:07
range of tasks and it's that has led to
00:13:10
its predominance as an amazing tool in a
00:13:13
variety of
00:13:15
fields secondly all modern large
00:13:18
language models are trained in two main
00:13:20
steps first we pre-train on an unlabeled
00:13:23
data this is called as a foundational
00:13:25
model and for this very large data sets
00:13:28
are needed typically billions of words
00:13:31
and it costs a lot as we saw training
00:13:33
pre-training gpt3 costs $4.6 million so
00:13:37
you need access to huge amount of data
00:13:39
compute power and money to pre-train
00:13:42
such a foundational model now if you are
00:13:45
actually going to implement an llm
00:13:47
application on production level so let's
00:13:49
say if you're an educational company
00:13:51
building multiple choice questions and
00:13:53
you think that the answers provided by
00:13:55
the pre-training or foundational model
00:13:57
are not very good and they are a bit
00:13:58
generic
00:13:59
you can provide your own specific data
00:14:02
set and you can label the data set
00:14:04
saying that these are the right answers
00:14:06
and I want you to further train on this
00:14:07
refined data set uh to build a better
00:14:10
model this is called fine tuning usually
00:14:14
airline companies restaurants Banks
00:14:16
educational companies when they deploy
00:14:19
llms into production level they fine
00:14:21
tune the pre-trained llm nobody deploys
00:14:23
the pre-trend one directly you fine tune
00:14:26
the element llm on your specific smaller
00:14:29
label data set this is very important
00:14:31
see for pre-training the data set which
00:14:33
we have is unlabeled it's Auto
00:14:35
regressive so the sentence structure
00:14:37
itself is used for creating the labels
00:14:39
as we are just predicting the next world
00:14:42
but when we F tune we have a label data
00:14:44
set such as remember the spam versus no
00:14:47
spam example which I showed you that is
00:14:49
a label data set we give labels like hey
00:14:51
this is Spam this is not spam this is a
00:14:53
good answer this is not a good answer
00:14:55
and this finetuning step is generally
00:14:57
needed for Building Product ction ready
00:14:59
llm
00:15:01
applications important thing to remember
00:15:03
is that fine tuned llms can outperform
00:15:06
only pre-trained llms on specific tasks
00:15:09
so let's say you take two cases right in
00:15:11
one case you only have pre-trained llms
00:15:13
and in second case you have pre-trained
00:15:15
plus fine tuned llms so it turns out
00:15:18
that pre-trained plus finetune does a
00:15:20
much better job at certain specific
00:15:22
tasks than just using pre-rain for
00:15:24
students who just want to interact for
00:15:26
getting their doubts solved or for
00:15:29
getting assistance uh in summarization
00:15:32
uh helping in writing a research paper
00:15:34
Etc gp4 perplexity or such API tools or
00:15:39
such interfaces which are available work
00:15:41
perfectly fine but if you want to build
00:15:43
a specific application on your data set
00:15:46
and take it to production level you
00:15:48
definitely need fine
00:15:50
tuning okay now uh one more key thing is
00:15:54
that the secret Source behind large
00:15:55
language models is this Transformer
00:15:57
architecture
00:15:59
so uh the key idea behind Transformer
00:16:02
architecture is the attention mechanism
00:16:05
uh just to show you how the Transformer
00:16:07
architecture looks like it looks like
00:16:08
this and the main thing behind the
00:16:10
Transformer architecture which really
00:16:12
makes it so
00:16:14
powerful are these attention
00:16:17
blocks we'll see what they mean so no
00:16:19
need to worry about this right
00:16:21
now but in the nutshell attention
00:16:24
mechanism gives the llm selective access
00:16:26
to the whole input sequence when
00:16:28
generating output one word at a time
00:16:31
basically attention mechanism allows the
00:16:33
llm to understand the importance of
00:16:36
words and not just the word in the
00:16:39
current sentence but in the previous
00:16:41
sentences which have come long before
00:16:42
also because context is important in
00:16:45
predicting the next word the current
00:16:47
sentence is not the only one which
00:16:48
matters attention mechanism allows the
00:16:51
llm to give access to the entire context
00:16:53
and select or give weightage to which
00:16:55
words are important in predicting the
00:16:57
next word this is a key idea which and
00:17:00
we'll spend a lot of time on this
00:17:02
idea remember that the original
00:17:04
Transformer had only the had encoder
00:17:07
plus decoder so it had both of these
00:17:10
things it had the encoder as well as it
00:17:11
had the decoder but generative pre-train
00:17:15
Transformer only has the decoder it did
00:17:17
not it does not have the encoder so
00:17:20
Transformer and GPT is not the same
00:17:22
Transformer paper came in 2017 it had
00:17:24
encoder plus decoder generative pre-rain
00:17:27
Transformer came one year later
00:17:29
2018 and that only had the decoder
00:17:32
architecture so even gp4 right now it
00:17:34
only has decoder no encoder so 2018 came
00:17:38
GPT the first generative pre-trend
00:17:40
Transformer architecture 2019 came gpt2
00:17:43
2020 came gpt3 which had 175 billion
00:17:47
parameters and that really changed the
00:17:49
game because no one had seen a model
00:17:51
this large before and then now we are at
00:17:53
GPT 4
00:17:55
stage one last point which is very
00:17:57
important is that llms are only trained
00:18:00
for predicting the next word right but
00:18:02
very surprisingly they develop emergent
00:18:04
properties which means that although
00:18:07
they are only trained to predict the
00:18:08
next word they show some amazing
00:18:11
properties like ability to classify text
00:18:14
translate text from one language into
00:18:16
another language and even summarize
00:18:17
texts so they were not trained for these
00:18:20
tasks but they developed these
00:18:22
properties and that was an awesome thing
00:18:23
to realize the pre-training stage works
00:18:26
so well that llms develop all of these
00:18:28
wonderful other properties which makes
00:18:30
them so impactful for a wide range of
00:18:33
tasks
00:18:35
currently okay so this brings us to the
00:18:37
end of the recap which we have covered
00:18:39
up till now if you have not seen the
00:18:41
previous lectures I really encourage you
00:18:43
to go through them because these
00:18:45
lectures have really set the stage for
00:18:46
us to now dive into stage one so from
00:18:49
the next lecture we'll start going into
00:18:51
stage one and we'll start seeing the
00:18:53
first aspect which is data preparation
00:18:55
and sampling so the next lecture title
00:18:58
will be be working with Text data and
00:19:00
we'll be looking at the data sets how to
00:19:03
load a data set how to count the number
00:19:05
of characters uh how to break the data
00:19:07
into tokens and I'll I'll start sharing
00:19:10
sharing Jupiter notebooks from next time
00:19:12
onward so that we can parall begin
00:19:15
coding so thanks everyone I hope you are
00:19:17
liking these lectures so lecture 1 to
00:19:20
six we kind of like an introductory
00:19:23
lecture to give you a feel of the entire
00:19:24
series and so that you understand
00:19:26
Concepts at a fundamental level from
00:19:28
from lecture 7 we'll be diving deep into
00:19:30
code and we'll be starting into stage
00:19:33
one so I follow this approach of writing
00:19:36
on a whiteboard and also
00:19:38
coding um so that you understand the
00:19:40
details plus the code at the same time
00:19:43
because I believe Theory plus practical
00:19:44
implementation both are important and
00:19:47
that is one of the philosophies of this
00:19:49
lecture Series so do let me know in the
00:19:51
comments how you finding this teaching
00:19:53
style uh because I will take feedback
00:19:56
from that and we can build this series
00:19:58
together 3 to four months later this can
00:20:00
be an amazing and awesome series and I
00:20:03
will rely on your feedback to build this
00:20:05
thanks a lot everyone and I look forward
00:20:07
to seeing you in the next lecture