00:00:00
this is amazing now we're going to see
00:00:02
about llama 3.1 finetuning so why we
00:00:05
need fine tuning so if you have your
00:00:07
custom data or your private company data
00:00:10
llama 3.1 doesn't know about that you
00:00:13
need to teach llama 3.1 to be able to
00:00:16
answer specific questions that's when
00:00:19
you need to train this model in this
00:00:21
we'll be training the 8 billion
00:00:23
parameter model by the end of the video
00:00:25
you will learn how you can train using
00:00:26
custom data that is your own company
00:00:29
data how to fine-tune how to save that
00:00:32
to hugging face as you can see here then
00:00:35
finally how to save that to olama as you
00:00:38
can see here so I can even just copy
00:00:41
this command after uploading just copy
00:00:43
it run this locally on my computer and
00:00:45
it will automatically pull the model and
00:00:48
then ask a question create a function to
00:00:50
add 1 2 3 4 5 and you can see it
00:00:54
automatically created this function this
00:00:55
is a custom model which I created
00:00:58
trained and up up loaded in O Lama you
00:01:01
are going to learn that that's exactly
00:01:03
what we're going to see today let's get
00:01:05
[Music]
00:01:07
started hi everyone I'm really excited
00:01:10
to show you about llama 3.1 fine tuning
00:01:12
in this we are going to fine tune a
00:01:14
model and teach python so generally if
00:01:17
you ask the model to create a function
00:01:20
to add few numbers it is going to
00:01:22
randomly choose a programming language
00:01:24
and create that function for you but I
00:01:26
want this model to be trained on Python
00:01:30
Programming so that whatever question I
00:01:32
ask it should generate Python program so
00:01:35
fine-tuning or training a model is
00:01:36
nothing but teaching how to respond for
00:01:39
the particular question so in this we'll
00:01:41
be seeing how to configure how it looks
00:01:44
like before training how to load the
00:01:46
data how to train then how it looks
00:01:48
after training and then how to save it
00:01:51
but before that I regularly create
00:01:53
videos in regards to Artificial
00:01:54
Intelligence on my YouTube channel so do
00:01:56
subscribe and click the Bell icon to
00:01:57
stay tuned make sure you click the like
00:01:59
button so this this video can be helpful
00:02:00
for many others like you I'm going to
00:02:02
use M compute and use Marin prisoners
00:02:04
coupon code to get 50% off I'm inside uh
00:02:07
the MOs computer now and this is my
00:02:09
configuration I'm using for NVIDIA RTX
00:02:12
a6000 but just for fine-tuning llama 3.1
00:02:16
one graphic card is enough and we'll be
00:02:19
using UNS sloth to fine-tune our model
00:02:21
UNS sloth helps us to fine-tune two
00:02:24
times faster and also with less memory
00:02:26
so in your terminal pip install hugging
00:02:29
face hub and then all these packages
00:02:31
unslot packages and then click enter
00:02:34
I'll put all the code and the commands
00:02:36
in the description below after this
00:02:38
export your hugging face token like this
00:02:40
in your terminal you can generate
00:02:42
hugging face token from hugging face
00:02:45
this is used to upload the fine-tuned
00:02:47
model to hugging face sometime this also
00:02:50
required to download a model after this
00:02:52
click enter next let's create a file
00:02:54
called app.py and let's open it inside
00:02:56
the file from UNS sloth import file F
00:03:00
language model then importing torch OS
00:03:03
text streamer load data sets sft trainer
00:03:07
training arguments is B flat 16
00:03:09
supported now we'll be using all these
00:03:13
packages to start training so first step
00:03:15
configuration so configuration maximum
00:03:18
sequence length dtype loading in 4 bits
00:03:22
and providing the alpaka format is just
00:03:25
the instruction input and the response
00:03:28
these are just basic configuration so
00:03:30
when we give a instruction and input we
00:03:33
expect a response from the logge
00:03:35
language model that's what this prompt
00:03:36
template mean next we are going to
00:03:38
predefine some questions such as an
00:03:41
instruction create a function to
00:03:42
calculate the sum of sequence of numbers
00:03:45
and we are providing the numbers so when
00:03:47
we provide this instruction and input
00:03:48
here what is going to be the response
00:03:51
before training we're going to see that
00:03:53
so step number two before training step
00:03:56
number two before training and then
00:03:58
loading the model and organizer using
00:04:01
fast language model which we defined
00:04:03
that earlier here then fast language
00:04:06
model. for inference this is to speed up
00:04:09
the inference then we are converting the
00:04:12
input into tokens that means numbers so
00:04:16
why we need to convert the instruction
00:04:18
and the input which we Define here to
00:04:21
numbers or tokens using tokenizer the
00:04:23
reason is because generally all large
00:04:26
language models are trained with token
00:04:30
or numbers so these log language models
00:04:33
understands only numbers that's why we
00:04:35
are converting the input text that's the
00:04:38
instruction and input to numbers next
00:04:40
text streamer to stream the output and
00:04:43
finally printing out the response so
00:04:46
this will allow us to compare how it
00:04:48
looked before training now next loading
00:04:52
the data so we need to load the data
00:04:54
that is Step number three this involves
00:04:56
defining the end of sentence token then
00:04:59
create a function for formatting The
00:05:01
Prompt then after that using the load
00:05:04
data set function to load the data from
00:05:07
here this is the python code instruction
00:05:09
data set so if we open this python code
00:05:12
instruction data set here is the data
00:05:14
set so if you view the data you can see
00:05:16
it consists of instruction input output
00:05:19
and then the prompt so generally for
00:05:22
alpaka data set we'll be giving the
00:05:25
instruction and we'll be giving the
00:05:26
input as we saw before we are giving the
00:05:29
instruction like this and the input like
00:05:31
this and we are expecting a response or
00:05:33
the output like this so in this way we
00:05:35
teaching a log language model that if we
00:05:38
provide any information like instruction
00:05:40
and input it should automatically give
00:05:42
you the output like this similarly we
00:05:44
are having totally 18,000 rows and we
00:05:47
are going to feed that and train this
00:05:49
large language model so that's why we
00:05:51
defined that here so if you want to use
00:05:53
your custom data set you can just create
00:05:55
a CSV file or Excel sheet with just
00:05:58
three columns one is instruction another
00:06:01
one is input and output so in this case
00:06:04
I'm using this custom data set and going
00:06:06
to train LMA 3.1 8 billion parameter
00:06:09
model so next step I need to convert
00:06:11
that file to the required format that's
00:06:14
why we use formatting prompt function
00:06:17
that's what we Define here so we are
00:06:19
just taking the instruction column the
00:06:22
input column and the output column and
00:06:24
we are merging all those columns
00:06:27
together in this way we are telling the
00:06:28
log language model if we provide an
00:06:30
instruction and input like this I need
00:06:32
an output like this so that's why we
00:06:34
created this function earlier so that is
00:06:37
loading data completed next training the
00:06:40
model to do that and training the model
00:06:42
here we are using fast language model
00:06:45
get P model which means we are not
00:06:48
training all the parameters in this
00:06:50
model we are training only few using
00:06:53
this PFT method you can modify this
00:06:55
configuration based on your requirement
00:06:57
now next we need to define the trainer
00:07:00
that is sft trainer and this is the main
00:07:02
training function where we provide the
00:07:04
model the tokenizer the data set the
00:07:07
data set is a text field maximum
00:07:09
sequence length the optimizer maximum
00:07:12
number of steps and you can modify this
00:07:14
based on your requirement finally it's
00:07:16
going to save that in the outputs folder
00:07:18
next I'm going to add some optional
00:07:20
values just for monitoring the memory so
00:07:23
these are just optional just for us to
00:07:25
understand the GPU and the memory usage
00:07:28
then the key area is this trainer. train
00:07:32
this is the main training function to
00:07:35
train the model so after that I'm going
00:07:36
to print the stats again even this is
00:07:39
optional so just to understand the
00:07:41
memory usage and other stats so keep
00:07:45
this and this as optional next we need
00:07:48
to see how after training is going to
00:07:50
look like so step number five after
00:07:52
training just making sure that we call
00:07:55
this function to make it fast inference
00:07:58
next inputs equals tokenizer same as
00:08:00
before providing in the alpaka data set
00:08:03
format then text streamer and model.
00:08:06
generate will generate the response so
00:08:09
this is after training how it will be
00:08:11
looking like and the final step is
00:08:12
saving the model that is six save model
00:08:15
model do save pre-trained and then
00:08:18
mentioning the folder that is Lowa
00:08:20
folder and tokenizer Lowa folder then
00:08:23
pushing to HUB this will automatically
00:08:25
push the model to HUB and also the
00:08:27
tokenizer to HUB generally when you up
00:08:29
Lo the model and the tokenizer it
00:08:32
includes only the adapter so generally
00:08:34
these adapter are the key files which we
00:08:36
fine-tuned and we can merge that with
00:08:39
the main model using this merged
00:08:42
function that's it as a quick overview
00:08:45
we Define our configuration and created
00:08:47
a question or instruction and the input
00:08:50
for us to compare before training and
00:08:51
after training step number two setting
00:08:54
up before training that is loading the
00:08:55
model and testing how it looks like
00:08:57
before training step number three is
00:08:59
loading the data set step number four
00:09:01
training using sft trainer then step
00:09:04
number five after training how it's
00:09:06
going to look like and step number six
00:09:08
saving and pushing into Hub now I'm
00:09:11
going to run this code in your terminal
00:09:13
python app. pi and then click enter now
00:09:15
it is starting the training now if you
00:09:18
see the instruction create a function to
00:09:20
calculate the sum of sequence of
00:09:21
integers and the input is 1 2 3 4 5 the
00:09:25
response is giving is Javascript but we
00:09:27
need python as output that's the
00:09:30
ultimate goal so now it's loading the
00:09:32
data set now the training is in progress
00:09:36
and we gave 100 steps you can see the
00:09:39
loss is going down that's what we need
00:09:42
now it's near to complete now it is all
00:09:44
done you can see the memory usage
00:09:47
everything here and as expected we ask
00:09:50
create a function to calculate the sum
00:09:51
of sequence of numbers and now it is
00:09:54
giving me the correct answer in Python
00:09:56
this is exciting next we are saving the
00:09:59
model in this location and you can see
00:10:02
the model here in hugging face you can
00:10:04
see this got saved just a minute ago
00:10:07
with the model and with the adapter file
00:10:10
you can now use the model directly in
00:10:11
your own application we have completed
00:10:13
the step of saving to hugging face now
00:10:16
final step is to save to ol saving to ol
00:10:20
involves four different steps simple and
00:10:22
easy steps first to create ggf format
00:10:25
second create model using model file
00:10:27
third olama run to test the the model
00:10:30
finally ol push to save the model to
00:10:32
ama.com first let's see creating ggf
00:10:36
format so first we need to save this in
00:10:39
ggf format as you can see here save
00:10:42
pre-trained ggf and you're passing the
00:10:45
model and the tokenizer and the list of
00:10:48
quantization method similarly push to
00:10:50
HUB to save that in Hub that's it now we
00:10:53
are going to run this code Python local.
00:10:55
py and then click enter now you can see
00:10:57
it's saving the token on the model in GG
00:11:00
UF format it'll try to save in various
00:11:03
quantization methods as you can see here
00:11:06
it is going through various steps
00:11:08
currently it's working on q4k
00:11:11
quantization and in our code we are
00:11:13
using three different quantization so
00:11:15
next it will go through 80 and 5km now
00:11:18
we can see all the version such as Q5 km
00:11:22
Q4 km q80 everything got uploaded to
00:11:27
this location and you can see those ggf
00:11:30
format here uploaded in this location
00:11:33
next we are going to see how we can
00:11:34
create a model that is olama model using
00:11:36
model file next create a model file m
00:11:39
modl f i l e this is for O then inside
00:11:44
that file you can see I mentioned the
00:11:46
path where my ggf file got stored which
00:11:50
you can see in my folder structure the
00:11:53
GF got stored in this location in this
00:11:55
path and you can see the list of files
00:11:58
here so you can even just just right
00:11:59
click copy relative path then paste that
00:12:02
here in this location that's it so these
00:12:05
are default templates which I'm using
00:12:08
you can also copy the same template and
00:12:10
create this model file now after this in
00:12:12
your terminal I'm using Linux so I'm
00:12:16
using this command to install olama but
00:12:19
in your case it could be your own Mac
00:12:22
computer or desktop so you can directly
00:12:25
download from their own website so you
00:12:27
can see it's downloading Ama and it is
00:12:29
running in this URL now after this o
00:12:32
Lama create hyphen f model file that is
00:12:36
the model file and then the path to the
00:12:39
model me is my username which I created
00:12:42
from ama.com just go to o.com and you
00:12:45
should be able to sign in create your
00:12:48
own account to publish and share model
00:12:51
in olama so that is my username and the
00:12:54
model name and then click enter now the
00:12:57
model got created next we run oama run
00:13:00
to test the model that is oama run me/
00:13:04
Lama 3.1 hyphen Python and click enter
00:13:07
now the model Got Loaded I can say
00:13:09
create a function to add these numbers 1
00:13:13
2 3 4 5 click answer and it's able to
00:13:16
generate the function now clicking back
00:13:18
slash exit to exit the final step is to
00:13:22
push that model to oama to do that you
00:13:24
might need to generate your sh key using
00:13:27
this and then click enter enter so now
00:13:30
the key got created in this location I
00:13:32
might need to move this to a different
00:13:34
location as well so I'm going to type
00:13:37
PSE sudo and copying the saved private
00:13:40
key in this location after this click
00:13:43
enter now it got copied now you need to
00:13:46
get the public Key by typing this
00:13:49
command and then click enter so now
00:13:51
you're going to copy this public key go
00:13:53
to olama after logging in go to settings
00:13:57
AMA keys there you should should be able
00:13:59
to add your public key click that add
00:14:02
the key and click add that's it now
00:14:04
coming back to our terminal ol push me
00:14:08
and the model name and then click enter
00:14:10
this will automatically save the model
00:14:13
remotely in Olo now it's all completed
00:14:17
by going to my models you should be able
00:14:20
to see your model listed there updated
00:14:22
just now now you can run this command
00:14:25
anywhere in any computer and run the mod
00:14:29
model which you have just trained as
00:14:31
simple as that now you are able to
00:14:33
create your own model train your own
00:14:36
model with your own custom data save
00:14:38
that in hugging face save that in olama
00:14:41
so that anyone can use it I'm really
00:14:43
excited about this I'm going to create
00:14:45
more videos similar to this so stay
00:14:46
tuned I hope you like this video do like
00:14:49
share and subscribe and thanks for
00:14:50
watching