What is the purpose of fine-tuning Llama 3.1?

Fine-tuning allows the model to learn from custom data so it can provide specific answers tailored to the user's needs.

What programming language will the model be trained to use?

The model will be specifically trained to generate Python code.

What are some platforms used to save the fine-tuned model?

The model can be saved to Hugging Face and Olama.

What tools are required for fine-tuning?

Key tools include M compute for GPU, Hugging Face, and UNS Sloth for faster training.

How can custom datasets be formatted for training?

Custom datasets should be structured into a CSV or Excel format with columns for instruction, input, and output.

Which GPU is recommended for fine-tuning?

An NVIDIA RTX A6000 is recommended, but one graphic card is sufficient.

What is the expected outcome after fine-tuning?

The model should correctly generate Python functions based on the given instructions.

How do you test the model after training?

You can test the model by providing it with instructions and checking if it generates the desired output.

What is the final step of the training process?

The final step involves pushing the trained model to Olama for cloud hosting.

Can anyone use the final model?

Yes, once saved, the model can be used by anyone who accesses it on Olama.

EASILY Train Llama 3 and Upload to Ollama.com (Must Know)

00:14:51

https://www.youtube.com/watch?v=V6LDl3Vjq-A

概要

TLDRThis video tutorial provides a comprehensive guide on fine-tuning the Llama 3.1 model using custom data, specifically for generating Python code. It explains the necessity of fine-tuning for tailoring model responses and demonstrates the entire training process, from configuration to saving the model on platforms like Hugging Face and Olama. The tutorial highlights the benefits of using specified datasets and efficient tools to achieve fast and effective fine-tuning, making the model capable of understanding specific instructions and producing relevant outputs after training.

収穫

🛠️ Learn how to fine-tune Llama 3.1 for specific tasks.
📊 Understand the importance of custom datasets.
🐍 Train the model to generate Python code consistently.
🚀 Use efficient tools for speedier training.
💾 Save the model to Hugging Face and Olama easily.
🔄 Test the model's outputs after training for accuracy.
📁 Format custom datasets properly for training.
🖥️ Utilize GPU resources effectively for fine-tuning.
🔑 Manage tokens for Hugging Face and Olama.
📅 Stay updated with future tutorials on AI and ML.

タイムライン

00:00:00 - 00:05:00
The video introduces fine-tuning Llama 3.1, emphasizing its importance for tailoring the model to answer specific questions using custom or private data. The presenter outlines the training process for an 8 billion parameter model, detailing how to prepare and save this model using Hugging Face and Olama. The viewer will learn to execute a function creation in Python and to evaluate the model's ability after training, making it a practical tutorial on fine-tuning AI models.
00:05:00 - 00:14:51
Following the introduction, the presenter delves into the configuration needed for fine-tuning. Steps include loading datasets, defining training parameters, and monitoring the trainer's functionality. The importance of using tokenized data is emphasized, as the learning model requires numerical input. Additionally, the process for saving the fine-tuned model to both Hugging Face and Olama is covered, ensuring users can share and utilize their customized models effectively in various applications.

マインドマップ

ビデオQ&A

What is the purpose of fine-tuning Llama 3.1?
Fine-tuning allows the model to learn from custom data so it can provide specific answers tailored to the user's needs.
What programming language will the model be trained to use?
The model will be specifically trained to generate Python code.
What are some platforms used to save the fine-tuned model?
The model can be saved to Hugging Face and Olama.
What tools are required for fine-tuning?
Key tools include M compute for GPU, Hugging Face, and UNS Sloth for faster training.
How can custom datasets be formatted for training?
Custom datasets should be structured into a CSV or Excel format with columns for instruction, input, and output.
Which GPU is recommended for fine-tuning?
An NVIDIA RTX A6000 is recommended, but one graphic card is sufficient.
What is the expected outcome after fine-tuning?
The model should correctly generate Python functions based on the given instructions.
How do you test the model after training?
You can test the model by providing it with instructions and checking if it generates the desired output.
What is the final step of the training process?
The final step involves pushing the trained model to Olama for cloud hosting.
Can anyone use the final model?
Yes, once saved, the model can be used by anyone who accesses it on Olama.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス！

字幕

オートスクロール:

00:00:00
this is amazing now we're going to see
00:00:02
about llama 3.1 finetuning so why we
00:00:05
need fine tuning so if you have your
00:00:07
custom data or your private company data
00:00:10
llama 3.1 doesn't know about that you
00:00:13
need to teach llama 3.1 to be able to
00:00:16
answer specific questions that's when
00:00:19
you need to train this model in this
00:00:21
we'll be training the 8 billion
00:00:23
parameter model by the end of the video
00:00:25
you will learn how you can train using
00:00:26
custom data that is your own company
00:00:29
data how to fine-tune how to save that
00:00:32
to hugging face as you can see here then
00:00:35
finally how to save that to olama as you
00:00:38
can see here so I can even just copy
00:00:41
this command after uploading just copy
00:00:43
it run this locally on my computer and
00:00:45
it will automatically pull the model and
00:00:48
then ask a question create a function to
00:00:50
add 1 2 3 4 5 and you can see it
00:00:54
automatically created this function this
00:00:55
is a custom model which I created
00:00:58
trained and up up loaded in O Lama you
00:01:01
are going to learn that that's exactly
00:01:03
what we're going to see today let's get
00:01:05
[Music]
00:01:07
started hi everyone I'm really excited
00:01:10
to show you about llama 3.1 fine tuning
00:01:12
in this we are going to fine tune a
00:01:14
model and teach python so generally if
00:01:17
you ask the model to create a function
00:01:20
to add few numbers it is going to
00:01:22
randomly choose a programming language
00:01:24
and create that function for you but I
00:01:26
want this model to be trained on Python
00:01:30
Programming so that whatever question I
00:01:32
ask it should generate Python program so
00:01:35
fine-tuning or training a model is
00:01:36
nothing but teaching how to respond for
00:01:39
the particular question so in this we'll
00:01:41
be seeing how to configure how it looks
00:01:44
like before training how to load the
00:01:46
data how to train then how it looks
00:01:48
after training and then how to save it
00:01:51
but before that I regularly create
00:01:53
videos in regards to Artificial
00:01:54
Intelligence on my YouTube channel so do
00:01:56
subscribe and click the Bell icon to
00:01:57
stay tuned make sure you click the like
00:01:59
button so this this video can be helpful
00:02:00
for many others like you I'm going to
00:02:02
use M compute and use Marin prisoners
00:02:04
coupon code to get 50% off I'm inside uh
00:02:07
the MOs computer now and this is my
00:02:09
configuration I'm using for NVIDIA RTX
00:02:12
a6000 but just for fine-tuning llama 3.1
00:02:16
one graphic card is enough and we'll be
00:02:19
using UNS sloth to fine-tune our model
00:02:21
UNS sloth helps us to fine-tune two
00:02:24
times faster and also with less memory
00:02:26
so in your terminal pip install hugging
00:02:29
face hub and then all these packages
00:02:31
unslot packages and then click enter
00:02:34
I'll put all the code and the commands
00:02:36
in the description below after this
00:02:38
export your hugging face token like this
00:02:40
in your terminal you can generate
00:02:42
hugging face token from hugging face
00:02:45
this is used to upload the fine-tuned
00:02:47
model to hugging face sometime this also
00:02:50
required to download a model after this
00:02:52
click enter next let's create a file
00:02:54
called app.py and let's open it inside
00:02:56
the file from UNS sloth import file F
00:03:00
language model then importing torch OS
00:03:03
text streamer load data sets sft trainer
00:03:07
training arguments is B flat 16
00:03:09
supported now we'll be using all these
00:03:13
packages to start training so first step
00:03:15
configuration so configuration maximum
00:03:18
sequence length dtype loading in 4 bits
00:03:22
and providing the alpaka format is just
00:03:25
the instruction input and the response
00:03:28
these are just basic configuration so
00:03:30
when we give a instruction and input we
00:03:33
expect a response from the logge
00:03:35
language model that's what this prompt
00:03:36
template mean next we are going to
00:03:38
predefine some questions such as an
00:03:41
instruction create a function to
00:03:42
calculate the sum of sequence of numbers
00:03:45
and we are providing the numbers so when
00:03:47
we provide this instruction and input
00:03:48
here what is going to be the response
00:03:51
before training we're going to see that
00:03:53
so step number two before training step
00:03:56
number two before training and then
00:03:58
loading the model and organizer using
00:04:01
fast language model which we defined
00:04:03
that earlier here then fast language
00:04:06
model. for inference this is to speed up
00:04:09
the inference then we are converting the
00:04:12
input into tokens that means numbers so
00:04:16
why we need to convert the instruction
00:04:18
and the input which we Define here to
00:04:21
numbers or tokens using tokenizer the
00:04:23
reason is because generally all large
00:04:26
language models are trained with token
00:04:30
or numbers so these log language models
00:04:33
understands only numbers that's why we
00:04:35
are converting the input text that's the
00:04:38
instruction and input to numbers next
00:04:40
text streamer to stream the output and
00:04:43
finally printing out the response so
00:04:46
this will allow us to compare how it
00:04:48
looked before training now next loading
00:04:52
the data so we need to load the data
00:04:54
that is Step number three this involves
00:04:56
defining the end of sentence token then
00:04:59
create a function for formatting The
00:05:01
Prompt then after that using the load
00:05:04
data set function to load the data from
00:05:07
here this is the python code instruction
00:05:09
data set so if we open this python code
00:05:12
instruction data set here is the data
00:05:14
set so if you view the data you can see
00:05:16
it consists of instruction input output
00:05:19
and then the prompt so generally for
00:05:22
alpaka data set we'll be giving the
00:05:25
instruction and we'll be giving the
00:05:26
input as we saw before we are giving the
00:05:29
instruction like this and the input like
00:05:31
this and we are expecting a response or
00:05:33
the output like this so in this way we
00:05:35
teaching a log language model that if we
00:05:38
provide any information like instruction
00:05:40
and input it should automatically give
00:05:42
you the output like this similarly we
00:05:44
are having totally 18,000 rows and we
00:05:47
are going to feed that and train this
00:05:49
large language model so that's why we
00:05:51
defined that here so if you want to use
00:05:53
your custom data set you can just create
00:05:55
a CSV file or Excel sheet with just
00:05:58
three columns one is instruction another
00:06:01
one is input and output so in this case
00:06:04
I'm using this custom data set and going
00:06:06
to train LMA 3.1 8 billion parameter
00:06:09
model so next step I need to convert
00:06:11
that file to the required format that's
00:06:14
why we use formatting prompt function
00:06:17
that's what we Define here so we are
00:06:19
just taking the instruction column the
00:06:22
input column and the output column and
00:06:24
we are merging all those columns
00:06:27
together in this way we are telling the
00:06:28
log language model if we provide an
00:06:30
instruction and input like this I need
00:06:32
an output like this so that's why we
00:06:34
created this function earlier so that is
00:06:37
loading data completed next training the
00:06:40
model to do that and training the model
00:06:42
here we are using fast language model
00:06:45
get P model which means we are not
00:06:48
training all the parameters in this
00:06:50
model we are training only few using
00:06:53
this PFT method you can modify this
00:06:55
configuration based on your requirement
00:06:57
now next we need to define the trainer
00:07:00
that is sft trainer and this is the main
00:07:02
training function where we provide the
00:07:04
model the tokenizer the data set the
00:07:07
data set is a text field maximum
00:07:09
sequence length the optimizer maximum
00:07:12
number of steps and you can modify this
00:07:14
based on your requirement finally it's
00:07:16
going to save that in the outputs folder
00:07:18
next I'm going to add some optional
00:07:20
values just for monitoring the memory so
00:07:23
these are just optional just for us to
00:07:25
understand the GPU and the memory usage
00:07:28
then the key area is this trainer. train
00:07:32
this is the main training function to
00:07:35
train the model so after that I'm going
00:07:36
to print the stats again even this is
00:07:39
optional so just to understand the
00:07:41
memory usage and other stats so keep
00:07:45
this and this as optional next we need
00:07:48
to see how after training is going to
00:07:50
look like so step number five after
00:07:52
training just making sure that we call
00:07:55
this function to make it fast inference
00:07:58
next inputs equals tokenizer same as
00:08:00
before providing in the alpaka data set
00:08:03
format then text streamer and model.
00:08:06
generate will generate the response so
00:08:09
this is after training how it will be
00:08:11
looking like and the final step is
00:08:12
saving the model that is six save model
00:08:15
model do save pre-trained and then
00:08:18
mentioning the folder that is Lowa
00:08:20
folder and tokenizer Lowa folder then
00:08:23
pushing to HUB this will automatically
00:08:25
push the model to HUB and also the
00:08:27
tokenizer to HUB generally when you up
00:08:29
Lo the model and the tokenizer it
00:08:32
includes only the adapter so generally
00:08:34
these adapter are the key files which we
00:08:36
fine-tuned and we can merge that with
00:08:39
the main model using this merged
00:08:42
function that's it as a quick overview
00:08:45
we Define our configuration and created
00:08:47
a question or instruction and the input
00:08:50
for us to compare before training and
00:08:51
after training step number two setting
00:08:54
up before training that is loading the
00:08:55
model and testing how it looks like
00:08:57
before training step number three is
00:08:59
loading the data set step number four
00:09:01
training using sft trainer then step
00:09:04
number five after training how it's
00:09:06
going to look like and step number six
00:09:08
saving and pushing into Hub now I'm
00:09:11
going to run this code in your terminal
00:09:13
python app. pi and then click enter now
00:09:15
it is starting the training now if you
00:09:18
see the instruction create a function to
00:09:20
calculate the sum of sequence of
00:09:21
integers and the input is 1 2 3 4 5 the
00:09:25
response is giving is Javascript but we
00:09:27
need python as output that's the
00:09:30
ultimate goal so now it's loading the
00:09:32
data set now the training is in progress
00:09:36
and we gave 100 steps you can see the
00:09:39
loss is going down that's what we need
00:09:42
now it's near to complete now it is all
00:09:44
done you can see the memory usage
00:09:47
everything here and as expected we ask
00:09:50
create a function to calculate the sum
00:09:51
of sequence of numbers and now it is
00:09:54
giving me the correct answer in Python
00:09:56
this is exciting next we are saving the
00:09:59
model in this location and you can see
00:10:02
the model here in hugging face you can
00:10:04
see this got saved just a minute ago
00:10:07
with the model and with the adapter file
00:10:10
you can now use the model directly in
00:10:11
your own application we have completed
00:10:13
the step of saving to hugging face now
00:10:16
final step is to save to ol saving to ol
00:10:20
involves four different steps simple and
00:10:22
easy steps first to create ggf format
00:10:25
second create model using model file
00:10:27
third olama run to test the the model
00:10:30
finally ol push to save the model to
00:10:32
ama.com first let's see creating ggf
00:10:36
format so first we need to save this in
00:10:39
ggf format as you can see here save
00:10:42
pre-trained ggf and you're passing the
00:10:45
model and the tokenizer and the list of
00:10:48
quantization method similarly push to
00:10:50
HUB to save that in Hub that's it now we
00:10:53
are going to run this code Python local.
00:10:55
py and then click enter now you can see
00:10:57
it's saving the token on the model in GG
00:11:00
UF format it'll try to save in various
00:11:03
quantization methods as you can see here
00:11:06
it is going through various steps
00:11:08
currently it's working on q4k
00:11:11
quantization and in our code we are
00:11:13
using three different quantization so
00:11:15
next it will go through 80 and 5km now
00:11:18
we can see all the version such as Q5 km
00:11:22
Q4 km q80 everything got uploaded to
00:11:27
this location and you can see those ggf
00:11:30
format here uploaded in this location
00:11:33
next we are going to see how we can
00:11:34
create a model that is olama model using
00:11:36
model file next create a model file m
00:11:39
modl f i l e this is for O then inside
00:11:44
that file you can see I mentioned the
00:11:46
path where my ggf file got stored which
00:11:50
you can see in my folder structure the
00:11:53
GF got stored in this location in this
00:11:55
path and you can see the list of files
00:11:58
here so you can even just just right
00:11:59
click copy relative path then paste that
00:12:02
here in this location that's it so these
00:12:05
are default templates which I'm using
00:12:08
you can also copy the same template and
00:12:10
create this model file now after this in
00:12:12
your terminal I'm using Linux so I'm
00:12:16
using this command to install olama but
00:12:19
in your case it could be your own Mac
00:12:22
computer or desktop so you can directly
00:12:25
download from their own website so you
00:12:27
can see it's downloading Ama and it is
00:12:29
running in this URL now after this o
00:12:32
Lama create hyphen f model file that is
00:12:36
the model file and then the path to the
00:12:39
model me is my username which I created
00:12:42
from ama.com just go to o.com and you
00:12:45
should be able to sign in create your
00:12:48
own account to publish and share model
00:12:51
in olama so that is my username and the
00:12:54
model name and then click enter now the
00:12:57
model got created next we run oama run
00:13:00
to test the model that is oama run me/
00:13:04
Lama 3.1 hyphen Python and click enter
00:13:07
now the model Got Loaded I can say
00:13:09
create a function to add these numbers 1
00:13:13
2 3 4 5 click answer and it's able to
00:13:16
generate the function now clicking back
00:13:18
slash exit to exit the final step is to
00:13:22
push that model to oama to do that you
00:13:24
might need to generate your sh key using
00:13:27
this and then click enter enter so now
00:13:30
the key got created in this location I
00:13:32
might need to move this to a different
00:13:34
location as well so I'm going to type
00:13:37
PSE sudo and copying the saved private
00:13:40
key in this location after this click
00:13:43
enter now it got copied now you need to
00:13:46
get the public Key by typing this
00:13:49
command and then click enter so now
00:13:51
you're going to copy this public key go
00:13:53
to olama after logging in go to settings
00:13:57
AMA keys there you should should be able
00:13:59
to add your public key click that add
00:14:02
the key and click add that's it now
00:14:04
coming back to our terminal ol push me
00:14:08
and the model name and then click enter
00:14:10
this will automatically save the model
00:14:13
remotely in Olo now it's all completed
00:14:17
by going to my models you should be able
00:14:20
to see your model listed there updated
00:14:22
just now now you can run this command
00:14:25
anywhere in any computer and run the mod
00:14:29
model which you have just trained as
00:14:31
simple as that now you are able to
00:14:33
create your own model train your own
00:14:36
model with your own custom data save
00:14:38
that in hugging face save that in olama
00:14:41
so that anyone can use it I'm really
00:14:43
excited about this I'm going to create
00:14:45
more videos similar to this so stay
00:14:46
tuned I hope you like this video do like
00:14:49
share and subscribe and thanks for
00:14:50
watching

タグ

Llama 3.1
Fine-tuning
Custom data
Python generation
AI
Hugging Face
Olama
Training process
Model saving
Artificial Intelligence