EASILY Train Llama 3 and Upload to Ollama.com (Must Know)

00:14:51
https://www.youtube.com/watch?v=V6LDl3Vjq-A

概要

TLDRThis video tutorial provides a comprehensive guide on fine-tuning the Llama 3.1 model using custom data, specifically for generating Python code. It explains the necessity of fine-tuning for tailoring model responses and demonstrates the entire training process, from configuration to saving the model on platforms like Hugging Face and Olama. The tutorial highlights the benefits of using specified datasets and efficient tools to achieve fast and effective fine-tuning, making the model capable of understanding specific instructions and producing relevant outputs after training.

収穫

  • 🛠️ Learn how to fine-tune Llama 3.1 for specific tasks.
  • 📊 Understand the importance of custom datasets.
  • 🐍 Train the model to generate Python code consistently.
  • 🚀 Use efficient tools for speedier training.
  • 💾 Save the model to Hugging Face and Olama easily.
  • 🔄 Test the model's outputs after training for accuracy.
  • 📁 Format custom datasets properly for training.
  • 🖥️ Utilize GPU resources effectively for fine-tuning.
  • 🔑 Manage tokens for Hugging Face and Olama.
  • 📅 Stay updated with future tutorials on AI and ML.

タイムライン

  • 00:00:00 - 00:05:00

    The video introduces fine-tuning Llama 3.1, emphasizing its importance for tailoring the model to answer specific questions using custom or private data. The presenter outlines the training process for an 8 billion parameter model, detailing how to prepare and save this model using Hugging Face and Olama. The viewer will learn to execute a function creation in Python and to evaluate the model's ability after training, making it a practical tutorial on fine-tuning AI models.

  • 00:05:00 - 00:14:51

    Following the introduction, the presenter delves into the configuration needed for fine-tuning. Steps include loading datasets, defining training parameters, and monitoring the trainer's functionality. The importance of using tokenized data is emphasized, as the learning model requires numerical input. Additionally, the process for saving the fine-tuned model to both Hugging Face and Olama is covered, ensuring users can share and utilize their customized models effectively in various applications.

マインドマップ

ビデオQ&A

  • What is the purpose of fine-tuning Llama 3.1?

    Fine-tuning allows the model to learn from custom data so it can provide specific answers tailored to the user's needs.

  • What programming language will the model be trained to use?

    The model will be specifically trained to generate Python code.

  • What are some platforms used to save the fine-tuned model?

    The model can be saved to Hugging Face and Olama.

  • What tools are required for fine-tuning?

    Key tools include M compute for GPU, Hugging Face, and UNS Sloth for faster training.

  • How can custom datasets be formatted for training?

    Custom datasets should be structured into a CSV or Excel format with columns for instruction, input, and output.

  • Which GPU is recommended for fine-tuning?

    An NVIDIA RTX A6000 is recommended, but one graphic card is sufficient.

  • What is the expected outcome after fine-tuning?

    The model should correctly generate Python functions based on the given instructions.

  • How do you test the model after training?

    You can test the model by providing it with instructions and checking if it generates the desired output.

  • What is the final step of the training process?

    The final step involves pushing the trained model to Olama for cloud hosting.

  • Can anyone use the final model?

    Yes, once saved, the model can be used by anyone who accesses it on Olama.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:00
    this is amazing now we're going to see
  • 00:00:02
    about llama 3.1 finetuning so why we
  • 00:00:05
    need fine tuning so if you have your
  • 00:00:07
    custom data or your private company data
  • 00:00:10
    llama 3.1 doesn't know about that you
  • 00:00:13
    need to teach llama 3.1 to be able to
  • 00:00:16
    answer specific questions that's when
  • 00:00:19
    you need to train this model in this
  • 00:00:21
    we'll be training the 8 billion
  • 00:00:23
    parameter model by the end of the video
  • 00:00:25
    you will learn how you can train using
  • 00:00:26
    custom data that is your own company
  • 00:00:29
    data how to fine-tune how to save that
  • 00:00:32
    to hugging face as you can see here then
  • 00:00:35
    finally how to save that to olama as you
  • 00:00:38
    can see here so I can even just copy
  • 00:00:41
    this command after uploading just copy
  • 00:00:43
    it run this locally on my computer and
  • 00:00:45
    it will automatically pull the model and
  • 00:00:48
    then ask a question create a function to
  • 00:00:50
    add 1 2 3 4 5 and you can see it
  • 00:00:54
    automatically created this function this
  • 00:00:55
    is a custom model which I created
  • 00:00:58
    trained and up up loaded in O Lama you
  • 00:01:01
    are going to learn that that's exactly
  • 00:01:03
    what we're going to see today let's get
  • 00:01:05
    [Music]
  • 00:01:07
    started hi everyone I'm really excited
  • 00:01:10
    to show you about llama 3.1 fine tuning
  • 00:01:12
    in this we are going to fine tune a
  • 00:01:14
    model and teach python so generally if
  • 00:01:17
    you ask the model to create a function
  • 00:01:20
    to add few numbers it is going to
  • 00:01:22
    randomly choose a programming language
  • 00:01:24
    and create that function for you but I
  • 00:01:26
    want this model to be trained on Python
  • 00:01:30
    Programming so that whatever question I
  • 00:01:32
    ask it should generate Python program so
  • 00:01:35
    fine-tuning or training a model is
  • 00:01:36
    nothing but teaching how to respond for
  • 00:01:39
    the particular question so in this we'll
  • 00:01:41
    be seeing how to configure how it looks
  • 00:01:44
    like before training how to load the
  • 00:01:46
    data how to train then how it looks
  • 00:01:48
    after training and then how to save it
  • 00:01:51
    but before that I regularly create
  • 00:01:53
    videos in regards to Artificial
  • 00:01:54
    Intelligence on my YouTube channel so do
  • 00:01:56
    subscribe and click the Bell icon to
  • 00:01:57
    stay tuned make sure you click the like
  • 00:01:59
    button so this this video can be helpful
  • 00:02:00
    for many others like you I'm going to
  • 00:02:02
    use M compute and use Marin prisoners
  • 00:02:04
    coupon code to get 50% off I'm inside uh
  • 00:02:07
    the MOs computer now and this is my
  • 00:02:09
    configuration I'm using for NVIDIA RTX
  • 00:02:12
    a6000 but just for fine-tuning llama 3.1
  • 00:02:16
    one graphic card is enough and we'll be
  • 00:02:19
    using UNS sloth to fine-tune our model
  • 00:02:21
    UNS sloth helps us to fine-tune two
  • 00:02:24
    times faster and also with less memory
  • 00:02:26
    so in your terminal pip install hugging
  • 00:02:29
    face hub and then all these packages
  • 00:02:31
    unslot packages and then click enter
  • 00:02:34
    I'll put all the code and the commands
  • 00:02:36
    in the description below after this
  • 00:02:38
    export your hugging face token like this
  • 00:02:40
    in your terminal you can generate
  • 00:02:42
    hugging face token from hugging face
  • 00:02:45
    this is used to upload the fine-tuned
  • 00:02:47
    model to hugging face sometime this also
  • 00:02:50
    required to download a model after this
  • 00:02:52
    click enter next let's create a file
  • 00:02:54
    called app.py and let's open it inside
  • 00:02:56
    the file from UNS sloth import file F
  • 00:03:00
    language model then importing torch OS
  • 00:03:03
    text streamer load data sets sft trainer
  • 00:03:07
    training arguments is B flat 16
  • 00:03:09
    supported now we'll be using all these
  • 00:03:13
    packages to start training so first step
  • 00:03:15
    configuration so configuration maximum
  • 00:03:18
    sequence length dtype loading in 4 bits
  • 00:03:22
    and providing the alpaka format is just
  • 00:03:25
    the instruction input and the response
  • 00:03:28
    these are just basic configuration so
  • 00:03:30
    when we give a instruction and input we
  • 00:03:33
    expect a response from the logge
  • 00:03:35
    language model that's what this prompt
  • 00:03:36
    template mean next we are going to
  • 00:03:38
    predefine some questions such as an
  • 00:03:41
    instruction create a function to
  • 00:03:42
    calculate the sum of sequence of numbers
  • 00:03:45
    and we are providing the numbers so when
  • 00:03:47
    we provide this instruction and input
  • 00:03:48
    here what is going to be the response
  • 00:03:51
    before training we're going to see that
  • 00:03:53
    so step number two before training step
  • 00:03:56
    number two before training and then
  • 00:03:58
    loading the model and organizer using
  • 00:04:01
    fast language model which we defined
  • 00:04:03
    that earlier here then fast language
  • 00:04:06
    model. for inference this is to speed up
  • 00:04:09
    the inference then we are converting the
  • 00:04:12
    input into tokens that means numbers so
  • 00:04:16
    why we need to convert the instruction
  • 00:04:18
    and the input which we Define here to
  • 00:04:21
    numbers or tokens using tokenizer the
  • 00:04:23
    reason is because generally all large
  • 00:04:26
    language models are trained with token
  • 00:04:30
    or numbers so these log language models
  • 00:04:33
    understands only numbers that's why we
  • 00:04:35
    are converting the input text that's the
  • 00:04:38
    instruction and input to numbers next
  • 00:04:40
    text streamer to stream the output and
  • 00:04:43
    finally printing out the response so
  • 00:04:46
    this will allow us to compare how it
  • 00:04:48
    looked before training now next loading
  • 00:04:52
    the data so we need to load the data
  • 00:04:54
    that is Step number three this involves
  • 00:04:56
    defining the end of sentence token then
  • 00:04:59
    create a function for formatting The
  • 00:05:01
    Prompt then after that using the load
  • 00:05:04
    data set function to load the data from
  • 00:05:07
    here this is the python code instruction
  • 00:05:09
    data set so if we open this python code
  • 00:05:12
    instruction data set here is the data
  • 00:05:14
    set so if you view the data you can see
  • 00:05:16
    it consists of instruction input output
  • 00:05:19
    and then the prompt so generally for
  • 00:05:22
    alpaka data set we'll be giving the
  • 00:05:25
    instruction and we'll be giving the
  • 00:05:26
    input as we saw before we are giving the
  • 00:05:29
    instruction like this and the input like
  • 00:05:31
    this and we are expecting a response or
  • 00:05:33
    the output like this so in this way we
  • 00:05:35
    teaching a log language model that if we
  • 00:05:38
    provide any information like instruction
  • 00:05:40
    and input it should automatically give
  • 00:05:42
    you the output like this similarly we
  • 00:05:44
    are having totally 18,000 rows and we
  • 00:05:47
    are going to feed that and train this
  • 00:05:49
    large language model so that's why we
  • 00:05:51
    defined that here so if you want to use
  • 00:05:53
    your custom data set you can just create
  • 00:05:55
    a CSV file or Excel sheet with just
  • 00:05:58
    three columns one is instruction another
  • 00:06:01
    one is input and output so in this case
  • 00:06:04
    I'm using this custom data set and going
  • 00:06:06
    to train LMA 3.1 8 billion parameter
  • 00:06:09
    model so next step I need to convert
  • 00:06:11
    that file to the required format that's
  • 00:06:14
    why we use formatting prompt function
  • 00:06:17
    that's what we Define here so we are
  • 00:06:19
    just taking the instruction column the
  • 00:06:22
    input column and the output column and
  • 00:06:24
    we are merging all those columns
  • 00:06:27
    together in this way we are telling the
  • 00:06:28
    log language model if we provide an
  • 00:06:30
    instruction and input like this I need
  • 00:06:32
    an output like this so that's why we
  • 00:06:34
    created this function earlier so that is
  • 00:06:37
    loading data completed next training the
  • 00:06:40
    model to do that and training the model
  • 00:06:42
    here we are using fast language model
  • 00:06:45
    get P model which means we are not
  • 00:06:48
    training all the parameters in this
  • 00:06:50
    model we are training only few using
  • 00:06:53
    this PFT method you can modify this
  • 00:06:55
    configuration based on your requirement
  • 00:06:57
    now next we need to define the trainer
  • 00:07:00
    that is sft trainer and this is the main
  • 00:07:02
    training function where we provide the
  • 00:07:04
    model the tokenizer the data set the
  • 00:07:07
    data set is a text field maximum
  • 00:07:09
    sequence length the optimizer maximum
  • 00:07:12
    number of steps and you can modify this
  • 00:07:14
    based on your requirement finally it's
  • 00:07:16
    going to save that in the outputs folder
  • 00:07:18
    next I'm going to add some optional
  • 00:07:20
    values just for monitoring the memory so
  • 00:07:23
    these are just optional just for us to
  • 00:07:25
    understand the GPU and the memory usage
  • 00:07:28
    then the key area is this trainer. train
  • 00:07:32
    this is the main training function to
  • 00:07:35
    train the model so after that I'm going
  • 00:07:36
    to print the stats again even this is
  • 00:07:39
    optional so just to understand the
  • 00:07:41
    memory usage and other stats so keep
  • 00:07:45
    this and this as optional next we need
  • 00:07:48
    to see how after training is going to
  • 00:07:50
    look like so step number five after
  • 00:07:52
    training just making sure that we call
  • 00:07:55
    this function to make it fast inference
  • 00:07:58
    next inputs equals tokenizer same as
  • 00:08:00
    before providing in the alpaka data set
  • 00:08:03
    format then text streamer and model.
  • 00:08:06
    generate will generate the response so
  • 00:08:09
    this is after training how it will be
  • 00:08:11
    looking like and the final step is
  • 00:08:12
    saving the model that is six save model
  • 00:08:15
    model do save pre-trained and then
  • 00:08:18
    mentioning the folder that is Lowa
  • 00:08:20
    folder and tokenizer Lowa folder then
  • 00:08:23
    pushing to HUB this will automatically
  • 00:08:25
    push the model to HUB and also the
  • 00:08:27
    tokenizer to HUB generally when you up
  • 00:08:29
    Lo the model and the tokenizer it
  • 00:08:32
    includes only the adapter so generally
  • 00:08:34
    these adapter are the key files which we
  • 00:08:36
    fine-tuned and we can merge that with
  • 00:08:39
    the main model using this merged
  • 00:08:42
    function that's it as a quick overview
  • 00:08:45
    we Define our configuration and created
  • 00:08:47
    a question or instruction and the input
  • 00:08:50
    for us to compare before training and
  • 00:08:51
    after training step number two setting
  • 00:08:54
    up before training that is loading the
  • 00:08:55
    model and testing how it looks like
  • 00:08:57
    before training step number three is
  • 00:08:59
    loading the data set step number four
  • 00:09:01
    training using sft trainer then step
  • 00:09:04
    number five after training how it's
  • 00:09:06
    going to look like and step number six
  • 00:09:08
    saving and pushing into Hub now I'm
  • 00:09:11
    going to run this code in your terminal
  • 00:09:13
    python app. pi and then click enter now
  • 00:09:15
    it is starting the training now if you
  • 00:09:18
    see the instruction create a function to
  • 00:09:20
    calculate the sum of sequence of
  • 00:09:21
    integers and the input is 1 2 3 4 5 the
  • 00:09:25
    response is giving is Javascript but we
  • 00:09:27
    need python as output that's the
  • 00:09:30
    ultimate goal so now it's loading the
  • 00:09:32
    data set now the training is in progress
  • 00:09:36
    and we gave 100 steps you can see the
  • 00:09:39
    loss is going down that's what we need
  • 00:09:42
    now it's near to complete now it is all
  • 00:09:44
    done you can see the memory usage
  • 00:09:47
    everything here and as expected we ask
  • 00:09:50
    create a function to calculate the sum
  • 00:09:51
    of sequence of numbers and now it is
  • 00:09:54
    giving me the correct answer in Python
  • 00:09:56
    this is exciting next we are saving the
  • 00:09:59
    model in this location and you can see
  • 00:10:02
    the model here in hugging face you can
  • 00:10:04
    see this got saved just a minute ago
  • 00:10:07
    with the model and with the adapter file
  • 00:10:10
    you can now use the model directly in
  • 00:10:11
    your own application we have completed
  • 00:10:13
    the step of saving to hugging face now
  • 00:10:16
    final step is to save to ol saving to ol
  • 00:10:20
    involves four different steps simple and
  • 00:10:22
    easy steps first to create ggf format
  • 00:10:25
    second create model using model file
  • 00:10:27
    third olama run to test the the model
  • 00:10:30
    finally ol push to save the model to
  • 00:10:32
    ama.com first let's see creating ggf
  • 00:10:36
    format so first we need to save this in
  • 00:10:39
    ggf format as you can see here save
  • 00:10:42
    pre-trained ggf and you're passing the
  • 00:10:45
    model and the tokenizer and the list of
  • 00:10:48
    quantization method similarly push to
  • 00:10:50
    HUB to save that in Hub that's it now we
  • 00:10:53
    are going to run this code Python local.
  • 00:10:55
    py and then click enter now you can see
  • 00:10:57
    it's saving the token on the model in GG
  • 00:11:00
    UF format it'll try to save in various
  • 00:11:03
    quantization methods as you can see here
  • 00:11:06
    it is going through various steps
  • 00:11:08
    currently it's working on q4k
  • 00:11:11
    quantization and in our code we are
  • 00:11:13
    using three different quantization so
  • 00:11:15
    next it will go through 80 and 5km now
  • 00:11:18
    we can see all the version such as Q5 km
  • 00:11:22
    Q4 km q80 everything got uploaded to
  • 00:11:27
    this location and you can see those ggf
  • 00:11:30
    format here uploaded in this location
  • 00:11:33
    next we are going to see how we can
  • 00:11:34
    create a model that is olama model using
  • 00:11:36
    model file next create a model file m
  • 00:11:39
    modl f i l e this is for O then inside
  • 00:11:44
    that file you can see I mentioned the
  • 00:11:46
    path where my ggf file got stored which
  • 00:11:50
    you can see in my folder structure the
  • 00:11:53
    GF got stored in this location in this
  • 00:11:55
    path and you can see the list of files
  • 00:11:58
    here so you can even just just right
  • 00:11:59
    click copy relative path then paste that
  • 00:12:02
    here in this location that's it so these
  • 00:12:05
    are default templates which I'm using
  • 00:12:08
    you can also copy the same template and
  • 00:12:10
    create this model file now after this in
  • 00:12:12
    your terminal I'm using Linux so I'm
  • 00:12:16
    using this command to install olama but
  • 00:12:19
    in your case it could be your own Mac
  • 00:12:22
    computer or desktop so you can directly
  • 00:12:25
    download from their own website so you
  • 00:12:27
    can see it's downloading Ama and it is
  • 00:12:29
    running in this URL now after this o
  • 00:12:32
    Lama create hyphen f model file that is
  • 00:12:36
    the model file and then the path to the
  • 00:12:39
    model me is my username which I created
  • 00:12:42
    from ama.com just go to o.com and you
  • 00:12:45
    should be able to sign in create your
  • 00:12:48
    own account to publish and share model
  • 00:12:51
    in olama so that is my username and the
  • 00:12:54
    model name and then click enter now the
  • 00:12:57
    model got created next we run oama run
  • 00:13:00
    to test the model that is oama run me/
  • 00:13:04
    Lama 3.1 hyphen Python and click enter
  • 00:13:07
    now the model Got Loaded I can say
  • 00:13:09
    create a function to add these numbers 1
  • 00:13:13
    2 3 4 5 click answer and it's able to
  • 00:13:16
    generate the function now clicking back
  • 00:13:18
    slash exit to exit the final step is to
  • 00:13:22
    push that model to oama to do that you
  • 00:13:24
    might need to generate your sh key using
  • 00:13:27
    this and then click enter enter so now
  • 00:13:30
    the key got created in this location I
  • 00:13:32
    might need to move this to a different
  • 00:13:34
    location as well so I'm going to type
  • 00:13:37
    PSE sudo and copying the saved private
  • 00:13:40
    key in this location after this click
  • 00:13:43
    enter now it got copied now you need to
  • 00:13:46
    get the public Key by typing this
  • 00:13:49
    command and then click enter so now
  • 00:13:51
    you're going to copy this public key go
  • 00:13:53
    to olama after logging in go to settings
  • 00:13:57
    AMA keys there you should should be able
  • 00:13:59
    to add your public key click that add
  • 00:14:02
    the key and click add that's it now
  • 00:14:04
    coming back to our terminal ol push me
  • 00:14:08
    and the model name and then click enter
  • 00:14:10
    this will automatically save the model
  • 00:14:13
    remotely in Olo now it's all completed
  • 00:14:17
    by going to my models you should be able
  • 00:14:20
    to see your model listed there updated
  • 00:14:22
    just now now you can run this command
  • 00:14:25
    anywhere in any computer and run the mod
  • 00:14:29
    model which you have just trained as
  • 00:14:31
    simple as that now you are able to
  • 00:14:33
    create your own model train your own
  • 00:14:36
    model with your own custom data save
  • 00:14:38
    that in hugging face save that in olama
  • 00:14:41
    so that anyone can use it I'm really
  • 00:14:43
    excited about this I'm going to create
  • 00:14:45
    more videos similar to this so stay
  • 00:14:46
    tuned I hope you like this video do like
  • 00:14:49
    share and subscribe and thanks for
  • 00:14:50
    watching
タグ
  • Llama 3.1
  • Fine-tuning
  • Custom data
  • Python generation
  • AI
  • Hugging Face
  • Olama
  • Training process
  • Model saving
  • Artificial Intelligence