AWS re:Invent 2023 - Train and tune state-of-the-art ML models on Amazon SageMaker (AIM335)

00:54:18
https://www.youtube.com/watch?v=i2-M7x9dJXQ

الملخص

TLDRThe presentation highlights the evolution and challenges of training state-of-the-art machine learning models using Amazon SageMaker. Gal Oshri introduces the session and discusses the growing interest in deep learning applications, along with the benefits of using SageMaker to handle large-scale model training challenges, including infrastructure orchestration, data management, and cost efficiency. Emily Webber elaborates on fine-tuning and pre-training large language models with practical demonstrations, emphasizing the ease of use and scalability of SageMaker. Tom Kollar shares insights from the Toyota Research Institute, detailing their application of SageMaker for various machine learning tasks, particularly in robotics, and highlights the importance of robust training infrastructure. The session underscores advancements like smart data sifting and distributed training, showcasing SageMaker as a versatile tool for model development.

الوجبات الجاهزة

  • 👤 Gal Oshri introduces SageMaker and its capabilities.
  • 📊 Emily discusses fine-tuning large language models with a live demo.
  • 🚗 Tom shares insights from Toyota Research on using SageMaker in robotics.
  • ⚙️ The challenges of training large models include hardware utilization and cost efficiency.
  • 📦 Amazon SageMaker streamlines the training process with its estimator API.
  • 📈 Smart sifting can reduce training time by filtering uninformative samples.
  • 💾 SageMaker allows seamless integration of various data sources for training.
  • 🔧 The importance of cluster repair features for uninterrupted training.
  • 🌐 Advances in transformer architecture improve model performance significantly.
  • 🔍 Customers can securely customize third-party models without exposing their data.

الجدول الزمني

  • 00:00:00 - 00:05:00

    Gal Oshri introduces himself and colleagues, presenting on training and tuning state-of-the-art ML models on AWS SageMaker. The audience is involved with a count of how many are currently training ML models, especially on larger scales. The talk outlines the agenda, including challenges in large-scale model training, demonstrations, and research use cases.

  • 00:05:00 - 00:10:00

    Machine learning is showcased as a versatile tool across various applications like recommendations and autonomous driving, highlighting recent advances in image generation from ML models. Gal discusses how algorithmic improvements, particularly the transformer architecture, and increased availability of data and compute power have contributed to the enhancement of model outputs over recent years.

  • 00:10:00 - 00:15:00

    Challenges in training large-scale ML models are detailed, including the need for efficient hardware, proper orchestration, dataset management, scaling infrastructure, and cost management. Gal emphasizes the importance of optimizing both financial costs and team resources, advocating for effective use of SageMaker in addressing these challenges.

  • 00:15:00 - 00:20:00

    Amazon SageMaker is presented as a solution to the aforementioned challenges. A high-level overview of SageMaker's operation is provided, detailing how it streamlines the training process through a structured API that manages compute resources, health checks, and data integration, ensuring efficient model training and cost-effectiveness.

  • 00:20:00 - 00:25:00

    SageMaker’s capabilities for loading data, using built-in algorithms, and providing distributed training options were discussed. Features like logging, checkpoint synchronization, and instance management further highlight how SageMaker can facilitate efficient training processes while ensuring resiliency against failures during training jobs.

  • 00:25:00 - 00:30:00

    Training model efficiency and performance tracking using SageMaker tools like the profiler were discussed. The profiler helps optimize GPU usage and performance issues, which is critical for minimizing the overall training costs and times through better hardware utilization.

  • 00:30:00 - 00:35:00

    Emily introduces the concept of smart sifting—a new feature in SageMaker aimed at improving training efficiency by refining data during training, thus reducing overall training time and costs up to 35% without negatively impacting model accuracy. She presents how to implement this feature simply using SageMaker's framework.

  • 00:35:00 - 00:40:00

    The introduction of Amazon SageMaker HyperPod is announced for managing large-scale training instances while allowing easier access and control over the training environment, providing managed benefits without sacrificing performance with features such as resilience and reduced setup time.

  • 00:40:00 - 00:45:00

    The discussion shifts to fine-tuning models, emphasizing how it can be executed on proprietary models without exposing sensitive data. New security enhancements in SageMaker facilitate this process while maintaining the model’s confidentiality, showcasing an efficient end-to-end workflow.

  • 00:45:00 - 00:54:18

    Emily transitions to her segment focused on fine-tuning and pre-training large language models (LLMs) on SageMaker. She explains various customization techniques for LLMs, from simpler adjustments like prompt engineering to more complex methods like retrieval-augmented generation and pre-training new foundation models.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • What is Amazon SageMaker?

    Amazon SageMaker is a fully managed service that provides tools to build, train, and deploy machine learning models quickly and efficiently.

  • What are the benefits of using SageMaker for training models?

    SageMaker helps automate the orchestration of training jobs, monitors hardware health, offers distributed training libraries, and provides tools for monitoring and managing costs.

  • How does SageMaker assist with large model training?

    It offers tools for easy scaling of infrastructure, enables distributed training, and ensures fault tolerance through cluster repair features.

  • What is smart sifting in the context of SageMaker?

    Smart sifting is a technique that filters out less informative training data, potentially reducing training time and costs without impacting model accuracy.

  • How does Toyota Research Institute utilize SageMaker?

    TRI uses SageMaker for various applications including training large language models, robotics projects, and serving models in production.

  • What advancements have impacted deep learning in recent years?

    Significant algorithmic improvements, particularly the introduction of transformer architecture, along with increased datasets, model sizes, and computational resources.

  • How can one get started with SageMaker?

    Users can easily initiate a training job by using the estimator API, which requires minimal setup and configuration.

  • What types of customization can be done on large language models?

    Models can be customized through techniques like prompt engineering, retrieval augmented generation, fine-tuning, and pre-training.

  • What role does model parallel training play in SageMaker?

    Model parallel training allows the distribution of a neural network over multiple GPUs or accelerators to optimize performance.

  • Can existing models be fine-tuned on SageMaker?

    Yes, SageMaker allows users to fine-tune third-party models on their data securely without exposing their data to the model provider.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:00
    - Good afternoon everyone.
  • 00:00:01
    My name is Gal Oshri, I'm a product manager
  • 00:00:03
    at AWS working on SageMaker.
  • 00:00:06
    I'm here with Emily Webber and Thomas Kollar,
  • 00:00:08
    to talk to you about training and tuning state-of-the-art
  • 00:00:11
    machine learning models on Amazon SageMaker.
  • 00:00:14
    Before we start, how many of you're already training
  • 00:00:16
    machine learning models today?
  • 00:00:20
    Awesome.
  • 00:00:20
    How many of you're training models
  • 00:00:21
    with more than 10 GPUs or accelerators?
  • 00:00:25
    All right, anyone with more than a hundred?
  • 00:00:29
    Alright, a thousand?
  • 00:00:31
    No.
  • 00:00:32
    Alright, cool, well, today we'll learn a bit about that.
  • 00:00:37
    So we'll talk about the challenges,
  • 00:00:39
    for training large scale machine learning models.
  • 00:00:42
    And then we'll talk about how SageMaker
  • 00:00:43
    can help you train those models.
  • 00:00:45
    Emily will then talk to you about fine tuning
  • 00:00:47
    and pre-training large language models,
  • 00:00:50
    and show you a demo, training Llama 7 B on SageMaker.
  • 00:00:54
    And then we'll hear from Tom,
  • 00:00:55
    about Toyota Research Institute
  • 00:00:57
    and their machine learning use cases.
  • 00:01:03
    So machine learning has already proven itself useful
  • 00:01:05
    across a wide range of applications.
  • 00:01:07
    From recommendations, to credit risk prediction,
  • 00:01:10
    and autonomous driving, to document analysis.
  • 00:01:14
    But recently there's been an explosion in interest
  • 00:01:16
    in deep learning models for computer vision
  • 00:01:19
    and natural language processing.
  • 00:01:22
    Just a few years ago, this is the type of image you would
  • 00:01:25
    get if you tried to generate a very clean living room
  • 00:01:28
    with a machine learning model.
  • 00:01:30
    You can tell that it's fake, it's not really coherent,
  • 00:01:32
    you can't tell that it's a living room.
  • 00:01:36
    And just a few years later, we can now generate images
  • 00:01:39
    like this, where you'd have to look really closely
  • 00:01:42
    to tell that it's not a real image.
  • 00:01:44
    And I showed something similar a year ago at re:Invent,
  • 00:01:47
    and at that time this was kind of shocking, right,
  • 00:01:49
    seeing this type of image get generated from a model.
  • 00:01:52
    But a year later, and I think many people in the audience
  • 00:01:54
    have already seen these types of images get generated,
  • 00:01:57
    and the quality that you can get
  • 00:01:58
    with machine learning models.
  • 00:02:01
    So how did this happen?
  • 00:02:03
    Well first, there were notable algorithmic improvements
  • 00:02:05
    over the last few years.
  • 00:02:07
    Specifically the transformer architecture,
  • 00:02:09
    which is used in many of the large scale models
  • 00:02:12
    that you hear about today.
  • 00:02:14
    However, there's also an increase in the data sets,
  • 00:02:17
    the model sizes, and the amount of compute that is used to
  • 00:02:20
    train these models.
  • 00:02:22
    And a lot of the research shows that we can continue
  • 00:02:24
    increasing these dimensions,
  • 00:02:26
    to get better and better results.
  • 00:02:29
    So to be competitive, you really have to think about
  • 00:02:31
    how do you leverage the these advancements
  • 00:02:34
    to provide the best experiences
  • 00:02:35
    for your customers with machine learning.
  • 00:02:39
    Okay, so training large scale models is awesome.
  • 00:02:41
    Let's just train the biggest one immediately
  • 00:02:43
    and be done with it, right?
  • 00:02:45
    But, it's a bit more complicated than that.
  • 00:02:47
    There's some challenges.
  • 00:02:49
    The first, is that you want to use the latest hardware.
  • 00:02:52
    Every few years there are innovations in hardware
  • 00:02:54
    that lead to two to nine x improvements in training
  • 00:02:57
    efficiency.
  • 00:02:59
    But it's not enough to get access to the latest hardware,
  • 00:03:02
    you have to think about how well it works.
  • 00:03:04
    Is it, like fault resistant enough, to be able to let you
  • 00:03:08
    to continue your training with minimal interruptions
  • 00:03:10
    to the machine learning team?
  • 00:03:13
    You have to think about orchestration,
  • 00:03:15
    and how to most effectively use the
  • 00:03:16
    resources you have available.
  • 00:03:18
    Especially if you have a large team of data scientists
  • 00:03:21
    who want to train many models in parallel.
  • 00:03:24
    We talked about how you want to have larger data sets,
  • 00:03:27
    and being able to store, load, and process
  • 00:03:29
    these large data sets, can require a lot of work.
  • 00:03:33
    And there are a lot of pitfalls in doing that.
  • 00:03:36
    You wanna think about scaling up.
  • 00:03:38
    Both the infrastructure, to get more compute for training
  • 00:03:41
    the model, as well as the algorithms that you use.
  • 00:03:44
    The models that we train today, for these use cases,
  • 00:03:47
    do not fit on a single accelerator.
  • 00:03:49
    So you have to think about the algorithms
  • 00:03:50
    that you need to use to scale up.
  • 00:03:53
    And finally, we have to think about cost.
  • 00:03:56
    Training these models can cost hundreds of thousands,
  • 00:03:58
    or millions of dollars.
  • 00:04:00
    So you need to think about efficiency when you're training
  • 00:04:02
    those models.
  • 00:04:03
    Especially at the beginning, when you're doing
  • 00:04:04
    sporadic experimentation, you're trying out different ideas,
  • 00:04:08
    and you don't use the hardware all the time.
  • 00:04:11
    Right, so you want to think about how you use that
  • 00:04:12
    efficiently.
  • 00:04:14
    And it's not just the financial cost,
  • 00:04:15
    but the team's time, right?
  • 00:04:17
    A lot of customers tell us, that making sure that their
  • 00:04:20
    ML engineers are not spending time dealing
  • 00:04:22
    with infrastructure, is one of their top priorities.
  • 00:04:27
    But not all hope is lost.
  • 00:04:29
    Amazon SageMaker can help with many of these challenges.
  • 00:04:34
    I'll give a high level overview of how SageMaker works,
  • 00:04:37
    but you'll see it in a lot more detail during Emily's demo.
  • 00:04:41
    We start, by calling the create training job API.
  • 00:04:45
    This Sage API captures information about your dataset,
  • 00:04:48
    your compute resource configuration,
  • 00:04:50
    as well as the training algorithm that you want to use.
  • 00:04:54
    SageMaker will then set up the cluster
  • 00:04:56
    for training the model,
  • 00:04:58
    with the right VPC and networking configurations,
  • 00:05:01
    by default, to save you a lot of time,
  • 00:05:03
    but you can configure all of it yourself as well
  • 00:05:06
    and add the flexibility that you need.
  • 00:05:09
    As part of spinning up the cluster,
  • 00:05:11
    SageMaker will also run health checks on the hardware,
  • 00:05:14
    to make sure that everything is working effectively.
  • 00:05:16
    before the job even begins, and before the billing starts.
  • 00:05:20
    So this saves you time and money,
  • 00:05:22
    to make sure that the training can continue efficiently.
  • 00:05:28
    SageMaker will then load data from S3, EFS,
  • 00:05:31
    or FSX for Luster, and you have options to either copy
  • 00:05:35
    or stream the data.
  • 00:05:36
    Depending on your dataset size, one of those might be
  • 00:05:38
    more applicable.
  • 00:05:40
    But again, and you'll hear this theme again, again,
  • 00:05:42
    while SageMaker provides great options for
  • 00:05:45
    getting started and moving quickly,
  • 00:05:47
    you also have the flexibility to do what you want,
  • 00:05:50
    and load data from other sources.
  • 00:05:54
    You can then download the training image from ECR,
  • 00:05:57
    and you have options of built-in algorithms in SageMaker.
  • 00:06:01
    You can use one of the SageMaker deep learning containers
  • 00:06:03
    to quickly use PyTorch, TensorFlow, or Hugging Face,
  • 00:06:06
    or you can bring your own training image,
  • 00:06:09
    with your own algorithms completely.
  • 00:06:12
    SageMaker also offers distributed training libraries
  • 00:06:14
    that can help accelerate both data, and model parallel
  • 00:06:17
    training, and you'll hear more about that later.
  • 00:06:22
    SageMaker, then starts a training, and streams the logs
  • 00:06:24
    to CloudWatch throughout training.
  • 00:06:27
    It stores the metadata and hyper parameters,
  • 00:06:29
    so you can view it later.
  • 00:06:30
    And you have, again, options for using TensorBoard,
  • 00:06:32
    and other tools, to visualize your experiments.
  • 00:06:37
    It will synchronize your checkpoints throughout training,
  • 00:06:39
    to your storage, which is critical, if you are, you know,
  • 00:06:43
    you want to be fault resistant in case something fails
  • 00:06:46
    during training, you don't want to lose your progress
  • 00:06:48
    until that point.
  • 00:06:51
    At the end of the training, SageMaker will save the model
  • 00:06:54
    and other output data so you can revisit it later.
  • 00:06:58
    At the end of the training, SageMaker spins down all the
  • 00:07:01
    compute, so that if the job fails at 3:00 AM,
  • 00:07:04
    no one has to wake up to turn anything off
  • 00:07:06
    and make sure that you're not paying for all that hardware,
  • 00:07:09
    all those instances running,
  • 00:07:10
    without being used to train a model.
  • 00:07:13
    And with the same paradigm,
  • 00:07:15
    we can actually scale up our training
  • 00:07:17
    to many more instances really easily,
  • 00:07:19
    to get those large scale models.
  • 00:07:23
    One really awesome feature that we launched this year,
  • 00:07:26
    is a cluster repair feature.
  • 00:07:28
    So if during the training, one of the instances fails,
  • 00:07:31
    we look at what happened to that instance
  • 00:07:34
    and decide whether we need to reboot it
  • 00:07:36
    or replace it with a different instance,
  • 00:07:38
    and then restart the training within a few minutes.
  • 00:07:41
    So there are all these resiliency capabilities
  • 00:07:43
    to ensure the training continues as quickly as possible
  • 00:07:46
    and without manual intervention.
  • 00:07:52
    In case any of that sounds intimidating,
  • 00:07:54
    the good news, it's actually really easy to get started.
  • 00:07:57
    The most important code for converting model training
  • 00:07:59
    to a SageMaker training job, is the estimator API.
  • 00:08:04
    You'll see more in the demo later,
  • 00:08:05
    but at a high level, the API takes a a Python file
  • 00:08:09
    or an entry point, in this case cifar10.pi,
  • 00:08:13
    which is very similar to how I would do the model training
  • 00:08:16
    on my laptop.
  • 00:08:18
    I also provide the instance type I want to use,
  • 00:08:21
    how many of them, and hyper parameters,
  • 00:08:24
    which I can easily change later
  • 00:08:25
    to try additional training jobs.
  • 00:08:28
    I also add metric definitions, so that I can view
  • 00:08:31
    those metrics in CloudWatch during the training.
  • 00:08:35
    Finally, I provide a path to my data and call estimator.fit.
  • 00:08:42
    Now the even better news, is that we recently made it
  • 00:08:44
    even easier to get started.
  • 00:08:47
    Now, you can take your existing Python code
  • 00:08:50
    and add the remote python decorator to it,
  • 00:08:53
    to immediately serialize the runtime,
  • 00:08:56
    the packages, functions, and everything else,
  • 00:08:58
    so that it runs as a SageMaker training job
  • 00:09:01
    without even having to learn about the estimator API.
  • 00:09:07
    Once the training job begins,
  • 00:09:08
    you can easily view the metadata and reproduce it later,
  • 00:09:12
    or clone that training job.
  • 00:09:14
    And you know, like tracking experiments
  • 00:09:15
    is extremely important.
  • 00:09:17
    You want to learn from what you've done before.
  • 00:09:19
    And we often see people who keep the training results
  • 00:09:22
    in a spreadsheet, or even in a document,
  • 00:09:24
    and pass it around the team,
  • 00:09:25
    but that makes it more difficult to collaborate
  • 00:09:28
    and learn from past experiments.
  • 00:09:30
    So by automatically keeping all of this in one place,
  • 00:09:33
    it becomes much easier to learn from your mistakes
  • 00:09:36
    and build better models.
  • 00:09:42
    Now let's move on from tracking the training,
  • 00:09:44
    to improving the performance,
  • 00:09:46
    specifically the training speed, which impacts how much time
  • 00:09:49
    you end up requiring to like use the instances
  • 00:09:52
    to train the model, and the overall project completion time,
  • 00:09:56
    as well as the cost.
  • 00:09:58
    Now, the SageMaker profiler, is an ML observability tool
  • 00:10:01
    that enables you to understand hardware utilization
  • 00:10:04
    and root cause performance issues
  • 00:10:06
    to maximize the efficiency of your model training.
  • 00:10:09
    On the dashboard shown here, we can see some overall metrics
  • 00:10:13
    around the GPU usage, and you want that to be as high
  • 00:10:16
    as possible, as well as the GPU usage throughout
  • 00:10:19
    the training job, across each individual node
  • 00:10:22
    within your cluster.
  • 00:10:24
    So you can see that even if your utilization overall might
  • 00:10:27
    be high, within some intervals, there might be
  • 00:10:29
    low utilization that you want to check out a bit more.
  • 00:10:34
    Lower down on the dashboard,
  • 00:10:35
    there are other metrics.
  • 00:10:36
    For example, the total time spent on each GPU kernel.
  • 00:10:40
    So that gives you additional hints about what you want
  • 00:10:42
    to optimize next, to further improve your training.
  • 00:10:48
    There's another page in the profiler,
  • 00:10:49
    showing a more detailed timeline view,
  • 00:10:52
    that allows you to get data from your host and devices
  • 00:10:56
    at all the different levels,
  • 00:10:57
    so you can dig deeper to understand
  • 00:10:59
    what is happening at each point.
  • 00:11:04
    Now I'm excited to announce a preview of a new capability
  • 00:11:07
    in SageMaker, smart sifting of data.
  • 00:11:10
    Smart sifting is an online data refinement technique
  • 00:11:12
    that can reduce your deep learning training time and cost
  • 00:11:15
    by up to 35%.
  • 00:11:18
    Now when you train your models,
  • 00:11:20
    and we talked about wanting to use larger data sets,
  • 00:11:23
    it also matters about the quality of the data sets.
  • 00:11:26
    And, some samples in your data might be less informative
  • 00:11:30
    to your model training,
  • 00:11:31
    or you might have seen those samples already.
  • 00:11:33
    There might be duplicate data or similar data.
  • 00:11:36
    And it's often difficult to pre-process that data
  • 00:11:39
    and remove the data that you don't want
  • 00:11:41
    in the training anymore.
  • 00:11:42
    So smart sifting helps, because it analyzes your data
  • 00:11:45
    during the training job, and filters out
  • 00:11:47
    the low loss samples, which are less informative
  • 00:11:50
    to the model.
  • 00:11:52
    By training on a subset of your data,
  • 00:11:53
    you can reduce the time and cost of the training
  • 00:11:56
    by up to 35%.
  • 00:11:58
    And because it only filters out the low loss samples,
  • 00:12:02
    it has minimal or no impact, to the final training accuracy.
  • 00:12:08
    And it's easy to get started,
  • 00:12:09
    because it does not require you to make changes
  • 00:12:11
    to your data or training pipeline.
  • 00:12:16
    Here's a simple example that uses smart sifting.
  • 00:12:20
    We use the SageMaker deep learning container,
  • 00:12:22
    we load the sifting data loader,
  • 00:12:25
    and then we wrap whatever existing data loader we use
  • 00:12:28
    with the sifting data loader,
  • 00:12:29
    and provide a bit more configuration to start using it.
  • 00:12:33
    I don't need to change the rest of my model
  • 00:12:35
    or data pipeline.
  • 00:12:38
    And we already have customers who are seeing great results
  • 00:12:40
    with this capability.
  • 00:12:42
    For example, LG AI research,
  • 00:12:44
    use it to get meaningful increase in training performance
  • 00:12:47
    without any changes to the model accuracy.
  • 00:12:54
    At re:Invent, we also announced Amazon SageMaker HyperPod.
  • 00:12:58
    This enables customers who are training large scale models
  • 00:13:00
    to get all the managed benefits of SageMaker
  • 00:13:03
    that we've been discussing today,
  • 00:13:05
    with a UX that they might be more familiar with,
  • 00:13:07
    of being able to access the instances directly,
  • 00:13:10
    using Slerm and so on.
  • 00:13:13
    It has similar resilient capabilities to what we discussed
  • 00:13:15
    for training, in terms of replacing faulty instances
  • 00:13:20
    and enabling the training to begin a bit more quickly,
  • 00:13:24
    saving up to 20% in time.
  • 00:13:27
    It also benefits from the optimized distributed training
  • 00:13:30
    libraries, that also improve performance,
  • 00:13:32
    for both model and data parallel training.
  • 00:13:37
    And I mentioned it provides more granular control
  • 00:13:40
    over the cluster in what you're doing,
  • 00:13:42
    being able to access the instances directly,
  • 00:13:45
    install additional software, and make any changes
  • 00:13:47
    that you want to the cluster, to be able to fine tune
  • 00:13:50
    your training a bit more.
  • 00:13:54
    Now, we've talked about training really large scale models,
  • 00:13:57
    but sometimes you don't need to do that.
  • 00:13:58
    Sometimes you just want to fine tune an existing model.
  • 00:14:01
    And that's beneficial if you have an existing model,
  • 00:14:04
    a foundation model, and you want to bring in your own data
  • 00:14:08
    to fine tune that model to a particular use case.
  • 00:14:12
    So, by bringing in your own data,
  • 00:14:14
    you're making the model better than if you were just using
  • 00:14:16
    an off the shelf foundation model.
  • 00:14:18
    But, it saves you a lot of time and money,
  • 00:14:20
    because you don't have to train
  • 00:14:21
    that whole model from scratch.
  • 00:14:24
    Now the challenge is, that some models are not open sourced.
  • 00:14:27
    Right, you can't download the model weights
  • 00:14:29
    and fine tune them yourselves in an existing
  • 00:14:31
    SageMaker training job.
  • 00:14:33
    But, this has changed with enhancements we've made
  • 00:14:35
    to SageMaker algorithms and model packages.
  • 00:14:39
    You can now easily and securely customize third party models
  • 00:14:42
    by fine tuning them on your private data.
  • 00:14:45
    This provides end-to-end security.
  • 00:14:47
    The model provider can provide their model without revealing
  • 00:14:50
    their model weights, and you as a customer,
  • 00:14:53
    can fine tune on that model by bringing in your own data,
  • 00:14:56
    without exposing that data to the model provider.
  • 00:14:59
    And the final model weights, after your fine tuning,
  • 00:15:02
    are also only available to you.
  • 00:15:06
    Now this can be done with a variety of models
  • 00:15:07
    and algorithms, for example, Cohere models.
  • 00:15:12
    And all this is easy to use, and done through
  • 00:15:14
    the Python SageMaker SDK, that we were discussing earlier,
  • 00:15:18
    and integrates with other SageMaker capabilities,
  • 00:15:20
    like SageMaker experiments and pipelines.
  • 00:15:25
    And of course, with SageMaker inference,
  • 00:15:27
    you can deploy the models at the end,
  • 00:15:29
    to use them for inference in production scenarios,
  • 00:15:33
    in a secure way.
  • 00:15:36
    I'll now hand it over to Emily, to talk about fine tuning
  • 00:15:38
    and pre-training LLMs on SageMaker.
  • 00:15:41
    - Alright, thanks Gal.
  • 00:15:43
    Great, so I hope you're as excited as I am about a lot
  • 00:15:46
    of these new launches, new features.
  • 00:15:48
    I should introduce myself.
  • 00:15:50
    My name's Emily Webber, I lead our generative AI
  • 00:15:53
    foundation's technical field community here at AWS.
  • 00:15:57
    And in particular, some of those launches,
  • 00:16:00
    many of them came directly from conversations with you.
  • 00:16:03
    Actually, we were listening with customers
  • 00:16:05
    and chatting with you to understand key capabilities
  • 00:16:08
    that you wanted to see in our training stack,
  • 00:16:11
    and this led to a number of the features
  • 00:16:12
    that you've just learned about.
  • 00:16:16
    In any case, there are many ways to customize
  • 00:16:20
    a large language model.
  • 00:16:22
    Here I'm presenting them on two axis, right?
  • 00:16:25
    So at the bottom you have roughly complexity and then cost.
  • 00:16:29
    Obviously you wanna be closer to the left.
  • 00:16:32
    You want your LLM customization techniques
  • 00:16:35
    to be roughly easy, because then of course that's faster
  • 00:16:39
    for you to get started, and then that's less expensive.
  • 00:16:42
    However, there is a progression of these techniques.
  • 00:16:46
    Most customers will start with prompt engineering,
  • 00:16:51
    which is a nice way to easily improve
  • 00:16:54
    and then customize your large language model.
  • 00:16:56
    However, it's not as accurate as some of these
  • 00:16:59
    extra techniques that you can use.
  • 00:17:02
    Most customers will move from prompt engineering into
  • 00:17:06
    what we call a retrieval augmented generation stack,
  • 00:17:10
    where you have some set of data, you're converting
  • 00:17:13
    that data into embeddings, or that dense representation,
  • 00:17:17
    and then retrieving those documents
  • 00:17:19
    to interact with your consumers.
  • 00:17:22
    This then can transform, if you will,
  • 00:17:25
    into a fine tuning stack.
  • 00:17:27
    Actually, there's a bit of an overlap there,
  • 00:17:30
    but in any case, you can take, as Gal mentioned,
  • 00:17:33
    custom data, fine tune your model to add
  • 00:17:37
    that extra knowledge.
  • 00:17:39
    All of these techniques, however, pale in comparison
  • 00:17:43
    to the holy grail, which is pre-training,
  • 00:17:45
    which is creating a net new foundation model.
  • 00:17:49
    And so all of these techniques are available on SageMaker
  • 00:17:53
    and well supported by the stack.
  • 00:17:55
    So we're gonna learn how to move from
  • 00:17:59
    fine tuning into pre-training,
  • 00:18:01
    during our session here today.
  • 00:18:04
    Now, fine tuning small models is really impactful.
  • 00:18:09
    Here are a couple reasons why you would consider
  • 00:18:11
    fine tuning a small model.
  • 00:18:13
    The first is, of course, it's less expensive.
  • 00:18:16
    You're going to use a smaller dataset,
  • 00:18:18
    possibly a smaller model, and then still improve accuracy
  • 00:18:24
    because you're fine tuning this model,
  • 00:18:25
    but you're keeping your costs down.
  • 00:18:28
    When you're working with a smaller model,
  • 00:18:30
    such as something in the 7 billion parameters,
  • 00:18:33
    this is inherently faster, because the model itself
  • 00:18:36
    is just physically smaller than some of those larger ones,
  • 00:18:40
    and so the training time is faster,
  • 00:18:43
    the inferencing time is faster,
  • 00:18:45
    which means you can train more models
  • 00:18:48
    and you can do more inferencing,
  • 00:18:50
    again, with that smaller object.
  • 00:18:53
    Because the object is smaller,
  • 00:18:55
    it's easier for you to manage.
  • 00:18:57
    And so, again, the storage requirements are smaller,
  • 00:19:01
    so it's easier for you to copy the model.
  • 00:19:03
    It's easier for you to put the model into your applications,
  • 00:19:07
    and your packages, and your CICD pipelines,
  • 00:19:10
    and your repositories.
  • 00:19:12
    Many customers inherently prefer the ownership that comes
  • 00:19:16
    with creating new models, particular through fine tuning,
  • 00:19:20
    and then again, pre-training.
  • 00:19:21
    This allows you to increase the IP of your firm.
  • 00:19:25
    And then of course you have more deployment options when
  • 00:19:28
    you're fine tuning, again, that small model.
  • 00:19:32
    The more deployment options include serverless,
  • 00:19:34
    actually I have customers who create and then fine tune
  • 00:19:39
    these small 7 billion parameter models, compile them,
  • 00:19:43
    and then host them on Lambda, (chuckling)
  • 00:19:45
    and run them on serverless inferencing.
  • 00:19:47
    And so, absolutely, when you're working with these
  • 00:19:50
    tiny models, that are knowledgeable in small domains,
  • 00:19:55
    you have a lot of flexibility.
  • 00:19:58
    Pre-training is really best for extremely large data sets.
  • 00:20:03
    So when you have hundreds of GBs, or multiple terabytes
  • 00:20:07
    of custom language data that just really is not online,
  • 00:20:11
    if it, the language data that you have,
  • 00:20:15
    if it's not in Wikipedia, if it's not on Reddit,
  • 00:20:20
    if it's the core language that you're using,
  • 00:20:23
    if when you, you know, take a sentence and try and put
  • 00:20:26
    that sentence into Wikipedia, for example,
  • 00:20:29
    if Wikipedia doesn't understand what you're trying to say,
  • 00:20:31
    you may wanna consider seriously customizing a language
  • 00:20:35
    model, and then possibly creating a new one from scratch.
  • 00:20:39
    Now, why is this the case?
  • 00:20:41
    Why is pre-training so powerful?
  • 00:20:44
    Part of this is because the pre-training loss function
  • 00:20:48
    is more generalizable.
  • 00:20:49
    So when you're creating that new foundation model
  • 00:20:53
    from scratch, the learning is slightly different.
  • 00:20:56
    It's more general, and it's deeper
  • 00:20:58
    in the neural network, actually.
  • 00:21:01
    Also, when you're creating a new foundation model,
  • 00:21:04
    you can do this without supervised data.
  • 00:21:07
    So you don't need to go label, you know, millions of records
  • 00:21:11
    in pre-training, you can just capture and tokenize
  • 00:21:14
    a terabyte of your own language data
  • 00:21:18
    and then throw that into the network.
  • 00:21:20
    There's no need to add additional supervision on top of
  • 00:21:22
    that, which makes it very attractive.
  • 00:21:26
    Also, I love to see the efficiency gains
  • 00:21:29
    of pre-training, actually.
  • 00:21:30
    We all have small teams, we all have have few resources
  • 00:21:34
    for data science and modeling, and so,
  • 00:21:36
    when we take our small teams and focus them on one project,
  • 00:21:40
    and create this one massive, you know,
  • 00:21:42
    powerful foundation model,
  • 00:21:44
    and then use the foundation model
  • 00:21:46
    in many, many applications,
  • 00:21:48
    it actually, I find, is more efficient than optimizing
  • 00:21:52
    and then maintaining our tiny ML ops workloads,
  • 00:21:56
    which is what many of us were doing,
  • 00:21:58
    prior to transformers.
  • 00:22:01
    So, what does it take, to pre-train a new foundation model?
  • 00:22:07
    It sounds scary, it sounds like only, you know,
  • 00:22:09
    the the best can do this, but in fact,
  • 00:22:13
    in large part, due to, you know, a very sophisticated
  • 00:22:17
    and very mature training infrastructure that you're here
  • 00:22:19
    to learn about, it's actually pretty accessible.
  • 00:22:22
    So how are we gonna do this?
  • 00:22:24
    So here are three example models that were pre-trained
  • 00:22:28
    and created from scratch on Amazon SageMaker,
  • 00:22:31
    specially on our training infrastructure.
  • 00:22:33
    Stable Diffusion, clocking in at 5 billion images
  • 00:22:38
    and 240 terabytes of image data.
  • 00:22:41
    And so of course, that's a lot.
  • 00:22:43
    And so, image models tend to take a lot of data,
  • 00:22:47
    but the models themselves are a bit smaller.
  • 00:22:50
    And so you can use smaller cluster sizes.
  • 00:22:54
    The Falcon model of course,
  • 00:22:56
    from Technology Innovation Institute,
  • 00:22:58
    is a very large language model, the largest open source
  • 00:23:01
    language model.
  • 00:23:03
    1 trillion tokens, just under three terabytes
  • 00:23:06
    of language data, 40 billion parameters,
  • 00:23:09
    and then 48 p4d instances.
  • 00:23:13
    So sizable cluster, and that is two months
  • 00:23:16
    to train this model.
  • 00:23:19
    And then we have another Financial large language model
  • 00:23:22
    trained on SageMaker with just under two terabytes
  • 00:23:25
    of language data.
  • 00:23:27
    And so all of these requirements,
  • 00:23:30
    are surprisingly accessible.
  • 00:23:32
    Actually, I think there are quite a few companies
  • 00:23:34
    with that volume of language data.
  • 00:23:38
    And then the capabilities that we provide on SageMaker,
  • 00:23:41
    make the training experience, again, very accessible
  • 00:23:45
    to a wide variety of companies.
  • 00:23:49
    So how do we do this?
  • 00:23:51
    If we know we have, we meet the requirements,
  • 00:23:54
    how are we gonna go about creating
  • 00:23:56
    and pre-training these foundation models on AWS?
  • 00:23:59
    So the first step is just gathering and accessing that data.
  • 00:24:03
    And again, we want at least, I'd say one terabyte
  • 00:24:07
    of your own language data.
  • 00:24:09
    So this is documents, digitized PDFs, conversations,
  • 00:24:15
    you know, language streams like rich,
  • 00:24:18
    rich, robust language data.
  • 00:24:20
    So you wanna gather about one terabyte
  • 00:24:22
    of this language data.
  • 00:24:23
    Many firms will then pair that with open source data
  • 00:24:27
    actually, so that your model understands both
  • 00:24:30
    the nuances of your company's acronyms, and history,
  • 00:24:35
    and phrasing, and domain expertise,
  • 00:24:37
    but also knows what time the sunrises in Honolulu.
  • 00:24:41
    Because of course we want that mix of the general,
  • 00:24:44
    sort of open source knowledge,
  • 00:24:46
    but also what's specific to your company.
  • 00:24:49
    And so, that's gathering and storing the information.
  • 00:24:52
    After that, you'll pre-process your data.
  • 00:24:56
    SageMaker also has a really nice capability
  • 00:24:59
    for pre-processing datasets.
  • 00:25:01
    Actually one of our builders, Jenny over here,
  • 00:25:04
    helped me run many pre-processing
  • 00:25:06
    and data transformation jobs on SageMaker.
  • 00:25:10
    And so you can use our training job API,
  • 00:25:14
    including that remote function that we just learned about,
  • 00:25:17
    to run jobs in parallel, which are then tokenizing
  • 00:25:21
    and pre-processing.
  • 00:25:22
    So this core, sort of training job construct,
  • 00:25:26
    is applicable both for creating new models from scratch,
  • 00:25:29
    and also for for general data transformation
  • 00:25:32
    and general processing.
  • 00:25:33
    So you'll pre-process your data sets,
  • 00:25:36
    and then you'll optimize those data sets using
  • 00:25:38
    your preferred data storage.
  • 00:25:40
    We see a lot of customers using FSX for Luster.
  • 00:25:44
    This is because you can store your data in one place
  • 00:25:46
    and then easily attach this volume to training job runs.
  • 00:25:51
    So as you're iterating through different model sizes,
  • 00:25:54
    and different infrastructure, and experimental choices,
  • 00:25:58
    you can use and store your data in the same place.
  • 00:26:03
    After this, customers will then need to develop and iterate
  • 00:26:06
    over their training scripts.
  • 00:26:08
    And the elasticity that you get with the infrastructure
  • 00:26:11
    on SageMaker is beautiful.
  • 00:26:13
    You can use and run tiny instances.
  • 00:26:16
    So the T3 medium and the T2, that Werner shared with us
  • 00:26:20
    this morning.
  • 00:26:21
    So the T3 medium is a great choice for notebook instances,
  • 00:26:25
    very cost effective, very small machine.
  • 00:26:28
    And then you can scale that up with a click
  • 00:26:30
    of a couple buttons, to a small GPU, for example,
  • 00:26:34
    the G4 or the G5 series,
  • 00:26:38
    which your teams can then develop on,
  • 00:26:40
    and get the nuances working in their training loop.
  • 00:26:44
    And then ultimately scale out in the same platform,
  • 00:26:47
    in the same service, to hundreds and thousands of GPUs.
  • 00:26:53
    And so that's that step from, that move from step four
  • 00:26:56
    to step five, where you're developing and testing
  • 00:26:59
    on increasingly larger instances,
  • 00:27:02
    and then ultimately scaling up and using the massive
  • 00:27:06
    training infrastructure that SageMaker provides.
  • 00:27:10
    And then of course you'll evaluate the model artifact
  • 00:27:13
    step by step, and the way that SageMaker holds onto
  • 00:27:17
    the metadata, holds onto your scripts,
  • 00:27:20
    holds onto your hyper parameters, stores all of your
  • 00:27:23
    artifacts in S3, makes it so easy to just look up
  • 00:27:27
    your previous work.
  • 00:27:29
    So, I know if you're trying to capture an experiment
  • 00:27:32
    that you ran six months ago,
  • 00:27:34
    or even three years ago, as long as it was in AWS,
  • 00:27:38
    then you can easily go look up the results of that job,
  • 00:27:42
    capture some of the artifacts,
  • 00:27:44
    and then run a new experiment.
  • 00:27:46
    And so, at a high level, that's how you can pre-train
  • 00:27:49
    foundation models on AWS.
  • 00:27:53
    And again, all of this is possible because of the
  • 00:27:56
    distributed training libraries that we provide
  • 00:27:59
    on Amazon SageMaker.
  • 00:28:00
    So these are capabilities that we've been building
  • 00:28:03
    for many years.
  • 00:28:04
    Including data parallel and model parallel
  • 00:28:07
    distributed training libraries that give you efficiency
  • 00:28:10
    and enhancements.
  • 00:28:11
    So model parallel, is a way to distribute a neural network
  • 00:28:16
    over multiple accelerators and GPUs,
  • 00:28:19
    providing optimized performance.
  • 00:28:21
    And then our data parallel package,
  • 00:28:23
    will let you actually make copies of your model
  • 00:28:26
    across a large cluster.
  • 00:28:28
    And then we're delivering custom communication collectives
  • 00:28:31
    actually, that are optimized for the AWS network topology,
  • 00:28:35
    to save you up to 40% in the overall training time.
  • 00:28:39
    And so this is after many years
  • 00:28:41
    of innovation at this layer in the stack.
  • 00:28:44
    And again, all of this is available through SageMaker.
  • 00:28:48
    And customers agree with us.
  • 00:28:50
    So as you heard from Swami's keynote yesterday,
  • 00:28:54
    Aravind Srinivas, CEO of Perplexity AI,
  • 00:28:58
    is happily using SageMaker, and in particular
  • 00:29:02
    the data and model parallel training libraries,
  • 00:29:06
    again, to get that optimized performance
  • 00:29:08
    in particular in the Hyperpod mode.
  • 00:29:13
    Another feature of SageMaker that I find really handy,
  • 00:29:17
    is Warm Pools.
  • 00:29:18
    And so, the training job API, again,
  • 00:29:22
    is creating infrastructure when you train a model.
  • 00:29:26
    So when you call model.fits, or when you run
  • 00:29:29
    that Python training script,
  • 00:29:31
    we actually turn on our instances at the same time.
  • 00:29:34
    And so that call to create the cluster,
  • 00:29:37
    and to execute the scripts, are coupled,
  • 00:29:40
    they happen together.
  • 00:29:41
    And now again, this is really useful for cost efficiency,
  • 00:29:46
    so that when the job fails, because I forgot to point
  • 00:29:50
    to the right Luster volume,
  • 00:29:52
    that instance isn't sitting up there charging me money,
  • 00:29:54
    right, it turns off.
  • 00:29:56
    So it's extremely compute efficient.
  • 00:29:58
    However, as a dove, that can be challenging,
  • 00:30:01
    because I don't wanna wait eight minutes
  • 00:30:03
    just to ship a new line of code.
  • 00:30:06
    And so we launched, last year, our Warm Pools feature,
  • 00:30:10
    that lets you run new jobs, using the same image,
  • 00:30:14
    in seconds.
  • 00:30:15
    And so as a developer it's extremely handy,
  • 00:30:19
    because you can make just one, two, three line edits in your
  • 00:30:22
    training script, and then just run the job in seconds.
  • 00:30:25
    And so the Warm Pool feature is incredibly useful
  • 00:30:28
    for developing with the SageMaker training API.
  • 00:30:33
    Another core feature of SageMaker,
  • 00:30:36
    is the ability to use many different types of instances
  • 00:30:39
    and have a lot of flexibility with the underlying
  • 00:30:42
    infrastructure, where you're trying to run your scripts.
  • 00:30:45
    One of these, of course, is custom accelerators from AWS.
  • 00:30:49
    And so the Trainium and Inferentia capabilities
  • 00:30:52
    are both available on SageMaker.
  • 00:30:55
    And you're seeing a lot of cost performance,
  • 00:30:59
    relative to comparable Amazon EC2 instances,
  • 00:31:02
    up to 46% with Trainium one, relative to Llama two,
  • 00:31:07
    and so you'll see even better performance with Trainium two,
  • 00:31:10
    which was just recently announced.
  • 00:31:12
    And so, in the demo today, actually, we're gonna take a look
  • 00:31:15
    at Trainium on SageMaker.
  • 00:31:19
    So what is this demo?
  • 00:31:20
    So we're gonna be pre-training Llama two,
  • 00:31:23
    of course I got a little visual for you.
  • 00:31:25
    So this is a cartoon Llama with sunglasses
  • 00:31:28
    on the Las Vegas strip from Bedrock.
  • 00:31:30
    And so we're gonna pre-train a 7 billion parameter
  • 00:31:33
    Llama on SageMaker.
  • 00:31:36
    Now, why are we going to do this?
  • 00:31:38
    Why is this a useful exercise?
  • 00:31:40
    Again, this is assuming I have at least a few hundred
  • 00:31:45
    gigabytes of custom data.
  • 00:31:47
    So really a sizable data set
  • 00:31:49
    of my own language data.
  • 00:31:52
    And then again, this is knowledge
  • 00:31:54
    that's not generally available online.
  • 00:31:57
    And so it's my own proprietary data set.
  • 00:32:00
    So it's knowledge that wouldn't generally be found
  • 00:32:02
    in say, a Wikipedia archive.
  • 00:32:05
    This then drives in-domain accuracy.
  • 00:32:09
    And so this small model will be very surprisingly accurate
  • 00:32:13
    within that domain.
  • 00:32:15
    Again, it won't know everything under the sun,
  • 00:32:17
    but it will have a surprising amount of accuracy,
  • 00:32:21
    again, in that dataset, and in that domain
  • 00:32:24
    where you're training it.
  • 00:32:25
    This of course then drives, as I mentioned earlier,
  • 00:32:28
    ownership, it drives flexibility, and then that lets you,
  • 00:32:32
    again, use serverless hosting,
  • 00:32:34
    and then ultimately cost reduction opportunities.
  • 00:32:37
    And again, how are we gonna do this?
  • 00:32:39
    So I have some example notebooks, I'm gonna walk
  • 00:32:41
    through different instances, again, that T3 medium,
  • 00:32:45
    the Train one, optimize large scale data
  • 00:32:48
    stored on FSX for Luster, and then some Warm Pools,
  • 00:32:51
    and then, again, that distributed training infrastructure.
  • 00:32:54
    So let's check this out.
  • 00:33:03
    All right, so here we are.
  • 00:33:06
    I'm starting with the example notebook.
  • 00:33:09
    So this is a publicly available GitHub repository
  • 00:33:13
    that is using Neuron X,
  • 00:33:15
    which is from the Amazon Annapurna ML team,
  • 00:33:18
    and then Nemo Megatron actually, which is from Nvidia.
  • 00:33:21
    So this is the Nvidia distributed training framework,
  • 00:33:25
    Nemo Megatron.
  • 00:33:26
    And then we're gonna be running that
  • 00:33:27
    on Tranium accelerators.
  • 00:33:31
    This also uses Pytorch and then the core
  • 00:33:34
    torchrun framework.
  • 00:33:39
    All right, so first I am running this on a notebook instance
  • 00:33:44
    actually, so this is a SageMaker notebook instance.
  • 00:33:47
    This is my sturdy M four instance.
  • 00:33:51
    And it's handy because again, you can create a docker image.
  • 00:33:56
    So what I'm gonna do here on my notebook instance,
  • 00:33:58
    is I'm pointing to a deep learning container.
  • 00:34:03
    So that's a fully managed deep learning container
  • 00:34:06
    that we build at AWS, and track, and manage, and update,
  • 00:34:10
    as the software frameworks change.
  • 00:34:12
    We manage these deep learning containers,
  • 00:34:14
    and then you can inherit our container in your docker files
  • 00:34:19
    and build on top of it.
  • 00:34:20
    And so it's an easy way for you to make sure
  • 00:34:22
    that your scripts will work in the AWS framework.
  • 00:34:25
    So, first I'm gonna build that docker image.
  • 00:34:29
    Then I'm setting up FSX for Luster for my optimized runs.
  • 00:34:33
    I'm gonna prepare my data sets using tokenization
  • 00:34:37
    on the SageMaker training API,
  • 00:34:40
    converting the Hugging Face weights to the Neo framework.
  • 00:34:43
    Then I'm gonna train Llama two.
  • 00:34:46
    So again, just a really simple docker image,
  • 00:34:49
    I'm just importing SageMaker,
  • 00:34:54
    pointing to the deep learning container accounts,
  • 00:34:57
    grabbing that image,
  • 00:34:59
    and then this is a nice sh script that just builds
  • 00:35:02
    a docker image.
  • 00:35:03
    So I've got my docker image locally,
  • 00:35:05
    and then this is pushed to,
  • 00:35:07
    what's called the Elastic Container Registry,
  • 00:35:09
    so another AWS service that just hosts your docker images.
  • 00:35:13
    And so I'm pushing my docker image,
  • 00:35:16
    which is on my notebook instance locally,
  • 00:35:18
    up to ECR in AWS.
  • 00:35:21
    And so that's this process.
  • 00:35:23
    Great.
  • 00:35:25
    So I have my docker image hosted.
  • 00:35:28
    My second step is setting up FSX for Luster.
  • 00:35:32
    So this is using a cloud formation template
  • 00:35:35
    to deploy Luster in my account.
  • 00:35:38
    You can also do this through the console.
  • 00:35:41
    Luster sets up a two-way data repository
  • 00:35:44
    with your S3 bucket.
  • 00:35:46
    So as long as you have data sitting in your S3 bucket,
  • 00:35:49
    you can enable a two-way data repository.
  • 00:35:52
    So data will be copied through the service.
  • 00:35:56
    The metadata will be copied through the service,
  • 00:35:58
    from the bucket, to Luster,
  • 00:36:00
    and then as you write files to Luster,
  • 00:36:04
    those are then copied back to S3.
  • 00:36:06
    So you can download it from S3.
  • 00:36:08
    You can also mount the Luster volume
  • 00:36:10
    to your notebook instance, as long as the networking
  • 00:36:14
    is aligned.
  • 00:36:14
    And then you can view the contents of that volume directly
  • 00:36:19
    from your notebook instance.
  • 00:36:20
    So I'll set this up here.
  • 00:36:22
    Again, just downloading the cloud formation template,
  • 00:36:26
    and then creating the stack in the relevant AZ.
  • 00:36:31
    Once the stack has been created,
  • 00:36:33
    then I'm gonna prep my dataset.
  • 00:36:35
    And so my dataset, again, is for LLM training,
  • 00:36:38
    using that docker image, using the SageMaker SDK,
  • 00:36:42
    and creating a lot of hyper parameters.
  • 00:36:45
    Some hyper parameters for the dataset pointer,
  • 00:36:49
    the model pointer, some of my keys,
  • 00:36:52
    I'm pointing to FSX for Luster right here,
  • 00:36:56
    and then setting up the file system input.
  • 00:36:59
    And then as Gal mentioned, here is the Pytorch API.
  • 00:37:03
    Now this is, again, a very complex example.
  • 00:37:07
    If you are new to SageMaker, I would give you
  • 00:37:09
    a really easy one. (chuckling)
  • 00:37:10
    And so there are a lot of very accessible examples,
  • 00:37:15
    where you can train a model in like two lines of code,
  • 00:37:18
    basically, using Hugging Face, actually.
  • 00:37:21
    But certainly the Python training scripts,
  • 00:37:24
    you can bring your own packages and requirements.txt,
  • 00:37:27
    and be on your way very, very quickly.
  • 00:37:30
    This is a complex example,
  • 00:37:31
    but we have very simple ones that you can use
  • 00:37:33
    to get started.
  • 00:37:34
    So in any case, I'm importing a pointer to that container.
  • 00:37:39
    Actually, this is, this Python object is pointing
  • 00:37:43
    to the deep learning containers for PyTorch.
  • 00:37:46
    And then this API is letting me define my own
  • 00:37:50
    pre-processing function right here, in the script.
  • 00:37:54
    I'm pointing to a source directory,
  • 00:37:57
    which can be a local file, can also be a Git repository.
  • 00:38:01
    And then I'm defining my infrastructure right here.
  • 00:38:05
    So this is one, Trainium actually, dot 32 xl.
  • 00:38:10
    And so I'm running this job here.
  • 00:38:13
    And again, that's my pre-processing dataset.
  • 00:38:15
    And I'm gonna jump to Llama here,
  • 00:38:18
    'cause we're a little short on time.
  • 00:38:19
    And so, once I've pre-processed the data
  • 00:38:22
    and stored it on Luster,
  • 00:38:25
    which again, then will replicate back in S3,
  • 00:38:27
    after that, I'm going to train my model
  • 00:38:31
    on that same Luster volume.
  • 00:38:32
    And Luster is useful because it mounts in seconds.
  • 00:38:36
    So when you're working with large scale data sets,
  • 00:38:38
    of course it can be very computationally and time intensive
  • 00:38:41
    to copy them.
  • 00:38:43
    And streaming can be a little bit challenging to set up.
  • 00:38:46
    So Luster is a nice way to store your data
  • 00:38:49
    in a high performance file system,
  • 00:38:52
    which you can then mount in seconds using as many jobs
  • 00:38:56
    as you want to, because the bandwidth scales,
  • 00:38:59
    actually, is a function of mounts.
  • 00:39:01
    So Luster is a great high performance data store.
  • 00:39:03
    So in any case, we're gonna set this up.
  • 00:39:07
    And then same as last time, pointing to all
  • 00:39:10
    of my hyper parameters, specifying that I'm using,
  • 00:39:14
    again, four instances.
  • 00:39:16
    Each of these instances has 32 accelerators.
  • 00:39:20
    Again, the Trainium accelerators.
  • 00:39:22
    All of the hyper parameters for the 7 billion parameters
  • 00:39:27
    that we're gonna train here, my FSX for Luster pointer,
  • 00:39:32
    and then I'm gonna launch my training job.
  • 00:39:34
    So, setting in the rest of my hyper parameters,
  • 00:39:37
    again pointing to that PyTorch API, loading in my scripts,
  • 00:39:42
    and then I call model.fits.
  • 00:39:44
    And as promised, all of the content for this job,
  • 00:39:49
    is loaded directly in the SageMaker control plane.
  • 00:39:52
    So I see exactly when I started this, what the status was,
  • 00:39:56
    where the artifacts are, when I ran this.
  • 00:39:59
    I can view the outputs, I can step through the logs
  • 00:40:03
    and see every piece of information that I need for my model,
  • 00:40:07
    which then of course, I can download it and build
  • 00:40:09
    an entire app on top of this.
  • 00:40:12
    And so with that, I'm gonna hand over the stage
  • 00:40:15
    to Tom, and he's gonna share some information
  • 00:40:18
    with you about Toyota.
  • 00:40:26
    - Great, thank you Emily.
  • 00:40:29
    So today I am going to really tell you about
  • 00:40:31
    how we're using SageMaker,
  • 00:40:33
    to accelerate machine learning at TRI.
  • 00:40:36
    And first, maybe I should tell you a little bit
  • 00:40:38
    about what TRI actually is.
  • 00:40:41
    I'll give you a couple of example of projects
  • 00:40:42
    that we have, ongoing right now.
  • 00:40:46
    The first, is a project around autonomous drift driving,
  • 00:40:56
    with a Toyota Supra.
  • 00:40:57
    And I'll just let the video play here.
  • 00:40:59
    (electronic music)
  • 00:41:03
    (tires screeching)
  • 00:41:07
    (indistinct chatter)
  • 00:41:09
    (electronic music continues)
  • 00:41:22
    (engine revving)
  • 00:41:25
    (tires screeching)
  • 00:41:29
    So that's one example.
  • 00:41:31
    And of course AI, here, helps lay the foundation,
  • 00:41:34
    for all of this work.
  • 00:41:35
    The second is, we work on a lot of challenge problems.
  • 00:41:38
    And so, there's a big robotics group
  • 00:41:40
    that focuses on challenge problems,
  • 00:41:44
    that we start in the lab, but also go out
  • 00:41:46
    to the real world environment,
  • 00:41:48
    and evaluate our systems in.
  • 00:41:50
    And so, this is an example of where we have a robotic system
  • 00:41:55
    that we built in-house, from the ground up,
  • 00:41:57
    and we're able to retrieve and stock grocery store shelves.
  • 00:42:03
    This has evolved more into the factory setting as well,
  • 00:42:05
    more recently.
  • 00:42:09
    And, we're 250 people across a few different locations.
  • 00:42:12
    So there's a team in Los Altos
  • 00:42:15
    and a team in Cambridge, Mass.
  • 00:42:17
    And there's teams also in human-centered AI,
  • 00:42:19
    and also material science as well.
  • 00:42:23
    And most recently, one of the things about generative AI
  • 00:42:25
    that we found anyway, is, in the context of robotics,
  • 00:42:28
    is that it can be applied, it can now be applied
  • 00:42:31
    to robotics, to be able to do a wide variety of tasks
  • 00:42:36
    that we never thought were possible.
  • 00:42:38
    And this is a technique called diffusion policy,
  • 00:42:41
    that is now able to learn from a few examples of a human,
  • 00:42:45
    from a human, how to perform very complicated tasks.
  • 00:42:50
    And so, building on this, the machine learning team at TRI
  • 00:42:54
    tries to build a foundation across language, vision,
  • 00:42:57
    and action.
  • 00:42:58
    Language in the sense, that both common sense knowledge,
  • 00:43:06
    and also a wider variety of applications.
  • 00:43:09
    So like, language has applications across Toyota
  • 00:43:12
    more generally, in the context of enterprise applications,
  • 00:43:15
    but also in terms of code generation as well.
  • 00:43:19
    Vision, feeds into language to give robots eyes,
  • 00:43:22
    for example, and then action, to perform a wide variety
  • 00:43:26
    of tasks across a number of different platforms.
  • 00:43:30
    But, this talk is more about SageMaker. (chuckling)
  • 00:43:33
    And so I wanna tell you about how we're using SageMaker,
  • 00:43:36
    at TRI, to really accelerate our progress.
  • 00:43:40
    And the first is sort of general experimentation.
  • 00:43:43
    Where we use sort of, one to eight instances,
  • 00:43:46
    to scale up our training jobs.
  • 00:43:49
    And, the second, is how we can take some of these ideas
  • 00:43:53
    and really scale this up very, very quickly.
  • 00:43:56
    To not just a few GPUs, but to hundreds of GPUs at a time.
  • 00:44:01
    And finally, we're also looking, we're also able
  • 00:44:04
    to use SageMaker for even more broad applications,
  • 00:44:06
    such as, like, just serving models.
  • 00:44:09
    As these are hard to serve sort of locally,
  • 00:44:11
    on a device.
  • 00:44:15
    So lemme tell you a little bit about the experimentation
  • 00:44:17
    that we do on SageMaker at TRI.
  • 00:44:22
    First, you know, here's I guess,
  • 00:44:24
    the high level, is that we have a wide variety
  • 00:44:26
    of applications that we're, models that we're training
  • 00:44:29
    on SageMaker.
  • 00:44:30
    The first, is large language models.
  • 00:44:34
    Second, is a mono depth model.
  • 00:44:35
    So taking RGB images and inferring depth, for example.
  • 00:44:39
    A third, is sort of stable diffusion.
  • 00:44:42
    Language to image generation,
  • 00:44:44
    for better feature representations.
  • 00:44:46
    And a third, a fourth, is 3D representations.
  • 00:44:50
    Such as, language, to 3D sort of structures, as well.
  • 00:44:53
    That's useful for robotics and a number
  • 00:44:54
    of other applications.
  • 00:44:57
    But, and across all of these we found,
  • 00:45:00
    SageMaker to be very useful for a number of reasons.
  • 00:45:03
    Some of the challenges that come up for us,
  • 00:45:07
    include a few things.
  • 00:45:09
    First, we wanna be able to reuse
  • 00:45:11
    existing training infrastructure and clusters
  • 00:45:13
    that we create.
  • 00:45:15
    And so, you know, Warm Pools, that you heard about earlier,
  • 00:45:19
    are one way in which to do that.
  • 00:45:21
    And we take advantage of that on a daily basis,
  • 00:45:24
    to pull back those resources
  • 00:45:25
    and continue iterating on our training jobs.
  • 00:45:29
    The second is for scaling.
  • 00:45:31
    We need to be able to go from one to many instances
  • 00:45:33
    very quickly, and also to change instance types
  • 00:45:36
    very quickly as well.
  • 00:45:40
    We also need high performance systems.
  • 00:45:41
    So, you know, and SageMaker is very well optimized
  • 00:45:46
    in the backend.
  • 00:45:48
    And finally, we need a sort of flexibility,
  • 00:45:50
    we need to run a number of different jobs
  • 00:45:54
    across all of the, sort of, science group.
  • 00:45:59
    And so, you know, I'll echo like,
  • 00:46:02
    this is the code you saw earlier.
  • 00:46:06
    I'll echo how easy it is to sort of like,
  • 00:46:08
    scale these things up.
  • 00:46:10
    You can start with one instance,
  • 00:46:11
    and you can iterate with your training job,
  • 00:46:14
    with that instance for example.
  • 00:46:16
    And, if you need to scale, it's a very simple change
  • 00:46:21
    to enable that.
  • 00:46:23
    So this can, in this case you can change the instance count
  • 00:46:26
    to like from one to eight for example,
  • 00:46:28
    and then start just scaling your runs very quickly.
  • 00:46:32
    The second, is to, as new hardware comes out,
  • 00:46:34
    as Gal mentioned, we're able to,
  • 00:46:37
    quickly change the hardware types, as well.
  • 00:46:40
    And so in this case the, we can change,
  • 00:46:43
    say, from a p4 instance to a p5,
  • 00:46:46
    which will give twice the throughput for our training jobs,
  • 00:46:49
    and reduce, sort of, the training times for us.
  • 00:46:52
    And just to give some evidence of how this looks,
  • 00:46:55
    how performant these systems are, if you look at sort of,
  • 00:46:59
    scaling across a number of instances,
  • 00:47:01
    it's almost linear, in terms of the scalability here.
  • 00:47:04
    So SageMaker has been very performant for us,
  • 00:47:07
    in terms of scaling up our training jobs as well.
  • 00:47:13
    As Emily mentioned earlier, you know, using,
  • 00:47:16
    these data sets are huge too,
  • 00:47:18
    and as we started, we started using
  • 00:47:21
    data sets of a few terabytes.
  • 00:47:25
    And, it's nice to be able to quickly start up
  • 00:47:28
    with FSX for Luster.
  • 00:47:30
    However, as we scaled our training jobs as well,
  • 00:47:32
    and the amount of data that's been, that we need,
  • 00:47:34
    for these, from, you know, a few terabytes,
  • 00:47:36
    to a half a petabyte or more,
  • 00:47:39
    we, the flexibility in SageMaker, to pull in other resources
  • 00:47:44
    like web dataset, has been really, really great,
  • 00:47:47
    and has really accelerated, sort of,
  • 00:47:49
    the training runs that we have as well.
  • 00:47:56
    And so, just to reiterate, I mean there's,
  • 00:47:59
    the group is running jobs from, you know,
  • 00:48:02
    one instance, eight instances,
  • 00:48:04
    and these are a few of the applications.
  • 00:48:07
    But, going beyond this, we're also able to scale
  • 00:48:11
    our training runs,
  • 00:48:13
    to much larger scale as well, here.
  • 00:48:16
    And so, you know, just to highlight one
  • 00:48:18
    of the ways in which we're doing that at TRI,
  • 00:48:20
    we're building sort of state of the art,
  • 00:48:23
    the question, you know, the question is:
  • 00:48:24
    How do you build state, can you build state-of-the-art
  • 00:48:26
    LLMs with SageMaker?
  • 00:48:28
    And,
  • 00:48:29
    we at TRI have been doing this.
  • 00:48:31
    We've been reproducing some of the Llama two
  • 00:48:35
    models initially,
  • 00:48:36
    to validate all of our systems.
  • 00:48:41
    And, for this, SageMaker, we need sort of
  • 00:48:45
    scalability and performance across all of these instances.
  • 00:48:48
    And what SageMaker has really provided for us,
  • 00:48:51
    is that scalability.
  • 00:48:52
    So this is the newest hardware, the H100s.
  • 00:48:55
    And, sort of linear scaling, as the sort of
  • 00:48:59
    number of nodes increases.
  • 00:49:01
    And this is, you know, so this ends up being like 256,
  • 00:49:06
    like, H100s,
  • 00:49:09
    and if you run this out, a training job like this,
  • 00:49:12
    for pre-training a Llama two model, you know,
  • 00:49:15
    can take about a week,
  • 00:49:17
    when you start scaling out to 30 instances.
  • 00:49:20
    And this is just to say, you know, with a trillion,
  • 00:49:22
    with more than a trillion tokens,
  • 00:49:25
    we can reproduce, sort of like,
  • 00:49:28
    the state-of-the-art models here.
  • 00:49:29
    And we're scaling, not only from the 7 billion parameter
  • 00:49:32
    models, up to 13, 34, 70 as well, on SageMaker right now.
  • 00:49:40
    One of the key features, or one of the nice features,
  • 00:49:43
    of SageMaker, so that you don't lose any time,
  • 00:49:46
    has also been some of the repair work.
  • 00:49:47
    I think Gal may have mentioned this earlier.
  • 00:49:50
    So your job may actually, as you scale these jobs,
  • 00:49:53
    it's often the case that like, hardware will fail.
  • 00:49:56
    And when hardware fails, you have downtime.
  • 00:49:59
    And if you have downtime,
  • 00:50:01
    it costs you money.
  • 00:50:03
    You're not training your models.
  • 00:50:06
    One of the great parts of SageMaker is that,
  • 00:50:09
    it has this option for cluster repair.
  • 00:50:11
    And so, for us, this happened in about 10 minutes,
  • 00:50:14
    like, one of the feed machines failed,
  • 00:50:15
    and the cluster came right back up and we were able to
  • 00:50:18
    continue our training run very quickly.
  • 00:50:23
    So that's pre-training.
  • 00:50:25
    And, you know, the other thing is that,
  • 00:50:29
    is sort of, on more on the side of up training.
  • 00:50:31
    Where you have a large data set,
  • 00:50:33
    and you want to, but not quite the size you'd need
  • 00:50:35
    for pre-training, and you want to
  • 00:50:38
    look at a particular domain.
  • 00:50:39
    So we, at TRI we're, you know,
  • 00:50:41
    because we're a Toyota-centric entity,
  • 00:50:45
    Japanese was sort of like, one of the areas that we were
  • 00:50:48
    very interested in.
  • 00:50:50
    And so, you know, you can take,
  • 00:50:52
    some of the state-of-the-art models,
  • 00:50:53
    which aren't actually trained for,
  • 00:50:56
    they do have a little bit of, say for example,
  • 00:50:57
    Japanese training data,
  • 00:50:58
    but they don't have that much.
  • 00:51:00
    If you go out there and acquire all of,
  • 00:51:02
    say, the open source data available,
  • 00:51:04
    you get to a hundred billion,
  • 00:51:06
    10 to a hundred billion tokens here,
  • 00:51:08
    which is enough to up train a model,
  • 00:51:11
    such as Japanese.
  • 00:51:14
    And what we found, is that, you know, taking Llama two,
  • 00:51:17
    with 13 billion parameters, and up training,
  • 00:51:22
    you gain some performance.
  • 00:51:23
    This is a win rate metric, against some of the best model,
  • 00:51:27
    closed source models.
  • 00:51:29
    But, the next step, where you actually instruction
  • 00:51:32
    fine tune the model, so this is how you get
  • 00:51:34
    large language models to follow instructions,
  • 00:51:38
    to be sort of chatty, is to fine tune them using
  • 00:51:40
    instruction fine tuning,
  • 00:51:42
    with data of this type, where the instruction
  • 00:51:44
    in the first part would be,
  • 00:51:45
    would be in the first part, and the second part would be,
  • 00:51:48
    the sort of, response you would expect.
  • 00:51:51
    And if you do that, with the additional pre-training,
  • 00:51:54
    with the additional instruction fine tuning in Japanese,
  • 00:51:57
    on some of the more performant models out there,
  • 00:51:59
    you can get state-of-the-art performance,
  • 00:52:01
    in Japanese here.
  • 00:52:03
    And so this is a much smaller model compared to say,
  • 00:52:05
    a Llama 70 B, yet still more performant, for example.
  • 00:52:11
    And so SageMaker's really enabled us to do a lot
  • 00:52:13
    of this experimentation very rapidly, at TRI.
  • 00:52:19
    The final one I just wanna mention, it's not covered
  • 00:52:21
    as much in this topic, but, you know,
  • 00:52:24
    there's also the ability to do other sort of workloads,
  • 00:52:27
    such as serving models as well.
  • 00:52:28
    And we've been leveraging SageMaker endpoints to actually,
  • 00:52:33
    you know, serve both open source models,
  • 00:52:36
    as well as the models that we have in-house,
  • 00:52:40
    internally, across TRI and maybe eventually
  • 00:52:42
    externally as well.
  • 00:52:46
    So with that, I just wanted to say,
  • 00:52:47
    there's sort of three primary areas which we are,
  • 00:52:51
    we're focused on in using SageMaker four.
  • 00:52:55
    Small scale experiments, such as,
  • 00:52:57
    one to eight nodes, for example.
  • 00:52:59
    Large scale training,
  • 00:53:00
    up to, you know, 32, 64, or more instances.
  • 00:53:07
    As well as surveying as well.
  • 00:53:10
    And so, you know, SageMaker has been very critical,
  • 00:53:13
    and important for our training of these,
  • 00:53:18
    of this variety of models, and experimentation generally.
  • 00:53:24
    And I just wanted to close, I guess, with saying like,
  • 00:53:28
    you know, it's been great working with SageMaker,
  • 00:53:31
    for training all of these models.
  • 00:53:32
    In the next time, hopefully when we come back
  • 00:53:35
    to AWS re:Invent, maybe we will have a foundation model,
  • 00:53:41
    that can be trained once and do a whole lot,
  • 00:53:43
    do many different robotics tasks,
  • 00:53:46
    in response to language and other things as well.
  • 00:53:50
    So with that, I'll end, and maybe I'll give Gal.
  • 00:53:56
    - Thank you Tom.
  • 00:53:58
    Yeah, oh.
  • 00:53:59
    We just wanted to end by showing you a couple of links,
  • 00:54:01
    QR codes to learn more about SageMaker and how to use it.
  • 00:54:06
    And thank you all for your time.
  • 00:54:08
    We'll all stand around here for a little bit longer
  • 00:54:11
    if you have any questions.
  • 00:54:12
    I actually think some members of the Smart Sifting team
  • 00:54:14
    are also here, if you have questions about that
  • 00:54:16
    and want to learn more.
الوسوم
  • Machine Learning
  • Amazon SageMaker
  • Deep Learning
  • Model Training
  • Artificial Intelligence
  • Fine-tuning
  • Pre-training
  • Robotics
  • Data Management
  • Distributed Training