What is Amazon SageMaker?

Amazon SageMaker is a fully managed service that provides tools to build, train, and deploy machine learning models quickly and efficiently.

What are the benefits of using SageMaker for training models?

SageMaker helps automate the orchestration of training jobs, monitors hardware health, offers distributed training libraries, and provides tools for monitoring and managing costs.

How does SageMaker assist with large model training?

It offers tools for easy scaling of infrastructure, enables distributed training, and ensures fault tolerance through cluster repair features.

What is smart sifting in the context of SageMaker?

Smart sifting is a technique that filters out less informative training data, potentially reducing training time and costs without impacting model accuracy.

How does Toyota Research Institute utilize SageMaker?

TRI uses SageMaker for various applications including training large language models, robotics projects, and serving models in production.

What advancements have impacted deep learning in recent years?

Significant algorithmic improvements, particularly the introduction of transformer architecture, along with increased datasets, model sizes, and computational resources.

How can one get started with SageMaker?

Users can easily initiate a training job by using the estimator API, which requires minimal setup and configuration.

What types of customization can be done on large language models?

Models can be customized through techniques like prompt engineering, retrieval augmented generation, fine-tuning, and pre-training.

What role does model parallel training play in SageMaker?

Model parallel training allows the distribution of a neural network over multiple GPUs or accelerators to optimize performance.

Can existing models be fine-tuned on SageMaker?

Yes, SageMaker allows users to fine-tune third-party models on their data securely without exposing their data to the model provider.

AWS re:Invent 2023 - Train and tune state-of-the-art ML models on Amazon SageMaker (AIM335)

00:54:18

https://www.youtube.com/watch?v=i2-M7x9dJXQ

Resumo

TLDRThe presentation highlights the evolution and challenges of training state-of-the-art machine learning models using Amazon SageMaker. Gal Oshri introduces the session and discusses the growing interest in deep learning applications, along with the benefits of using SageMaker to handle large-scale model training challenges, including infrastructure orchestration, data management, and cost efficiency. Emily Webber elaborates on fine-tuning and pre-training large language models with practical demonstrations, emphasizing the ease of use and scalability of SageMaker. Tom Kollar shares insights from the Toyota Research Institute, detailing their application of SageMaker for various machine learning tasks, particularly in robotics, and highlights the importance of robust training infrastructure. The session underscores advancements like smart data sifting and distributed training, showcasing SageMaker as a versatile tool for model development.

Conclusões

👤 Gal Oshri introduces SageMaker and its capabilities.
📊 Emily discusses fine-tuning large language models with a live demo.
🚗 Tom shares insights from Toyota Research on using SageMaker in robotics.
⚙️ The challenges of training large models include hardware utilization and cost efficiency.
📦 Amazon SageMaker streamlines the training process with its estimator API.
📈 Smart sifting can reduce training time by filtering uninformative samples.
💾 SageMaker allows seamless integration of various data sources for training.
🔧 The importance of cluster repair features for uninterrupted training.
🌐 Advances in transformer architecture improve model performance significantly.
🔍 Customers can securely customize third-party models without exposing their data.

Linha do tempo

00:00:00 - 00:05:00
Gal Oshri introduces himself and colleagues, presenting on training and tuning state-of-the-art ML models on AWS SageMaker. The audience is involved with a count of how many are currently training ML models, especially on larger scales. The talk outlines the agenda, including challenges in large-scale model training, demonstrations, and research use cases.
00:05:00 - 00:10:00
Machine learning is showcased as a versatile tool across various applications like recommendations and autonomous driving, highlighting recent advances in image generation from ML models. Gal discusses how algorithmic improvements, particularly the transformer architecture, and increased availability of data and compute power have contributed to the enhancement of model outputs over recent years.
00:10:00 - 00:15:00
Challenges in training large-scale ML models are detailed, including the need for efficient hardware, proper orchestration, dataset management, scaling infrastructure, and cost management. Gal emphasizes the importance of optimizing both financial costs and team resources, advocating for effective use of SageMaker in addressing these challenges.
00:15:00 - 00:20:00
Amazon SageMaker is presented as a solution to the aforementioned challenges. A high-level overview of SageMaker's operation is provided, detailing how it streamlines the training process through a structured API that manages compute resources, health checks, and data integration, ensuring efficient model training and cost-effectiveness.
00:20:00 - 00:25:00
SageMaker’s capabilities for loading data, using built-in algorithms, and providing distributed training options were discussed. Features like logging, checkpoint synchronization, and instance management further highlight how SageMaker can facilitate efficient training processes while ensuring resiliency against failures during training jobs.
00:25:00 - 00:30:00
Training model efficiency and performance tracking using SageMaker tools like the profiler were discussed. The profiler helps optimize GPU usage and performance issues, which is critical for minimizing the overall training costs and times through better hardware utilization.
00:30:00 - 00:35:00
Emily introduces the concept of smart sifting—a new feature in SageMaker aimed at improving training efficiency by refining data during training, thus reducing overall training time and costs up to 35% without negatively impacting model accuracy. She presents how to implement this feature simply using SageMaker's framework.
00:35:00 - 00:40:00
The introduction of Amazon SageMaker HyperPod is announced for managing large-scale training instances while allowing easier access and control over the training environment, providing managed benefits without sacrificing performance with features such as resilience and reduced setup time.
00:40:00 - 00:45:00
The discussion shifts to fine-tuning models, emphasizing how it can be executed on proprietary models without exposing sensitive data. New security enhancements in SageMaker facilitate this process while maintaining the model’s confidentiality, showcasing an efficient end-to-end workflow.
00:45:00 - 00:54:18
Emily transitions to her segment focused on fine-tuning and pre-training large language models (LLMs) on SageMaker. She explains various customization techniques for LLMs, from simpler adjustments like prompt engineering to more complex methods like retrieval-augmented generation and pre-training new foundation models.

Mostrar mais

Mapa mental

Vídeo de perguntas e respostas

What is Amazon SageMaker?
Amazon SageMaker is a fully managed service that provides tools to build, train, and deploy machine learning models quickly and efficiently.
What are the benefits of using SageMaker for training models?
SageMaker helps automate the orchestration of training jobs, monitors hardware health, offers distributed training libraries, and provides tools for monitoring and managing costs.
How does SageMaker assist with large model training?
It offers tools for easy scaling of infrastructure, enables distributed training, and ensures fault tolerance through cluster repair features.
What is smart sifting in the context of SageMaker?
Smart sifting is a technique that filters out less informative training data, potentially reducing training time and costs without impacting model accuracy.
How does Toyota Research Institute utilize SageMaker?
TRI uses SageMaker for various applications including training large language models, robotics projects, and serving models in production.
What advancements have impacted deep learning in recent years?
Significant algorithmic improvements, particularly the introduction of transformer architecture, along with increased datasets, model sizes, and computational resources.
How can one get started with SageMaker?
Users can easily initiate a training job by using the estimator API, which requires minimal setup and configuration.
What types of customization can be done on large language models?
Models can be customized through techniques like prompt engineering, retrieval augmented generation, fine-tuning, and pre-training.
What role does model parallel training play in SageMaker?
Model parallel training allows the distribution of a neural network over multiple GPUs or accelerators to optimize performance.
Can existing models be fine-tuned on SageMaker?
Yes, SageMaker allows users to fine-tune third-party models on their data securely without exposing their data to the model provider.

Ver mais resumos de vídeos

Obtenha acesso instantâneo a resumos gratuitos de vídeos do YouTube com tecnologia de IA!

Legendas

Rolagem automática:

00:00:00
- Good afternoon everyone.
00:00:01
My name is Gal Oshri, I'm a product manager
00:00:03
at AWS working on SageMaker.
00:00:06
I'm here with Emily Webber and Thomas Kollar,
00:00:08
to talk to you about training and tuning state-of-the-art
00:00:11
machine learning models on Amazon SageMaker.
00:00:14
Before we start, how many of you're already training
00:00:16
machine learning models today?
00:00:20
Awesome.
00:00:20
How many of you're training models
00:00:21
with more than 10 GPUs or accelerators?
00:00:25
All right, anyone with more than a hundred?
00:00:29
Alright, a thousand?
00:00:31
No.
00:00:32
Alright, cool, well, today we'll learn a bit about that.
00:00:37
So we'll talk about the challenges,
00:00:39
for training large scale machine learning models.
00:00:42
And then we'll talk about how SageMaker
00:00:43
can help you train those models.
00:00:45
Emily will then talk to you about fine tuning
00:00:47
and pre-training large language models,
00:00:50
and show you a demo, training Llama 7 B on SageMaker.
00:00:54
And then we'll hear from Tom,
00:00:55
about Toyota Research Institute
00:00:57
and their machine learning use cases.
00:01:03
So machine learning has already proven itself useful
00:01:05
across a wide range of applications.
00:01:07
From recommendations, to credit risk prediction,
00:01:10
and autonomous driving, to document analysis.
00:01:14
But recently there's been an explosion in interest
00:01:16
in deep learning models for computer vision
00:01:19
and natural language processing.
00:01:22
Just a few years ago, this is the type of image you would
00:01:25
get if you tried to generate a very clean living room
00:01:28
with a machine learning model.
00:01:30
You can tell that it's fake, it's not really coherent,
00:01:32
you can't tell that it's a living room.
00:01:36
And just a few years later, we can now generate images
00:01:39
like this, where you'd have to look really closely
00:01:42
to tell that it's not a real image.
00:01:44
And I showed something similar a year ago at re:Invent,
00:01:47
and at that time this was kind of shocking, right,
00:01:49
seeing this type of image get generated from a model.
00:01:52
But a year later, and I think many people in the audience
00:01:54
have already seen these types of images get generated,
00:01:57
and the quality that you can get
00:01:58
with machine learning models.
00:02:01
So how did this happen?
00:02:03
Well first, there were notable algorithmic improvements
00:02:05
over the last few years.
00:02:07
Specifically the transformer architecture,
00:02:09
which is used in many of the large scale models
00:02:12
that you hear about today.
00:02:14
However, there's also an increase in the data sets,
00:02:17
the model sizes, and the amount of compute that is used to
00:02:20
train these models.
00:02:22
And a lot of the research shows that we can continue
00:02:24
increasing these dimensions,
00:02:26
to get better and better results.
00:02:29
So to be competitive, you really have to think about
00:02:31
how do you leverage the these advancements
00:02:34
to provide the best experiences
00:02:35
for your customers with machine learning.
00:02:39
Okay, so training large scale models is awesome.
00:02:41
Let's just train the biggest one immediately
00:02:43
and be done with it, right?
00:02:45
But, it's a bit more complicated than that.
00:02:47
There's some challenges.
00:02:49
The first, is that you want to use the latest hardware.
00:02:52
Every few years there are innovations in hardware
00:02:54
that lead to two to nine x improvements in training
00:02:57
efficiency.
00:02:59
But it's not enough to get access to the latest hardware,
00:03:02
you have to think about how well it works.
00:03:04
Is it, like fault resistant enough, to be able to let you
00:03:08
to continue your training with minimal interruptions
00:03:10
to the machine learning team?
00:03:13
You have to think about orchestration,
00:03:15
and how to most effectively use the
00:03:16
resources you have available.
00:03:18
Especially if you have a large team of data scientists
00:03:21
who want to train many models in parallel.
00:03:24
We talked about how you want to have larger data sets,
00:03:27
and being able to store, load, and process
00:03:29
these large data sets, can require a lot of work.
00:03:33
And there are a lot of pitfalls in doing that.
00:03:36
You wanna think about scaling up.
00:03:38
Both the infrastructure, to get more compute for training
00:03:41
the model, as well as the algorithms that you use.
00:03:44
The models that we train today, for these use cases,
00:03:47
do not fit on a single accelerator.
00:03:49
So you have to think about the algorithms
00:03:50
that you need to use to scale up.
00:03:53
And finally, we have to think about cost.
00:03:56
Training these models can cost hundreds of thousands,
00:03:58
or millions of dollars.
00:04:00
So you need to think about efficiency when you're training
00:04:02
those models.
00:04:03
Especially at the beginning, when you're doing
00:04:04
sporadic experimentation, you're trying out different ideas,
00:04:08
and you don't use the hardware all the time.
00:04:11
Right, so you want to think about how you use that
00:04:12
efficiently.
00:04:14
And it's not just the financial cost,
00:04:15
but the team's time, right?
00:04:17
A lot of customers tell us, that making sure that their
00:04:20
ML engineers are not spending time dealing
00:04:22
with infrastructure, is one of their top priorities.
00:04:27
But not all hope is lost.
00:04:29
Amazon SageMaker can help with many of these challenges.
00:04:34
I'll give a high level overview of how SageMaker works,
00:04:37
but you'll see it in a lot more detail during Emily's demo.
00:04:41
We start, by calling the create training job API.
00:04:45
This Sage API captures information about your dataset,
00:04:48
your compute resource configuration,
00:04:50
as well as the training algorithm that you want to use.
00:04:54
SageMaker will then set up the cluster
00:04:56
for training the model,
00:04:58
with the right VPC and networking configurations,
00:05:01
by default, to save you a lot of time,
00:05:03
but you can configure all of it yourself as well
00:05:06
and add the flexibility that you need.
00:05:09
As part of spinning up the cluster,
00:05:11
SageMaker will also run health checks on the hardware,
00:05:14
to make sure that everything is working effectively.
00:05:16
before the job even begins, and before the billing starts.
00:05:20
So this saves you time and money,
00:05:22
to make sure that the training can continue efficiently.
00:05:28
SageMaker will then load data from S3, EFS,
00:05:31
or FSX for Luster, and you have options to either copy
00:05:35
or stream the data.
00:05:36
Depending on your dataset size, one of those might be
00:05:38
more applicable.
00:05:40
But again, and you'll hear this theme again, again,
00:05:42
while SageMaker provides great options for
00:05:45
getting started and moving quickly,
00:05:47
you also have the flexibility to do what you want,
00:05:50
and load data from other sources.
00:05:54
You can then download the training image from ECR,
00:05:57
and you have options of built-in algorithms in SageMaker.
00:06:01
You can use one of the SageMaker deep learning containers
00:06:03
to quickly use PyTorch, TensorFlow, or Hugging Face,
00:06:06
or you can bring your own training image,
00:06:09
with your own algorithms completely.
00:06:12
SageMaker also offers distributed training libraries
00:06:14
that can help accelerate both data, and model parallel
00:06:17
training, and you'll hear more about that later.
00:06:22
SageMaker, then starts a training, and streams the logs
00:06:24
to CloudWatch throughout training.
00:06:27
It stores the metadata and hyper parameters,
00:06:29
so you can view it later.
00:06:30
And you have, again, options for using TensorBoard,
00:06:32
and other tools, to visualize your experiments.
00:06:37
It will synchronize your checkpoints throughout training,
00:06:39
to your storage, which is critical, if you are, you know,
00:06:43
you want to be fault resistant in case something fails
00:06:46
during training, you don't want to lose your progress
00:06:48
until that point.
00:06:51
At the end of the training, SageMaker will save the model
00:06:54
and other output data so you can revisit it later.
00:06:58
At the end of the training, SageMaker spins down all the
00:07:01
compute, so that if the job fails at 3:00 AM,
00:07:04
no one has to wake up to turn anything off
00:07:06
and make sure that you're not paying for all that hardware,
00:07:09
all those instances running,
00:07:10
without being used to train a model.
00:07:13
And with the same paradigm,
00:07:15
we can actually scale up our training
00:07:17
to many more instances really easily,
00:07:19
to get those large scale models.
00:07:23
One really awesome feature that we launched this year,
00:07:26
is a cluster repair feature.
00:07:28
So if during the training, one of the instances fails,
00:07:31
we look at what happened to that instance
00:07:34
and decide whether we need to reboot it
00:07:36
or replace it with a different instance,
00:07:38
and then restart the training within a few minutes.
00:07:41
So there are all these resiliency capabilities
00:07:43
to ensure the training continues as quickly as possible
00:07:46
and without manual intervention.
00:07:52
In case any of that sounds intimidating,
00:07:54
the good news, it's actually really easy to get started.
00:07:57
The most important code for converting model training
00:07:59
to a SageMaker training job, is the estimator API.
00:08:04
You'll see more in the demo later,
00:08:05
but at a high level, the API takes a a Python file
00:08:09
or an entry point, in this case cifar10.pi,
00:08:13
which is very similar to how I would do the model training
00:08:16
on my laptop.
00:08:18
I also provide the instance type I want to use,
00:08:21
how many of them, and hyper parameters,
00:08:24
which I can easily change later
00:08:25
to try additional training jobs.
00:08:28
I also add metric definitions, so that I can view
00:08:31
those metrics in CloudWatch during the training.
00:08:35
Finally, I provide a path to my data and call estimator.fit.
00:08:42
Now the even better news, is that we recently made it
00:08:44
even easier to get started.
00:08:47
Now, you can take your existing Python code
00:08:50
and add the remote python decorator to it,
00:08:53
to immediately serialize the runtime,
00:08:56
the packages, functions, and everything else,
00:08:58
so that it runs as a SageMaker training job
00:09:01
without even having to learn about the estimator API.
00:09:07
Once the training job begins,
00:09:08
you can easily view the metadata and reproduce it later,
00:09:12
or clone that training job.
00:09:14
And you know, like tracking experiments
00:09:15
is extremely important.
00:09:17
You want to learn from what you've done before.
00:09:19
And we often see people who keep the training results
00:09:22
in a spreadsheet, or even in a document,
00:09:24
and pass it around the team,
00:09:25
but that makes it more difficult to collaborate
00:09:28
and learn from past experiments.
00:09:30
So by automatically keeping all of this in one place,
00:09:33
it becomes much easier to learn from your mistakes
00:09:36
and build better models.
00:09:42
Now let's move on from tracking the training,
00:09:44
to improving the performance,
00:09:46
specifically the training speed, which impacts how much time
00:09:49
you end up requiring to like use the instances
00:09:52
to train the model, and the overall project completion time,
00:09:56
as well as the cost.
00:09:58
Now, the SageMaker profiler, is an ML observability tool
00:10:01
that enables you to understand hardware utilization
00:10:04
and root cause performance issues
00:10:06
to maximize the efficiency of your model training.
00:10:09
On the dashboard shown here, we can see some overall metrics
00:10:13
around the GPU usage, and you want that to be as high
00:10:16
as possible, as well as the GPU usage throughout
00:10:19
the training job, across each individual node
00:10:22
within your cluster.
00:10:24
So you can see that even if your utilization overall might
00:10:27
be high, within some intervals, there might be
00:10:29
low utilization that you want to check out a bit more.
00:10:34
Lower down on the dashboard,
00:10:35
there are other metrics.
00:10:36
For example, the total time spent on each GPU kernel.
00:10:40
So that gives you additional hints about what you want
00:10:42
to optimize next, to further improve your training.
00:10:48
There's another page in the profiler,
00:10:49
showing a more detailed timeline view,
00:10:52
that allows you to get data from your host and devices
00:10:56
at all the different levels,
00:10:57
so you can dig deeper to understand
00:10:59
what is happening at each point.
00:11:04
Now I'm excited to announce a preview of a new capability
00:11:07
in SageMaker, smart sifting of data.
00:11:10
Smart sifting is an online data refinement technique
00:11:12
that can reduce your deep learning training time and cost
00:11:15
by up to 35%.
00:11:18
Now when you train your models,
00:11:20
and we talked about wanting to use larger data sets,
00:11:23
it also matters about the quality of the data sets.
00:11:26
And, some samples in your data might be less informative
00:11:30
to your model training,
00:11:31
or you might have seen those samples already.
00:11:33
There might be duplicate data or similar data.
00:11:36
And it's often difficult to pre-process that data
00:11:39
and remove the data that you don't want
00:11:41
in the training anymore.
00:11:42
So smart sifting helps, because it analyzes your data
00:11:45
during the training job, and filters out
00:11:47
the low loss samples, which are less informative
00:11:50
to the model.
00:11:52
By training on a subset of your data,
00:11:53
you can reduce the time and cost of the training
00:11:56
by up to 35%.
00:11:58
And because it only filters out the low loss samples,
00:12:02
it has minimal or no impact, to the final training accuracy.
00:12:08
And it's easy to get started,
00:12:09
because it does not require you to make changes
00:12:11
to your data or training pipeline.
00:12:16
Here's a simple example that uses smart sifting.
00:12:20
We use the SageMaker deep learning container,
00:12:22
we load the sifting data loader,
00:12:25
and then we wrap whatever existing data loader we use
00:12:28
with the sifting data loader,
00:12:29
and provide a bit more configuration to start using it.
00:12:33
I don't need to change the rest of my model
00:12:35
or data pipeline.
00:12:38
And we already have customers who are seeing great results
00:12:40
with this capability.
00:12:42
For example, LG AI research,
00:12:44
use it to get meaningful increase in training performance
00:12:47
without any changes to the model accuracy.
00:12:54
At re:Invent, we also announced Amazon SageMaker HyperPod.
00:12:58
This enables customers who are training large scale models
00:13:00
to get all the managed benefits of SageMaker
00:13:03
that we've been discussing today,
00:13:05
with a UX that they might be more familiar with,
00:13:07
of being able to access the instances directly,
00:13:10
using Slerm and so on.
00:13:13
It has similar resilient capabilities to what we discussed
00:13:15
for training, in terms of replacing faulty instances
00:13:20
and enabling the training to begin a bit more quickly,
00:13:24
saving up to 20% in time.
00:13:27
It also benefits from the optimized distributed training
00:13:30
libraries, that also improve performance,
00:13:32
for both model and data parallel training.
00:13:37
And I mentioned it provides more granular control
00:13:40
over the cluster in what you're doing,
00:13:42
being able to access the instances directly,
00:13:45
install additional software, and make any changes
00:13:47
that you want to the cluster, to be able to fine tune
00:13:50
your training a bit more.
00:13:54
Now, we've talked about training really large scale models,
00:13:57
but sometimes you don't need to do that.
00:13:58
Sometimes you just want to fine tune an existing model.
00:14:01
And that's beneficial if you have an existing model,
00:14:04
a foundation model, and you want to bring in your own data
00:14:08
to fine tune that model to a particular use case.
00:14:12
So, by bringing in your own data,
00:14:14
you're making the model better than if you were just using
00:14:16
an off the shelf foundation model.
00:14:18
But, it saves you a lot of time and money,
00:14:20
because you don't have to train
00:14:21
that whole model from scratch.
00:14:24
Now the challenge is, that some models are not open sourced.
00:14:27
Right, you can't download the model weights
00:14:29
and fine tune them yourselves in an existing
00:14:31
SageMaker training job.
00:14:33
But, this has changed with enhancements we've made
00:14:35
to SageMaker algorithms and model packages.
00:14:39
You can now easily and securely customize third party models
00:14:42
by fine tuning them on your private data.
00:14:45
This provides end-to-end security.
00:14:47
The model provider can provide their model without revealing
00:14:50
their model weights, and you as a customer,
00:14:53
can fine tune on that model by bringing in your own data,
00:14:56
without exposing that data to the model provider.
00:14:59
And the final model weights, after your fine tuning,
00:15:02
are also only available to you.
00:15:06
Now this can be done with a variety of models
00:15:07
and algorithms, for example, Cohere models.
00:15:12
And all this is easy to use, and done through
00:15:14
the Python SageMaker SDK, that we were discussing earlier,
00:15:18
and integrates with other SageMaker capabilities,
00:15:20
like SageMaker experiments and pipelines.
00:15:25
And of course, with SageMaker inference,
00:15:27
you can deploy the models at the end,
00:15:29
to use them for inference in production scenarios,
00:15:33
in a secure way.
00:15:36
I'll now hand it over to Emily, to talk about fine tuning
00:15:38
and pre-training LLMs on SageMaker.
00:15:41
- Alright, thanks Gal.
00:15:43
Great, so I hope you're as excited as I am about a lot
00:15:46
of these new launches, new features.
00:15:48
I should introduce myself.
00:15:50
My name's Emily Webber, I lead our generative AI
00:15:53
foundation's technical field community here at AWS.
00:15:57
And in particular, some of those launches,
00:16:00
many of them came directly from conversations with you.
00:16:03
Actually, we were listening with customers
00:16:05
and chatting with you to understand key capabilities
00:16:08
that you wanted to see in our training stack,
00:16:11
and this led to a number of the features
00:16:12
that you've just learned about.
00:16:16
In any case, there are many ways to customize
00:16:20
a large language model.
00:16:22
Here I'm presenting them on two axis, right?
00:16:25
So at the bottom you have roughly complexity and then cost.
00:16:29
Obviously you wanna be closer to the left.
00:16:32
You want your LLM customization techniques
00:16:35
to be roughly easy, because then of course that's faster
00:16:39
for you to get started, and then that's less expensive.
00:16:42
However, there is a progression of these techniques.
00:16:46
Most customers will start with prompt engineering,
00:16:51
which is a nice way to easily improve
00:16:54
and then customize your large language model.
00:16:56
However, it's not as accurate as some of these
00:16:59
extra techniques that you can use.
00:17:02
Most customers will move from prompt engineering into
00:17:06
what we call a retrieval augmented generation stack,
00:17:10
where you have some set of data, you're converting
00:17:13
that data into embeddings, or that dense representation,
00:17:17
and then retrieving those documents
00:17:19
to interact with your consumers.
00:17:22
This then can transform, if you will,
00:17:25
into a fine tuning stack.
00:17:27
Actually, there's a bit of an overlap there,
00:17:30
but in any case, you can take, as Gal mentioned,
00:17:33
custom data, fine tune your model to add
00:17:37
that extra knowledge.
00:17:39
All of these techniques, however, pale in comparison
00:17:43
to the holy grail, which is pre-training,
00:17:45
which is creating a net new foundation model.
00:17:49
And so all of these techniques are available on SageMaker
00:17:53
and well supported by the stack.
00:17:55
So we're gonna learn how to move from
00:17:59
fine tuning into pre-training,
00:18:01
during our session here today.
00:18:04
Now, fine tuning small models is really impactful.
00:18:09
Here are a couple reasons why you would consider
00:18:11
fine tuning a small model.
00:18:13
The first is, of course, it's less expensive.
00:18:16
You're going to use a smaller dataset,
00:18:18
possibly a smaller model, and then still improve accuracy
00:18:24
because you're fine tuning this model,
00:18:25
but you're keeping your costs down.
00:18:28
When you're working with a smaller model,
00:18:30
such as something in the 7 billion parameters,
00:18:33
this is inherently faster, because the model itself
00:18:36
is just physically smaller than some of those larger ones,
00:18:40
and so the training time is faster,
00:18:43
the inferencing time is faster,
00:18:45
which means you can train more models
00:18:48
and you can do more inferencing,
00:18:50
again, with that smaller object.
00:18:53
Because the object is smaller,
00:18:55
it's easier for you to manage.
00:18:57
And so, again, the storage requirements are smaller,
00:19:01
so it's easier for you to copy the model.
00:19:03
It's easier for you to put the model into your applications,
00:19:07
and your packages, and your CICD pipelines,
00:19:10
and your repositories.
00:19:12
Many customers inherently prefer the ownership that comes
00:19:16
with creating new models, particular through fine tuning,
00:19:20
and then again, pre-training.
00:19:21
This allows you to increase the IP of your firm.
00:19:25
And then of course you have more deployment options when
00:19:28
you're fine tuning, again, that small model.
00:19:32
The more deployment options include serverless,
00:19:34
actually I have customers who create and then fine tune
00:19:39
these small 7 billion parameter models, compile them,
00:19:43
and then host them on Lambda, (chuckling)
00:19:45
and run them on serverless inferencing.
00:19:47
And so, absolutely, when you're working with these
00:19:50
tiny models, that are knowledgeable in small domains,
00:19:55
you have a lot of flexibility.
00:19:58
Pre-training is really best for extremely large data sets.
00:20:03
So when you have hundreds of GBs, or multiple terabytes
00:20:07
of custom language data that just really is not online,
00:20:11
if it, the language data that you have,
00:20:15
if it's not in Wikipedia, if it's not on Reddit,
00:20:20
if it's the core language that you're using,
00:20:23
if when you, you know, take a sentence and try and put
00:20:26
that sentence into Wikipedia, for example,
00:20:29
if Wikipedia doesn't understand what you're trying to say,
00:20:31
you may wanna consider seriously customizing a language
00:20:35
model, and then possibly creating a new one from scratch.
00:20:39
Now, why is this the case?
00:20:41
Why is pre-training so powerful?
00:20:44
Part of this is because the pre-training loss function
00:20:48
is more generalizable.
00:20:49
So when you're creating that new foundation model
00:20:53
from scratch, the learning is slightly different.
00:20:56
It's more general, and it's deeper
00:20:58
in the neural network, actually.
00:21:01
Also, when you're creating a new foundation model,
00:21:04
you can do this without supervised data.
00:21:07
So you don't need to go label, you know, millions of records
00:21:11
in pre-training, you can just capture and tokenize
00:21:14
a terabyte of your own language data
00:21:18
and then throw that into the network.
00:21:20
There's no need to add additional supervision on top of
00:21:22
that, which makes it very attractive.
00:21:26
Also, I love to see the efficiency gains
00:21:29
of pre-training, actually.
00:21:30
We all have small teams, we all have have few resources
00:21:34
for data science and modeling, and so,
00:21:36
when we take our small teams and focus them on one project,
00:21:40
and create this one massive, you know,
00:21:42
powerful foundation model,
00:21:44
and then use the foundation model
00:21:46
in many, many applications,
00:21:48
it actually, I find, is more efficient than optimizing
00:21:52
and then maintaining our tiny ML ops workloads,
00:21:56
which is what many of us were doing,
00:21:58
prior to transformers.
00:22:01
So, what does it take, to pre-train a new foundation model?
00:22:07
It sounds scary, it sounds like only, you know,
00:22:09
the the best can do this, but in fact,
00:22:13
in large part, due to, you know, a very sophisticated
00:22:17
and very mature training infrastructure that you're here
00:22:19
to learn about, it's actually pretty accessible.
00:22:22
So how are we gonna do this?
00:22:24
So here are three example models that were pre-trained
00:22:28
and created from scratch on Amazon SageMaker,
00:22:31
specially on our training infrastructure.
00:22:33
Stable Diffusion, clocking in at 5 billion images
00:22:38
and 240 terabytes of image data.
00:22:41
And so of course, that's a lot.
00:22:43
And so, image models tend to take a lot of data,
00:22:47
but the models themselves are a bit smaller.
00:22:50
And so you can use smaller cluster sizes.
00:22:54
The Falcon model of course,
00:22:56
from Technology Innovation Institute,
00:22:58
is a very large language model, the largest open source
00:23:01
language model.
00:23:03
1 trillion tokens, just under three terabytes
00:23:06
of language data, 40 billion parameters,
00:23:09
and then 48 p4d instances.
00:23:13
So sizable cluster, and that is two months
00:23:16
to train this model.
00:23:19
And then we have another Financial large language model
00:23:22
trained on SageMaker with just under two terabytes
00:23:25
of language data.
00:23:27
And so all of these requirements,
00:23:30
are surprisingly accessible.
00:23:32
Actually, I think there are quite a few companies
00:23:34
with that volume of language data.
00:23:38
And then the capabilities that we provide on SageMaker,
00:23:41
make the training experience, again, very accessible
00:23:45
to a wide variety of companies.
00:23:49
So how do we do this?
00:23:51
If we know we have, we meet the requirements,
00:23:54
how are we gonna go about creating
00:23:56
and pre-training these foundation models on AWS?
00:23:59
So the first step is just gathering and accessing that data.
00:24:03
And again, we want at least, I'd say one terabyte
00:24:07
of your own language data.
00:24:09
So this is documents, digitized PDFs, conversations,
00:24:15
you know, language streams like rich,
00:24:18
rich, robust language data.
00:24:20
So you wanna gather about one terabyte
00:24:22
of this language data.
00:24:23
Many firms will then pair that with open source data
00:24:27
actually, so that your model understands both
00:24:30
the nuances of your company's acronyms, and history,
00:24:35
and phrasing, and domain expertise,
00:24:37
but also knows what time the sunrises in Honolulu.
00:24:41
Because of course we want that mix of the general,
00:24:44
sort of open source knowledge,
00:24:46
but also what's specific to your company.
00:24:49
And so, that's gathering and storing the information.
00:24:52
After that, you'll pre-process your data.
00:24:56
SageMaker also has a really nice capability
00:24:59
for pre-processing datasets.
00:25:01
Actually one of our builders, Jenny over here,
00:25:04
helped me run many pre-processing
00:25:06
and data transformation jobs on SageMaker.
00:25:10
And so you can use our training job API,
00:25:14
including that remote function that we just learned about,
00:25:17
to run jobs in parallel, which are then tokenizing
00:25:21
and pre-processing.
00:25:22
So this core, sort of training job construct,
00:25:26
is applicable both for creating new models from scratch,
00:25:29
and also for for general data transformation
00:25:32
and general processing.
00:25:33
So you'll pre-process your data sets,
00:25:36
and then you'll optimize those data sets using
00:25:38
your preferred data storage.
00:25:40
We see a lot of customers using FSX for Luster.
00:25:44
This is because you can store your data in one place
00:25:46
and then easily attach this volume to training job runs.
00:25:51
So as you're iterating through different model sizes,
00:25:54
and different infrastructure, and experimental choices,
00:25:58
you can use and store your data in the same place.
00:26:03
After this, customers will then need to develop and iterate
00:26:06
over their training scripts.
00:26:08
And the elasticity that you get with the infrastructure
00:26:11
on SageMaker is beautiful.
00:26:13
You can use and run tiny instances.
00:26:16
So the T3 medium and the T2, that Werner shared with us
00:26:20
this morning.
00:26:21
So the T3 medium is a great choice for notebook instances,
00:26:25
very cost effective, very small machine.
00:26:28
And then you can scale that up with a click
00:26:30
of a couple buttons, to a small GPU, for example,
00:26:34
the G4 or the G5 series,
00:26:38
which your teams can then develop on,
00:26:40
and get the nuances working in their training loop.
00:26:44
And then ultimately scale out in the same platform,
00:26:47
in the same service, to hundreds and thousands of GPUs.
00:26:53
And so that's that step from, that move from step four
00:26:56
to step five, where you're developing and testing
00:26:59
on increasingly larger instances,
00:27:02
and then ultimately scaling up and using the massive
00:27:06
training infrastructure that SageMaker provides.
00:27:10
And then of course you'll evaluate the model artifact
00:27:13
step by step, and the way that SageMaker holds onto
00:27:17
the metadata, holds onto your scripts,
00:27:20
holds onto your hyper parameters, stores all of your
00:27:23
artifacts in S3, makes it so easy to just look up
00:27:27
your previous work.
00:27:29
So, I know if you're trying to capture an experiment
00:27:32
that you ran six months ago,
00:27:34
or even three years ago, as long as it was in AWS,
00:27:38
then you can easily go look up the results of that job,
00:27:42
capture some of the artifacts,
00:27:44
and then run a new experiment.
00:27:46
And so, at a high level, that's how you can pre-train
00:27:49
foundation models on AWS.
00:27:53
And again, all of this is possible because of the
00:27:56
distributed training libraries that we provide
00:27:59
on Amazon SageMaker.
00:28:00
So these are capabilities that we've been building
00:28:03
for many years.
00:28:04
Including data parallel and model parallel
00:28:07
distributed training libraries that give you efficiency
00:28:10
and enhancements.
00:28:11
So model parallel, is a way to distribute a neural network
00:28:16
over multiple accelerators and GPUs,
00:28:19
providing optimized performance.
00:28:21
And then our data parallel package,
00:28:23
will let you actually make copies of your model
00:28:26
across a large cluster.
00:28:28
And then we're delivering custom communication collectives
00:28:31
actually, that are optimized for the AWS network topology,
00:28:35
to save you up to 40% in the overall training time.
00:28:39
And so this is after many years
00:28:41
of innovation at this layer in the stack.
00:28:44
And again, all of this is available through SageMaker.
00:28:48
And customers agree with us.
00:28:50
So as you heard from Swami's keynote yesterday,
00:28:54
Aravind Srinivas, CEO of Perplexity AI,
00:28:58
is happily using SageMaker, and in particular
00:29:02
the data and model parallel training libraries,
00:29:06
again, to get that optimized performance
00:29:08
in particular in the Hyperpod mode.
00:29:13
Another feature of SageMaker that I find really handy,
00:29:17
is Warm Pools.
00:29:18
And so, the training job API, again,
00:29:22
is creating infrastructure when you train a model.
00:29:26
So when you call model.fits, or when you run
00:29:29
that Python training script,
00:29:31
we actually turn on our instances at the same time.
00:29:34
And so that call to create the cluster,
00:29:37
and to execute the scripts, are coupled,
00:29:40
they happen together.
00:29:41
And now again, this is really useful for cost efficiency,
00:29:46
so that when the job fails, because I forgot to point
00:29:50
to the right Luster volume,
00:29:52
that instance isn't sitting up there charging me money,
00:29:54
right, it turns off.
00:29:56
So it's extremely compute efficient.
00:29:58
However, as a dove, that can be challenging,
00:30:01
because I don't wanna wait eight minutes
00:30:03
just to ship a new line of code.
00:30:06
And so we launched, last year, our Warm Pools feature,
00:30:10
that lets you run new jobs, using the same image,
00:30:14
in seconds.
00:30:15
And so as a developer it's extremely handy,
00:30:19
because you can make just one, two, three line edits in your
00:30:22
training script, and then just run the job in seconds.
00:30:25
And so the Warm Pool feature is incredibly useful
00:30:28
for developing with the SageMaker training API.
00:30:33
Another core feature of SageMaker,
00:30:36
is the ability to use many different types of instances
00:30:39
and have a lot of flexibility with the underlying
00:30:42
infrastructure, where you're trying to run your scripts.
00:30:45
One of these, of course, is custom accelerators from AWS.
00:30:49
And so the Trainium and Inferentia capabilities
00:30:52
are both available on SageMaker.
00:30:55
And you're seeing a lot of cost performance,
00:30:59
relative to comparable Amazon EC2 instances,
00:31:02
up to 46% with Trainium one, relative to Llama two,
00:31:07
and so you'll see even better performance with Trainium two,
00:31:10
which was just recently announced.
00:31:12
And so, in the demo today, actually, we're gonna take a look
00:31:15
at Trainium on SageMaker.
00:31:19
So what is this demo?
00:31:20
So we're gonna be pre-training Llama two,
00:31:23
of course I got a little visual for you.
00:31:25
So this is a cartoon Llama with sunglasses
00:31:28
on the Las Vegas strip from Bedrock.
00:31:30
And so we're gonna pre-train a 7 billion parameter
00:31:33
Llama on SageMaker.
00:31:36
Now, why are we going to do this?
00:31:38
Why is this a useful exercise?
00:31:40
Again, this is assuming I have at least a few hundred
00:31:45
gigabytes of custom data.
00:31:47
So really a sizable data set
00:31:49
of my own language data.
00:31:52
And then again, this is knowledge
00:31:54
that's not generally available online.
00:31:57
And so it's my own proprietary data set.
00:32:00
So it's knowledge that wouldn't generally be found
00:32:02
in say, a Wikipedia archive.
00:32:05
This then drives in-domain accuracy.
00:32:09
And so this small model will be very surprisingly accurate
00:32:13
within that domain.
00:32:15
Again, it won't know everything under the sun,
00:32:17
but it will have a surprising amount of accuracy,
00:32:21
again, in that dataset, and in that domain
00:32:24
where you're training it.
00:32:25
This of course then drives, as I mentioned earlier,
00:32:28
ownership, it drives flexibility, and then that lets you,
00:32:32
again, use serverless hosting,
00:32:34
and then ultimately cost reduction opportunities.
00:32:37
And again, how are we gonna do this?
00:32:39
So I have some example notebooks, I'm gonna walk
00:32:41
through different instances, again, that T3 medium,
00:32:45
the Train one, optimize large scale data
00:32:48
stored on FSX for Luster, and then some Warm Pools,
00:32:51
and then, again, that distributed training infrastructure.
00:32:54
So let's check this out.
00:33:03
All right, so here we are.
00:33:06
I'm starting with the example notebook.
00:33:09
So this is a publicly available GitHub repository
00:33:13
that is using Neuron X,
00:33:15
which is from the Amazon Annapurna ML team,
00:33:18
and then Nemo Megatron actually, which is from Nvidia.
00:33:21
So this is the Nvidia distributed training framework,
00:33:25
Nemo Megatron.
00:33:26
And then we're gonna be running that
00:33:27
on Tranium accelerators.
00:33:31
This also uses Pytorch and then the core
00:33:34
torchrun framework.
00:33:39
All right, so first I am running this on a notebook instance
00:33:44
actually, so this is a SageMaker notebook instance.
00:33:47
This is my sturdy M four instance.
00:33:51
And it's handy because again, you can create a docker image.
00:33:56
So what I'm gonna do here on my notebook instance,
00:33:58
is I'm pointing to a deep learning container.
00:34:03
So that's a fully managed deep learning container
00:34:06
that we build at AWS, and track, and manage, and update,
00:34:10
as the software frameworks change.
00:34:12
We manage these deep learning containers,
00:34:14
and then you can inherit our container in your docker files
00:34:19
and build on top of it.
00:34:20
And so it's an easy way for you to make sure
00:34:22
that your scripts will work in the AWS framework.
00:34:25
So, first I'm gonna build that docker image.
00:34:29
Then I'm setting up FSX for Luster for my optimized runs.
00:34:33
I'm gonna prepare my data sets using tokenization
00:34:37
on the SageMaker training API,
00:34:40
converting the Hugging Face weights to the Neo framework.
00:34:43
Then I'm gonna train Llama two.
00:34:46
So again, just a really simple docker image,
00:34:49
I'm just importing SageMaker,
00:34:54
pointing to the deep learning container accounts,
00:34:57
grabbing that image,
00:34:59
and then this is a nice sh script that just builds
00:35:02
a docker image.
00:35:03
So I've got my docker image locally,
00:35:05
and then this is pushed to,
00:35:07
what's called the Elastic Container Registry,
00:35:09
so another AWS service that just hosts your docker images.
00:35:13
And so I'm pushing my docker image,
00:35:16
which is on my notebook instance locally,
00:35:18
up to ECR in AWS.
00:35:21
And so that's this process.
00:35:23
Great.
00:35:25
So I have my docker image hosted.
00:35:28
My second step is setting up FSX for Luster.
00:35:32
So this is using a cloud formation template
00:35:35
to deploy Luster in my account.
00:35:38
You can also do this through the console.
00:35:41
Luster sets up a two-way data repository
00:35:44
with your S3 bucket.
00:35:46
So as long as you have data sitting in your S3 bucket,
00:35:49
you can enable a two-way data repository.
00:35:52
So data will be copied through the service.
00:35:56
The metadata will be copied through the service,
00:35:58
from the bucket, to Luster,
00:36:00
and then as you write files to Luster,
00:36:04
those are then copied back to S3.
00:36:06
So you can download it from S3.
00:36:08
You can also mount the Luster volume
00:36:10
to your notebook instance, as long as the networking
00:36:14
is aligned.
00:36:14
And then you can view the contents of that volume directly
00:36:19
from your notebook instance.
00:36:20
So I'll set this up here.
00:36:22
Again, just downloading the cloud formation template,
00:36:26
and then creating the stack in the relevant AZ.
00:36:31
Once the stack has been created,
00:36:33
then I'm gonna prep my dataset.
00:36:35
And so my dataset, again, is for LLM training,
00:36:38
using that docker image, using the SageMaker SDK,
00:36:42
and creating a lot of hyper parameters.
00:36:45
Some hyper parameters for the dataset pointer,
00:36:49
the model pointer, some of my keys,
00:36:52
I'm pointing to FSX for Luster right here,
00:36:56
and then setting up the file system input.
00:36:59
And then as Gal mentioned, here is the Pytorch API.
00:37:03
Now this is, again, a very complex example.
00:37:07
If you are new to SageMaker, I would give you
00:37:09
a really easy one. (chuckling)
00:37:10
And so there are a lot of very accessible examples,
00:37:15
where you can train a model in like two lines of code,
00:37:18
basically, using Hugging Face, actually.
00:37:21
But certainly the Python training scripts,
00:37:24
you can bring your own packages and requirements.txt,
00:37:27
and be on your way very, very quickly.
00:37:30
This is a complex example,
00:37:31
but we have very simple ones that you can use
00:37:33
to get started.
00:37:34
So in any case, I'm importing a pointer to that container.
00:37:39
Actually, this is, this Python object is pointing
00:37:43
to the deep learning containers for PyTorch.
00:37:46
And then this API is letting me define my own
00:37:50
pre-processing function right here, in the script.
00:37:54
I'm pointing to a source directory,
00:37:57
which can be a local file, can also be a Git repository.
00:38:01
And then I'm defining my infrastructure right here.
00:38:05
So this is one, Trainium actually, dot 32 xl.
00:38:10
And so I'm running this job here.
00:38:13
And again, that's my pre-processing dataset.
00:38:15
And I'm gonna jump to Llama here,
00:38:18
'cause we're a little short on time.
00:38:19
And so, once I've pre-processed the data
00:38:22
and stored it on Luster,
00:38:25
which again, then will replicate back in S3,
00:38:27
after that, I'm going to train my model
00:38:31
on that same Luster volume.
00:38:32
And Luster is useful because it mounts in seconds.
00:38:36
So when you're working with large scale data sets,
00:38:38
of course it can be very computationally and time intensive
00:38:41
to copy them.
00:38:43
And streaming can be a little bit challenging to set up.
00:38:46
So Luster is a nice way to store your data
00:38:49
in a high performance file system,
00:38:52
which you can then mount in seconds using as many jobs
00:38:56
as you want to, because the bandwidth scales,
00:38:59
actually, is a function of mounts.
00:39:01
So Luster is a great high performance data store.
00:39:03
So in any case, we're gonna set this up.
00:39:07
And then same as last time, pointing to all
00:39:10
of my hyper parameters, specifying that I'm using,
00:39:14
again, four instances.
00:39:16
Each of these instances has 32 accelerators.
00:39:20
Again, the Trainium accelerators.
00:39:22
All of the hyper parameters for the 7 billion parameters
00:39:27
that we're gonna train here, my FSX for Luster pointer,
00:39:32
and then I'm gonna launch my training job.
00:39:34
So, setting in the rest of my hyper parameters,
00:39:37
again pointing to that PyTorch API, loading in my scripts,
00:39:42
and then I call model.fits.
00:39:44
And as promised, all of the content for this job,
00:39:49
is loaded directly in the SageMaker control plane.
00:39:52
So I see exactly when I started this, what the status was,
00:39:56
where the artifacts are, when I ran this.
00:39:59
I can view the outputs, I can step through the logs
00:40:03
and see every piece of information that I need for my model,
00:40:07
which then of course, I can download it and build
00:40:09
an entire app on top of this.
00:40:12
And so with that, I'm gonna hand over the stage
00:40:15
to Tom, and he's gonna share some information
00:40:18
with you about Toyota.
00:40:26
- Great, thank you Emily.
00:40:29
So today I am going to really tell you about
00:40:31
how we're using SageMaker,
00:40:33
to accelerate machine learning at TRI.
00:40:36
And first, maybe I should tell you a little bit
00:40:38
about what TRI actually is.
00:40:41
I'll give you a couple of example of projects
00:40:42
that we have, ongoing right now.
00:40:46
The first, is a project around autonomous drift driving,
00:40:56
with a Toyota Supra.
00:40:57
And I'll just let the video play here.
00:40:59
(electronic music)
00:41:03
(tires screeching)
00:41:07
(indistinct chatter)
00:41:09
(electronic music continues)
00:41:22
(engine revving)
00:41:25
(tires screeching)
00:41:29
So that's one example.
00:41:31
And of course AI, here, helps lay the foundation,
00:41:34
for all of this work.
00:41:35
The second is, we work on a lot of challenge problems.
00:41:38
And so, there's a big robotics group
00:41:40
that focuses on challenge problems,
00:41:44
that we start in the lab, but also go out
00:41:46
to the real world environment,
00:41:48
and evaluate our systems in.
00:41:50
And so, this is an example of where we have a robotic system
00:41:55
that we built in-house, from the ground up,
00:41:57
and we're able to retrieve and stock grocery store shelves.
00:42:03
This has evolved more into the factory setting as well,
00:42:05
more recently.
00:42:09
And, we're 250 people across a few different locations.
00:42:12
So there's a team in Los Altos
00:42:15
and a team in Cambridge, Mass.
00:42:17
And there's teams also in human-centered AI,
00:42:19
and also material science as well.
00:42:23
And most recently, one of the things about generative AI
00:42:25
that we found anyway, is, in the context of robotics,
00:42:28
is that it can be applied, it can now be applied
00:42:31
to robotics, to be able to do a wide variety of tasks
00:42:36
that we never thought were possible.
00:42:38
And this is a technique called diffusion policy,
00:42:41
that is now able to learn from a few examples of a human,
00:42:45
from a human, how to perform very complicated tasks.
00:42:50
And so, building on this, the machine learning team at TRI
00:42:54
tries to build a foundation across language, vision,
00:42:57
and action.
00:42:58
Language in the sense, that both common sense knowledge,
00:43:06
and also a wider variety of applications.
00:43:09
So like, language has applications across Toyota
00:43:12
more generally, in the context of enterprise applications,
00:43:15
but also in terms of code generation as well.
00:43:19
Vision, feeds into language to give robots eyes,
00:43:22
for example, and then action, to perform a wide variety
00:43:26
of tasks across a number of different platforms.
00:43:30
But, this talk is more about SageMaker. (chuckling)
00:43:33
And so I wanna tell you about how we're using SageMaker,
00:43:36
at TRI, to really accelerate our progress.
00:43:40
And the first is sort of general experimentation.
00:43:43
Where we use sort of, one to eight instances,
00:43:46
to scale up our training jobs.
00:43:49
And, the second, is how we can take some of these ideas
00:43:53
and really scale this up very, very quickly.
00:43:56
To not just a few GPUs, but to hundreds of GPUs at a time.
00:44:01
And finally, we're also looking, we're also able
00:44:04
to use SageMaker for even more broad applications,
00:44:06
such as, like, just serving models.
00:44:09
As these are hard to serve sort of locally,
00:44:11
on a device.
00:44:15
So lemme tell you a little bit about the experimentation
00:44:17
that we do on SageMaker at TRI.
00:44:22
First, you know, here's I guess,
00:44:24
the high level, is that we have a wide variety
00:44:26
of applications that we're, models that we're training
00:44:29
on SageMaker.
00:44:30
The first, is large language models.
00:44:34
Second, is a mono depth model.
00:44:35
So taking RGB images and inferring depth, for example.
00:44:39
A third, is sort of stable diffusion.
00:44:42
Language to image generation,
00:44:44
for better feature representations.
00:44:46
And a third, a fourth, is 3D representations.
00:44:50
Such as, language, to 3D sort of structures, as well.
00:44:53
That's useful for robotics and a number
00:44:54
of other applications.
00:44:57
But, and across all of these we found,
00:45:00
SageMaker to be very useful for a number of reasons.
00:45:03
Some of the challenges that come up for us,
00:45:07
include a few things.
00:45:09
First, we wanna be able to reuse
00:45:11
existing training infrastructure and clusters
00:45:13
that we create.
00:45:15
And so, you know, Warm Pools, that you heard about earlier,
00:45:19
are one way in which to do that.
00:45:21
And we take advantage of that on a daily basis,
00:45:24
to pull back those resources
00:45:25
and continue iterating on our training jobs.
00:45:29
The second is for scaling.
00:45:31
We need to be able to go from one to many instances
00:45:33
very quickly, and also to change instance types
00:45:36
very quickly as well.
00:45:40
We also need high performance systems.
00:45:41
So, you know, and SageMaker is very well optimized
00:45:46
in the backend.
00:45:48
And finally, we need a sort of flexibility,
00:45:50
we need to run a number of different jobs
00:45:54
across all of the, sort of, science group.
00:45:59
And so, you know, I'll echo like,
00:46:02
this is the code you saw earlier.
00:46:06
I'll echo how easy it is to sort of like,
00:46:08
scale these things up.
00:46:10
You can start with one instance,
00:46:11
and you can iterate with your training job,
00:46:14
with that instance for example.
00:46:16
And, if you need to scale, it's a very simple change
00:46:21
to enable that.
00:46:23
So this can, in this case you can change the instance count
00:46:26
to like from one to eight for example,
00:46:28
and then start just scaling your runs very quickly.
00:46:32
The second, is to, as new hardware comes out,
00:46:34
as Gal mentioned, we're able to,
00:46:37
quickly change the hardware types, as well.
00:46:40
And so in this case the, we can change,
00:46:43
say, from a p4 instance to a p5,
00:46:46
which will give twice the throughput for our training jobs,
00:46:49
and reduce, sort of, the training times for us.
00:46:52
And just to give some evidence of how this looks,
00:46:55
how performant these systems are, if you look at sort of,
00:46:59
scaling across a number of instances,
00:47:01
it's almost linear, in terms of the scalability here.
00:47:04
So SageMaker has been very performant for us,
00:47:07
in terms of scaling up our training jobs as well.
00:47:13
As Emily mentioned earlier, you know, using,
00:47:16
these data sets are huge too,
00:47:18
and as we started, we started using
00:47:21
data sets of a few terabytes.
00:47:25
And, it's nice to be able to quickly start up
00:47:28
with FSX for Luster.
00:47:30
However, as we scaled our training jobs as well,
00:47:32
and the amount of data that's been, that we need,
00:47:34
for these, from, you know, a few terabytes,
00:47:36
to a half a petabyte or more,
00:47:39
we, the flexibility in SageMaker, to pull in other resources
00:47:44
like web dataset, has been really, really great,
00:47:47
and has really accelerated, sort of,
00:47:49
the training runs that we have as well.
00:47:56
And so, just to reiterate, I mean there's,
00:47:59
the group is running jobs from, you know,
00:48:02
one instance, eight instances,
00:48:04
and these are a few of the applications.
00:48:07
But, going beyond this, we're also able to scale
00:48:11
our training runs,
00:48:13
to much larger scale as well, here.
00:48:16
And so, you know, just to highlight one
00:48:18
of the ways in which we're doing that at TRI,
00:48:20
we're building sort of state of the art,
00:48:23
the question, you know, the question is:
00:48:24
How do you build state, can you build state-of-the-art
00:48:26
LLMs with SageMaker?
00:48:28
And,
00:48:29
we at TRI have been doing this.
00:48:31
We've been reproducing some of the Llama two
00:48:35
models initially,
00:48:36
to validate all of our systems.
00:48:41
And, for this, SageMaker, we need sort of
00:48:45
scalability and performance across all of these instances.
00:48:48
And what SageMaker has really provided for us,
00:48:51
is that scalability.
00:48:52
So this is the newest hardware, the H100s.
00:48:55
And, sort of linear scaling, as the sort of
00:48:59
number of nodes increases.
00:49:01
And this is, you know, so this ends up being like 256,
00:49:06
like, H100s,
00:49:09
and if you run this out, a training job like this,
00:49:12
for pre-training a Llama two model, you know,
00:49:15
can take about a week,
00:49:17
when you start scaling out to 30 instances.
00:49:20
And this is just to say, you know, with a trillion,
00:49:22
with more than a trillion tokens,
00:49:25
we can reproduce, sort of like,
00:49:28
the state-of-the-art models here.
00:49:29
And we're scaling, not only from the 7 billion parameter
00:49:32
models, up to 13, 34, 70 as well, on SageMaker right now.
00:49:40
One of the key features, or one of the nice features,
00:49:43
of SageMaker, so that you don't lose any time,
00:49:46
has also been some of the repair work.
00:49:47
I think Gal may have mentioned this earlier.
00:49:50
So your job may actually, as you scale these jobs,
00:49:53
it's often the case that like, hardware will fail.
00:49:56
And when hardware fails, you have downtime.
00:49:59
And if you have downtime,
00:50:01
it costs you money.
00:50:03
You're not training your models.
00:50:06
One of the great parts of SageMaker is that,
00:50:09
it has this option for cluster repair.
00:50:11
And so, for us, this happened in about 10 minutes,
00:50:14
like, one of the feed machines failed,
00:50:15
and the cluster came right back up and we were able to
00:50:18
continue our training run very quickly.
00:50:23
So that's pre-training.
00:50:25
And, you know, the other thing is that,
00:50:29
is sort of, on more on the side of up training.
00:50:31
Where you have a large data set,
00:50:33
and you want to, but not quite the size you'd need
00:50:35
for pre-training, and you want to
00:50:38
look at a particular domain.
00:50:39
So we, at TRI we're, you know,
00:50:41
because we're a Toyota-centric entity,
00:50:45
Japanese was sort of like, one of the areas that we were
00:50:48
very interested in.
00:50:50
And so, you know, you can take,
00:50:52
some of the state-of-the-art models,
00:50:53
which aren't actually trained for,
00:50:56
they do have a little bit of, say for example,
00:50:57
Japanese training data,
00:50:58
but they don't have that much.
00:51:00
If you go out there and acquire all of,
00:51:02
say, the open source data available,
00:51:04
you get to a hundred billion,
00:51:06
10 to a hundred billion tokens here,
00:51:08
which is enough to up train a model,
00:51:11
such as Japanese.
00:51:14
And what we found, is that, you know, taking Llama two,
00:51:17
with 13 billion parameters, and up training,
00:51:22
you gain some performance.
00:51:23
This is a win rate metric, against some of the best model,
00:51:27
closed source models.
00:51:29
But, the next step, where you actually instruction
00:51:32
fine tune the model, so this is how you get
00:51:34
large language models to follow instructions,
00:51:38
to be sort of chatty, is to fine tune them using
00:51:40
instruction fine tuning,
00:51:42
with data of this type, where the instruction
00:51:44
in the first part would be,
00:51:45
would be in the first part, and the second part would be,
00:51:48
the sort of, response you would expect.
00:51:51
And if you do that, with the additional pre-training,
00:51:54
with the additional instruction fine tuning in Japanese,
00:51:57
on some of the more performant models out there,
00:51:59
you can get state-of-the-art performance,
00:52:01
in Japanese here.
00:52:03
And so this is a much smaller model compared to say,
00:52:05
a Llama 70 B, yet still more performant, for example.
00:52:11
And so SageMaker's really enabled us to do a lot
00:52:13
of this experimentation very rapidly, at TRI.
00:52:19
The final one I just wanna mention, it's not covered
00:52:21
as much in this topic, but, you know,
00:52:24
there's also the ability to do other sort of workloads,
00:52:27
such as serving models as well.
00:52:28
And we've been leveraging SageMaker endpoints to actually,
00:52:33
you know, serve both open source models,
00:52:36
as well as the models that we have in-house,
00:52:40
internally, across TRI and maybe eventually
00:52:42
externally as well.
00:52:46
So with that, I just wanted to say,
00:52:47
there's sort of three primary areas which we are,
00:52:51
we're focused on in using SageMaker four.
00:52:55
Small scale experiments, such as,
00:52:57
one to eight nodes, for example.
00:52:59
Large scale training,
00:53:00
up to, you know, 32, 64, or more instances.
00:53:07
As well as surveying as well.
00:53:10
And so, you know, SageMaker has been very critical,
00:53:13
and important for our training of these,
00:53:18
of this variety of models, and experimentation generally.
00:53:24
And I just wanted to close, I guess, with saying like,
00:53:28
you know, it's been great working with SageMaker,
00:53:31
for training all of these models.
00:53:32
In the next time, hopefully when we come back
00:53:35
to AWS re:Invent, maybe we will have a foundation model,
00:53:41
that can be trained once and do a whole lot,
00:53:43
do many different robotics tasks,
00:53:46
in response to language and other things as well.
00:53:50
So with that, I'll end, and maybe I'll give Gal.
00:53:56
- Thank you Tom.
00:53:58
Yeah, oh.
00:53:59
We just wanted to end by showing you a couple of links,
00:54:01
QR codes to learn more about SageMaker and how to use it.
00:54:06
And thank you all for your time.
00:54:08
We'll all stand around here for a little bit longer
00:54:11
if you have any questions.
00:54:12
I actually think some members of the Smart Sifting team
00:54:14
are also here, if you have questions about that
00:54:16
and want to learn more.

Etiquetas

Machine Learning
Amazon SageMaker
Deep Learning
Model Training
Artificial Intelligence
Fine-tuning
Pre-training
Robotics
Data Management
Distributed Training