Who is Swami Sivasubramanian?

Swami Sivasubramanian oversees database, analytics, and machine learning at AWS and has been a key figure in AI and ML development for 15 years.

What is generative AI?

Generative AI refers to advanced models like large language and foundation models capable of creating content, making predictions, and learning tasks from vast datasets.

What makes transformer models significant?

Transformer models are significant due to their scalability, faster training, and capacity to process extensive data like the internet to understand human knowledge.

How do foundation models differ from task-specific models?

Foundation models are trained on extensive datasets and can adapt to multiple tasks, whereas task-specific models are created for particular tasks.

What is 'in-context learning'?

In-context learning involves providing examples in a prompt to improve a model’s output, allowing it to adjust its responses based on those examples.

What are the challenges with large language models?

Challenges include the high cost of training and inference, data quality issues, and ensuring the models are free of bias and sensitive information.

What techniques help reduce inference cost?

Techniques like quantization and distillation help reduce the model size and inference cost while maintaining performance.

What is needed to make machine learning models more accessible?

Making models accessible involves developing platforms and tools like SageMaker and Bedrock to simplify using and deploying these models.

What role does custom silicon play in ML?

Custom silicon like AWS's Trainium and Inferentia helps reduce the cost and energy use of running large models, making them more sustainable.

Hello World: Meet Generative AI | Amazon Web Services

00:17:24

https://www.youtube.com/watch?v=dBzCGcwYCJo

Sintesi

TLDRIn this conversation, the focus is on advancements in machine learning, particularly generative AI and foundation models, and their transformative potential for future applications. Swami Sivasubramanian from AWS shares insights on how cloud technology has allowed the scaling of machine learning by providing extensive compute and data resources. The discussion elaborates on how transformer-based models and foundation models have revolutionized AI. Unlike previous task-based models, these new models are capable of applying their extensive training on broad datasets to various tasks with minimal additional training on specific data. The importance of quantization and distillation techniques is highlighted in managing model sizes and reducing inference costs. Moreover, the significance of custom silicon to enhance efficiency and sustainability in AI processing is discussed. The dialogue addresses making ML models more accessible to a broader audience, which is crucial for realizing their full potential in responsible and grounded ways across different applications.

Punti di forza

🤖 Machine learning is advancing rapidly, notably through generative AI.
🧠 Transformer models are a breakthrough in scalability and data processing.
📊 Foundation models simplify creation of custom models for various tasks.
⚙️ Quantization and distillation make inference more cost-effective.
🌐 Foundation models reduce reliance on task-specific models by covering broad datasets.
💻 Custom silicon, like AWS Trainium, optimizes operational cost and energy use.
🔍 Data cleaning and preparation remain critical in ML model success.
📈 Platforms like SageMaker are vital for democratizing access and usage of ML.
🔥 Generative AI and large models are foundational despite hype cycles.
🌱 Sustainability is increasingly significant in AI model deployment.

Linea temporale

00:00:00 - 00:05:00
Swami Sivasubramanian discusses the evolution of machine learning at AWS, emphasizing the need for accessibility to increase model development speed. Historical context is given on Amazon's involvement in AI and machine learning, highlighting the transition from simple algorithms to sophisticated deep learning applications, such as Alexa. The dialogue establishes the stage for understanding advancements in generative AI technologies.
00:05:00 - 00:10:00
Swami explains the significance of large language models and transformers in generative AI. Unlike task-specific models, these models, with their capacity for in-context learning, utilize enormous datasets, revolutionizing how AI understands and predicts language. He highlights the shift towards using foundation models for versatility across applications and suggests these have simplified building specific models, mitigating the need for extensive labeled data.
00:10:00 - 00:17:24
The discussion covers the cost implications and techniques for optimizing the deployment of large AI models, including quantization and distillation. Swami mentions AWS's investment in custom silicon, like Trainium and Inferentia, to enhance efficiency and sustainability. Despite public hype, Swami stresses the foundational significance of these technologies for future application development, acknowledging potential challenges in data integrity and responsible AI usage.

Mappa mentale

Video Domande e Risposte

Who is Swami Sivasubramanian?
Swami Sivasubramanian oversees database, analytics, and machine learning at AWS and has been a key figure in AI and ML development for 15 years.
What is generative AI?
Generative AI refers to advanced models like large language and foundation models capable of creating content, making predictions, and learning tasks from vast datasets.
What makes transformer models significant?
Transformer models are significant due to their scalability, faster training, and capacity to process extensive data like the internet to understand human knowledge.
How do foundation models differ from task-specific models?
Foundation models are trained on extensive datasets and can adapt to multiple tasks, whereas task-specific models are created for particular tasks.
What is 'in-context learning'?
In-context learning involves providing examples in a prompt to improve a model’s output, allowing it to adjust its responses based on those examples.
What are the challenges with large language models?
Challenges include the high cost of training and inference, data quality issues, and ensuring the models are free of bias and sensitive information.
What techniques help reduce inference cost?
Techniques like quantization and distillation help reduce the model size and inference cost while maintaining performance.
What is needed to make machine learning models more accessible?
Making models accessible involves developing platforms and tools like SageMaker and Bedrock to simplify using and deploying these models.
What role does custom silicon play in ML?
Custom silicon like AWS's Trainium and Inferentia helps reduce the cost and energy use of running large models, making them more sustainable.

Visualizza altre sintesi video

Ottenete l'accesso immediato ai riassunti gratuiti dei video di YouTube grazie all'intelligenza artificiale!

Sottotitoli

Scorrimento automatico:

00:00:00
So there's a lot of public interest in this recently and it feels like hype.
00:00:08
Is this the same, or is this something where we can see that this is a real foundation
00:00:14
for future application development?
00:00:16
We are living in very exciting times with machine learning.
00:00:20
The speed of ML model development will really actually increase.
00:00:26
But you won't get to that end state that we want in the next coming years
00:00:31
unless we actually make these models more accessible to everybody.
00:00:51
Swami Sivasubramanian oversees database, analytics and machine learning at AWS.
00:00:58
For the past 15 years, he has helped lead the way on AI and ML in the industry.
00:01:03
Swami’s teams have a strong track record of taking new technologies and turning these
00:01:08
into viable tools.
00:01:11
Today, Generative AI is dominating news feeds and conversations.
00:01:15
Consumers are interacting with it and brands are trying to understand how to best harness
00:01:20
its potential for their customers.
00:01:23
So, I sat down with Swami to better understand the broad landscape of this technology.
00:01:29
Swami, we go back a long time.
00:01:36
Tell me a bit. Do you remember your first day at Amazon?
00:01:39
I still remember because it's not very common for PhD students to join Amazon at that time
00:01:48
because you were known as retailer or ecommerce.
00:01:53
We were building things.
00:01:55
And so that's also quite a departure from a foreign academic.
00:01:59
Definitely, for a PhD student to go from thinking to, actually, how do I build this?
00:02:04
So you brought actually DynamoDB to the world and quite a few other databases since then,
00:02:11
but under your purview now is also AI and machine learning.
00:02:18
So tell me a bit about how does your world of AI look like?
00:02:21
After building a bunch of these databases and analytics services,
00:02:27
I got fascinated by AI because that literally AI and machine learning because that puts
00:02:32
data to work.
00:02:33
And if you look at machine learning technology itself broadly, it's not necessarily new.
00:02:38
In fact, some of the first papers of deep learning was written even like 30 years ago.
00:02:43
But even in those papers, they explicitly called out for it to get large scale adoption.
00:02:49
It required massive amount of compute and massive amount of data to actually succeed.
00:02:54
And that's what cloud got us to actually unlock the power of deep learning technologies.
00:03:01
So which led me to early on this is like six, seven years ago to start the machine Learning
00:03:07
organization because we wanted to take machine learning, especially deep learning style technologies,
00:03:13
from not just in the hands of scientists to everyday developers.
00:03:17
If you think about the early days of Amazon, the retailer with similarities and recommendations
00:03:23
and things like that, were they the same algorithms that we're seeing being used today or is that,
00:03:30
I mean that's a long time ago, 30 years.
00:03:35
Machine learning has really gone through huge growth in actually the complexity of the algorithms
00:03:41
and applicability of the use cases.
00:03:44
Early on the algorithms were a lot more simple, a lot more like linear algorithms based or
00:03:49
gradient boosting.
00:03:51
If you see last decade, it was all around like deep learning early part of last decade,
00:03:57
which was essentially a step up in the ability for neural nets to actually understand and
00:04:03
learn from the patterns, which is effectively what all the image based image processing
00:04:08
algorithms come from.
00:04:10
And then also personalization with different types of neural nets and so forth.
00:04:15
And that's what led to the invention like Alexa,
00:04:18
which has a remarkable accuracy compared to others.
00:04:21
So the neural nets and deep learning has really been a step up.
00:04:25
And the next big step up is what is happening today in machine learning.
00:04:30
So a lot of the talk these days is around generative AI,
00:04:34
large language models, foundation models.
00:04:37
Tell me a bit why is that different from, let's say the more task based like vision
00:04:43
algorithms and things like that?
00:04:45
I mean, if you take a step back and look at what's -
00:04:49
How this foundation models - large language models - is all about
00:04:54
These are big models which are trained with
00:04:57
hundreds of millions of parameters if not billion
00:05:00
A parameter, just to give context, is like an internal variable
00:05:05
where the ML algorithm has learned from its data set.
00:05:08
Now, to give a sense, what is this big thing suddenly that has happened?
00:05:14
Few things -
00:05:15
One, if you take a look at Transformers, has been a big change.
00:05:22
Transformer is a kind of neural net technology that is remarkably scalable than the previous
00:05:30
versions like RNNS or various others.
00:05:33
So what does this mean?
00:05:34
Why did this suddenly lead to this transformation?
00:05:38
Because it is actually scalable and you can train them a lot faster now you can throw
00:05:42
a lot of hardware and lot of data.
00:05:45
Now that means now I can actually crawl the entire World Wide Web and actually feed it
00:05:53
into these kind of algorithms and start actually building models that can actually understand
00:06:00
human knowledge.
00:06:02
At a high level, a generative AI text model is good at using natural language processing
00:06:09
to analyze text and predict the next word that comes in a sequence of words.
00:06:14
By paying attention to certain words or phrases in the input, these models can infer context.
00:06:21
And they can use that context to find the words that have the highest probability of
00:06:26
following the words that came before it.
00:06:29
Structuring inputs as instructions with relevant context can prompt a model to generate answers
00:06:35
for language understanding, knowledge, and composition
00:06:39
Foundation Models are also capable of what is called “in-context learning,” which
00:06:44
is what happens when you include a handful of demonstration examples
00:06:48
as part of a prompt to improve the model’s output on the fly.
00:06:53
We supply examples to further explain the instruction
00:06:56
And this helps the model adjust the output based on the pattern and style in the examples.
00:07:03
When the models use billions of parameters and their training corpus is the entire internet,
00:07:10
the results can be remarkable.
00:07:12
The training is unsupervised and task agnostic.
00:07:16
And the mountains of web data used for training let it respond to natural language instructions
00:07:21
for many different tasks.
00:07:24
So the task based models that we had before and that we were already really good at, could
00:07:29
you build them based on these foundation models?
00:07:33
You no longer need these task specific models or do we still need them?
00:07:38
The way to think about it is the need for task based specific models are not going away.
00:07:44
But what essentially is how we go about building them.
00:07:47
You still need a model to translate from one language to another
00:07:52
or to generate code and so forth.
00:07:55
But how easy now you can build them is essentially a big change because with foundation models,
00:08:01
which are the entire corpus of knowledge of, let's say huge amount of data, now it is simply
00:08:08
a matter of actually building on top of this with fine tuning, with specific examples.
00:08:13
Think about if you're running like a recruiting firm as an example and you want to ingest
00:08:21
all your resumes and store it in a format that is standard for you to search and index on,
00:08:26
instead of building a custom NLP model to do all that.
00:08:30
Now using foundation models and give a few examples of here is an input resume in this
00:08:35
format and here is the output resume.
00:08:37
Now you can even fine tune these models by just giving few specific examples and then
00:08:45
you essentially are good to go.
00:08:47
So in the past, most of the work went into probably labeling the data and that was also
00:08:54
the hardest part because that drives the accuracy.
00:08:57
Exactly.
00:08:58
So in this particular case, with these foundation models, no longer labeling is needed?
00:09:05
Essentially, I mean, yes and no.
00:09:07
As always with these things, there is a nuance.
00:09:10
But majority of what makes these large scale models remarkable is they actually can be
00:09:16
trained on a lot of unlabeled data.
00:09:20
You actually go through what I call as a pretraining phase, which is essentially you collect data
00:09:25
sets from, let's say, the World Wide Web, like common crawl data, or code data and various
00:09:30
other data sites, Wikipedia, whatnot.
00:09:32
And then you don't even label them, you kind of feed them as it is.
00:09:37
But you have to of course go through Sanitization step in terms of making sure you cleanse data
00:09:42
from PII or actually all other stuff like negative things or HP and whatnot.
00:09:50
But then you actually start training on large number of hardware clusters because these
00:09:57
models to train them can take tens of millions of dollars to actually go through that training.
00:10:03
And then you actually finally you get a notion of a model and then you go through the next
00:10:09
step of what is called inference.
00:10:12
When it comes to building these LLMs, the easy part is the training.
00:10:19
The hardest part is the data.
00:10:22
Training models with poor data quality will lead to poor results.
00:10:26
You’ll need to filter out bias, hate speech, and toxicity.
00:10:30
You’ll need to make sure that the data is free of PII or sensitive data.
00:10:36
You’ll need to make sure your data is deduplicated, balanced, and doesn’t lead to oversampling.
00:10:43
Because the whole process can be so expensive and requires access large amounts of compute
00:10:48
and storage, many companies feel lost on where to even start.
00:10:54
Let's speak object detection in video that would be as a smaller model than what we see
00:11:03
now with the foundation models.
00:11:06
What's the cost of running a model like that?
00:11:09
Because now these models with these hundreds of billions of parameters are probably very
00:11:15
large pieces of data.
00:11:18
That's a great question because there is so much talk only happening around training these
00:11:23
models, but very little talk on the cost of running these models to make predictions,
00:11:29
which is inference, which is a signal that very few people are actually deploying it
00:11:34
and runtime for actual production.
00:11:36
Or once they actually deploy in production, they will realize oh no, these models are
00:11:41
very expensive to run and that is where few important techniques actually really come
00:11:47
into play.
00:11:48
So one, once you build these large models to run them in production, you need to do
00:11:54
a few things to make them affordable to run at cost, run at scale and run actually very
00:12:01
in an economical fashion.
00:12:03
One is what we call as quantization.
00:12:05
The other one is what I call as distillation, which is that you have these large teacher
00:12:11
models and even though they are trained hundreds of billions of models, they kind of are distilled
00:12:17
to a smaller fine grained model and speaking in a super abstract term, but that is the
00:12:22
essence of these models.
00:12:26
Of course, there’s a lot that goes into training the model, but what about inference?
00:12:32
It turns out that the sheer size of these models can make inference expensive to run.
00:12:38
To reduce model size, we can do “quantization,” which is approximating a neural network by
00:12:44
using smaller, 8-bit integers instead of 32- or 16-bit floating point numbers.
00:12:50
We can also use “distillation”, which is effectively a transferring of knowledge
00:12:55
from a larger “teacher” model to a smaller and faster “student” model.
00:13:01
These techniques have reduced the model size significantly for us, while providing similar
00:13:06
accuracy and improved latency.
00:13:09
So we do have this custom hardware to help out with this that happens at I mean, normally
00:13:17
this is all GPU based, which are expensive energy hungry beasts.
00:13:23
Tell us what we can do with custom silicon that makes it so much cheaper
00:13:29
both in terms of cost as well as in, let's say, your carbon footprint of the energy.
00:13:37
When it comes to custom silicon, as mentioned, the cost is becoming a big issue in these
00:13:43
foundation models because they are very expensive to train
00:13:46
and very expensive also to run at scale.
00:13:49
You can actually run like build a playground and test your chatbot and at low scale and
00:13:54
it may not be that big a deal, but once you start deploying at scale
00:13:59
as part of your core business operation, then these things add up.
00:14:03
So since in AWS we did invest in our custom silicones for training with Trainium and with
00:14:11
Inferentia with inference.
00:14:13
And all these things are like ways for us to actually understand the essence of which
00:14:19
operators are making are involved in making these prediction decisions and optimizing
00:14:25
them at the core silicon level and software stack level.
00:14:28
I mean, if cost is also a reflection of energy used because in essence, that's what you're
00:14:35
paying for, you can also see that they are, from a sustainability point of view, much
00:14:40
more important than running it on general purpose GPUs.
00:14:44
So there's a lot of public interest in this recently and it feels like hype.
00:14:52
Is this the same or is this something where we can see that this is a real foundation
00:14:58
for future application development?
00:15:00
First of all, we are living in very exciting times with machine learning.
00:15:05
I have probably said this now every year.
00:15:07
But this year is even more special because
00:15:11
these large language models and foundation models truly can actually enable so many use
00:15:18
cases where people don't have to staff as separable teams to go build task specific
00:15:24
models.
00:15:25
The speed of ML model development will really actually increase.
00:15:31
But you won't get to that end state that we want in the next coming years unless we actually
00:15:38
make these models more accessible to everybody.
00:15:43
And this is what we did with SageMaker early on with machine learning and that's what we
00:15:48
need to do with Bedrock and all its applications as well.
00:15:52
But we do think while the Hype cycle will subside like with any technology, but these
00:15:58
are going to become a core part of every application in the coming years.
00:16:04
And they will be done in a grounded way, but in a responsible fashion too, because there
00:16:10
is a lot more stuff that people need to think through in a generative AI context.
00:16:16
Because what kind of data did it learn from to actually what response does it generate?
00:16:21
How truthful it is as well?
00:16:23
These are stuff we are excited to actually help our customers.
00:16:27
So when you say that this is the most exciting time in machine learning,
00:16:33
what are you going to say next year?
00:16:36
Well, Swami, thank you for talking to me.
00:16:40
I mean, you educated me quite a bit on what the current state of the field is.
00:16:45
So I'm very grateful for that.
00:16:46
My pleasure. Thanks again for having me, sir.
00:16:51
I'm excited to see how builders use this technology
00:16:54
and continue to push the possibilities forward.
00:16:57
I want to say thanks to Swami. His insights and understanding of the space
00:17:02
are a great way to begin this conversation.
00:17:05
I'm looking forward to diving even deeper and exploring the architectures
00:17:09
behind some of this.
00:17:11
And how large models can be used by engineers and developers
00:17:14
To create meaningful experiences.

Tag

AI
Machine Learning
Generative AI
Foundation Models
Transformer Models
Compute Scalability
AWS
Swami Sivasubramanian
Quantization
Distillation