00:00:00
So there's a lot of public interest in this
recently and it feels like hype.
00:00:08
Is this the same, or is this something where
we can see that this is a real foundation
00:00:14
for future application development?
00:00:16
We are living in very exciting times with
machine learning.
00:00:20
The speed of ML model development will really
actually increase.
00:00:26
But you won't get to that end state that we
want in the next coming years
00:00:31
unless we actually make these models more
accessible to everybody.
00:00:51
Swami Sivasubramanian oversees database, analytics
and machine learning at AWS.
00:00:58
For the past 15 years, he has helped lead
the way on AI and ML in the industry.
00:01:03
Swami’s teams have a strong track record
of taking new technologies and turning these
00:01:08
into viable tools.
00:01:11
Today, Generative AI is dominating news feeds
and conversations.
00:01:15
Consumers are interacting with it and brands
are trying to understand how to best harness
00:01:20
its potential for their customers.
00:01:23
So, I sat down with Swami to better understand
the broad landscape of this technology.
00:01:29
Swami, we go back a long time.
00:01:36
Tell me a bit.
Do you remember your first day at Amazon?
00:01:39
I still remember because it's not very common
for PhD students to join Amazon at that time
00:01:48
because you were known as retailer or ecommerce.
00:01:53
We were building things.
00:01:55
And so that's also quite a departure from
a foreign academic.
00:01:59
Definitely, for a PhD student to go from
thinking to, actually, how do I build this?
00:02:04
So you brought actually DynamoDB to the world
and quite a few other databases since then,
00:02:11
but under your purview now is also AI and
machine learning.
00:02:18
So tell me a bit about how does your world
of AI look like?
00:02:21
After building a bunch of these databases
and analytics services,
00:02:27
I got fascinated by AI because that literally
AI and machine learning because that puts
00:02:32
data to work.
00:02:33
And if you look at machine learning technology
itself broadly, it's not necessarily new.
00:02:38
In fact, some of the first papers of deep
learning was written even like 30 years ago.
00:02:43
But even in those papers, they explicitly
called out for it to get large scale adoption.
00:02:49
It required massive amount of compute and
massive amount of data to actually succeed.
00:02:54
And that's what cloud got us to actually unlock
the power of deep learning technologies.
00:03:01
So which led me to early on this is like six,
seven years ago to start the machine Learning
00:03:07
organization because we wanted to take machine
learning, especially deep learning style technologies,
00:03:13
from not just in the hands of scientists to
everyday developers.
00:03:17
If you think about the early days of Amazon,
the retailer with similarities and recommendations
00:03:23
and things like that, were they the same algorithms
that we're seeing being used today or is that,
00:03:30
I mean that's a long time ago, 30 years.
00:03:35
Machine learning has really gone through huge
growth in actually the complexity of the algorithms
00:03:41
and applicability of the use cases.
00:03:44
Early on the algorithms were a lot more simple,
a lot more like linear algorithms based or
00:03:49
gradient boosting.
00:03:51
If you see last decade, it was all around
like deep learning early part of last decade,
00:03:57
which was essentially a step up in the ability
for neural nets to actually understand and
00:04:03
learn from the patterns, which is effectively
what all the image based image processing
00:04:08
algorithms come from.
00:04:10
And then also personalization with different
types of neural nets and so forth.
00:04:15
And that's what led to the invention like
Alexa,
00:04:18
which has a remarkable accuracy compared to others.
00:04:21
So the neural nets and deep learning has really
been a step up.
00:04:25
And the next big step up is what is happening
today in machine learning.
00:04:30
So a lot of the talk these days is around
generative AI,
00:04:34
large language models, foundation models.
00:04:37
Tell me a bit why is that different from,
let's say the more task based like vision
00:04:43
algorithms and things like that?
00:04:45
I mean, if you take a step back and look at what's -
00:04:49
How this foundation models -
large language models - is all about
00:04:54
These are big models which are trained with
00:04:57
hundreds of millions of parameters if not billion
00:05:00
A parameter, just to give context, is like an
internal variable
00:05:05
where the ML algorithm has learned from its data set.
00:05:08
Now, to give a sense, what is this
big thing suddenly that has happened?
00:05:14
Few things -
00:05:15
One, if you take a look at Transformers, has
been a big change.
00:05:22
Transformer is a kind of neural net technology
that is remarkably scalable than the previous
00:05:30
versions like RNNS or various others.
00:05:33
So what does this mean?
00:05:34
Why did this suddenly lead to this transformation?
00:05:38
Because it is actually scalable and you can
train them a lot faster now you can throw
00:05:42
a lot of hardware and lot of data.
00:05:45
Now that means now I can actually crawl the
entire World Wide Web and actually feed it
00:05:53
into these kind of algorithms and start actually
building models that can actually understand
00:06:00
human knowledge.
00:06:02
At a high level, a generative AI text model
is good at using natural language processing
00:06:09
to analyze text and predict the next word
that comes in a sequence of words.
00:06:14
By paying attention to certain words or phrases
in the input, these models can infer context.
00:06:21
And they can use that context to find the
words that have the highest probability of
00:06:26
following the words that came before it.
00:06:29
Structuring inputs as instructions with relevant
context can prompt a model to generate answers
00:06:35
for language understanding, knowledge,
and composition
00:06:39
Foundation Models are also capable of what
is called “in-context learning,” which
00:06:44
is what happens when you include a handful
of demonstration examples
00:06:48
as part of a prompt to improve
the model’s output on the fly.
00:06:53
We supply examples to further explain
the instruction
00:06:56
And this helps the model adjust the output based
on the pattern and style in the examples.
00:07:03
When the models use billions of parameters
and their training corpus is the entire internet,
00:07:10
the results can be remarkable.
00:07:12
The training is unsupervised and task agnostic.
00:07:16
And the mountains of web data used for training
let it respond to natural language instructions
00:07:21
for many different tasks.
00:07:24
So the task based models that we had before
and that we were already really good at, could
00:07:29
you build them based on these foundation
models?
00:07:33
You no longer need these task specific models
or do we still need them?
00:07:38
The way to think about it is the need for
task based specific models are not going away.
00:07:44
But what essentially is how we go about building
them.
00:07:47
You still need a model to translate from one
language to another
00:07:52
or to generate code and so forth.
00:07:55
But how easy now you can build them is essentially
a big change because with foundation models,
00:08:01
which are the entire corpus of knowledge of,
let's say huge amount of data, now it is simply
00:08:08
a matter of actually building on top of this
with fine tuning, with specific examples.
00:08:13
Think about if you're running like a recruiting
firm as an example and you want to ingest
00:08:21
all your resumes and store it in a format that
is standard for you to search and index on,
00:08:26
instead of building a custom NLP model
to do all that.
00:08:30
Now using foundation models and give a few
examples of here is an input resume in this
00:08:35
format and here is the output resume.
00:08:37
Now you can even fine tune these models
by just giving few specific examples and then
00:08:45
you essentially are good to go.
00:08:47
So in the past, most of the work went into
probably labeling the data and that was also
00:08:54
the hardest part because that drives the accuracy.
00:08:57
Exactly.
00:08:58
So in this particular case, with these foundation
models, no longer labeling is needed?
00:09:05
Essentially, I mean, yes and no.
00:09:07
As always with these things, there is a nuance.
00:09:10
But majority of what makes these large scale
models remarkable is they actually can be
00:09:16
trained on a lot of unlabeled data.
00:09:20
You actually go through what I call as a pretraining
phase, which is essentially you collect data
00:09:25
sets from, let's say, the World Wide Web,
like common crawl data, or code data and various
00:09:30
other data sites, Wikipedia, whatnot.
00:09:32
And then you don't even label them, you kind
of feed them as it is.
00:09:37
But you have to of course go through Sanitization
step in terms of making sure you cleanse data
00:09:42
from PII or actually all other stuff like
negative things or HP and whatnot.
00:09:50
But then you actually start training on large
number of hardware clusters because these
00:09:57
models to train them can take tens of millions
of dollars to actually go through that training.
00:10:03
And then you actually finally you get a notion
of a model and then you go through the next
00:10:09
step of what is called inference.
00:10:12
When it comes to building these LLMs, the
easy part is the training.
00:10:19
The hardest part is the data.
00:10:22
Training models with poor data quality will
lead to poor results.
00:10:26
You’ll need to filter out bias, hate speech,
and toxicity.
00:10:30
You’ll need to make sure that the data is
free of PII or sensitive data.
00:10:36
You’ll need to make sure your data is deduplicated,
balanced, and doesn’t lead to oversampling.
00:10:43
Because the whole process can be so expensive
and requires access large amounts of compute
00:10:48
and storage, many companies feel lost on where
to even start.
00:10:54
Let's speak object detection in video that
would be as a smaller model than what we see
00:11:03
now with the foundation models.
00:11:06
What's the cost of running a model like that?
00:11:09
Because now these models with these hundreds
of billions of parameters are probably very
00:11:15
large pieces of data.
00:11:18
That's a great question because there is so
much talk only happening around training these
00:11:23
models, but very little talk on the cost of
running these models to make predictions,
00:11:29
which is inference, which is a signal that
very few people are actually deploying it
00:11:34
and runtime for actual production.
00:11:36
Or once they actually deploy in production,
they will realize oh no, these models are
00:11:41
very expensive to run and that is where few
important techniques actually really come
00:11:47
into play.
00:11:48
So one, once you build these large models
to run them in production, you need to do
00:11:54
a few things to make them affordable to run
at cost, run at scale and run actually very
00:12:01
in an economical fashion.
00:12:03
One is what we call as quantization.
00:12:05
The other one is what I call as distillation,
which is that you have these large teacher
00:12:11
models and even though they are trained hundreds
of billions of models, they kind of are distilled
00:12:17
to a smaller fine grained model and speaking
in a super abstract term, but that is the
00:12:22
essence of these models.
00:12:26
Of course, there’s a lot that goes into
training the model, but what about inference?
00:12:32
It turns out that the sheer size of these
models can make inference expensive to run.
00:12:38
To reduce model size, we can do “quantization,”
which is approximating a neural network by
00:12:44
using smaller, 8-bit integers instead of 32-
or 16-bit floating point numbers.
00:12:50
We can also use “distillation”, which
is effectively a transferring of knowledge
00:12:55
from a larger “teacher” model to a smaller
and faster “student” model.
00:13:01
These techniques have reduced the model size
significantly for us, while providing similar
00:13:06
accuracy and improved latency.
00:13:09
So we do have this custom hardware to help
out with this that happens at I mean, normally
00:13:17
this is all GPU based, which are expensive
energy hungry beasts.
00:13:23
Tell us what we can do with custom silicon
that makes it so much cheaper
00:13:29
both in terms of cost as well as in,
let's say, your carbon footprint of the energy.
00:13:37
When it comes to custom silicon, as mentioned,
the cost is becoming a big issue in these
00:13:43
foundation models because they are very
expensive to train
00:13:46
and very expensive also to run at scale.
00:13:49
You can actually run like build a playground
and test your chatbot and at low scale and
00:13:54
it may not be that big a deal, but once you
start deploying at scale
00:13:59
as part of your core business operation,
then these things add up.
00:14:03
So since in AWS we did invest in our custom
silicones for training with Trainium and with
00:14:11
Inferentia with inference.
00:14:13
And all these things are like ways for us
to actually understand the essence of which
00:14:19
operators are making are involved in making
these prediction decisions and optimizing
00:14:25
them at the core silicon level and software
stack level.
00:14:28
I mean, if cost is also a reflection of energy
used because in essence, that's what you're
00:14:35
paying for, you can also see that they are,
from a sustainability point of view, much
00:14:40
more important than running it on general
purpose GPUs.
00:14:44
So there's a lot of public interest in this
recently and it feels like hype.
00:14:52
Is this the same or is this something where
we can see that this is a real foundation
00:14:58
for future application development?
00:15:00
First of all, we are living in very exciting
times with machine learning.
00:15:05
I have probably said this now every year.
00:15:07
But this year is even more special because
00:15:11
these large language models and foundation
models truly can actually enable so many use
00:15:18
cases where people don't have to staff as
separable teams to go build task specific
00:15:24
models.
00:15:25
The speed of ML model development will really
actually increase.
00:15:31
But you won't get to that end state that we
want in the next coming years unless we actually
00:15:38
make these models more accessible to everybody.
00:15:43
And this is what we did with SageMaker early
on with machine learning and that's what we
00:15:48
need to do with Bedrock and all its applications
as well.
00:15:52
But we do think while the Hype cycle will
subside like with any technology, but these
00:15:58
are going to become a core part of every application
in the coming years.
00:16:04
And they will be done in a grounded way, but
in a responsible fashion too, because there
00:16:10
is a lot more stuff that people need to think
through in a generative AI context.
00:16:16
Because what kind of data did it learn from
to actually what response does it generate?
00:16:21
How truthful it is as well?
00:16:23
These are stuff we are excited to actually
help our customers.
00:16:27
So when you say that this is the most exciting
time in machine learning,
00:16:33
what are you going to say next year?
00:16:36
Well, Swami, thank you for talking to me.
00:16:40
I mean, you educated me quite a bit on what
the current state of the field is.
00:16:45
So I'm very grateful for that.
00:16:46
My pleasure. Thanks again for having me, sir.
00:16:51
I'm excited to see how builders use this technology
00:16:54
and continue to push the possibilities forward.
00:16:57
I want to say thanks to Swami.
His insights and understanding of the space
00:17:02
are a great way to begin this conversation.
00:17:05
I'm looking forward to diving even deeper
and exploring the architectures
00:17:09
behind some of this.
00:17:11
And how large models can be used by engineers
and developers
00:17:14
To create meaningful experiences.