Hello World: Meet Generative AI | Amazon Web Services

00:17:24
https://www.youtube.com/watch?v=dBzCGcwYCJo

Summary

TLDRIn this conversation, the focus is on advancements in machine learning, particularly generative AI and foundation models, and their transformative potential for future applications. Swami Sivasubramanian from AWS shares insights on how cloud technology has allowed the scaling of machine learning by providing extensive compute and data resources. The discussion elaborates on how transformer-based models and foundation models have revolutionized AI. Unlike previous task-based models, these new models are capable of applying their extensive training on broad datasets to various tasks with minimal additional training on specific data. The importance of quantization and distillation techniques is highlighted in managing model sizes and reducing inference costs. Moreover, the significance of custom silicon to enhance efficiency and sustainability in AI processing is discussed. The dialogue addresses making ML models more accessible to a broader audience, which is crucial for realizing their full potential in responsible and grounded ways across different applications.

Takeaways

  • 🤖 Machine learning is advancing rapidly, notably through generative AI.
  • 🧠 Transformer models are a breakthrough in scalability and data processing.
  • 📊 Foundation models simplify creation of custom models for various tasks.
  • ⚙️ Quantization and distillation make inference more cost-effective.
  • 🌐 Foundation models reduce reliance on task-specific models by covering broad datasets.
  • 💻 Custom silicon, like AWS Trainium, optimizes operational cost and energy use.
  • 🔍 Data cleaning and preparation remain critical in ML model success.
  • 📈 Platforms like SageMaker are vital for democratizing access and usage of ML.
  • 🔥 Generative AI and large models are foundational despite hype cycles.
  • 🌱 Sustainability is increasingly significant in AI model deployment.

Timeline

  • 00:00:00 - 00:05:00

    Swami Sivasubramanian discusses the evolution of machine learning at AWS, emphasizing the need for accessibility to increase model development speed. Historical context is given on Amazon's involvement in AI and machine learning, highlighting the transition from simple algorithms to sophisticated deep learning applications, such as Alexa. The dialogue establishes the stage for understanding advancements in generative AI technologies.

  • 00:05:00 - 00:10:00

    Swami explains the significance of large language models and transformers in generative AI. Unlike task-specific models, these models, with their capacity for in-context learning, utilize enormous datasets, revolutionizing how AI understands and predicts language. He highlights the shift towards using foundation models for versatility across applications and suggests these have simplified building specific models, mitigating the need for extensive labeled data.

  • 00:10:00 - 00:17:24

    The discussion covers the cost implications and techniques for optimizing the deployment of large AI models, including quantization and distillation. Swami mentions AWS's investment in custom silicon, like Trainium and Inferentia, to enhance efficiency and sustainability. Despite public hype, Swami stresses the foundational significance of these technologies for future application development, acknowledging potential challenges in data integrity and responsible AI usage.

Mind Map

Video Q&A

  • Who is Swami Sivasubramanian?

    Swami Sivasubramanian oversees database, analytics, and machine learning at AWS and has been a key figure in AI and ML development for 15 years.

  • What is generative AI?

    Generative AI refers to advanced models like large language and foundation models capable of creating content, making predictions, and learning tasks from vast datasets.

  • What makes transformer models significant?

    Transformer models are significant due to their scalability, faster training, and capacity to process extensive data like the internet to understand human knowledge.

  • How do foundation models differ from task-specific models?

    Foundation models are trained on extensive datasets and can adapt to multiple tasks, whereas task-specific models are created for particular tasks.

  • What is 'in-context learning'?

    In-context learning involves providing examples in a prompt to improve a model’s output, allowing it to adjust its responses based on those examples.

  • What are the challenges with large language models?

    Challenges include the high cost of training and inference, data quality issues, and ensuring the models are free of bias and sensitive information.

  • What techniques help reduce inference cost?

    Techniques like quantization and distillation help reduce the model size and inference cost while maintaining performance.

  • What is needed to make machine learning models more accessible?

    Making models accessible involves developing platforms and tools like SageMaker and Bedrock to simplify using and deploying these models.

  • What role does custom silicon play in ML?

    Custom silicon like AWS's Trainium and Inferentia helps reduce the cost and energy use of running large models, making them more sustainable.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:00
    So there's a lot of public interest in this recently and it feels like hype.
  • 00:00:08
    Is this the same, or is this something where we can see that this is a real foundation
  • 00:00:14
    for future application development?
  • 00:00:16
    We are living in very exciting times with machine learning.
  • 00:00:20
    The speed of ML model development will really actually increase.
  • 00:00:26
    But you won't get to that end state that we want in the next coming years
  • 00:00:31
    unless we actually make these models more accessible to everybody.
  • 00:00:51
    Swami Sivasubramanian oversees database, analytics and machine learning at AWS.
  • 00:00:58
    For the past 15 years, he has helped lead the way on AI and ML in the industry.
  • 00:01:03
    Swami’s teams have a strong track record of taking new technologies and turning these
  • 00:01:08
    into viable tools.
  • 00:01:11
    Today, Generative AI is dominating news feeds and conversations.
  • 00:01:15
    Consumers are interacting with it and brands are trying to understand how to best harness
  • 00:01:20
    its potential for their customers.
  • 00:01:23
    So, I sat down with Swami to better understand the broad landscape of this technology.
  • 00:01:29
    Swami, we go back a long time.
  • 00:01:36
    Tell me a bit. Do you remember your first day at Amazon?
  • 00:01:39
    I still remember because it's not very common for PhD students to join Amazon at that time
  • 00:01:48
    because you were known as retailer or ecommerce.
  • 00:01:53
    We were building things.
  • 00:01:55
    And so that's also quite a departure from a foreign academic.
  • 00:01:59
    Definitely, for a PhD student to go from thinking to, actually, how do I build this?
  • 00:02:04
    So you brought actually DynamoDB to the world and quite a few other databases since then,
  • 00:02:11
    but under your purview now is also AI and machine learning.
  • 00:02:18
    So tell me a bit about how does your world of AI look like?
  • 00:02:21
    After building a bunch of these databases and analytics services,
  • 00:02:27
    I got fascinated by AI because that literally AI and machine learning because that puts
  • 00:02:32
    data to work.
  • 00:02:33
    And if you look at machine learning technology itself broadly, it's not necessarily new.
  • 00:02:38
    In fact, some of the first papers of deep learning was written even like 30 years ago.
  • 00:02:43
    But even in those papers, they explicitly called out for it to get large scale adoption.
  • 00:02:49
    It required massive amount of compute and massive amount of data to actually succeed.
  • 00:02:54
    And that's what cloud got us to actually unlock the power of deep learning technologies.
  • 00:03:01
    So which led me to early on this is like six, seven years ago to start the machine Learning
  • 00:03:07
    organization because we wanted to take machine learning, especially deep learning style technologies,
  • 00:03:13
    from not just in the hands of scientists to everyday developers.
  • 00:03:17
    If you think about the early days of Amazon, the retailer with similarities and recommendations
  • 00:03:23
    and things like that, were they the same algorithms that we're seeing being used today or is that,
  • 00:03:30
    I mean that's a long time ago, 30 years.
  • 00:03:35
    Machine learning has really gone through huge growth in actually the complexity of the algorithms
  • 00:03:41
    and applicability of the use cases.
  • 00:03:44
    Early on the algorithms were a lot more simple, a lot more like linear algorithms based or
  • 00:03:49
    gradient boosting.
  • 00:03:51
    If you see last decade, it was all around like deep learning early part of last decade,
  • 00:03:57
    which was essentially a step up in the ability for neural nets to actually understand and
  • 00:04:03
    learn from the patterns, which is effectively what all the image based image processing
  • 00:04:08
    algorithms come from.
  • 00:04:10
    And then also personalization with different types of neural nets and so forth.
  • 00:04:15
    And that's what led to the invention like Alexa,
  • 00:04:18
    which has a remarkable accuracy compared to others.
  • 00:04:21
    So the neural nets and deep learning has really been a step up.
  • 00:04:25
    And the next big step up is what is happening today in machine learning.
  • 00:04:30
    So a lot of the talk these days is around generative AI,
  • 00:04:34
    large language models, foundation models.
  • 00:04:37
    Tell me a bit why is that different from, let's say the more task based like vision
  • 00:04:43
    algorithms and things like that?
  • 00:04:45
    I mean, if you take a step back and look at what's -
  • 00:04:49
    How this foundation models - large language models - is all about
  • 00:04:54
    These are big models which are trained with
  • 00:04:57
    hundreds of millions of parameters if not billion
  • 00:05:00
    A parameter, just to give context, is like an internal variable
  • 00:05:05
    where the ML algorithm has learned from its data set.
  • 00:05:08
    Now, to give a sense, what is this big thing suddenly that has happened?
  • 00:05:14
    Few things -
  • 00:05:15
    One, if you take a look at Transformers, has been a big change.
  • 00:05:22
    Transformer is a kind of neural net technology that is remarkably scalable than the previous
  • 00:05:30
    versions like RNNS or various others.
  • 00:05:33
    So what does this mean?
  • 00:05:34
    Why did this suddenly lead to this transformation?
  • 00:05:38
    Because it is actually scalable and you can train them a lot faster now you can throw
  • 00:05:42
    a lot of hardware and lot of data.
  • 00:05:45
    Now that means now I can actually crawl the entire World Wide Web and actually feed it
  • 00:05:53
    into these kind of algorithms and start actually building models that can actually understand
  • 00:06:00
    human knowledge.
  • 00:06:02
    At a high level, a generative AI text model is good at using natural language processing
  • 00:06:09
    to analyze text and predict the next word that comes in a sequence of words.
  • 00:06:14
    By paying attention to certain words or phrases in the input, these models can infer context.
  • 00:06:21
    And they can use that context to find the words that have the highest probability of
  • 00:06:26
    following the words that came before it.
  • 00:06:29
    Structuring inputs as instructions with relevant context can prompt a model to generate answers
  • 00:06:35
    for language understanding, knowledge, and composition
  • 00:06:39
    Foundation Models are also capable of what is called “in-context learning,” which
  • 00:06:44
    is what happens when you include a handful of demonstration examples
  • 00:06:48
    as part of a prompt to improve the model’s output on the fly.
  • 00:06:53
    We supply examples to further explain the instruction
  • 00:06:56
    And this helps the model adjust the output based on the pattern and style in the examples.
  • 00:07:03
    When the models use billions of parameters and their training corpus is the entire internet,
  • 00:07:10
    the results can be remarkable.
  • 00:07:12
    The training is unsupervised and task agnostic.
  • 00:07:16
    And the mountains of web data used for training let it respond to natural language instructions
  • 00:07:21
    for many different tasks.
  • 00:07:24
    So the task based models that we had before and that we were already really good at, could
  • 00:07:29
    you build them based on these foundation models?
  • 00:07:33
    You no longer need these task specific models or do we still need them?
  • 00:07:38
    The way to think about it is the need for task based specific models are not going away.
  • 00:07:44
    But what essentially is how we go about building them.
  • 00:07:47
    You still need a model to translate from one language to another
  • 00:07:52
    or to generate code and so forth.
  • 00:07:55
    But how easy now you can build them is essentially a big change because with foundation models,
  • 00:08:01
    which are the entire corpus of knowledge of, let's say huge amount of data, now it is simply
  • 00:08:08
    a matter of actually building on top of this with fine tuning, with specific examples.
  • 00:08:13
    Think about if you're running like a recruiting firm as an example and you want to ingest
  • 00:08:21
    all your resumes and store it in a format that is standard for you to search and index on,
  • 00:08:26
    instead of building a custom NLP model to do all that.
  • 00:08:30
    Now using foundation models and give a few examples of here is an input resume in this
  • 00:08:35
    format and here is the output resume.
  • 00:08:37
    Now you can even fine tune these models by just giving few specific examples and then
  • 00:08:45
    you essentially are good to go.
  • 00:08:47
    So in the past, most of the work went into probably labeling the data and that was also
  • 00:08:54
    the hardest part because that drives the accuracy.
  • 00:08:57
    Exactly.
  • 00:08:58
    So in this particular case, with these foundation models, no longer labeling is needed?
  • 00:09:05
    Essentially, I mean, yes and no.
  • 00:09:07
    As always with these things, there is a nuance.
  • 00:09:10
    But majority of what makes these large scale models remarkable is they actually can be
  • 00:09:16
    trained on a lot of unlabeled data.
  • 00:09:20
    You actually go through what I call as a pretraining phase, which is essentially you collect data
  • 00:09:25
    sets from, let's say, the World Wide Web, like common crawl data, or code data and various
  • 00:09:30
    other data sites, Wikipedia, whatnot.
  • 00:09:32
    And then you don't even label them, you kind of feed them as it is.
  • 00:09:37
    But you have to of course go through Sanitization step in terms of making sure you cleanse data
  • 00:09:42
    from PII or actually all other stuff like negative things or HP and whatnot.
  • 00:09:50
    But then you actually start training on large number of hardware clusters because these
  • 00:09:57
    models to train them can take tens of millions of dollars to actually go through that training.
  • 00:10:03
    And then you actually finally you get a notion of a model and then you go through the next
  • 00:10:09
    step of what is called inference.
  • 00:10:12
    When it comes to building these LLMs, the easy part is the training.
  • 00:10:19
    The hardest part is the data.
  • 00:10:22
    Training models with poor data quality will lead to poor results.
  • 00:10:26
    You’ll need to filter out bias, hate speech, and toxicity.
  • 00:10:30
    You’ll need to make sure that the data is free of PII or sensitive data.
  • 00:10:36
    You’ll need to make sure your data is deduplicated, balanced, and doesn’t lead to oversampling.
  • 00:10:43
    Because the whole process can be so expensive and requires access large amounts of compute
  • 00:10:48
    and storage, many companies feel lost on where to even start.
  • 00:10:54
    Let's speak object detection in video that would be as a smaller model than what we see
  • 00:11:03
    now with the foundation models.
  • 00:11:06
    What's the cost of running a model like that?
  • 00:11:09
    Because now these models with these hundreds of billions of parameters are probably very
  • 00:11:15
    large pieces of data.
  • 00:11:18
    That's a great question because there is so much talk only happening around training these
  • 00:11:23
    models, but very little talk on the cost of running these models to make predictions,
  • 00:11:29
    which is inference, which is a signal that very few people are actually deploying it
  • 00:11:34
    and runtime for actual production.
  • 00:11:36
    Or once they actually deploy in production, they will realize oh no, these models are
  • 00:11:41
    very expensive to run and that is where few important techniques actually really come
  • 00:11:47
    into play.
  • 00:11:48
    So one, once you build these large models to run them in production, you need to do
  • 00:11:54
    a few things to make them affordable to run at cost, run at scale and run actually very
  • 00:12:01
    in an economical fashion.
  • 00:12:03
    One is what we call as quantization.
  • 00:12:05
    The other one is what I call as distillation, which is that you have these large teacher
  • 00:12:11
    models and even though they are trained hundreds of billions of models, they kind of are distilled
  • 00:12:17
    to a smaller fine grained model and speaking in a super abstract term, but that is the
  • 00:12:22
    essence of these models.
  • 00:12:26
    Of course, there’s a lot that goes into training the model, but what about inference?
  • 00:12:32
    It turns out that the sheer size of these models can make inference expensive to run.
  • 00:12:38
    To reduce model size, we can do “quantization,” which is approximating a neural network by
  • 00:12:44
    using smaller, 8-bit integers instead of 32- or 16-bit floating point numbers.
  • 00:12:50
    We can also use “distillation”, which is effectively a transferring of knowledge
  • 00:12:55
    from a larger “teacher” model to a smaller and faster “student” model.
  • 00:13:01
    These techniques have reduced the model size significantly for us, while providing similar
  • 00:13:06
    accuracy and improved latency.
  • 00:13:09
    So we do have this custom hardware to help out with this that happens at I mean, normally
  • 00:13:17
    this is all GPU based, which are expensive energy hungry beasts.
  • 00:13:23
    Tell us what we can do with custom silicon that makes it so much cheaper
  • 00:13:29
    both in terms of cost as well as in, let's say, your carbon footprint of the energy.
  • 00:13:37
    When it comes to custom silicon, as mentioned, the cost is becoming a big issue in these
  • 00:13:43
    foundation models because they are very expensive to train
  • 00:13:46
    and very expensive also to run at scale.
  • 00:13:49
    You can actually run like build a playground and test your chatbot and at low scale and
  • 00:13:54
    it may not be that big a deal, but once you start deploying at scale
  • 00:13:59
    as part of your core business operation, then these things add up.
  • 00:14:03
    So since in AWS we did invest in our custom silicones for training with Trainium and with
  • 00:14:11
    Inferentia with inference.
  • 00:14:13
    And all these things are like ways for us to actually understand the essence of which
  • 00:14:19
    operators are making are involved in making these prediction decisions and optimizing
  • 00:14:25
    them at the core silicon level and software stack level.
  • 00:14:28
    I mean, if cost is also a reflection of energy used because in essence, that's what you're
  • 00:14:35
    paying for, you can also see that they are, from a sustainability point of view, much
  • 00:14:40
    more important than running it on general purpose GPUs.
  • 00:14:44
    So there's a lot of public interest in this recently and it feels like hype.
  • 00:14:52
    Is this the same or is this something where we can see that this is a real foundation
  • 00:14:58
    for future application development?
  • 00:15:00
    First of all, we are living in very exciting times with machine learning.
  • 00:15:05
    I have probably said this now every year.
  • 00:15:07
    But this year is even more special because
  • 00:15:11
    these large language models and foundation models truly can actually enable so many use
  • 00:15:18
    cases where people don't have to staff as separable teams to go build task specific
  • 00:15:24
    models.
  • 00:15:25
    The speed of ML model development will really actually increase.
  • 00:15:31
    But you won't get to that end state that we want in the next coming years unless we actually
  • 00:15:38
    make these models more accessible to everybody.
  • 00:15:43
    And this is what we did with SageMaker early on with machine learning and that's what we
  • 00:15:48
    need to do with Bedrock and all its applications as well.
  • 00:15:52
    But we do think while the Hype cycle will subside like with any technology, but these
  • 00:15:58
    are going to become a core part of every application in the coming years.
  • 00:16:04
    And they will be done in a grounded way, but in a responsible fashion too, because there
  • 00:16:10
    is a lot more stuff that people need to think through in a generative AI context.
  • 00:16:16
    Because what kind of data did it learn from to actually what response does it generate?
  • 00:16:21
    How truthful it is as well?
  • 00:16:23
    These are stuff we are excited to actually help our customers.
  • 00:16:27
    So when you say that this is the most exciting time in machine learning,
  • 00:16:33
    what are you going to say next year?
  • 00:16:36
    Well, Swami, thank you for talking to me.
  • 00:16:40
    I mean, you educated me quite a bit on what the current state of the field is.
  • 00:16:45
    So I'm very grateful for that.
  • 00:16:46
    My pleasure. Thanks again for having me, sir.
  • 00:16:51
    I'm excited to see how builders use this technology
  • 00:16:54
    and continue to push the possibilities forward.
  • 00:16:57
    I want to say thanks to Swami. His insights and understanding of the space
  • 00:17:02
    are a great way to begin this conversation.
  • 00:17:05
    I'm looking forward to diving even deeper and exploring the architectures
  • 00:17:09
    behind some of this.
  • 00:17:11
    And how large models can be used by engineers and developers
  • 00:17:14
    To create meaningful experiences.
Tags
  • AI
  • Machine Learning
  • Generative AI
  • Foundation Models
  • Transformer Models
  • Compute Scalability
  • AWS
  • Swami Sivasubramanian
  • Quantization
  • Distillation