Hvornår blev Anthropic grundlagt, og hvad er deres mission?

Anthropic blev grundlagt i 2021 med missionen om at sikre en sikker overgang til transformative AI ved hjælp af forskningsmetoder, der fokuserer på alignment og interpretability.

Hvornår blev Claude 3.5 Sonet lanceret, og hvilke funktioner har den?

Claude 3.5 Sonet blev lanceret i oktober 2024 og byder på forbedringer inden for bl.a. computerbrug og agentier.

Hvordan bruger virksomheder som Jane Street og DoorDash Claude?

Claude bruges til at forbedre kodnings effektivitet og udviklerproduktivitet ved virksomheder som Jane Street og DoorDash.

Hvilken slags opgaver blev demonstreret ved hjælp af Claude i præsentationen?

Præsenterede demovideoer viste brugen af Claude til opgaver såsom webudvikling ved hjælp af computerbrugsfunktioner.

Hvem er Maggie Vo og Ellie Shopic?

Maggie Vo leder det tekniske uddannelsesteam hos Anthropic, og Ellie Shopic er chef for teknisk træning.

Hvilke evner inden for computerbrug blev præsenteret for Claude?

Claude kan analysere skærmbilleder og bestemme nødvendige kommandoer, hvilket kan bruges til opgaver som at teste apps eller manipulere Excel-ark.

Hvad er formålet med Anthropic's prompt-generator?

Prompt-generatoren hos Anthropic er designet til at hjælpe med hurtig oprettelse af prompts ved at følge bedste praksis.

Hvad er RAG, og hvordan bruges det?

Rag står for "retrieval augmented generation," som integrerer ekstern viden med sprogmodeller ved at hente relevante data og bruge dem i en prompt.

Hvornår er det hensigtsmæssigt at bruge fine-tuning?

Fine-tuning kan bruges til at ændre model-adfærd til specifikke formater eller skemaer, men det kan gøre modellen dårligere, hvis data ikke vælges omhyggeligt.

Hvad er tool use, og hvordan bruges det med Claude?

Tool use er en metode til at udvide Claudes kapaciteter ved at give den værktøjer til at udføre specifikke opgaver baseret på forespørgsler.

AWS re:Invent 2024 - Leverage Anthropic's Claude models for AI's evolving landscape (AIM123)

00:59:21

https://www.youtube.com/watch?v=21PjWyB_beU

Résumé

TLDRPræsentationen dækkede en række emner fra introduktion af virksomheden Anthropic og deres mission, til en detaljeret gennemgang af deres AI-model, Claude. Grundlagt i 2021, søger Anthropic at fremme sikker AI-udvikling med forskningsfokus på alignment og interpretability. Claude 3.5 Sonet blev lanceret i oktober 2024 med funktioner som computerbrug og forbedret agentisk kapabilitet, der gør det lettere at navigere komplekse arbejdsprocesser. Virksomheder som Jane Street og DoorDash bruger Claude til at forbedre effektiviteten i deres arbejdsprocesser. Computerbrugsfunktionerne i Claude blev demonstreret gennem video, der viste, hvordan det kan bruges til at udvikle hjemmesider og mere komplekse kodningsopgaver. Funktionerne understøtter manuel QA og kan analysere skærmbilleder for nødvendige handlinger. Desuden blev emner som prompt engineering, herunder brugen af prompt-generatoren, RAG (Retrieval Augmented Generation) og fine-tuning behandlet. Prompt-generatoren hjælper med hurtigt at konstruere prompts ved hjælp af bedste praksiser. RAG bruger eksterne data til at forbedre sprogmodellernes svar, mens tool use udvider Claudes funktionalitet ved at bruge eksterne værktøjer og funktioner. Fine-tuning kan ændre en models adfærd baseret på specifikke behov, men kræver omhyggeligt valgte data for at undgå forringelse af modellen.

A retenir

🤖 Anthropic blev grundlagt i 2021 for at fremme sikker AI-udvikling.
🆕 Claude 3.5 Sonet inkluderer ny computerbrugsfunktionalitet og agentiske evner.
🏢 Virksomheder som Jane Street og DoorDash bruger Claude for øget effektivitet.
👨‍💻 Computerbrugsfærdigheder hos Claude blev fremvist gennem kodningsdemonstrationer.
🔧 Prompt-generatoren hjælper med at oprette korrekte og effektive prompts.
🔍 RAG integrerer ekstern viden i AI-modellers funktionalitet.
🧠 Fine-tuning justerer en models adfærd for specifikke anvendelser.
🛠 Tool use udvider Claude's kapabiliteter ved hjælp af eksterne værktøjer.
📈 Trænede modeller kan opnå forbedringer i instruktionsfølge og handling.
🖥️ Claude kan analyse skærmbilleder og forstå det nødvendige næste skridt.

Chronologie

00:00:00 - 00:05:00
Talen forekommer taknemmelig for deltagernes tilstedeværelse og præsenterer sessionens agenda, der inkluderer emner som prompt engineering, fin-tuning og retrieverarkitekturer. Maggie Vo introducerer sig selv og sin kollega Ellie Shipet som de primære oplægsholdere og kundgør Anthropics mission om at sikre en sikker overgang til transformative AI.
00:05:00 - 00:10:00
Maggie præsenterer Claude 3.5 Sonet-modellen, lanceret i oktober 2024, og fremhæver dens forbedringer, især inden for kodegenerering og computerbrug. Claude kan navigere computergrænseflader og bruge agenter til mange-trins ræsonnering, hvilket støttes af eksempler som Jane Street.
00:10:00 - 00:15:00
Claude 3.5 Haikoo-modellen fremhæves sammen med succesen hos virksomheder som DoorDash, der bruger modellen til at rute billetforespørgsler. Maggie overdrager præsentationen til Ellie, der vil diskutere computerværdien mere detaljeret og vise de praktiske anvendelser.
00:15:00 - 00:20:00
Ellie, med fokus på prompt engineering og computersystemer, introducerer computerskærmbillederstjenesten, demonstreret gennem en demo af Alex. Denne funktion gør Claude i stand til at styre computermiljøer ved at interagere med skærmbilleder for at udføre komplekse opgaver.
00:20:00 - 00:25:00
En demo af Alex viser Claude, der interagerer med computerskærme for at udføre kodedeving og fejlretning, hvilket illustrerer agenttjenestens potentialer. Ellie opfordrer til deltagelse i demoer og informerer om muligheden for at observere Claude's evner live ved boderne.
00:25:00 - 00:30:00
Ellie diskuterer AI's integrationsmodeller, leveret via Amazon Bedrock. Målet er at levere sikkerhedsforanstaltninger og tjenester som embeddings og finjusteringer. Dybdegående information om værktøjsbrug, RAG-arkitektur og prompt engineering gives for bedre at kunne forstå og anvende teknikker.
00:30:00 - 00:35:00
Der indføres værktøjer til promptgenerering og forbedring af promptkvaliteten, med eksempler og bedste praksisser fremhævet. En ny undervises i punkt for punkt ræsonnering og brugen af XML-tags til bedre strukturering af prompts.
00:35:00 - 00:40:00
Importance of prompt engineering highlighted by encouraging revision and version control for efficacy. Examples and XML tags are emphasized as powerful tools to maintain clarity.
00:40:00 - 00:45:00
Ellie forklarer begrebet værktøjsbrug som en udvidelse af Clodes funktionalitet og noterer dets sammenligning med daily værktøjsopgaver som vejropdateringer. Claude’s computerbrug betragtes som en kompleks forlengelse af værktøjsbrug, hvilket gør det muligt for AI at interagere med skærmbilleder og skærmbilledetekst. RAG-arkitekturen gennemgås grundigt med et fokus på eksternt placerede data.
00:45:00 - 00:50:00
Brugen af embeddings til søgningsforbedring og lagring af data i en vektordatabase samt prompt-hentning for optimeret udførelse blev forklaret. Disse teknikker hjælper med at organisere information til mere effektiv og præcis hentning af data.
00:50:00 - 00:59:21
Ellie afslutter med diskussioner om fin-tuning og dets brug for adfærdsændring snarere end indlæring af ny information. Der blev også diskuteret betydningen af velstruktureret evaluering og variagning af arbejdsprocesser for at nå frem til optimal modelydelse.

Afficher plus

Carte mentale

Vidéo Q&R

Hvornår blev Anthropic grundlagt, og hvad er deres mission?
Anthropic blev grundlagt i 2021 med missionen om at sikre en sikker overgang til transformative AI ved hjælp af forskningsmetoder, der fokuserer på alignment og interpretability.
Hvornår blev Claude 3.5 Sonet lanceret, og hvilke funktioner har den?
Claude 3.5 Sonet blev lanceret i oktober 2024 og byder på forbedringer inden for bl.a. computerbrug og agentier.
Hvordan bruger virksomheder som Jane Street og DoorDash Claude?
Claude bruges til at forbedre kodnings effektivitet og udviklerproduktivitet ved virksomheder som Jane Street og DoorDash.
Hvilken slags opgaver blev demonstreret ved hjælp af Claude i præsentationen?
Præsenterede demovideoer viste brugen af Claude til opgaver såsom webudvikling ved hjælp af computerbrugsfunktioner.
Hvem er Maggie Vo og Ellie Shopic?
Maggie Vo leder det tekniske uddannelsesteam hos Anthropic, og Ellie Shopic er chef for teknisk træning.
Hvilke evner inden for computerbrug blev præsenteret for Claude?
Claude kan analysere skærmbilleder og bestemme nødvendige kommandoer, hvilket kan bruges til opgaver som at teste apps eller manipulere Excel-ark.
Hvad er formålet med Anthropic's prompt-generator?
Prompt-generatoren hos Anthropic er designet til at hjælpe med hurtig oprettelse af prompts ved at følge bedste praksis.
Hvad er RAG, og hvordan bruges det?
Rag står for "retrieval augmented generation," som integrerer ekstern viden med sprogmodeller ved at hente relevante data og bruge dem i en prompt.
Hvornår er det hensigtsmæssigt at bruge fine-tuning?
Fine-tuning kan bruges til at ændre model-adfærd til specifikke formater eller skemaer, men det kan gøre modellen dårligere, hvis data ikke vælges omhyggeligt.
Hvad er tool use, og hvordan bruges det med Claude?
Tool use er en metode til at udvide Claudes kapaciteter ved at give den værktøjer til at udføre specifikke opgaver baseret på forespørgsler.

Voir plus de résumés vidéo

Accédez instantanément à des résumés vidéo gratuits sur YouTube grâce à l'IA !

Sous-titres

Défilement automatique:

00:00:00
hi everybody thank you so much for
00:00:02
spending some of your day today coming
00:00:03
to this session I hope it's a session
00:00:05
that's super useful for you jam-packed
00:00:07
full of um tips and tricks everything
00:00:09
from prompt engineering to fine-tuning
00:00:11
to retrieval architectures and so on um
00:00:14
my name is Maggie vo I lead the tech the
00:00:16
technical Education team at anthropic
00:00:18
and my colleague is Ellie shopic who
00:00:20
will be doing the majority of this
00:00:22
presentation he is much more
00:00:23
knowledgeable than I am um and he's the
00:00:25
head of Technical
00:00:27
Training again pictures of our faces
00:00:31
guess so our agenda today um is lightly
00:00:34
I'll go into an introduction about
00:00:35
anthropic an overview of our latest
00:00:37
models and what's exciting about them
00:00:39
including some of our latest features
00:00:40
like computer use then we'll go into
00:00:43
techniques and best practices all sorts
00:00:45
of things top to botom PRP engineering
00:00:46
to agents and so
00:00:49
on so to begin with I'm curious how many
00:00:52
of you have heard of
00:00:54
Claude great love that how many of you
00:00:57
have used
00:00:58
Claude and how many of you Claude in
00:01:00
production somewhere great okay so
00:01:03
pretty good familiarity with Claude and
00:01:06
uh hopefully with anthropic as well um
00:01:08
for those of you who don't know we're a
00:01:09
pretty young company we were founded in
00:01:12
2021 and anthropic is our mission is to
00:01:16
help ensure that the world safely makes
00:01:18
the transition through transformative AI
00:01:20
we do this in a wide variety of ways
00:01:22
with alignment research interpretability
00:01:24
research um as well as of course making
00:01:27
some of the world's most intelligent
00:01:28
models while ensuring that they remain
00:01:30
trustworthy we also push other companies
00:01:32
to also be uh produce trustworthy
00:01:34
systems and just in general we try to
00:01:35
support businesses with generative AI
00:01:38
with um you know foundational
00:01:40
transformative uh features while also
00:01:43
making sure that they are safe for us
00:01:46
though safety is not something
00:01:47
theoretical it's not about safety guard
00:01:49
rails that you're not sure of during
00:01:50
training and so on we are the frontier
00:01:52
of research that makes Claude easier to
00:01:55
steer harder to jailbreak and also the
00:01:57
least likely model to hallucinate on the
00:01:59
market
00:02:00
according to um Galileo's llm
00:02:02
hallucination index and we're very proud
00:02:05
of these as we believe that these are
00:02:06
critical pieces to making sure you can
00:02:08
trust the models you have in deployment
00:02:10
um or even just in your personal life
00:02:12
and personal
00:02:15
use in October 2024 we launched uh
00:02:18
Claude 3.5 Sonet um the latest version
00:02:21
of it and we upgraded Cloud 3.5 Sonet
00:02:25
basically to a state-of-the-art model in
00:02:27
quite a variety of areas especially in
00:02:29
coding where we um you know put on a
00:02:32
wide variety of actual uh you know task
00:02:35
oriented benchmarks not just theoretical
00:02:37
remove from everyday life kind of
00:02:39
benchmarks and it showed some vast
00:02:40
improvements over whatever was uh
00:02:42
leading uh before we also released a
00:02:45
cloud 3.5 hi coup model including uh 3.5
00:02:48
hi cou uh fast model on Bedrock
00:02:51
specifically um that is inference
00:02:53
optimize in order to serve you the best
00:02:55
speeds for High cool level
00:02:57
intelligence and then uh something that
00:02:59
we'll talk about briefly it's also
00:03:01
computer use which is a experimental
00:03:03
beta um featuring capability that we
00:03:06
released with the latest Cloud 345 Sonic
00:03:08
model where cloud is able to use
00:03:11
computer interfaces with a variety of um
00:03:13
combination of screenshots and so on
00:03:14
cloud can navigate around your whole
00:03:16
computer you don't need apis and so on
00:03:18
because because Cloud can understand the
00:03:20
whole computer interface and start
00:03:21
interacting with it
00:03:24
directly a little bit about the cloud
00:03:26
3.5 Sonet model specifically you can see
00:03:28
some of our benchmarks here on the
00:03:30
standard benchmarks um but I think more
00:03:31
importantly what's interesting is that a
00:03:33
its computer vision has been vastly
00:03:35
improved so it can do something like
00:03:37
computer use and use it with greater
00:03:38
accuracy we trained it for strong
00:03:41
agentic capabilities too which is how
00:03:43
computer use actually works it can do um
00:03:45
much more nuanced thinking
00:03:48
decision-making um and also it's just
00:03:50
got much better at code generation in
00:03:52
terms of accuracy in terms of um
00:03:54
readability in terms of converting you
00:03:56
know Legacy code into um you know modern
00:03:59
uh code
00:04:01
all of these um combine into a wide
00:04:05
variety of use cases that I encourage
00:04:08
you to explore and look at um things
00:04:10
like as I mentioned code generation
00:04:12
that's a big one but also great visual
00:04:14
analysis in combination with computer
00:04:16
use Cloud's able to do things like read
00:04:18
Excel spreadsheets and do some analysis
00:04:20
for you even manipulate the spreadsheets
00:04:22
and then do some um you know additional
00:04:25
uh write some functions and so on to
00:04:27
help you with financial analysis for
00:04:28
example and then we've also trained
00:04:31
Cloud very well to just handle really
00:04:33
complex queries a lot of instructions
00:04:35
multi-step situations if this then that
00:04:38
sorts of situations um so Cloud's really
00:04:40
able to reason in multi-step ways quite
00:04:43
well at this
00:04:44
point here's an example I want to
00:04:46
highlight from one of our customers Jane
00:04:48
Street um and they're basically using
00:04:50
Claude to scale their um coding to be uh
00:04:55
much more efficient their quality is
00:04:57
increased developer productivity is V
00:04:59
increase and just the general time spent
00:05:02
on having you know writing PRS and
00:05:03
improving their codebase fixing code has
00:05:06
vastly decreased we're very proud of
00:05:10
this then there's hot Cloud 3 H cou 3.5
00:05:13
H cou which I won't spend too much time
00:05:14
on um it's you know for a similar speed
00:05:17
to Cloud 3 ha cou um but with much
00:05:20
greater uh intelligence and it's really
00:05:24
really good at coding as well um
00:05:26
especially fast cating where it is
00:05:28
currently best in class and it's able B
00:05:29
to um you know beat the previous Cloud
00:05:32
35 Sonet model also um on sbench and um
00:05:35
kind of other coding
00:05:38
benchmarks this is an example of a use
00:05:40
case that we found great success with
00:05:43
where door Dash is using Claude at Claud
00:05:44
3.5 haou in order to um route tickets in
00:05:48
their support system which has been much
00:05:51
more increasing the accuracy time of
00:05:53
response um in their uh ticket routing
00:05:57
and their customer service and
00:05:59
decreasing the you know average time
00:06:01
resolution and the uh rerouting rates
00:06:03
and accuracy
00:06:05
rates the last thing I will talk about
00:06:08
is computer use and here I'll actually
00:06:09
transfer over to Ellie but just to
00:06:11
introduce it computer use is this
00:06:13
ability that Claude has to perform tasks
00:06:16
by interpreting screenshots and then
00:06:18
taking actions based on those
00:06:19
screenshots so Claude does all sorts of
00:06:21
things now like it can test your apps
00:06:22
for you as it said it can manipulate
00:06:24
Excel spreadsheets um you can plan
00:06:26
vacations with Claude or ask Claud all
00:06:28
sorts of questions that it can then go
00:06:29
on the internet and browse and decide
00:06:31
things um I'll pass La if you don't mind
00:06:33
coming up I'll pass it here for LA to
00:06:35
further explain the use cases as well to
00:06:37
show you some things that computer use
00:06:38
can do and so on awesome
00:06:42
thank hopefully you all can hear me in a
00:06:44
quick second when this there we go
00:06:45
awesome hello everyone my name is Ellie
00:06:46
shopic I'm the head of Technical
00:06:47
Training here at anthropic and I'm super
00:06:49
excited to get a chance to talk to you
00:06:51
about computer use and prompt
00:06:52
engineering and so on just a quick show
00:06:54
of hands I know kind of piggyback what
00:06:55
Maggie was saying how many of you had a
00:06:57
chance to check out our booth and
00:06:58
actually see any of these demos in
00:06:59
action just quick show of hands all
00:07:02
right awesome so I'm going to do a lot
00:07:03
to give a theoretical overview of some
00:07:05
of these ideas we're going to talk about
00:07:06
computer use we're going to talk about
00:07:07
prompt engineering we're going to talk
00:07:08
about Rag and Tool use and fine tuning
00:07:10
but if you want to actually see these in
00:07:11
action I definitely recommend you come
00:07:13
check us out because we've got plenty of
00:07:15
demos going on in our booth we've got a
00:07:16
really wonderful team also to guide you
00:07:18
through any questions you may have or
00:07:19
any kind of technical or non-technical
00:07:21
questions that you have we're happy
00:07:22
happy to answer so maie talked a little
00:07:24
bit about computer use and computer use
00:07:26
is this capability that our upgraded 35
00:07:28
sonnet has to basically analyze a
00:07:30
screenshot and figure out the necessary
00:07:33
computer command to take so what that
00:07:35
basically means is if we give Claude a
00:07:36
screenshot of a desktop and we give
00:07:38
Claude a prompt something like I need
00:07:40
you to go and figure out some really
00:07:41
awesome hiking trails near Vegas because
00:07:43
I need to see the sun I've been at this
00:07:45
conference for too long and I need to
00:07:46
get outside Claude is going to go and
00:07:48
take a look at that screenshot and it
00:07:49
what it's going to do is it's actually
00:07:50
going to say something like okay in
00:07:52
order to figure out the best hikes near
00:07:54
the convention I'm going to go ahead and
00:07:57
go to the browser and do that research
00:07:59
so I see based on this screenshot that
00:08:01
the Firefox or Chrome or Safari or
00:08:03
whatever browser you have icon is at
00:08:05
this coordinate and you should go and
00:08:07
execute a click command it's then up to
00:08:09
you as the developer to write the code
00:08:11
necessary to execute that click and you
00:08:13
can do that in your language of choice
00:08:14
the reference implementation that we
00:08:16
have is in Python once that click
00:08:18
happens we then take a screenshot and
00:08:19
feed it again to Claude and Claude can
00:08:21
then take a look at this and say great
00:08:23
this looks like Firefox I see that in
00:08:25
the browser bar we should probably type
00:08:27
something like good hikes near Vegas and
00:08:29
press enter you as the developer execute
00:08:32
that command repeat repeat repeat that
00:08:34
idea of giving a model some kind of
00:08:37
tools some kind of ability to interact
00:08:39
with some kind of system and then just
00:08:41
running a loop over and over and over
00:08:42
again is a very very very high Lev way
00:08:45
of explaining what an agent is you're
00:08:46
going to hear this term agents and
00:08:47
agentic workflows and so on we're going
00:08:49
to talk about that in a little bit but
00:08:50
when you hear that term agents an llm
00:08:52
something like Claud a set of tools some
00:08:55
ability to perform actions like
00:08:57
interpreting screenshots and then just a
00:08:59
loop do it again do it again do it again
00:09:01
do it again what's really interesting
00:09:03
about computer use is while we are so
00:09:05
excited about this technology it's still
00:09:06
relatively early but in a short period
00:09:09
of time we've been able to basically
00:09:10
double the performance and evaluation
00:09:12
benchmarks of previous state-of-the-art
00:09:13
models and some of the use cases just
00:09:15
like Maggie mentioned that we're
00:09:16
exploring are in that kind of manual QA
00:09:18
anytime that there's a human in the mix
00:09:20
doing very tedious time incentive and
00:09:22
very very laborious tasks things like
00:09:25
manually QA a particular piece of
00:09:27
software and thinking about every single
00:09:29
inter action that might need to be taken
00:09:30
and instead of having to worry about all
00:09:32
those interactions and taking them
00:09:34
programmatically we can say here Claude
00:09:36
is the flow that I'd like you to go
00:09:37
through here are all the permutations
00:09:38
that I want you to analyze and I might
00:09:40
even miss some of them so if there is a
00:09:42
flow in this application or architecture
00:09:44
that I've missed why don't you go ahead
00:09:45
and do that and test it for me take
00:09:47
those results write the output to some
00:09:49
file I'll give you an expected output
00:09:50
you let me know if those match so you
00:09:52
can think about kind of QA workflows in
00:09:55
that particular capacity I also want to
00:09:56
show you a little demo this is from a
00:09:58
colleague of mine Alex a wonderful
00:10:00
wonderful demo here just talk a little
00:10:01
bit about computer use with
00:10:03
coding so go ahead and give that a video
00:10:05
a second to play I'm Alex I lead
00:10:08
developer relations at anthropic and
00:10:10
today I'm going to be showing you a
00:10:11
coding task with computer
00:10:14
[Music]
00:10:19
use so we're going to be showing Claude
00:10:21
doing a website coding task by actually
00:10:24
controlling my laptop but before we
00:10:26
start coding we need an actual website
00:10:29
ite for Claude to make changes to so
00:10:31
let's ask Claude to navigate to cloud.
00:10:34
within my Chrome browser and ask Claude
00:10:36
within cloud. to create a fun '90s
00:10:39
themed personal homepage for
00:10:42
[Music]
00:10:43
itself Claud opens
00:10:45
[Music]
00:10:47
Chrome searches for
00:10:52
cloud. and then types in a prompt asking
00:10:55
the other Cloud to create a personal
00:10:57
homepage for itself
00:11:00
[Music]
00:11:05
cloud. returns some
00:11:10
code and that gets nicely rendered in an
00:11:12
artifact on the right hand side that
00:11:14
looks great but I want to make a few
00:11:16
changes to the website locally on my own
00:11:18
computer let's ask Claude to download
00:11:21
the file and then open it up in vs
00:11:24
code Claude clicks the save to file
00:11:27
button
00:11:29
opens up VSS
00:11:32
code and then finds the file within my
00:11:35
downloads folder and opens it
00:11:37
[Music]
00:11:39
up perfect now that the file is up and
00:11:42
running let's ask Claude to start up a
00:11:44
server so that we can actually view the
00:11:46
file within our
00:11:48
[Music]
00:11:51
browser Claude opens up the VSS code
00:11:54
terminal and tries to start a server
00:12:00
but it hits an error we don't actually
00:12:01
have python installed on our machine but
00:12:03
that's all right because Claude realizes
00:12:05
this by looking at the terminal output
00:12:07
and then tries again with Python 3 which
00:12:09
we do have installed on our machine that
00:12:11
works so now the server is up and
00:12:13
running now that we have the local
00:12:14
server started we can go manually take a
00:12:16
look at the website within the browser
00:12:19
and it looks pretty good but I noticed
00:12:21
that there's actually an error in the
00:12:22
terminal output and we also have this
00:12:23
missing file icon at the top here let's
00:12:26
ask Claude to identify this error and
00:12:28
then fix it within the file Claude
00:12:31
visually reads the terminal output and
00:12:33
then opens up the find and replace tool
00:12:35
in BS code to find the line that's
00:12:38
throwing the actual error in this case
00:12:40
we just ask Claude to get rid of the
00:12:41
error entirely so it will just delete
00:12:43
the whole line then Claude will save the
00:12:45
file and automatically rerun the website
00:12:48
so now that the error is gone let's go
00:12:50
take a final look at our website and we
00:12:52
can see that the file icon has
00:12:54
disappeared and the aor is gone as well
00:12:57
perfect so that's coding with computer
00:12:59
use and Claude this took a few prompts
00:13:01
now but we can imagine in the future
00:13:03
that Claude will be able to do tasks
00:13:04
like this end to
00:13:05
[Music]
00:13:08
end awesome thank you so much Alex and I
00:13:10
could watch that over and over again but
00:13:12
I'll have to talk to you for the next 40
00:13:13
or so minutes so bear with me but what's
00:13:15
really interesting about that is we saw
00:13:17
screenshots we saw screenshots of a
00:13:18
terminal we saw screenshots of a browser
00:13:20
we saw screenshots of a text editor we
00:13:23
fed those screenshots to Claude and
00:13:24
Claude had the ability to analyze that
00:13:26
screenshot and figure out the necessary
00:13:28
command whether that is inputting text
00:13:30
whether that's looking at a terminal and
00:13:32
realizing that there's an error what you
00:13:34
just looked at that idea of a prompt and
00:13:36
then feeding a screenshot and an action
00:13:37
and a screenshot and an action and a
00:13:39
screenshot an action is that agentic
00:13:41
workflow and again if you'd like to see
00:13:42
that actually in action come check it
00:13:43
out at the booth we've got that demo
00:13:44
running you can kind of play around for
00:13:46
yourself put in your own prompts and
00:13:47
explore all the fun things you can do
00:13:48
with computer use as we shift gears a
00:13:51
little bit the models that we have in
00:13:53
our ecosystem CLA 35 Sonet 35 Hau and so
00:13:56
on and the other ones as well all
00:13:57
supported in the Bedrock environment and
00:13:59
what we're really really proud of is the
00:14:00
ability to leverage some of the
00:14:02
functionality that Bedrock has out out
00:14:03
of the box from security as well as some
00:14:05
of the embeddings and fine-tuning
00:14:07
services that Amazon Bedrock offers
00:14:09
combined with our models to produce
00:14:11
really high power generative AI
00:14:14
applications as we start thinking about
00:14:15
some of the highle things in the
00:14:17
generative AI space and some of the kind
00:14:18
of most essential building blocks for
00:14:20
building successful applications first
00:14:22
thing I want to start with is the idea
00:14:23
of prompt engineering talking a little
00:14:24
bit about prompting I'm sure you're all
00:14:26
very familiar with a prompt and if not I
00:14:28
will gladly read Define it for you
00:14:29
prompt is the information you pass into
00:14:32
a large language model to get a response
00:14:34
when you're working with a large
00:14:35
language model in a more conversational
00:14:37
way this could be something like Claud
00:14:38
AI or just a chat bot you're working
00:14:40
with you have a little bit more luxury
00:14:42
with going back and forth oh you didn't
00:14:44
really understand what I meant so here's
00:14:45
what I want or no actually here's what
00:14:47
I'm looking for give me this kind of
00:14:48
thing but when you're dealing with
00:14:50
prompts in more of an Enterprise or API
00:14:52
context you really just have one
00:14:54
opportunity to create a very high
00:14:56
quality prompt that consists of your
00:14:57
context your data your conversation
00:14:59
history examples and so on and many
00:15:01
times that leads to creating prompts
00:15:03
that look like this and it's relatively
00:15:04
scary fortunately it's very colorful so
00:15:06
at least you have that part but if you
00:15:07
look at this it's pretty intimidating to
00:15:09
figure out all the parts of this and
00:15:11
figure out what goes where and you might
00:15:12
be looking at this and taking a picture
00:15:13
furiously and trying to jot it down and
00:15:15
so on but what's really challenging
00:15:17
about getting these prompts ready for
00:15:18
production is actually going from zero
00:15:20
to one starting with this idea you have
00:15:22
for an application I'm going to build a
00:15:23
classifier I'm going to build a
00:15:24
summarizer and then taking that and
00:15:26
turning it into this leveraging the best
00:15:27
practices we have around where the task
00:15:30
content goes where the dynamic data goes
00:15:31
where the pre-filled response goes and
00:15:33
so on so in order to solve that problem
00:15:35
we have a really lovely tool that we
00:15:36
call the prompt generator and we're firm
00:15:38
Believers that as prompt engineering
00:15:39
grows and evolves there's going to be a
00:15:41
really strong combination of manual work
00:15:44
and programmatic work initially this was
00:15:46
really all done manually with lots and
00:15:47
lots of iteration we don't believe this
00:15:49
is something that is going to be 100%
00:15:51
programmatic because there's really a
00:15:52
bit of an art and science to it but if
00:15:54
there's a situation where you need to go
00:15:56
from zero to one you need to generate a
00:15:57
prompt come take a look at this
00:15:59
particular tool that we have at console.
00:16:01
anthropic tocom I also recommend you
00:16:03
come check out the demos that we have
00:16:05
for the workbench to get up to speed on
00:16:07
the promp generator as well again this
00:16:09
is not going to solve all of your
00:16:10
problems the promp generator is a way
00:16:12
for you to put in the task at hand and
00:16:13
have automatically generated prompt with
00:16:15
best practices in mind but again just
00:16:18
like with any software that is generated
00:16:19
for you there's still things you're
00:16:20
going to have to go in and tweak and
00:16:21
edit to fit your particular use case
00:16:24
something that I do recommend you really
00:16:25
think about when doing prompt
00:16:27
engineering is really try to leverage
00:16:28
some of the best best practices you may
00:16:29
have from software so for those of you
00:16:31
that are familiar with software
00:16:32
engineering been in the software
00:16:33
engineering space you're probably
00:16:34
familiar with the idea of Version
00:16:35
Control keeping track of changes to
00:16:37
files that you've made edits deletions
00:16:39
modifications new file Creations do the
00:16:41
same with your prompts as you have new
00:16:43
prompts as you iterate on prompts you
00:16:45
want to make sure that you're keeping
00:16:46
track of the previous ones that you have
00:16:48
so that you can iterate on them
00:16:49
appropriately instead of just redoing
00:16:52
and redoing and redoing again we've got
00:16:54
a lot of really good stuff on our docs
00:16:55
around this and I recommend you all take
00:16:56
a look at the prompt generators to
00:16:57
really get up and running with high
00:16:59
quality prompts in general some best
00:17:01
practices that we have around prompt
00:17:02
engineering I'm going to show you some
00:17:03
highle ones but the big three that I
00:17:05
really want to focus on again the first
00:17:08
one's going to seem a bit simplistic but
00:17:09
you'd be shocked how many times this
00:17:10
goes ay be clear and direct at the end
00:17:13
of the day all that these models are are
00:17:16
tools for predicting the next token the
00:17:18
next word the next series of text so the
00:17:21
more that you can give to the model to
00:17:23
give it some context to have it pay
00:17:25
attention to what it's seen before to
00:17:26
figure out what's next the better of a
00:17:28
response you're going to give I really
00:17:30
like to draw the parallel to talking to
00:17:31
a human as well I want you to think of
00:17:33
the llm that you're working with as
00:17:35
someone who has a very very very very
00:17:37
large broad range of knowledge but
00:17:39
absolutely no idea what the task is that
00:17:40
you want to do it's up to you to tell
00:17:43
the large language model what the task
00:17:44
is at hand and explain it succinctly
00:17:47
what does that mean start with a couple
00:17:48
highle sentences of the role and the
00:17:50
highle task description we recommend if
00:17:53
there's Dynamic data coming in whether
00:17:54
that's from something like retrieval AED
00:17:56
generation or any variables coming in
00:17:58
we'll talk about that rag in a little
00:17:59
bit put that content at the top make
00:18:01
sure you've got some detailed task
00:18:03
instructions and then we'll talk a
00:18:04
little bit about if your particular use
00:18:07
case requires a little bit more in-depth
00:18:09
explanation examples are incredibly
00:18:11
powerful so be clear and direct provide
00:18:13
examples and then you want to be really
00:18:15
intentional about the process that you
00:18:17
want the model to think through I'm sure
00:18:19
many of you have heard this idea of
00:18:20
Chain of Thought or thinking step by
00:18:21
step thinking step byep is a great start
00:18:24
but just like if I told you I want you
00:18:25
to think step by step about what's going
00:18:27
on here you might say all right I'll
00:18:28
take a extra seconds and think about it
00:18:30
but what's even more powerful than the
00:18:31
think step byep is actually telling
00:18:33
Claude how to do that thinking if you
00:18:35
were to explain this to someone new at
00:18:37
your company even someone senior or an
00:18:39
intern how would you go ahead and think
00:18:40
about performing that task really try to
00:18:42
draw that parallel when thinking about
00:18:44
prompt
00:18:45
engineering something you're going to
00:18:46
see quite a bit is this idea of XML tags
00:18:49
if you are familiar with HTML it's a
00:18:50
very similar kind of idea you have an
00:18:51
open tag and a closed tag with XML you
00:18:53
can pick the name of that tag the name
00:18:56
of the tag does not really have any
00:18:58
significance but you want to be
00:18:59
intentional about the semantic meaning
00:19:01
for what that is the purpose of using
00:19:03
XML tags is to help create organization
00:19:06
when you have a very very large prompt
00:19:07
if you have a relatively short prompt
00:19:09
you don't need to move the needle too
00:19:10
much with XML tags but as your prompts
00:19:12
get longer and longer and longer the
00:19:14
same way that we as humans like
00:19:15
indentation and Whit space and so on
00:19:18
Claude likes XML text you can use other
00:19:20
delimiters other formats and so on but
00:19:22
we prefer XML because it's clear and
00:19:24
token
00:19:25
efficient we talked a little bit about
00:19:27
examples when when you think about
00:19:29
providing your examples these are
00:19:31
essentially one of the most powerful
00:19:32
tools you can provide because Claude is
00:19:35
very very good at pattern matching
00:19:36
especially if there's a particular
00:19:37
format that you want Claude to adhere to
00:19:39
you really want to give Claude
00:19:40
essentially as much information as
00:19:42
possible that it can figure out what to
00:19:44
do with its output in general just like
00:19:46
with humans it's much easier to just
00:19:48
tell me how to do it and show me what it
00:19:50
looks like as opposed to long-winded
00:19:52
explanations of all the things at hand
00:19:54
so providing examples but being
00:19:55
intentional with your examples the
00:19:57
relevance the diversity the quantity you
00:19:59
really want to make sure that's
00:19:59
something you have in all of your
00:20:01
prompts something that's a bit unique
00:20:03
about Cloud something that other large
00:20:04
language model providers don't
00:20:05
necessarily offer is the idea of
00:20:07
pre-filling Cloud's response if you're
00:20:10
familiar with communicating with apis
00:20:12
with users and assistants the basic flow
00:20:14
is that you have a user message that
00:20:16
could be something that the user types
00:20:17
that could be a prompt that you have
00:20:19
programmatically and the assistant is
00:20:21
what you are getting back from claw that
00:20:22
is the response you are getting back
00:20:24
from the large language model what you
00:20:26
can do by pre-filling the response is
00:20:28
essentially put some words into claude's
00:20:30
mouth and what that basically means is
00:20:32
every single response that Claude gives
00:20:34
can start with something that you as the
00:20:37
human have
00:20:38
dictated why might that be useful if
00:20:40
there's a situation where you want to
00:20:42
steer claud's Behavior a bit there might
00:20:43
be a situation where Claud again like
00:20:45
Maggie mentioned we are a safety first
00:20:47
company that is a priority to what we do
00:20:49
and there may be situations where your
00:20:50
use case is actually getting blocked by
00:20:53
some of the safety considerations that
00:20:54
we have Again by using pre-filling the
00:20:57
response you are not going to jailbreak
00:20:58
everything by any means but you can put
00:21:00
some words into cla's mouth to try to
00:21:01
give it some context for what it is that
00:21:03
you're trying to do you can be
00:21:05
intentional with that to make sure that
00:21:06
you have a little bit more control over
00:21:08
the behavior and the formatting so
00:21:09
pre-filling the response is a very
00:21:11
common way that you can put some words
00:21:12
into claude's mouth and that way CLA can
00:21:14
essentially just pick up where you as
00:21:16
the human left off it's really an
00:21:18
underrated aspect to some of the
00:21:19
prompting that we can do so we've got a
00:21:22
good sense of prompt engineering we got
00:21:24
a good sense of the data that we passed
00:21:25
to our model to elicit a response we
00:21:27
talked about being clear IND direct we
00:21:29
talked about some of the most important
00:21:30
ideas with prompt engineering I want to
00:21:32
shift gears a little bit to Tool use
00:21:33
just raise your hand if you're familiar
00:21:34
with the idea of tool
00:21:36
use right excellent you can go and come
00:21:38
back in five minutes so tool use the
00:21:40
idea here is simply to extend claud's
00:21:42
functionality the classic example for
00:21:44
Tool use is a situation where I might
00:21:46
ask you something like hey Claude what
00:21:47
is the weather right now in Las Vegas
00:21:50
the response that Claude is going to
00:21:51
give me is most certainly going to be
00:21:52
something like I'm sorry I don't have
00:21:54
that information right now thank you or
00:21:58
maybe I can tell the weather you know in
00:22:00
August or April of last year or so on
00:22:02
probably not going to get that but you
00:22:03
can imagine the information that I want
00:22:05
at this exact moment is not something
00:22:06
that Claude has out of the box so tool
00:22:09
use allows us to instead give our
00:22:12
application a little bit more awareness
00:22:15
of other things that we might want to do
00:22:17
we're not going to give Claude the
00:22:18
ability to go and find the weather and
00:22:20
return it to us that is not the purpose
00:22:21
of tool use the purpose of tool use is
00:22:23
to Simply extend claude's capability so
00:22:27
that instead of saying something like I
00:22:28
don't know what the weather is Claude is
00:22:30
actually going to say something like hey
00:22:32
looks like you trying to find the
00:22:33
weather looks like your location is Las
00:22:36
Vegas and since you're based in the US
00:22:38
I'm going to assume you want that in
00:22:40
Fahrenheit it's then up to the developer
00:22:42
of the application to take that response
00:22:45
and do what they want with it so I'll
00:22:47
give you an example here you might have
00:22:48
a prompt what was the final score of a
00:22:49
SF Giants game on October 28th 2024 this
00:22:52
is something out of the box that Claude
00:22:53
does not know this is past our training
00:22:54
cutoff date so what is Claude normally
00:22:56
going to say I don't know but instead
00:22:59
if we give Claude a list of tools and
00:23:00
I'm going to show you what a tool looks
00:23:01
like again it's actually not that
00:23:03
difficult to do and there are many tools
00:23:04
that can even help you generate and
00:23:05
validate the tools that you make so with
00:23:07
this particular list of tools every tool
00:23:09
has a name every tool has a description
00:23:12
you might wonder why do you need a name
00:23:13
why do you give why do you need a
00:23:15
description because when a prompt comes
00:23:16
in asking about something like a final
00:23:18
score of a game if we have a tool that
00:23:21
is called get score that's related to
00:23:23
some kind of baseball game Claud is
00:23:24
going to be able to infer oh that's the
00:23:26
one that you probably want to use so
00:23:28
when you start making use of tool use
00:23:29
you want to have a really good name and
00:23:30
a really good description because that's
00:23:32
what Claude is going to use to infer
00:23:33
what action to take again Claude is not
00:23:36
going to go to some sports application
00:23:39
or to a database and get you the score
00:23:41
all that Claude is going to do is return
00:23:42
to the developer of the application the
00:23:45
particular tool and the inputs and then
00:23:47
claw just kind of walks away like I've
00:23:48
done my work another example here you
00:23:51
might have a situation where you're
00:23:52
building a chatbot and in this
00:23:54
particular chatbot you want to have the
00:23:55
ability for a user to look up something
00:23:57
in the inventory
00:23:59
well if you ask Claude for example you
00:24:00
know how many units of this particular
00:24:02
skew or item or so do we have cla's
00:24:04
going to say I have no idea but if you
00:24:05
give it a tool like inventory lookup
00:24:08
Claude will respond and say something
00:24:09
like oh it looks like this person is
00:24:10
trying to find this item and they want
00:24:12
to know the particular quantity again
00:24:14
it's up to you as the developer to go to
00:24:16
a database go to an API go to a service
00:24:18
do whatever it is that you want why do I
00:24:20
really want to hammer home this idea of
00:24:22
tool use because computer use is
00:24:24
actually just an extension of tool use
00:24:27
computer use is simply just a bunch of
00:24:29
tools that we at anthropic have defined
00:24:32
for you to use those tools include
00:24:35
things like taking a look at a
00:24:37
screenshot and figuring out the
00:24:38
necessary command taking a look at a
00:24:40
terminal and figuring out where things
00:24:42
are and what commands might need to be
00:24:44
input and executed taking a look at some
00:24:47
text and figuring out where text needs
00:24:49
to be written or copied or modified or
00:24:50
so on so if you have an understanding of
00:24:53
tool use understanding computer use is
00:24:55
actually just a very very small jump
00:24:57
conceptually
00:24:58
one more visualization for this idea
00:25:00
again I'll walk you through it you might
00:25:02
have a situation where you ask how many
00:25:03
shares of GM can I buy with $500 while
00:25:05
Claude is going to say I mean I can tell
00:25:06
you GM maybe a year ago but I don't have
00:25:08
it right now instead if we give our
00:25:10
application a tool Claude can then say
00:25:12
something like oh looks like you're
00:25:14
trying to use that get stock price tool
00:25:16
and you've passed in General Motors as
00:25:18
the argument great our application can
00:25:20
now go to an API go to some external
00:25:22
data source fetch that price figure out
00:25:25
what's necessary and then Claude can
00:25:26
take it the rest of the way
00:25:29
tool use is something that you can do
00:25:31
with quite quite high accuracy with a
00:25:33
tremendous amount of tools in fact this
00:25:34
is actually much closer to 95 plus and
00:25:36
hundreds of tools is something that is
00:25:38
no problem here you just want to be
00:25:40
mindful that you have accurate and
00:25:41
reasonable tool definitions just to show
00:25:43
you what a tool looks like again it it
00:25:45
shouldn't be too scary looking because
00:25:47
this is really all it is you give it a
00:25:49
name you give it a description for those
00:25:51
of you familiar with Json schema whether
00:25:52
it's through typing or API validation or
00:25:54
so on you can see here in this input
00:25:56
schema we basically just put in any
00:25:59
Properties or parameters or arguments
00:26:01
that that tool should have so the name
00:26:03
of the tool get weather the description
00:26:05
get the current weather and then we
00:26:07
simply just give it some properties of
00:26:08
things you should look for when using
00:26:10
that tool so if a prompt comes in and
00:26:12
someone says what is the weather right
00:26:14
now Cloud's going to basically say
00:26:16
something like oh it looks like you're
00:26:17
trying to use the get weather tool but
00:26:19
the location is required so could you
00:26:21
tell me what location you are in then
00:26:23
the user might say something like oh
00:26:24
sorry I'm in Vegas great Claude has that
00:26:27
information and then says excellent I
00:26:29
know the location I know the name of
00:26:30
this tool okay application here is the
00:26:34
tool you're trying to use here is the
00:26:35
input go and do what you want with it so
00:26:38
if you can kind of build that flow that
00:26:39
kind of mental model for how tool use
00:26:42
Works how tools are defined how we work
00:26:45
with those getting up to the
00:26:47
understanding of computer use and agents
00:26:48
and so on it's not too much of a leap
00:26:50
but you really want to make sure at the
00:26:51
end of the day you understand what a
00:26:53
large language model is which hopefully
00:26:54
we've got that Foundation you want to
00:26:56
make sure you understand tool use this
00:26:58
extension passing in these object like
00:27:02
ideas to Claude to basically interpret
00:27:04
analyze and then execute a particular
00:27:07
command we talked a little bit about
00:27:09
some just good practice for Tool use
00:27:11
again a simple and accurate tool name
00:27:12
being clear and direct same exact idea
00:27:14
from prompt engineering coming right
00:27:16
back to Tool use being mindful of the
00:27:18
description what the tool returns how
00:27:20
the tool is used examples are actually
00:27:23
less important than having a clear and
00:27:25
comprehensive explanation so you might
00:27:26
be tempted to show all these examples
00:27:28
again with many ideas in prompt
00:27:30
engineering start with something
00:27:31
relatively simple simple just see what
00:27:33
you get see how it works and then
00:27:35
iterate from there again if you want to
00:27:37
take a look at all of these ideas we're
00:27:38
talking about rag we're talk about tool
00:27:40
use computer use you talked a little bit
00:27:41
about the prompt generator and so on
00:27:43
we've got tools for improving prompts
00:27:44
come take a look at those in the booth
00:27:46
we've got lots of really fun uh quick
00:27:47
starts and ways for you to get
00:27:48
interacted with
00:27:50
that in general you might think of a
00:27:53
large application with hundreds and
00:27:54
hundreds of tools to make sure this is
00:27:57
working properly to make sure you're
00:27:58
being really intentional about how this
00:28:00
is all done you want to think a lot
00:28:02
about each tool having one particular
00:28:04
meaning and one particular concern so if
00:28:06
you're familiar in software engineering
00:28:07
of the single responsibility principle
00:28:09
or such you want to kind of follow the
00:28:11
same idea you really don't want to have
00:28:13
a tool that tries to do 10 things all at
00:28:14
once that's why even with computer use
00:28:16
we don't have one tool called do all the
00:28:18
computer stuff we have things like a
00:28:20
computer tool and a bash tool and a
00:28:22
string edit tool and so on so again you
00:28:24
want to have one tool that just does one
00:28:26
thing well
00:28:30
as we shift gears we talked a little bit
00:28:31
about tool use it's important that we
00:28:33
understand tool use actually before we
00:28:34
talk about rag because it is possible
00:28:36
that you might want to use a tool to
00:28:38
figure out whether you should do rag or
00:28:40
not quick show of hands how many of you
00:28:42
are familiar with the idea of rag or the
00:28:43
acronym or so on or if I were to call on
00:28:45
you you could give me a definition all
00:28:47
those hands went down very quickly but
00:28:49
that's okay um that's that's all right
00:28:50
I'll do that for you so talk about rag
00:28:53
architecture and tips here rag or
00:28:54
retrieval augmented generation the
00:28:56
really cool thing about rag is it's
00:28:58
short and fun sounding acronym the even
00:29:00
more exciting thing is 99% of the work
00:29:02
is just that letter r the A and the g
00:29:04
are just a nice way to make it sound
00:29:05
kind of cool so retrieval is really
00:29:07
where all the hard stuff is going to
00:29:08
happen so we'll talk about what this
00:29:10
idea is we'll walk through it step by
00:29:12
step rag is the idea of searching for
00:29:14
retrieving and adding context what does
00:29:16
that mean you might have data that is
00:29:18
proprietary you might have data internal
00:29:20
to your company you might have data that
00:29:22
is past the training cutof date
00:29:24
information that Claude is not aware
00:29:25
about you also might have lots and lots
00:29:27
and lots of lots and lots of documents
00:29:29
that you can't just stuff into one
00:29:30
prompt in the context window because the
00:29:33
context window is not big enough so how
00:29:35
do we augment language models with
00:29:37
external knowledge well we take that
00:29:39
external knowledge and we put it
00:29:41
somewhere else we then go ahead and
00:29:44
retrieve that external
00:29:46
knowledge in order to set up more of a
00:29:48
professional grade rag Pipeline and so
00:29:51
on there's a little bit of work that
00:29:52
needs to happen this idea of
00:29:55
pre-processing and working and chunking
00:29:56
with your data is how this all kicks off
00:29:59
so I want you to imagine a situation of
00:30:00
you have all of your company's internal
00:30:02
documents for onboarding new hires and
00:30:05
so on you can imagine this might be you
00:30:07
know 50 100 200 500 pages all kinds of
00:30:10
things of benefits and insurance plans
00:30:11
and all kinds of internal information
00:30:13
that Claud is not aware of again we
00:30:16
can't take all that stuff and stuff it
00:30:17
into a prompt and ask it questions the
00:30:19
context window is not big enough so what
00:30:21
do we do we take that data and we break
00:30:24
up all of that data into smaller chunks
00:30:27
this is a process very commonly called
00:30:28
chunking you take your data whether it's
00:30:30
text whether it's image whether it's
00:30:32
video whether it's audio however it may
00:30:33
be and we take that data and we turn it
00:30:35
into what's called an embedding an
00:30:37
embedding is just a really really fancy
00:30:39
way of saying a list of long long
00:30:42
numbers or floating Point numbers the
00:30:44
purpose of this is at the end of the day
00:30:46
we have to take this text and get down
00:30:48
to numbers that's how models work as
00:30:49
well we take these text get them down to
00:30:51
numbers make our way up to a bunch of
00:30:53
probabilities and then figure out the
00:30:55
next
00:30:56
token the reason why we use these
00:30:58
embeddings is because embeddings allow
00:31:01
us to perform semantic or similar
00:31:04
searches why do we want to do something
00:31:06
like that well let's imagine you go
00:31:08
ahead and you take some internal
00:31:09
anthropic documents and they're about me
00:31:12
you might ask a question like who is
00:31:13
Ellie shopic you might ask a question
00:31:15
like who is that Ellie person whose last
00:31:17
name I can't pronounce or who is that
00:31:19
Ellie person who does not stop talking
00:31:20
about tool use and rag they're all
00:31:22
actually the same question but they're
00:31:23
all somewhat similar which means that
00:31:25
when we go ahead and we try to retrieve
00:31:27
doc doents related to whatever that may
00:31:29
be we can't just use an exact search so
00:31:32
what we do is we take our text we break
00:31:34
it down into embeddings in order to do
00:31:37
that we make use of an embedding model
00:31:39
which is actually a different kind of
00:31:40
model that instead of trying to Output
00:31:42
the next token or so on it actually
00:31:43
outputs a bunch of embeddings those
00:31:45
embeddings those numerical
00:31:47
representations refer to a particular
00:31:49
piece of meaning in a certain
00:31:51
dimensional space so you might have 500
00:31:54
300 2,000 5,000 Dimensions with each
00:31:57
particular vector representing a
00:31:59
particular part of a semantic meaning of
00:32:01
some text we take all that text and we
00:32:04
store it externally very commonly that
00:32:06
is done in a vector store so you might
00:32:08
have heard of vector databases or so on
00:32:10
these are essentially data stores for
00:32:11
all of our
00:32:13
embeddings before we can do any of the
00:32:15
retrieval and so on we got to do that
00:32:16
work very commonly this is part of your
00:32:18
rag pipeline or your pre-processing part
00:32:21
of data this is where there's actually a
00:32:22
decent amount of engineering and
00:32:24
trickery that happens how do I make sure
00:32:26
using the right embedding model for my
00:32:27
use case how do I make sure I'm chunking
00:32:29
appropriately and so on that's a lot of
00:32:30
work fortunately there are tools
00:32:32
especially in the Amazon Bedrock
00:32:34
ecosystem like Amazon Bedrock knowledge
00:32:36
bases which are wonderful tools for
00:32:38
helping you with a lot of that work so
00:32:40
you could do that engineering Yourself
00:32:41
by all means but there are also many
00:32:43
kind of infrastructures of service
00:32:44
platforms to help you with
00:32:46
that so before we do any of that
00:32:48
retrieval we got to do that work that
00:32:50
chunking that pre-processing getting
00:32:51
that stuff in a vector database once
00:32:53
we've got it all in a vector database
00:32:54
and we feel great about our embeddings
00:32:57
now we can start going ahead and
00:32:58
retrieving
00:33:00
information and here is how that
00:33:02
traditionally works we might have a
00:33:03
situation where a user asks a question I
00:33:05
want to get my daughter more interested
00:33:06
in science what kind of gifts should I
00:33:08
get her we then take that query we then
00:33:11
take that information and we embed it we
00:33:13
turn it into an embedding we turn it
00:33:14
into a list of a bunch of floating Point
00:33:16
numbers and then we go ahead and we take
00:33:18
that embedding and we go back to our
00:33:20
Vector database and we say what similar
00:33:22
results do I have what similar search
00:33:24
results do I have for that particular
00:33:26
thing that is the retrieval part once we
00:33:29
have those results we then augment our
00:33:32
prompt a little bit with some Dynamic
00:33:34
information and generate a response so
00:33:37
that's the A and the g in rag that's the
00:33:39
that's the quick and easy part but that
00:33:40
r that retrieval that process of making
00:33:42
sure that the data somewhere is living
00:33:45
in some store where I can search by
00:33:46
similar meanings or hybrid searches or
00:33:48
so on and I can get a result that's
00:33:50
where it gets a little bit more
00:33:52
interesting to give you a little bit of
00:33:54
a visualization of this rag architecture
00:33:55
and again if you want to see this in
00:33:56
action come take a look at our Dem Booth
00:33:58
we got some really cool quick starts of
00:33:59
actually rag in application using tools
00:34:01
like knowledge bases we're going to take
00:34:03
that question we're going to embed it
00:34:05
we're going to turn it into that long
00:34:07
list of floating Point numbers we're
00:34:09
then going to go ahead and execute some
00:34:10
kind of similarity search that
00:34:12
similarity search can be a variation of
00:34:14
a couple algorithms you might be
00:34:15
familiar with Manhattan distance or
00:34:17
ukian distance or cosine similarity or
00:34:19
dot product it's really just trying to
00:34:21
find two similar points in some kind of
00:34:23
dimensional space once we find those
00:34:26
similar results we can go ahead and take
00:34:29
those results that we got back and
00:34:30
augment our prompt and generate a
00:34:32
completion so again that R in retrieval
00:34:35
is really where the tricky part
00:34:37
happens the reason why I brought up tool
00:34:39
use earlier is because you could imagine
00:34:42
a situation where you ask Claude
00:34:44
something like who was the president of
00:34:46
the United States in
00:34:48
1776 well Claude doesn't need to do
00:34:50
something like ah let me go ahead and
00:34:51
embed that and go to the vector database
00:34:53
and so on that's something that Claud
00:34:54
should hopefully know and I feel pretty
00:34:55
good about that one so cla's just going
00:34:57
to give your
00:34:58
response so instead of jumping to
00:35:00
something like rag maybe you just want
00:35:02
to use claud's knowledge out of the box
00:35:05
this is where tool use can be very
00:35:06
helpful you might have a situation where
00:35:08
a query comes in and Claude basically
00:35:11
tries to figure out if it knows the
00:35:12
answer or not and you can even through
00:35:15
prompt engineering be very intentional
00:35:16
about what it means to know or not know
00:35:18
and if Claude says something like I
00:35:20
don't know well then go ahead let's go
00:35:22
use that tool and we probably got to go
00:35:23
find something or hey if a question
00:35:25
comes in and it's past your knowledge
00:35:26
cutoff date or if a question comes in
00:35:28
about some document that you do not
00:35:30
understand go ahead and use that tool
00:35:32
then we'll go do all the retrieval part
00:35:34
of things cuz you can imagine taking
00:35:36
your data embedding it turning into a
00:35:38
list and so on it's going to be a little
00:35:39
bit timec consuming it's going to
00:35:40
require some engineering and so on so if
00:35:42
you don't have to do that that would be
00:35:44
ideal what we also see a lot with
00:35:46
production grade rag pipelines is it's
00:35:48
not as simple as just having one vector
00:35:50
data store for one particular set of
00:35:52
data you can imagine a very very large
00:35:54
scale system is doing quite a bit of
00:35:56
embeddings over quite a few different
00:35:57
kinds of databases and data stores so
00:36:00
depending on your particular use case
00:36:02
you might not just stuff everything in
00:36:04
one vector data store you might actually
00:36:05
have multiple ones with multiple tools
00:36:07
trying to figure out how to interact
00:36:09
with your application and generate the
00:36:11
correct
00:36:13
completion another really powerful tool
00:36:15
that I want to point you towards and
00:36:16
we'll talk a little bit more about this
00:36:16
with the idea of contextual retrieval is
00:36:18
the idea of actually being able to use a
00:36:20
model to rewrite your query the classic
00:36:23
example here is you might have some kind
00:36:24
of customer support situation where I
00:36:27
say something like my username is Ellie
00:36:29
and I just want to let you know I am
00:36:30
pissed this service is terrible I am
00:36:32
very unhappy I've been waiting for 3
00:36:33
hours to talk to a human and all I get
00:36:35
is this chatbot I am feeling miserable
00:36:37
at the end of the day what we really
00:36:39
need to do is just look up Ellie so can
00:36:41
we instead of taking all that data and
00:36:43
all that you know frustration that Ellie
00:36:44
has maybe we'll handle that in a
00:36:46
different way instead of taking that and
00:36:47
embedding it and trying to find similar
00:36:49
searches to a frustrated customer can we
00:36:51
instead rewrite the query just to focus
00:36:53
on the information that we need this is
00:36:56
very commonly done by kind of throwing a
00:36:58
a smaller model that can do a little bit
00:36:59
more of that classification or rewriting
00:37:02
to essentially get you better results so
00:37:04
with rag with these ideas of rag
00:37:06
pipelines and so on there's so much
00:37:07
complexity that you can start throwing
00:37:09
on top but my goal here for this session
00:37:11
is just to make sure you have at a high
00:37:12
level an understanding of what it is how
00:37:14
it works why you might want to use it
00:37:15
for your particular use
00:37:17
case when you think about that chunking
00:37:19
process we talked a bit about breaking
00:37:21
your data into smaller sections I want
00:37:23
to point you towards this format that we
00:37:25
recommend and again you can take a look
00:37:26
at our documentation for more examples
00:37:27
of this but what's really important when
00:37:29
you break up your data is that you give
00:37:31
it some kind of meta data so that we can
00:37:34
search for it not only by the
00:37:35
information but also retrieve some
00:37:37
metadata so you can see up here we have
00:37:39
this documents tag and inside we have a
00:37:41
document subtag with an index of one
00:37:44
those kind of attributes that we're
00:37:46
adding here you can treat as metadata or
00:37:48
higher level information about the
00:37:50
particular document that could be very
00:37:51
helpful when doing retrieval to get
00:37:53
things like information aside from the
00:37:55
source that I might want to know some
00:37:56
unique identi maybe the author of that
00:37:58
particular piece or such in your prompt
00:38:01
you can then refer to those by their
00:38:02
indic or their metadata for that kind of
00:38:05
filtering so you'll see ideas around
00:38:06
metadata filtering as well to improve
00:38:08
General retrieval
00:38:11
performance what's really interesting
00:38:12
about R I would say at this point is it
00:38:15
is not going anywhere but it is
00:38:16
constantly constantly constantly
00:38:18
evolving the first thing that I want to
00:38:19
talk about that is really kind of
00:38:20
leading us to a slight change in the way
00:38:23
that we think about rag is a tool that
00:38:24
we have had in our first party API but
00:38:26
that we actually just released with
00:38:28
Amazon Bedrock that is the idea of
00:38:29
prompt caching prompt caching is the
00:38:32
idea of instead of taking some document
00:38:35
and putting it in your prompt and then
00:38:37
on each conversation we got to generate
00:38:39
all the tokens again for that particular
00:38:41
document that's going to get a little
00:38:43
expensive that's going to be a bit time
00:38:44
consuming can we instead explicitly say
00:38:47
I'm going to give you these documents I
00:38:49
want you to cash these documents they
00:38:51
are not going to change these are static
00:38:52
pieces of information and then on all
00:38:54
subsequent prompts don't go ahead and
00:38:56
regenerate all the tokens for that just
00:38:58
find them in the cache if you're
00:39:00
familiar with caching with any kind of
00:39:02
architecture it's a very similar kind of
00:39:03
idea find and store this information in
00:39:06
a quick retrieval process because it is
00:39:08
not going to change anytime
00:39:10
soon why is this meaningful with rag
00:39:13
because instead of taking documents and
00:39:15
chunking them and embedding them and
00:39:17
storing them somewhere you can actually
00:39:18
take your documents and put them in the
00:39:20
prompt itself and you can cash those
00:39:22
documents to then retrieve information
00:39:25
very very quickly without doing all the
00:39:26
embedding and so on obviously context
00:39:29
window size is going to take play here
00:39:31
so if you have a lot a lot a lot of
00:39:32
documents you still can't do that but at
00:39:34
the same time context windows are vastly
00:39:37
improving in size compared to where we
00:39:39
were 6 months ago a year ago two years
00:39:41
ago we are nearing worlds where we will
00:39:42
have millions tens of millions of tokens
00:39:44
in context windows so when you think
00:39:46
about combining prompt caching taking a
00:39:48
tremendous amount of your documents
00:39:50
caching them and then leveraging a very
00:39:52
very large context window you can
00:39:54
potentially avoid the need for having to
00:39:55
do that chunking and so on so right now
00:39:58
if your data is of a medium size you
00:39:59
really got to jump to rag if your data
00:40:01
is of a large size you got to jump to
00:40:02
rag even a smaller size depending on
00:40:04
what you're working with can we instead
00:40:06
as we start to see prom caching get more
00:40:08
and more widespread as we start to get
00:40:09
context windows that are even larger can
00:40:11
we start to shift that a little bit
00:40:13
towards not jumping for rag not needing
00:40:15
to worry about these massive pipelines
00:40:17
and infrastructure and instead just
00:40:18
putting that in the context of the
00:40:20
prompt what's interesting as well we
00:40:22
talked about Vector databases talked
00:40:24
about this idea of embeddings and
00:40:25
embedding models but there's always a
00:40:27
lot of research there's always a lot of
00:40:29
questions for is this the best approach
00:40:30
is this really what we want to do
00:40:32
there's a lot of really interesting
00:40:33
emerging research for using different
00:40:34
data structures for storing the kind of
00:40:37
retrieval that you want so instead of
00:40:39
using a vector database a list of large
00:40:41
floating Point numbers can we instead
00:40:43
use a different data structure like a
00:40:44
graph can we instead think of our data
00:40:46
as just a series of nodes interconnected
00:40:49
by edges and when we try to retrieve our
00:40:51
data find those similar nodes with more
00:40:53
meaning as opposed to a general semantic
00:40:56
search while this is not yet something
00:40:58
that's very very large in production
00:40:59
there's a lot of really interesting
00:41:00
research around this idea of graph Rag
00:41:02
and knowledge graphs that may lead us to
00:41:04
potentially not have to use Vector
00:41:05
stores and get better performance with
00:41:07
retrieval we going talk a little bit
00:41:09
soon about the idea of contextual
00:41:11
retrieval this is really interesting
00:41:12
research that we put out and again for
00:41:14
those of you that have been to the booth
00:41:15
you know that we have a section on
00:41:16
Research as well so I welcome you to
00:41:17
come take a look at that chat about the
00:41:18
research if you're interested especially
00:41:20
in things like interpretability how our
00:41:22
models are behaving from the inside out
00:41:24
trying to make sense of that contextual
00:41:26
retrieval that I'll talk about we got a
00:41:27
lot of really good uh good folks to talk
00:41:28
to about that kind of
00:41:30
stuff as we mentioned embedding models
00:41:32
are constantly changing there are a wide
00:41:34
variety of embedding models for all
00:41:35
different kinds of providers from open
00:41:37
source to commercial providers for all
00:41:38
different kinds of dimensions and
00:41:39
pricing and so on these are always
00:41:41
getting better it's also a very very
00:41:43
large world in the reranking space I'll
00:41:45
talk a little bit about reranking your
00:41:46
results when you get them back we're
00:41:48
also seeing a lot of improvement for
00:41:50
measuring the effectiveness of rag
00:41:52
through evaluations this can be done
00:41:54
using platforms that are Enterprise
00:41:55
grade this could also be done using
00:41:57
using Open Source Products like promp Fu
00:41:59
there's also an entire evaluation
00:42:00
framework called rag ass or rag
00:42:02
assessments that basically will analyze
00:42:04
the relevance the accuracy really
00:42:06
important metrics for whether you're
00:42:07
doing a good job or not with your rag
00:42:10
pipeline again I'll add a really
00:42:11
important bullet point here because in a
00:42:13
second we'll talk about fine-tuning
00:42:15
other techniques like model distillation
00:42:16
to try to teach the model new things and
00:42:18
extract more knowledge and introduce
00:42:20
Behavior change but before you jump to
00:42:21
any of that even before you try to go
00:42:24
crazy with your rag pipeline think a lot
00:42:26
about prompting can get a lot a lot a
00:42:28
lot of wins with very minimal
00:42:29
engineering and effort through prompting
00:42:31
so even with Rag and other options there
00:42:33
are always
00:42:34
optimizations I mentioned a little bit
00:42:36
about this idea of contextual retrieval
00:42:38
so I want to point you towards this idea
00:42:40
and research that we have here instead
00:42:42
of Performing the traditional approach
00:42:44
like we mentioned of taking your Corpus
00:42:45
of data and breaking it up into chunks
00:42:48
before you break it up into chunks what
00:42:50
we're going to do is we're actually
00:42:52
going to bring cloud back in the mix
00:42:54
we're going to run a little bit of
00:42:56
prompting on each of those chunks to
00:42:59
provide some context we're going to give
00:43:00
it 50 or 100 or so extra tokens which
00:43:03
reference the context of that chunk
00:43:05
what's a classic example here let's say
00:43:07
we take a a large 10k of a very very
00:43:10
large publicly traded company maybe
00:43:12
let's go with Amazon that seems relevant
00:43:14
we go ahead and we take our giant 10K
00:43:17
and we go ahead and we chunk it and one
00:43:18
of the sections in that chunk is
00:43:20
something like Revenue increased 37%
00:43:22
while cost decreased by 12% and
00:43:24
operating income was up 63%
00:43:27
seems like a reasonable thing we can
00:43:28
chunk but if someone goes and searches
00:43:31
for that particular chunk and they say
00:43:32
how's the company doing from a operating
00:43:35
perspective we don't know if that chunk
00:43:37
is is that the last quarter is that a
00:43:38
forecast for the next quarter was that
00:43:40
from last year what does that refer to
00:43:42
what is the context of that information
00:43:44
within the scope of the document and the
00:43:46
goal of contextual retrieval is simply
00:43:48
just to add a little bit more context to
00:43:50
each of those chunks so that when we
00:43:52
retrieve we can get much more accurate
00:43:54
results we don't just get a block of
00:43:55
text we get a block of text with a
00:43:57
little bit of context for what it refers
00:43:59
to you can take a look in our research
00:44:01
for what this prompt looks like how it's
00:44:03
run the different kinds of searches that
00:44:05
we have you can also see here that
00:44:07
instead of just using an embedding model
00:44:09
we're actually doing what's called a
00:44:10
hybrid search so that we're performing
00:44:12
the semantic search but we're also
00:44:13
performing other popular kinds of
00:44:15
searches a very common one called bm25
00:44:17
or best match 25 it's kind of that TF
00:44:20
IDF similarity search right there so
00:44:22
when performing these kinds of large rag
00:44:24
pipelines and pre-processing we think a
00:44:26
lot of about not only what we're going
00:44:28
to store but also how we're going to
00:44:30
retrieve and for those of you that have
00:44:31
questions I welcome you to ask those and
00:44:33
also take a look at our uh our booth if
00:44:35
you have questions on those so we talked
00:44:37
a bit about tool use we talked a bit
00:44:39
about prompting we talked a bit about
00:44:42
the idea of taking llms and extending
00:44:44
their functionality we saw some really
00:44:45
awesome demos of how you can essentially
00:44:47
use cloud to analyze screenshots and
00:44:49
perform actions and so on and what that
00:44:51
really leads us to is shifting from just
00:44:54
the tool to the eventual teammate that
00:44:56
you can work
00:44:58
with as the complexity of your use case
00:45:01
grows the technical investment you need
00:45:03
to make increases as well you can
00:45:05
imagine you might have something like a
00:45:07
classification task tell me if this is
00:45:08
Spam tell me if this is not spam tell me
00:45:10
if this is hot dog tell me if this is
00:45:11
not hot dog that's a relatively easier
00:45:14
task in the scheme of things we provide
00:45:16
enough examples we have enough
00:45:17
intelligence to do so we can lean on
00:45:19
models that maybe are a little bit more
00:45:20
cost effective potentially easier on the
00:45:22
latency side of things because we can
00:45:24
solve those we move on to summarization
00:45:27
our question and answer but the second
00:45:29
that we really start to stretch our
00:45:30
imagination to what we can do with these
00:45:32
tools like models taking independent
00:45:34
actions on their own models being given
00:45:37
a task and then models required to plan
00:45:40
out an action remember previous actions
00:45:43
with some kind of memory course correct
00:45:45
when things go wrong make use of a wide
00:45:48
variety of tools follow very complex
00:45:50
flows of instruction that's where we
00:45:52
really start to shift things more
00:45:54
towards this idea of Agents
00:45:57
what is an agent what is an llm agent
00:45:59
it's a system that combines large
00:46:00
language models with the ability to take
00:46:02
actions in the real world or digital
00:46:03
environments and what do they include
00:46:06
what we have right here I'll give you
00:46:07
the very high level definition and we'll
00:46:09
talk about some of the diagrams with a
00:46:10
bit more interest at the end of the day
00:46:13
I want you to think of an agent as just
00:46:14
three things a model in this case Claude
00:46:16
35 Sonet a bunch of tools or functions
00:46:20
that the agent can use I'm going to give
00:46:22
you a tool like the ability to search
00:46:23
the web I'm going to give you a tool
00:46:25
like the ability to analyze a screen
00:46:26
screenshot I'm going to give you a tool
00:46:28
like the ability to take a look at the
00:46:29
terminal and figure out what command to
00:46:31
pass
00:46:32
in and then we're going to go ahead and
00:46:34
give it a goal and then we're just going
00:46:37
to let it let it go what do we mean by
00:46:39
Let It Go essentially if you're familiar
00:46:42
with python if you look at our computer
00:46:43
reference documentation it's while true
00:46:46
it's an infinite Loop go ahead and take
00:46:48
the next bit of data and execute on it
00:46:51
take the next bit of data execute on it
00:46:52
take the ne next bit of data execute on
00:46:54
it again and again and again and again
00:46:56
so I'm going to give you the goal of
00:46:57
trying to find something it's probably
00:46:59
going to require 5 10 15 20 steps and
00:47:01
just like a human would the human would
00:47:03
probably plan what do I need to do I'm
00:47:05
probably going to have to open up the
00:47:05
browser and search for something and
00:47:07
then going to go ahead and cross
00:47:08
reference that with something that I
00:47:09
have in a document and then I'm going to
00:47:10
take that information and I'll email it
00:47:11
to someone and then maybe I'll go ahead
00:47:13
and go back to my text editor and make
00:47:15
that change and push that up to GitHub
00:47:16
and submit a poll request and then
00:47:17
communicate with my team lead about the
00:47:19
change I made these are all things that
00:47:21
when I say it very quickly you're like
00:47:22
wow that's a lot but think about what we
00:47:24
do every day many many different kinds
00:47:26
of tasks very very quickly all in
00:47:28
sequence we have the ability to plan out
00:47:30
what we want to do we have the tools at
00:47:32
our disposal or at least Claude and the
00:47:34
internet to figure out the tools that we
00:47:36
need and we have the memory to remember
00:47:38
what needs to be
00:47:39
done when you think about agents and
00:47:41
agentic workflows it's really just an
00:47:43
extension of all the things you've seen
00:47:45
before it's an extension of tool use
00:47:48
it's an extension of prompting it's
00:47:50
combining those together to perform
00:47:52
tasks just like we do in the real world
00:47:55
instead of a single task instead of a
00:47:56
sing single turn we're talking about
00:47:58
longer multi- turn
00:47:59
examples I really like this slide
00:48:01
because it drives a strong analogy for
00:48:03
those of you familiar with building web
00:48:04
applications essentially what it looks
00:48:06
like with building agents the same way
00:48:08
that you might start with building a
00:48:09
static page you might have some HTML
00:48:11
some CSS some JavaScript in the context
00:48:14
of an agent that's a pretty
00:48:15
straightforward
00:48:16
conversation as you start thinking about
00:48:18
interactivity as you start adding
00:48:20
JavaScript and you're handling clicks
00:48:21
and you're handling interactions this is
00:48:23
where you know the prompts have to get a
00:48:25
little bit more complex so the same way
00:48:27
that you think about expanding into the
00:48:29
world of agents and leveraging more in
00:48:30
the geni space really try to draw that
00:48:32
analogy to software other things that
00:48:34
you might be familiar with what's the
00:48:36
business use case what am I use what do
00:48:37
my users need what am I trying to build
00:48:39
for at the end of the day my application
00:48:41
grows larger I got to start separating
00:48:43
JavaScript files or whatever language
00:48:45
that I'm working with at this point if
00:48:48
you're doing this in a generative
00:48:49
generative AI world you know we're we're
00:48:51
improving the prompts we're not getting
00:48:52
too crazy as we start thinking more
00:48:55
about building a website as we start
00:48:57
moving to Frameworks as we start
00:48:58
thinking about breaking things into
00:48:59
microservices and distributed systems
00:49:01
and so on as we start reaching larger
00:49:03
scale that's where I want you to kind of
00:49:05
draw the parallel to this is where
00:49:07
agents can really come into play this is
00:49:09
where the idea of agentic workflows come
00:49:10
in so what's really nice about this just
00:49:12
from a visual perspective is if you're
00:49:14
familiar with things like software
00:49:15
you're familiar with the analogy of
00:49:17
building a website and so on you can
00:49:19
draw that parallel to where things like
00:49:21
Agents come into the
00:49:23
mix we're still relatively early in the
00:49:25
world of agents in terms terms of how we
00:49:27
think about memory how we think about
00:49:28
planning but what's really really
00:49:30
powerful about these models and
00:49:31
something that even Maggie spoke a
00:49:32
little bit about some of the benchmarks
00:49:34
that we're seeing with model performance
00:49:36
especially with 35 Sonet and even 35 Hau
00:49:38
on the coding front make us more and
00:49:40
more confident that it can perform the
00:49:42
tasks necessary that agents need to do
00:49:45
so what makes the model the right choice
00:49:46
for agentic workflows it's got to be
00:49:48
really really good at following
00:49:50
instructions and just like humans some
00:49:52
models are not great at following lots
00:49:54
and lots and lots of instructions if I
00:49:55
were to give the model hundreds and
00:49:56
hundreds of tasks to do could it
00:49:58
remember those tasks could it handle it
00:49:59
in the appropriate order could it figure
00:50:01
out when something went wrong and go
00:50:02
back and correct itself so what's really
00:50:05
exciting about this kind of agentic
00:50:06
workflow and situation that we're in is
00:50:08
that we're also starting to develop
00:50:09
benchmarks that can really determine the
00:50:12
effectiveness of these models for those
00:50:14
of you that are interested in digging in
00:50:15
that a little bit more I really
00:50:16
recommend taking a look at sbench or swe
00:50:19
bench this is a benchmark that basically
00:50:22
takes a model takes a open source
00:50:24
repository so it could be a very large
00:50:26
codebase something like jeno or very
00:50:28
large python code base and it basically
00:50:31
puts the model in a situation where it
00:50:33
says here's the code base here is the
00:50:35
issue go and write the poll request go
00:50:37
and write the code necessary to get
00:50:39
tests to pass and also make sure that
00:50:41
things don't break these kinds of
00:50:43
benchmarks are really the foundation for
00:50:45
how we can determine how we can feel
00:50:46
confident that the models don't just
00:50:48
understand ideas or answer highle
00:50:50
questions about code but actually
00:50:51
perform things similar to what humans do
00:50:53
so definitely recommend on a lot of our
00:50:55
documentation a lot of the kind of model
00:50:57
cards that we have taking a look at s
00:50:58
bench and then another one tow bench for
00:51:00
following particular actions really
00:51:02
really interesting data source for how
00:51:04
we think about the effectiveness of
00:51:05
these
00:51:07
models the last piece I want to talk
00:51:08
about is an idea called fine-tuning I'll
00:51:10
also talk a little bit about the idea of
00:51:11
model
00:51:12
distillation we're in a situation where
00:51:15
prompting is not getting us where we
00:51:17
need to go rag is not getting us where
00:51:19
we need to go so there are other options
00:51:22
for trying to improve the performance of
00:51:25
your model it always sounds very
00:51:27
exciting when you say some improve the
00:51:28
performance of your model so see a lot
00:51:29
of heads go up of like cool how do I do
00:51:31
that and it's very tempting to look at
00:51:32
this and say ah fine tuning is the
00:51:34
answer but fine-tuning is just one way
00:51:36
to try to solve a certain problem so I
00:51:37
want to do my best to kind of give you a
00:51:39
little bit of an overview of what
00:51:41
fine-tuning is how it works in a
00:51:43
nutshell fine tuning something you can
00:51:45
do through the Bedrock interface with
00:51:47
Hau 3 it's the ability to take a curated
00:51:51
data set this is also called supervised
00:51:53
fine-tuning so if you're familiar with
00:51:54
that idea in machine learning we are
00:51:56
essentially building a curated set of
00:51:58
data of inputs and outputs that we would
00:52:02
like the model to respond we're
00:52:03
basically giving it the question and the
00:52:04
answer so this is not unstructured we
00:52:06
are not giving it a question and let
00:52:08
letting it figure out the answer we are
00:52:09
basically going to curate a highquality
00:52:11
data set which again that's going to
00:52:13
take time that's going to take effort
00:52:14
that's going to take a lot of thought
00:52:15
into what data you can use to move
00:52:17
things forward and what we're going to
00:52:19
do is we're going to take that long long
00:52:21
long list of inputs and outputs and we
00:52:24
essentially are going to take our base
00:52:25
model this could to be something like
00:52:27
hiq 3 and we're going to go ahead and
00:52:31
run that data through a training set
00:52:35
using Hardware this could be some of the
00:52:36
hardware that Amazon provides and we're
00:52:38
going to go ahead and we're going to
00:52:39
come out with a custom model so what
00:52:41
we're actually going to do we're update
00:52:42
the underlying weights of the model and
00:52:44
in the context of fine tuning there are
00:52:45
many different kinds of aspects for
00:52:47
updating certain kinds of weights or
00:52:49
parameter efficient weights and so on
00:52:50
but what we're actually doing is
00:52:52
updating the underlying model weights to
00:52:54
then produce a custom model
00:52:57
we then evaluate that custom model and
00:52:59
hope that it does better at a particular
00:53:01
task the thing you want to be really
00:53:03
careful of with fine-tuning just looking
00:53:05
at all of these data is really hard to
00:53:07
come by that's high quality you can
00:53:09
introduce data to your model and your
00:53:11
model can become worse so fine tuning is
00:53:13
not a guarantee everything gets better
00:53:14
right away that's why we have a applied
00:53:16
AI team that focuses strictly on fine
00:53:18
tuning at our company you want to be
00:53:20
mindful of the model that you're working
00:53:21
with you use hiou 3 and then another
00:53:23
model comes out that is far more
00:53:25
intelligent and then all of a sudden you
00:53:26
have a custom model and you can't just
00:53:27
undo this so the math right here doesn't
00:53:30
really allow for subtraction or undoing
00:53:32
once you go ahead and you run that
00:53:34
training and you spend the money in the
00:53:35
compute you got that custom model you
00:53:37
can go ahead and do this over and over
00:53:38
again but this is not a reversible
00:53:41
decision when should you consider
00:53:43
something like
00:53:44
fine-tuning the most common use case I
00:53:47
want you to think about with fine-tuning
00:53:48
is introducing Behavior change you want
00:53:51
to follow a specific kind of schema you
00:53:53
want to follow a particular kind of
00:53:55
format people always like to get the
00:53:56
analogy of Talk Like a Pirate but I want
00:53:59
you to think of that for a particular
00:54:00
use case you want the model to Output
00:54:02
things in a certain format you want the
00:54:03
model to be a little bit more
00:54:04
constrained with how it calls apis or
00:54:07
references particular documents this is
00:54:09
where you're going to find more
00:54:11
likelihood with fine-tuning again you're
00:54:13
going to see that rag is also a very
00:54:14
viable option for a couple of these
00:54:16
other tasks especially the latter ones
00:54:17
that I'll talk about in a second but
00:54:19
just remember the trade-offs here there
00:54:21
are many situations where you have Rag
00:54:22
and fine tuning but when you think about
00:54:24
rag that is something that you can
00:54:26
iterate on that's something that you can
00:54:27
change with your pipeline when you're
00:54:29
dealing with prompting that's also
00:54:30
something you can constantly iterate on
00:54:32
over and over again it's also why we
00:54:33
push a lot of our customers many times
00:54:35
to First focus on the prompt what can we
00:54:38
improve in the prompt what can we change
00:54:39
in the prompt what's what best practices
00:54:41
are missing from the prompt does the
00:54:42
prompt even make sense that's where we
00:54:44
want to really start thinking about
00:54:46
things if you're trying to teach the
00:54:49
model new knowledge you're trying to
00:54:51
teach the model something brand new and
00:54:54
then you hope that from that knowledge
00:54:56
it can learn other things and expand its
00:54:58
knowledge that is not very very likely
00:55:01
with fine
00:55:02
tuning fine tuning is not a great way to
00:55:05
teach the model brand new things and
00:55:07
expect it to generalize for other tasks
00:55:09
that is where we have not seen a
00:55:10
tremendous amount of success with fine
00:55:12
tuning as those algorithms change as we
00:55:14
think more about this field and so on
00:55:16
that may change but at the same point I
00:55:18
really try to anchor on this slide quite
00:55:20
a bit for thinking about that decision
00:55:21
we see a lot of customers that jump to
00:55:23
fine tuning as a this is the way to
00:55:24
solve all of my problems there actually
00:55:27
other ways to think about solving that
00:55:28
at a high level again rag very very
00:55:30
common way to go about handling those
00:55:33
pieces so in general avoiding the
00:55:35
pitfalls of fine tuning it's exciting
00:55:37
but it's not for every single use
00:55:39
case if you have any questions or
00:55:41
curious about your particular use case
00:55:43
again come talk to us we're happy to
00:55:44
talk about those particular situations
00:55:45
we have a lot of folks on the team that
00:55:46
have done a lot of Direct Customer work
00:55:48
with fine tuning happy I'm sure to
00:55:49
answer
00:55:50
questions at the end of the day before
00:55:52
you jump to rag there's a reason why we
00:55:54
have the order of things in this
00:55:55
particular presentation you want to
00:55:57
think a lot about what you can get out
00:55:58
of prompt engineering be mindful of the
00:56:00
versions you have with prompt
00:56:01
engineering iterate on your prompts as
00:56:03
you go on and then no matter what you do
00:56:07
the most important thing I want you to
00:56:08
have if you walk away from anything from
00:56:09
this presentation when you think about
00:56:11
building any kind of generative AI
00:56:12
application when you think about
00:56:14
actually going from proof of concept to
00:56:15
production you want to make sure you
00:56:17
have some kind of evaluation criteria we
00:56:19
like to call these EV vals as well if
00:56:21
you're coming from software this is
00:56:22
essentially like a unit test or maybe an
00:56:24
integration test you want to make make
00:56:26
sure that you have some kind of way in
00:56:28
your application of benchmarking the
00:56:30
performance of the model of the prompt
00:56:33
of the fine-tuning of the rag pipeline
00:56:35
if you have no way of doing that then
00:56:38
prompt Engineering in this entire
00:56:39
ecosystem just kind of becomes an art
00:56:41
and not a science so no matter what
00:56:43
you're doing it is Mission critical to
00:56:45
make sure that when you're building
00:56:46
these applications you are using some
00:56:48
kind of benchmarking or evaluation Suite
00:56:50
Amazon Bedrock provides that many open
00:56:53
source libraries and companies provide
00:56:54
that as well but just like you wouldn't
00:56:56
develop software that is Mission
00:56:57
critical or software in production
00:56:59
without any kind of testing you want to
00:57:00
make sure you have that as well so do
00:57:02
you have an evaluation with a success
00:57:04
criteria whether it's just for fine
00:57:05
tuning or rag or so on it's really one
00:57:07
of the most important pieces that you
00:57:08
can do have you tried prompt engineering
00:57:11
determine if you have a baseline with
00:57:13
that prompt engineering and again make
00:57:15
sure you have evaluation so that you're
00:57:17
really getting as much as you possibly
00:57:18
can out of the prompt we see with a lot
00:57:20
of customers not having a robust enough
00:57:22
evaluation Suite basically just means
00:57:23
we're kind of starting from scratch
00:57:24
we're building on a house of carts
00:57:26
the last part with fine tuning again
00:57:28
like we mentioned it's irreversible and
00:57:30
when you think about the data you need
00:57:31
to curate the amount of data that you
00:57:33
need that's where things are going to
00:57:35
get a little bit trickier so how do you
00:57:36
plan to build that fine-tuning data
00:57:39
set got a couple minutes left I just
00:57:41
want to wrap up with all the things that
00:57:43
we've seen here because it's a lot of
00:57:44
information but at the end of the day my
00:57:45
goal is to give you a bit of a
00:57:46
foundation here that's what I hope we
00:57:48
have talked a bit about tool use talked
00:57:51
a bit about computer use this idea idea
00:57:53
of extending Cloud's capabilities
00:57:55
Cloud's functional
00:57:57
just by providing some tools what are
00:57:58
tools just these these objects these key
00:58:01
value pairs we give it a name we give it
00:58:04
some kind of description and we provide
00:58:06
the necessary arguments or parameters
00:58:08
that that particular tool needs we then
00:58:10
let Claud do the rest of the work if a
00:58:12
prompt comes in Claude says looks like
00:58:13
someone's trying to use that tool again
00:58:16
you're going to hear a lot of things
00:58:17
like Claude controls the computer and so
00:58:19
on Claude itself is not moving the mouse
00:58:23
and clicking and opening and closing and
00:58:24
executing commands all that Claud is
00:58:26
doing is taking in some screenshot
00:58:29
interpreting the necessary command and
00:58:31
then there is underlying code that a
00:58:32
developer writes to go ahead and execute
00:58:35
that code necessary so tool use and
00:58:37
computer use it's a really really
00:58:39
interesting and Powerful way to achieve
00:58:40
all these new and interesting use cases
00:58:43
but at the end of the day from a
00:58:44
conceptual standpoint it's not something
00:58:46
that should appear terribly intimidating
00:58:48
we talked a little bit about rag
00:58:50
retrieving data externally talked about
00:58:52
that pre-processing side of things
00:58:53
breaking up our data into chunks
00:58:55
embedding we also talked about some of
00:58:56
the other interesting ideas in this
00:58:58
ecosystem from prompt caching to
00:59:00
contextual retrieval and so on you're
00:59:01
welcome to dig into that research
00:59:03
finally wrapped up quite a bit with fine
00:59:06
tuning so we got a lot of information
00:59:08
coming out here again I just want to say
00:59:09
thank you all so so much for giving me
00:59:10
some time happy to answer questions
00:59:12
stick around for a little bit and have a
00:59:13
wonderful reinvent everyone thank you
00:59:14
all so much