AWS re:Invent 2024 - Leverage Anthropic's Claude models for AI's evolving landscape (AIM123)
Ringkasan
TLDRPræsentationen dækkede en række emner fra introduktion af virksomheden Anthropic og deres mission, til en detaljeret gennemgang af deres AI-model, Claude. Grundlagt i 2021, søger Anthropic at fremme sikker AI-udvikling med forskningsfokus på alignment og interpretability. Claude 3.5 Sonet blev lanceret i oktober 2024 med funktioner som computerbrug og forbedret agentisk kapabilitet, der gør det lettere at navigere komplekse arbejdsprocesser. Virksomheder som Jane Street og DoorDash bruger Claude til at forbedre effektiviteten i deres arbejdsprocesser. Computerbrugsfunktionerne i Claude blev demonstreret gennem video, der viste, hvordan det kan bruges til at udvikle hjemmesider og mere komplekse kodningsopgaver. Funktionerne understøtter manuel QA og kan analysere skærmbilleder for nødvendige handlinger. Desuden blev emner som prompt engineering, herunder brugen af prompt-generatoren, RAG (Retrieval Augmented Generation) og fine-tuning behandlet. Prompt-generatoren hjælper med hurtigt at konstruere prompts ved hjælp af bedste praksiser. RAG bruger eksterne data til at forbedre sprogmodellernes svar, mens tool use udvider Claudes funktionalitet ved at bruge eksterne værktøjer og funktioner. Fine-tuning kan ændre en models adfærd baseret på specifikke behov, men kræver omhyggeligt valgte data for at undgå forringelse af modellen.
Takeaways
- 🤖 Anthropic blev grundlagt i 2021 for at fremme sikker AI-udvikling.
- 🆕 Claude 3.5 Sonet inkluderer ny computerbrugsfunktionalitet og agentiske evner.
- 🏢 Virksomheder som Jane Street og DoorDash bruger Claude for øget effektivitet.
- 👨💻 Computerbrugsfærdigheder hos Claude blev fremvist gennem kodningsdemonstrationer.
- 🔧 Prompt-generatoren hjælper med at oprette korrekte og effektive prompts.
- 🔍 RAG integrerer ekstern viden i AI-modellers funktionalitet.
- 🧠 Fine-tuning justerer en models adfærd for specifikke anvendelser.
- 🛠 Tool use udvider Claude's kapabiliteter ved hjælp af eksterne værktøjer.
- 📈 Trænede modeller kan opnå forbedringer i instruktionsfølge og handling.
- 🖥️ Claude kan analyse skærmbilleder og forstå det nødvendige næste skridt.
Garis waktu
- 00:00:00 - 00:05:00
Talen forekommer taknemmelig for deltagernes tilstedeværelse og præsenterer sessionens agenda, der inkluderer emner som prompt engineering, fin-tuning og retrieverarkitekturer. Maggie Vo introducerer sig selv og sin kollega Ellie Shipet som de primære oplægsholdere og kundgør Anthropics mission om at sikre en sikker overgang til transformative AI.
- 00:05:00 - 00:10:00
Maggie præsenterer Claude 3.5 Sonet-modellen, lanceret i oktober 2024, og fremhæver dens forbedringer, især inden for kodegenerering og computerbrug. Claude kan navigere computergrænseflader og bruge agenter til mange-trins ræsonnering, hvilket støttes af eksempler som Jane Street.
- 00:10:00 - 00:15:00
Claude 3.5 Haikoo-modellen fremhæves sammen med succesen hos virksomheder som DoorDash, der bruger modellen til at rute billetforespørgsler. Maggie overdrager præsentationen til Ellie, der vil diskutere computerværdien mere detaljeret og vise de praktiske anvendelser.
- 00:15:00 - 00:20:00
Ellie, med fokus på prompt engineering og computersystemer, introducerer computerskærmbillederstjenesten, demonstreret gennem en demo af Alex. Denne funktion gør Claude i stand til at styre computermiljøer ved at interagere med skærmbilleder for at udføre komplekse opgaver.
- 00:20:00 - 00:25:00
En demo af Alex viser Claude, der interagerer med computerskærme for at udføre kodedeving og fejlretning, hvilket illustrerer agenttjenestens potentialer. Ellie opfordrer til deltagelse i demoer og informerer om muligheden for at observere Claude's evner live ved boderne.
- 00:25:00 - 00:30:00
Ellie diskuterer AI's integrationsmodeller, leveret via Amazon Bedrock. Målet er at levere sikkerhedsforanstaltninger og tjenester som embeddings og finjusteringer. Dybdegående information om værktøjsbrug, RAG-arkitektur og prompt engineering gives for bedre at kunne forstå og anvende teknikker.
- 00:30:00 - 00:35:00
Der indføres værktøjer til promptgenerering og forbedring af promptkvaliteten, med eksempler og bedste praksisser fremhævet. En ny undervises i punkt for punkt ræsonnering og brugen af XML-tags til bedre strukturering af prompts.
- 00:35:00 - 00:40:00
Importance of prompt engineering highlighted by encouraging revision and version control for efficacy. Examples and XML tags are emphasized as powerful tools to maintain clarity.
- 00:40:00 - 00:45:00
Ellie forklarer begrebet værktøjsbrug som en udvidelse af Clodes funktionalitet og noterer dets sammenligning med daily værktøjsopgaver som vejropdateringer. Claude’s computerbrug betragtes som en kompleks forlengelse af værktøjsbrug, hvilket gør det muligt for AI at interagere med skærmbilleder og skærmbilledetekst. RAG-arkitekturen gennemgås grundigt med et fokus på eksternt placerede data.
- 00:45:00 - 00:50:00
Brugen af embeddings til søgningsforbedring og lagring af data i en vektordatabase samt prompt-hentning for optimeret udførelse blev forklaret. Disse teknikker hjælper med at organisere information til mere effektiv og præcis hentning af data.
- 00:50:00 - 00:59:21
Ellie afslutter med diskussioner om fin-tuning og dets brug for adfærdsændring snarere end indlæring af ny information. Der blev også diskuteret betydningen af velstruktureret evaluering og variagning af arbejdsprocesser for at nå frem til optimal modelydelse.
Peta Pikiran
Video Tanya Jawab
Hvornår blev Anthropic grundlagt, og hvad er deres mission?
Anthropic blev grundlagt i 2021 med missionen om at sikre en sikker overgang til transformative AI ved hjælp af forskningsmetoder, der fokuserer på alignment og interpretability.
Hvornår blev Claude 3.5 Sonet lanceret, og hvilke funktioner har den?
Claude 3.5 Sonet blev lanceret i oktober 2024 og byder på forbedringer inden for bl.a. computerbrug og agentier.
Hvordan bruger virksomheder som Jane Street og DoorDash Claude?
Claude bruges til at forbedre kodnings effektivitet og udviklerproduktivitet ved virksomheder som Jane Street og DoorDash.
Hvilken slags opgaver blev demonstreret ved hjælp af Claude i præsentationen?
Præsenterede demovideoer viste brugen af Claude til opgaver såsom webudvikling ved hjælp af computerbrugsfunktioner.
Hvem er Maggie Vo og Ellie Shopic?
Maggie Vo leder det tekniske uddannelsesteam hos Anthropic, og Ellie Shopic er chef for teknisk træning.
Hvilke evner inden for computerbrug blev præsenteret for Claude?
Claude kan analysere skærmbilleder og bestemme nødvendige kommandoer, hvilket kan bruges til opgaver som at teste apps eller manipulere Excel-ark.
Hvad er formålet med Anthropic's prompt-generator?
Prompt-generatoren hos Anthropic er designet til at hjælpe med hurtig oprettelse af prompts ved at følge bedste praksis.
Hvad er RAG, og hvordan bruges det?
Rag står for "retrieval augmented generation," som integrerer ekstern viden med sprogmodeller ved at hente relevante data og bruge dem i en prompt.
Hvornår er det hensigtsmæssigt at bruge fine-tuning?
Fine-tuning kan bruges til at ændre model-adfærd til specifikke formater eller skemaer, men det kan gøre modellen dårligere, hvis data ikke vælges omhyggeligt.
Hvad er tool use, og hvordan bruges det med Claude?
Tool use er en metode til at udvide Claudes kapaciteter ved at give den værktøjer til at udføre specifikke opgaver baseret på forespørgsler.
Lihat lebih banyak ringkasan video
- 00:00:00hi everybody thank you so much for
- 00:00:02spending some of your day today coming
- 00:00:03to this session I hope it's a session
- 00:00:05that's super useful for you jam-packed
- 00:00:07full of um tips and tricks everything
- 00:00:09from prompt engineering to fine-tuning
- 00:00:11to retrieval architectures and so on um
- 00:00:14my name is Maggie vo I lead the tech the
- 00:00:16technical Education team at anthropic
- 00:00:18and my colleague is Ellie shopic who
- 00:00:20will be doing the majority of this
- 00:00:22presentation he is much more
- 00:00:23knowledgeable than I am um and he's the
- 00:00:25head of Technical
- 00:00:27Training again pictures of our faces
- 00:00:31guess so our agenda today um is lightly
- 00:00:34I'll go into an introduction about
- 00:00:35anthropic an overview of our latest
- 00:00:37models and what's exciting about them
- 00:00:39including some of our latest features
- 00:00:40like computer use then we'll go into
- 00:00:43techniques and best practices all sorts
- 00:00:45of things top to botom PRP engineering
- 00:00:46to agents and so
- 00:00:49on so to begin with I'm curious how many
- 00:00:52of you have heard of
- 00:00:54Claude great love that how many of you
- 00:00:57have used
- 00:00:58Claude and how many of you Claude in
- 00:01:00production somewhere great okay so
- 00:01:03pretty good familiarity with Claude and
- 00:01:06uh hopefully with anthropic as well um
- 00:01:08for those of you who don't know we're a
- 00:01:09pretty young company we were founded in
- 00:01:122021 and anthropic is our mission is to
- 00:01:16help ensure that the world safely makes
- 00:01:18the transition through transformative AI
- 00:01:20we do this in a wide variety of ways
- 00:01:22with alignment research interpretability
- 00:01:24research um as well as of course making
- 00:01:27some of the world's most intelligent
- 00:01:28models while ensuring that they remain
- 00:01:30trustworthy we also push other companies
- 00:01:32to also be uh produce trustworthy
- 00:01:34systems and just in general we try to
- 00:01:35support businesses with generative AI
- 00:01:38with um you know foundational
- 00:01:40transformative uh features while also
- 00:01:43making sure that they are safe for us
- 00:01:46though safety is not something
- 00:01:47theoretical it's not about safety guard
- 00:01:49rails that you're not sure of during
- 00:01:50training and so on we are the frontier
- 00:01:52of research that makes Claude easier to
- 00:01:55steer harder to jailbreak and also the
- 00:01:57least likely model to hallucinate on the
- 00:01:59market
- 00:02:00according to um Galileo's llm
- 00:02:02hallucination index and we're very proud
- 00:02:05of these as we believe that these are
- 00:02:06critical pieces to making sure you can
- 00:02:08trust the models you have in deployment
- 00:02:10um or even just in your personal life
- 00:02:12and personal
- 00:02:15use in October 2024 we launched uh
- 00:02:18Claude 3.5 Sonet um the latest version
- 00:02:21of it and we upgraded Cloud 3.5 Sonet
- 00:02:25basically to a state-of-the-art model in
- 00:02:27quite a variety of areas especially in
- 00:02:29coding where we um you know put on a
- 00:02:32wide variety of actual uh you know task
- 00:02:35oriented benchmarks not just theoretical
- 00:02:37remove from everyday life kind of
- 00:02:39benchmarks and it showed some vast
- 00:02:40improvements over whatever was uh
- 00:02:42leading uh before we also released a
- 00:02:45cloud 3.5 hi coup model including uh 3.5
- 00:02:48hi cou uh fast model on Bedrock
- 00:02:51specifically um that is inference
- 00:02:53optimize in order to serve you the best
- 00:02:55speeds for High cool level
- 00:02:57intelligence and then uh something that
- 00:02:59we'll talk about briefly it's also
- 00:03:01computer use which is a experimental
- 00:03:03beta um featuring capability that we
- 00:03:06released with the latest Cloud 345 Sonic
- 00:03:08model where cloud is able to use
- 00:03:11computer interfaces with a variety of um
- 00:03:13combination of screenshots and so on
- 00:03:14cloud can navigate around your whole
- 00:03:16computer you don't need apis and so on
- 00:03:18because because Cloud can understand the
- 00:03:20whole computer interface and start
- 00:03:21interacting with it
- 00:03:24directly a little bit about the cloud
- 00:03:263.5 Sonet model specifically you can see
- 00:03:28some of our benchmarks here on the
- 00:03:30standard benchmarks um but I think more
- 00:03:31importantly what's interesting is that a
- 00:03:33its computer vision has been vastly
- 00:03:35improved so it can do something like
- 00:03:37computer use and use it with greater
- 00:03:38accuracy we trained it for strong
- 00:03:41agentic capabilities too which is how
- 00:03:43computer use actually works it can do um
- 00:03:45much more nuanced thinking
- 00:03:48decision-making um and also it's just
- 00:03:50got much better at code generation in
- 00:03:52terms of accuracy in terms of um
- 00:03:54readability in terms of converting you
- 00:03:56know Legacy code into um you know modern
- 00:03:59uh code
- 00:04:01all of these um combine into a wide
- 00:04:05variety of use cases that I encourage
- 00:04:08you to explore and look at um things
- 00:04:10like as I mentioned code generation
- 00:04:12that's a big one but also great visual
- 00:04:14analysis in combination with computer
- 00:04:16use Cloud's able to do things like read
- 00:04:18Excel spreadsheets and do some analysis
- 00:04:20for you even manipulate the spreadsheets
- 00:04:22and then do some um you know additional
- 00:04:25uh write some functions and so on to
- 00:04:27help you with financial analysis for
- 00:04:28example and then we've also trained
- 00:04:31Cloud very well to just handle really
- 00:04:33complex queries a lot of instructions
- 00:04:35multi-step situations if this then that
- 00:04:38sorts of situations um so Cloud's really
- 00:04:40able to reason in multi-step ways quite
- 00:04:43well at this
- 00:04:44point here's an example I want to
- 00:04:46highlight from one of our customers Jane
- 00:04:48Street um and they're basically using
- 00:04:50Claude to scale their um coding to be uh
- 00:04:55much more efficient their quality is
- 00:04:57increased developer productivity is V
- 00:04:59increase and just the general time spent
- 00:05:02on having you know writing PRS and
- 00:05:03improving their codebase fixing code has
- 00:05:06vastly decreased we're very proud of
- 00:05:10this then there's hot Cloud 3 H cou 3.5
- 00:05:13H cou which I won't spend too much time
- 00:05:14on um it's you know for a similar speed
- 00:05:17to Cloud 3 ha cou um but with much
- 00:05:20greater uh intelligence and it's really
- 00:05:24really good at coding as well um
- 00:05:26especially fast cating where it is
- 00:05:28currently best in class and it's able B
- 00:05:29to um you know beat the previous Cloud
- 00:05:3235 Sonet model also um on sbench and um
- 00:05:35kind of other coding
- 00:05:38benchmarks this is an example of a use
- 00:05:40case that we found great success with
- 00:05:43where door Dash is using Claude at Claud
- 00:05:443.5 haou in order to um route tickets in
- 00:05:48their support system which has been much
- 00:05:51more increasing the accuracy time of
- 00:05:53response um in their uh ticket routing
- 00:05:57and their customer service and
- 00:05:59decreasing the you know average time
- 00:06:01resolution and the uh rerouting rates
- 00:06:03and accuracy
- 00:06:05rates the last thing I will talk about
- 00:06:08is computer use and here I'll actually
- 00:06:09transfer over to Ellie but just to
- 00:06:11introduce it computer use is this
- 00:06:13ability that Claude has to perform tasks
- 00:06:16by interpreting screenshots and then
- 00:06:18taking actions based on those
- 00:06:19screenshots so Claude does all sorts of
- 00:06:21things now like it can test your apps
- 00:06:22for you as it said it can manipulate
- 00:06:24Excel spreadsheets um you can plan
- 00:06:26vacations with Claude or ask Claud all
- 00:06:28sorts of questions that it can then go
- 00:06:29on the internet and browse and decide
- 00:06:31things um I'll pass La if you don't mind
- 00:06:33coming up I'll pass it here for LA to
- 00:06:35further explain the use cases as well to
- 00:06:37show you some things that computer use
- 00:06:38can do and so on awesome
- 00:06:42thank hopefully you all can hear me in a
- 00:06:44quick second when this there we go
- 00:06:45awesome hello everyone my name is Ellie
- 00:06:46shopic I'm the head of Technical
- 00:06:47Training here at anthropic and I'm super
- 00:06:49excited to get a chance to talk to you
- 00:06:51about computer use and prompt
- 00:06:52engineering and so on just a quick show
- 00:06:54of hands I know kind of piggyback what
- 00:06:55Maggie was saying how many of you had a
- 00:06:57chance to check out our booth and
- 00:06:58actually see any of these demos in
- 00:06:59action just quick show of hands all
- 00:07:02right awesome so I'm going to do a lot
- 00:07:03to give a theoretical overview of some
- 00:07:05of these ideas we're going to talk about
- 00:07:06computer use we're going to talk about
- 00:07:07prompt engineering we're going to talk
- 00:07:08about Rag and Tool use and fine tuning
- 00:07:10but if you want to actually see these in
- 00:07:11action I definitely recommend you come
- 00:07:13check us out because we've got plenty of
- 00:07:15demos going on in our booth we've got a
- 00:07:16really wonderful team also to guide you
- 00:07:18through any questions you may have or
- 00:07:19any kind of technical or non-technical
- 00:07:21questions that you have we're happy
- 00:07:22happy to answer so maie talked a little
- 00:07:24bit about computer use and computer use
- 00:07:26is this capability that our upgraded 35
- 00:07:28sonnet has to basically analyze a
- 00:07:30screenshot and figure out the necessary
- 00:07:33computer command to take so what that
- 00:07:35basically means is if we give Claude a
- 00:07:36screenshot of a desktop and we give
- 00:07:38Claude a prompt something like I need
- 00:07:40you to go and figure out some really
- 00:07:41awesome hiking trails near Vegas because
- 00:07:43I need to see the sun I've been at this
- 00:07:45conference for too long and I need to
- 00:07:46get outside Claude is going to go and
- 00:07:48take a look at that screenshot and it
- 00:07:49what it's going to do is it's actually
- 00:07:50going to say something like okay in
- 00:07:52order to figure out the best hikes near
- 00:07:54the convention I'm going to go ahead and
- 00:07:57go to the browser and do that research
- 00:07:59so I see based on this screenshot that
- 00:08:01the Firefox or Chrome or Safari or
- 00:08:03whatever browser you have icon is at
- 00:08:05this coordinate and you should go and
- 00:08:07execute a click command it's then up to
- 00:08:09you as the developer to write the code
- 00:08:11necessary to execute that click and you
- 00:08:13can do that in your language of choice
- 00:08:14the reference implementation that we
- 00:08:16have is in Python once that click
- 00:08:18happens we then take a screenshot and
- 00:08:19feed it again to Claude and Claude can
- 00:08:21then take a look at this and say great
- 00:08:23this looks like Firefox I see that in
- 00:08:25the browser bar we should probably type
- 00:08:27something like good hikes near Vegas and
- 00:08:29press enter you as the developer execute
- 00:08:32that command repeat repeat repeat that
- 00:08:34idea of giving a model some kind of
- 00:08:37tools some kind of ability to interact
- 00:08:39with some kind of system and then just
- 00:08:41running a loop over and over and over
- 00:08:42again is a very very very high Lev way
- 00:08:45of explaining what an agent is you're
- 00:08:46going to hear this term agents and
- 00:08:47agentic workflows and so on we're going
- 00:08:49to talk about that in a little bit but
- 00:08:50when you hear that term agents an llm
- 00:08:52something like Claud a set of tools some
- 00:08:55ability to perform actions like
- 00:08:57interpreting screenshots and then just a
- 00:08:59loop do it again do it again do it again
- 00:09:01do it again what's really interesting
- 00:09:03about computer use is while we are so
- 00:09:05excited about this technology it's still
- 00:09:06relatively early but in a short period
- 00:09:09of time we've been able to basically
- 00:09:10double the performance and evaluation
- 00:09:12benchmarks of previous state-of-the-art
- 00:09:13models and some of the use cases just
- 00:09:15like Maggie mentioned that we're
- 00:09:16exploring are in that kind of manual QA
- 00:09:18anytime that there's a human in the mix
- 00:09:20doing very tedious time incentive and
- 00:09:22very very laborious tasks things like
- 00:09:25manually QA a particular piece of
- 00:09:27software and thinking about every single
- 00:09:29inter action that might need to be taken
- 00:09:30and instead of having to worry about all
- 00:09:32those interactions and taking them
- 00:09:34programmatically we can say here Claude
- 00:09:36is the flow that I'd like you to go
- 00:09:37through here are all the permutations
- 00:09:38that I want you to analyze and I might
- 00:09:40even miss some of them so if there is a
- 00:09:42flow in this application or architecture
- 00:09:44that I've missed why don't you go ahead
- 00:09:45and do that and test it for me take
- 00:09:47those results write the output to some
- 00:09:49file I'll give you an expected output
- 00:09:50you let me know if those match so you
- 00:09:52can think about kind of QA workflows in
- 00:09:55that particular capacity I also want to
- 00:09:56show you a little demo this is from a
- 00:09:58colleague of mine Alex a wonderful
- 00:10:00wonderful demo here just talk a little
- 00:10:01bit about computer use with
- 00:10:03coding so go ahead and give that a video
- 00:10:05a second to play I'm Alex I lead
- 00:10:08developer relations at anthropic and
- 00:10:10today I'm going to be showing you a
- 00:10:11coding task with computer
- 00:10:14[Music]
- 00:10:19use so we're going to be showing Claude
- 00:10:21doing a website coding task by actually
- 00:10:24controlling my laptop but before we
- 00:10:26start coding we need an actual website
- 00:10:29ite for Claude to make changes to so
- 00:10:31let's ask Claude to navigate to cloud.
- 00:10:34within my Chrome browser and ask Claude
- 00:10:36within cloud. to create a fun '90s
- 00:10:39themed personal homepage for
- 00:10:42[Music]
- 00:10:43itself Claud opens
- 00:10:45[Music]
- 00:10:47Chrome searches for
- 00:10:52cloud. and then types in a prompt asking
- 00:10:55the other Cloud to create a personal
- 00:10:57homepage for itself
- 00:11:00[Music]
- 00:11:05cloud. returns some
- 00:11:10code and that gets nicely rendered in an
- 00:11:12artifact on the right hand side that
- 00:11:14looks great but I want to make a few
- 00:11:16changes to the website locally on my own
- 00:11:18computer let's ask Claude to download
- 00:11:21the file and then open it up in vs
- 00:11:24code Claude clicks the save to file
- 00:11:27button
- 00:11:29opens up VSS
- 00:11:32code and then finds the file within my
- 00:11:35downloads folder and opens it
- 00:11:37[Music]
- 00:11:39up perfect now that the file is up and
- 00:11:42running let's ask Claude to start up a
- 00:11:44server so that we can actually view the
- 00:11:46file within our
- 00:11:48[Music]
- 00:11:51browser Claude opens up the VSS code
- 00:11:54terminal and tries to start a server
- 00:12:00but it hits an error we don't actually
- 00:12:01have python installed on our machine but
- 00:12:03that's all right because Claude realizes
- 00:12:05this by looking at the terminal output
- 00:12:07and then tries again with Python 3 which
- 00:12:09we do have installed on our machine that
- 00:12:11works so now the server is up and
- 00:12:13running now that we have the local
- 00:12:14server started we can go manually take a
- 00:12:16look at the website within the browser
- 00:12:19and it looks pretty good but I noticed
- 00:12:21that there's actually an error in the
- 00:12:22terminal output and we also have this
- 00:12:23missing file icon at the top here let's
- 00:12:26ask Claude to identify this error and
- 00:12:28then fix it within the file Claude
- 00:12:31visually reads the terminal output and
- 00:12:33then opens up the find and replace tool
- 00:12:35in BS code to find the line that's
- 00:12:38throwing the actual error in this case
- 00:12:40we just ask Claude to get rid of the
- 00:12:41error entirely so it will just delete
- 00:12:43the whole line then Claude will save the
- 00:12:45file and automatically rerun the website
- 00:12:48so now that the error is gone let's go
- 00:12:50take a final look at our website and we
- 00:12:52can see that the file icon has
- 00:12:54disappeared and the aor is gone as well
- 00:12:57perfect so that's coding with computer
- 00:12:59use and Claude this took a few prompts
- 00:13:01now but we can imagine in the future
- 00:13:03that Claude will be able to do tasks
- 00:13:04like this end to
- 00:13:05[Music]
- 00:13:08end awesome thank you so much Alex and I
- 00:13:10could watch that over and over again but
- 00:13:12I'll have to talk to you for the next 40
- 00:13:13or so minutes so bear with me but what's
- 00:13:15really interesting about that is we saw
- 00:13:17screenshots we saw screenshots of a
- 00:13:18terminal we saw screenshots of a browser
- 00:13:20we saw screenshots of a text editor we
- 00:13:23fed those screenshots to Claude and
- 00:13:24Claude had the ability to analyze that
- 00:13:26screenshot and figure out the necessary
- 00:13:28command whether that is inputting text
- 00:13:30whether that's looking at a terminal and
- 00:13:32realizing that there's an error what you
- 00:13:34just looked at that idea of a prompt and
- 00:13:36then feeding a screenshot and an action
- 00:13:37and a screenshot and an action and a
- 00:13:39screenshot an action is that agentic
- 00:13:41workflow and again if you'd like to see
- 00:13:42that actually in action come check it
- 00:13:43out at the booth we've got that demo
- 00:13:44running you can kind of play around for
- 00:13:46yourself put in your own prompts and
- 00:13:47explore all the fun things you can do
- 00:13:48with computer use as we shift gears a
- 00:13:51little bit the models that we have in
- 00:13:53our ecosystem CLA 35 Sonet 35 Hau and so
- 00:13:56on and the other ones as well all
- 00:13:57supported in the Bedrock environment and
- 00:13:59what we're really really proud of is the
- 00:14:00ability to leverage some of the
- 00:14:02functionality that Bedrock has out out
- 00:14:03of the box from security as well as some
- 00:14:05of the embeddings and fine-tuning
- 00:14:07services that Amazon Bedrock offers
- 00:14:09combined with our models to produce
- 00:14:11really high power generative AI
- 00:14:14applications as we start thinking about
- 00:14:15some of the highle things in the
- 00:14:17generative AI space and some of the kind
- 00:14:18of most essential building blocks for
- 00:14:20building successful applications first
- 00:14:22thing I want to start with is the idea
- 00:14:23of prompt engineering talking a little
- 00:14:24bit about prompting I'm sure you're all
- 00:14:26very familiar with a prompt and if not I
- 00:14:28will gladly read Define it for you
- 00:14:29prompt is the information you pass into
- 00:14:32a large language model to get a response
- 00:14:34when you're working with a large
- 00:14:35language model in a more conversational
- 00:14:37way this could be something like Claud
- 00:14:38AI or just a chat bot you're working
- 00:14:40with you have a little bit more luxury
- 00:14:42with going back and forth oh you didn't
- 00:14:44really understand what I meant so here's
- 00:14:45what I want or no actually here's what
- 00:14:47I'm looking for give me this kind of
- 00:14:48thing but when you're dealing with
- 00:14:50prompts in more of an Enterprise or API
- 00:14:52context you really just have one
- 00:14:54opportunity to create a very high
- 00:14:56quality prompt that consists of your
- 00:14:57context your data your conversation
- 00:14:59history examples and so on and many
- 00:15:01times that leads to creating prompts
- 00:15:03that look like this and it's relatively
- 00:15:04scary fortunately it's very colorful so
- 00:15:06at least you have that part but if you
- 00:15:07look at this it's pretty intimidating to
- 00:15:09figure out all the parts of this and
- 00:15:11figure out what goes where and you might
- 00:15:12be looking at this and taking a picture
- 00:15:13furiously and trying to jot it down and
- 00:15:15so on but what's really challenging
- 00:15:17about getting these prompts ready for
- 00:15:18production is actually going from zero
- 00:15:20to one starting with this idea you have
- 00:15:22for an application I'm going to build a
- 00:15:23classifier I'm going to build a
- 00:15:24summarizer and then taking that and
- 00:15:26turning it into this leveraging the best
- 00:15:27practices we have around where the task
- 00:15:30content goes where the dynamic data goes
- 00:15:31where the pre-filled response goes and
- 00:15:33so on so in order to solve that problem
- 00:15:35we have a really lovely tool that we
- 00:15:36call the prompt generator and we're firm
- 00:15:38Believers that as prompt engineering
- 00:15:39grows and evolves there's going to be a
- 00:15:41really strong combination of manual work
- 00:15:44and programmatic work initially this was
- 00:15:46really all done manually with lots and
- 00:15:47lots of iteration we don't believe this
- 00:15:49is something that is going to be 100%
- 00:15:51programmatic because there's really a
- 00:15:52bit of an art and science to it but if
- 00:15:54there's a situation where you need to go
- 00:15:56from zero to one you need to generate a
- 00:15:57prompt come take a look at this
- 00:15:59particular tool that we have at console.
- 00:16:01anthropic tocom I also recommend you
- 00:16:03come check out the demos that we have
- 00:16:05for the workbench to get up to speed on
- 00:16:07the promp generator as well again this
- 00:16:09is not going to solve all of your
- 00:16:10problems the promp generator is a way
- 00:16:12for you to put in the task at hand and
- 00:16:13have automatically generated prompt with
- 00:16:15best practices in mind but again just
- 00:16:18like with any software that is generated
- 00:16:19for you there's still things you're
- 00:16:20going to have to go in and tweak and
- 00:16:21edit to fit your particular use case
- 00:16:24something that I do recommend you really
- 00:16:25think about when doing prompt
- 00:16:27engineering is really try to leverage
- 00:16:28some of the best best practices you may
- 00:16:29have from software so for those of you
- 00:16:31that are familiar with software
- 00:16:32engineering been in the software
- 00:16:33engineering space you're probably
- 00:16:34familiar with the idea of Version
- 00:16:35Control keeping track of changes to
- 00:16:37files that you've made edits deletions
- 00:16:39modifications new file Creations do the
- 00:16:41same with your prompts as you have new
- 00:16:43prompts as you iterate on prompts you
- 00:16:45want to make sure that you're keeping
- 00:16:46track of the previous ones that you have
- 00:16:48so that you can iterate on them
- 00:16:49appropriately instead of just redoing
- 00:16:52and redoing and redoing again we've got
- 00:16:54a lot of really good stuff on our docs
- 00:16:55around this and I recommend you all take
- 00:16:56a look at the prompt generators to
- 00:16:57really get up and running with high
- 00:16:59quality prompts in general some best
- 00:17:01practices that we have around prompt
- 00:17:02engineering I'm going to show you some
- 00:17:03highle ones but the big three that I
- 00:17:05really want to focus on again the first
- 00:17:08one's going to seem a bit simplistic but
- 00:17:09you'd be shocked how many times this
- 00:17:10goes ay be clear and direct at the end
- 00:17:13of the day all that these models are are
- 00:17:16tools for predicting the next token the
- 00:17:18next word the next series of text so the
- 00:17:21more that you can give to the model to
- 00:17:23give it some context to have it pay
- 00:17:25attention to what it's seen before to
- 00:17:26figure out what's next the better of a
- 00:17:28response you're going to give I really
- 00:17:30like to draw the parallel to talking to
- 00:17:31a human as well I want you to think of
- 00:17:33the llm that you're working with as
- 00:17:35someone who has a very very very very
- 00:17:37large broad range of knowledge but
- 00:17:39absolutely no idea what the task is that
- 00:17:40you want to do it's up to you to tell
- 00:17:43the large language model what the task
- 00:17:44is at hand and explain it succinctly
- 00:17:47what does that mean start with a couple
- 00:17:48highle sentences of the role and the
- 00:17:50highle task description we recommend if
- 00:17:53there's Dynamic data coming in whether
- 00:17:54that's from something like retrieval AED
- 00:17:56generation or any variables coming in
- 00:17:58we'll talk about that rag in a little
- 00:17:59bit put that content at the top make
- 00:18:01sure you've got some detailed task
- 00:18:03instructions and then we'll talk a
- 00:18:04little bit about if your particular use
- 00:18:07case requires a little bit more in-depth
- 00:18:09explanation examples are incredibly
- 00:18:11powerful so be clear and direct provide
- 00:18:13examples and then you want to be really
- 00:18:15intentional about the process that you
- 00:18:17want the model to think through I'm sure
- 00:18:19many of you have heard this idea of
- 00:18:20Chain of Thought or thinking step by
- 00:18:21step thinking step byep is a great start
- 00:18:24but just like if I told you I want you
- 00:18:25to think step by step about what's going
- 00:18:27on here you might say all right I'll
- 00:18:28take a extra seconds and think about it
- 00:18:30but what's even more powerful than the
- 00:18:31think step byep is actually telling
- 00:18:33Claude how to do that thinking if you
- 00:18:35were to explain this to someone new at
- 00:18:37your company even someone senior or an
- 00:18:39intern how would you go ahead and think
- 00:18:40about performing that task really try to
- 00:18:42draw that parallel when thinking about
- 00:18:44prompt
- 00:18:45engineering something you're going to
- 00:18:46see quite a bit is this idea of XML tags
- 00:18:49if you are familiar with HTML it's a
- 00:18:50very similar kind of idea you have an
- 00:18:51open tag and a closed tag with XML you
- 00:18:53can pick the name of that tag the name
- 00:18:56of the tag does not really have any
- 00:18:58significance but you want to be
- 00:18:59intentional about the semantic meaning
- 00:19:01for what that is the purpose of using
- 00:19:03XML tags is to help create organization
- 00:19:06when you have a very very large prompt
- 00:19:07if you have a relatively short prompt
- 00:19:09you don't need to move the needle too
- 00:19:10much with XML tags but as your prompts
- 00:19:12get longer and longer and longer the
- 00:19:14same way that we as humans like
- 00:19:15indentation and Whit space and so on
- 00:19:18Claude likes XML text you can use other
- 00:19:20delimiters other formats and so on but
- 00:19:22we prefer XML because it's clear and
- 00:19:24token
- 00:19:25efficient we talked a little bit about
- 00:19:27examples when when you think about
- 00:19:29providing your examples these are
- 00:19:31essentially one of the most powerful
- 00:19:32tools you can provide because Claude is
- 00:19:35very very good at pattern matching
- 00:19:36especially if there's a particular
- 00:19:37format that you want Claude to adhere to
- 00:19:39you really want to give Claude
- 00:19:40essentially as much information as
- 00:19:42possible that it can figure out what to
- 00:19:44do with its output in general just like
- 00:19:46with humans it's much easier to just
- 00:19:48tell me how to do it and show me what it
- 00:19:50looks like as opposed to long-winded
- 00:19:52explanations of all the things at hand
- 00:19:54so providing examples but being
- 00:19:55intentional with your examples the
- 00:19:57relevance the diversity the quantity you
- 00:19:59really want to make sure that's
- 00:19:59something you have in all of your
- 00:20:01prompts something that's a bit unique
- 00:20:03about Cloud something that other large
- 00:20:04language model providers don't
- 00:20:05necessarily offer is the idea of
- 00:20:07pre-filling Cloud's response if you're
- 00:20:10familiar with communicating with apis
- 00:20:12with users and assistants the basic flow
- 00:20:14is that you have a user message that
- 00:20:16could be something that the user types
- 00:20:17that could be a prompt that you have
- 00:20:19programmatically and the assistant is
- 00:20:21what you are getting back from claw that
- 00:20:22is the response you are getting back
- 00:20:24from the large language model what you
- 00:20:26can do by pre-filling the response is
- 00:20:28essentially put some words into claude's
- 00:20:30mouth and what that basically means is
- 00:20:32every single response that Claude gives
- 00:20:34can start with something that you as the
- 00:20:37human have
- 00:20:38dictated why might that be useful if
- 00:20:40there's a situation where you want to
- 00:20:42steer claud's Behavior a bit there might
- 00:20:43be a situation where Claud again like
- 00:20:45Maggie mentioned we are a safety first
- 00:20:47company that is a priority to what we do
- 00:20:49and there may be situations where your
- 00:20:50use case is actually getting blocked by
- 00:20:53some of the safety considerations that
- 00:20:54we have Again by using pre-filling the
- 00:20:57response you are not going to jailbreak
- 00:20:58everything by any means but you can put
- 00:21:00some words into cla's mouth to try to
- 00:21:01give it some context for what it is that
- 00:21:03you're trying to do you can be
- 00:21:05intentional with that to make sure that
- 00:21:06you have a little bit more control over
- 00:21:08the behavior and the formatting so
- 00:21:09pre-filling the response is a very
- 00:21:11common way that you can put some words
- 00:21:12into claude's mouth and that way CLA can
- 00:21:14essentially just pick up where you as
- 00:21:16the human left off it's really an
- 00:21:18underrated aspect to some of the
- 00:21:19prompting that we can do so we've got a
- 00:21:22good sense of prompt engineering we got
- 00:21:24a good sense of the data that we passed
- 00:21:25to our model to elicit a response we
- 00:21:27talked about being clear IND direct we
- 00:21:29talked about some of the most important
- 00:21:30ideas with prompt engineering I want to
- 00:21:32shift gears a little bit to Tool use
- 00:21:33just raise your hand if you're familiar
- 00:21:34with the idea of tool
- 00:21:36use right excellent you can go and come
- 00:21:38back in five minutes so tool use the
- 00:21:40idea here is simply to extend claud's
- 00:21:42functionality the classic example for
- 00:21:44Tool use is a situation where I might
- 00:21:46ask you something like hey Claude what
- 00:21:47is the weather right now in Las Vegas
- 00:21:50the response that Claude is going to
- 00:21:51give me is most certainly going to be
- 00:21:52something like I'm sorry I don't have
- 00:21:54that information right now thank you or
- 00:21:58maybe I can tell the weather you know in
- 00:22:00August or April of last year or so on
- 00:22:02probably not going to get that but you
- 00:22:03can imagine the information that I want
- 00:22:05at this exact moment is not something
- 00:22:06that Claude has out of the box so tool
- 00:22:09use allows us to instead give our
- 00:22:12application a little bit more awareness
- 00:22:15of other things that we might want to do
- 00:22:17we're not going to give Claude the
- 00:22:18ability to go and find the weather and
- 00:22:20return it to us that is not the purpose
- 00:22:21of tool use the purpose of tool use is
- 00:22:23to Simply extend claude's capability so
- 00:22:27that instead of saying something like I
- 00:22:28don't know what the weather is Claude is
- 00:22:30actually going to say something like hey
- 00:22:32looks like you trying to find the
- 00:22:33weather looks like your location is Las
- 00:22:36Vegas and since you're based in the US
- 00:22:38I'm going to assume you want that in
- 00:22:40Fahrenheit it's then up to the developer
- 00:22:42of the application to take that response
- 00:22:45and do what they want with it so I'll
- 00:22:47give you an example here you might have
- 00:22:48a prompt what was the final score of a
- 00:22:49SF Giants game on October 28th 2024 this
- 00:22:52is something out of the box that Claude
- 00:22:53does not know this is past our training
- 00:22:54cutoff date so what is Claude normally
- 00:22:56going to say I don't know but instead
- 00:22:59if we give Claude a list of tools and
- 00:23:00I'm going to show you what a tool looks
- 00:23:01like again it's actually not that
- 00:23:03difficult to do and there are many tools
- 00:23:04that can even help you generate and
- 00:23:05validate the tools that you make so with
- 00:23:07this particular list of tools every tool
- 00:23:09has a name every tool has a description
- 00:23:12you might wonder why do you need a name
- 00:23:13why do you give why do you need a
- 00:23:15description because when a prompt comes
- 00:23:16in asking about something like a final
- 00:23:18score of a game if we have a tool that
- 00:23:21is called get score that's related to
- 00:23:23some kind of baseball game Claud is
- 00:23:24going to be able to infer oh that's the
- 00:23:26one that you probably want to use so
- 00:23:28when you start making use of tool use
- 00:23:29you want to have a really good name and
- 00:23:30a really good description because that's
- 00:23:32what Claude is going to use to infer
- 00:23:33what action to take again Claude is not
- 00:23:36going to go to some sports application
- 00:23:39or to a database and get you the score
- 00:23:41all that Claude is going to do is return
- 00:23:42to the developer of the application the
- 00:23:45particular tool and the inputs and then
- 00:23:47claw just kind of walks away like I've
- 00:23:48done my work another example here you
- 00:23:51might have a situation where you're
- 00:23:52building a chatbot and in this
- 00:23:54particular chatbot you want to have the
- 00:23:55ability for a user to look up something
- 00:23:57in the inventory
- 00:23:59well if you ask Claude for example you
- 00:24:00know how many units of this particular
- 00:24:02skew or item or so do we have cla's
- 00:24:04going to say I have no idea but if you
- 00:24:05give it a tool like inventory lookup
- 00:24:08Claude will respond and say something
- 00:24:09like oh it looks like this person is
- 00:24:10trying to find this item and they want
- 00:24:12to know the particular quantity again
- 00:24:14it's up to you as the developer to go to
- 00:24:16a database go to an API go to a service
- 00:24:18do whatever it is that you want why do I
- 00:24:20really want to hammer home this idea of
- 00:24:22tool use because computer use is
- 00:24:24actually just an extension of tool use
- 00:24:27computer use is simply just a bunch of
- 00:24:29tools that we at anthropic have defined
- 00:24:32for you to use those tools include
- 00:24:35things like taking a look at a
- 00:24:37screenshot and figuring out the
- 00:24:38necessary command taking a look at a
- 00:24:40terminal and figuring out where things
- 00:24:42are and what commands might need to be
- 00:24:44input and executed taking a look at some
- 00:24:47text and figuring out where text needs
- 00:24:49to be written or copied or modified or
- 00:24:50so on so if you have an understanding of
- 00:24:53tool use understanding computer use is
- 00:24:55actually just a very very small jump
- 00:24:57conceptually
- 00:24:58one more visualization for this idea
- 00:25:00again I'll walk you through it you might
- 00:25:02have a situation where you ask how many
- 00:25:03shares of GM can I buy with $500 while
- 00:25:05Claude is going to say I mean I can tell
- 00:25:06you GM maybe a year ago but I don't have
- 00:25:08it right now instead if we give our
- 00:25:10application a tool Claude can then say
- 00:25:12something like oh looks like you're
- 00:25:14trying to use that get stock price tool
- 00:25:16and you've passed in General Motors as
- 00:25:18the argument great our application can
- 00:25:20now go to an API go to some external
- 00:25:22data source fetch that price figure out
- 00:25:25what's necessary and then Claude can
- 00:25:26take it the rest of the way
- 00:25:29tool use is something that you can do
- 00:25:31with quite quite high accuracy with a
- 00:25:33tremendous amount of tools in fact this
- 00:25:34is actually much closer to 95 plus and
- 00:25:36hundreds of tools is something that is
- 00:25:38no problem here you just want to be
- 00:25:40mindful that you have accurate and
- 00:25:41reasonable tool definitions just to show
- 00:25:43you what a tool looks like again it it
- 00:25:45shouldn't be too scary looking because
- 00:25:47this is really all it is you give it a
- 00:25:49name you give it a description for those
- 00:25:51of you familiar with Json schema whether
- 00:25:52it's through typing or API validation or
- 00:25:54so on you can see here in this input
- 00:25:56schema we basically just put in any
- 00:25:59Properties or parameters or arguments
- 00:26:01that that tool should have so the name
- 00:26:03of the tool get weather the description
- 00:26:05get the current weather and then we
- 00:26:07simply just give it some properties of
- 00:26:08things you should look for when using
- 00:26:10that tool so if a prompt comes in and
- 00:26:12someone says what is the weather right
- 00:26:14now Cloud's going to basically say
- 00:26:16something like oh it looks like you're
- 00:26:17trying to use the get weather tool but
- 00:26:19the location is required so could you
- 00:26:21tell me what location you are in then
- 00:26:23the user might say something like oh
- 00:26:24sorry I'm in Vegas great Claude has that
- 00:26:27information and then says excellent I
- 00:26:29know the location I know the name of
- 00:26:30this tool okay application here is the
- 00:26:34tool you're trying to use here is the
- 00:26:35input go and do what you want with it so
- 00:26:38if you can kind of build that flow that
- 00:26:39kind of mental model for how tool use
- 00:26:42Works how tools are defined how we work
- 00:26:45with those getting up to the
- 00:26:47understanding of computer use and agents
- 00:26:48and so on it's not too much of a leap
- 00:26:50but you really want to make sure at the
- 00:26:51end of the day you understand what a
- 00:26:53large language model is which hopefully
- 00:26:54we've got that Foundation you want to
- 00:26:56make sure you understand tool use this
- 00:26:58extension passing in these object like
- 00:27:02ideas to Claude to basically interpret
- 00:27:04analyze and then execute a particular
- 00:27:07command we talked a little bit about
- 00:27:09some just good practice for Tool use
- 00:27:11again a simple and accurate tool name
- 00:27:12being clear and direct same exact idea
- 00:27:14from prompt engineering coming right
- 00:27:16back to Tool use being mindful of the
- 00:27:18description what the tool returns how
- 00:27:20the tool is used examples are actually
- 00:27:23less important than having a clear and
- 00:27:25comprehensive explanation so you might
- 00:27:26be tempted to show all these examples
- 00:27:28again with many ideas in prompt
- 00:27:30engineering start with something
- 00:27:31relatively simple simple just see what
- 00:27:33you get see how it works and then
- 00:27:35iterate from there again if you want to
- 00:27:37take a look at all of these ideas we're
- 00:27:38talking about rag we're talk about tool
- 00:27:40use computer use you talked a little bit
- 00:27:41about the prompt generator and so on
- 00:27:43we've got tools for improving prompts
- 00:27:44come take a look at those in the booth
- 00:27:46we've got lots of really fun uh quick
- 00:27:47starts and ways for you to get
- 00:27:48interacted with
- 00:27:50that in general you might think of a
- 00:27:53large application with hundreds and
- 00:27:54hundreds of tools to make sure this is
- 00:27:57working properly to make sure you're
- 00:27:58being really intentional about how this
- 00:28:00is all done you want to think a lot
- 00:28:02about each tool having one particular
- 00:28:04meaning and one particular concern so if
- 00:28:06you're familiar in software engineering
- 00:28:07of the single responsibility principle
- 00:28:09or such you want to kind of follow the
- 00:28:11same idea you really don't want to have
- 00:28:13a tool that tries to do 10 things all at
- 00:28:14once that's why even with computer use
- 00:28:16we don't have one tool called do all the
- 00:28:18computer stuff we have things like a
- 00:28:20computer tool and a bash tool and a
- 00:28:22string edit tool and so on so again you
- 00:28:24want to have one tool that just does one
- 00:28:26thing well
- 00:28:30as we shift gears we talked a little bit
- 00:28:31about tool use it's important that we
- 00:28:33understand tool use actually before we
- 00:28:34talk about rag because it is possible
- 00:28:36that you might want to use a tool to
- 00:28:38figure out whether you should do rag or
- 00:28:40not quick show of hands how many of you
- 00:28:42are familiar with the idea of rag or the
- 00:28:43acronym or so on or if I were to call on
- 00:28:45you you could give me a definition all
- 00:28:47those hands went down very quickly but
- 00:28:49that's okay um that's that's all right
- 00:28:50I'll do that for you so talk about rag
- 00:28:53architecture and tips here rag or
- 00:28:54retrieval augmented generation the
- 00:28:56really cool thing about rag is it's
- 00:28:58short and fun sounding acronym the even
- 00:29:00more exciting thing is 99% of the work
- 00:29:02is just that letter r the A and the g
- 00:29:04are just a nice way to make it sound
- 00:29:05kind of cool so retrieval is really
- 00:29:07where all the hard stuff is going to
- 00:29:08happen so we'll talk about what this
- 00:29:10idea is we'll walk through it step by
- 00:29:12step rag is the idea of searching for
- 00:29:14retrieving and adding context what does
- 00:29:16that mean you might have data that is
- 00:29:18proprietary you might have data internal
- 00:29:20to your company you might have data that
- 00:29:22is past the training cutof date
- 00:29:24information that Claude is not aware
- 00:29:25about you also might have lots and lots
- 00:29:27and lots of lots and lots of documents
- 00:29:29that you can't just stuff into one
- 00:29:30prompt in the context window because the
- 00:29:33context window is not big enough so how
- 00:29:35do we augment language models with
- 00:29:37external knowledge well we take that
- 00:29:39external knowledge and we put it
- 00:29:41somewhere else we then go ahead and
- 00:29:44retrieve that external
- 00:29:46knowledge in order to set up more of a
- 00:29:48professional grade rag Pipeline and so
- 00:29:51on there's a little bit of work that
- 00:29:52needs to happen this idea of
- 00:29:55pre-processing and working and chunking
- 00:29:56with your data is how this all kicks off
- 00:29:59so I want you to imagine a situation of
- 00:30:00you have all of your company's internal
- 00:30:02documents for onboarding new hires and
- 00:30:05so on you can imagine this might be you
- 00:30:07know 50 100 200 500 pages all kinds of
- 00:30:10things of benefits and insurance plans
- 00:30:11and all kinds of internal information
- 00:30:13that Claud is not aware of again we
- 00:30:16can't take all that stuff and stuff it
- 00:30:17into a prompt and ask it questions the
- 00:30:19context window is not big enough so what
- 00:30:21do we do we take that data and we break
- 00:30:24up all of that data into smaller chunks
- 00:30:27this is a process very commonly called
- 00:30:28chunking you take your data whether it's
- 00:30:30text whether it's image whether it's
- 00:30:32video whether it's audio however it may
- 00:30:33be and we take that data and we turn it
- 00:30:35into what's called an embedding an
- 00:30:37embedding is just a really really fancy
- 00:30:39way of saying a list of long long
- 00:30:42numbers or floating Point numbers the
- 00:30:44purpose of this is at the end of the day
- 00:30:46we have to take this text and get down
- 00:30:48to numbers that's how models work as
- 00:30:49well we take these text get them down to
- 00:30:51numbers make our way up to a bunch of
- 00:30:53probabilities and then figure out the
- 00:30:55next
- 00:30:56token the reason why we use these
- 00:30:58embeddings is because embeddings allow
- 00:31:01us to perform semantic or similar
- 00:31:04searches why do we want to do something
- 00:31:06like that well let's imagine you go
- 00:31:08ahead and you take some internal
- 00:31:09anthropic documents and they're about me
- 00:31:12you might ask a question like who is
- 00:31:13Ellie shopic you might ask a question
- 00:31:15like who is that Ellie person whose last
- 00:31:17name I can't pronounce or who is that
- 00:31:19Ellie person who does not stop talking
- 00:31:20about tool use and rag they're all
- 00:31:22actually the same question but they're
- 00:31:23all somewhat similar which means that
- 00:31:25when we go ahead and we try to retrieve
- 00:31:27doc doents related to whatever that may
- 00:31:29be we can't just use an exact search so
- 00:31:32what we do is we take our text we break
- 00:31:34it down into embeddings in order to do
- 00:31:37that we make use of an embedding model
- 00:31:39which is actually a different kind of
- 00:31:40model that instead of trying to Output
- 00:31:42the next token or so on it actually
- 00:31:43outputs a bunch of embeddings those
- 00:31:45embeddings those numerical
- 00:31:47representations refer to a particular
- 00:31:49piece of meaning in a certain
- 00:31:51dimensional space so you might have 500
- 00:31:54300 2,000 5,000 Dimensions with each
- 00:31:57particular vector representing a
- 00:31:59particular part of a semantic meaning of
- 00:32:01some text we take all that text and we
- 00:32:04store it externally very commonly that
- 00:32:06is done in a vector store so you might
- 00:32:08have heard of vector databases or so on
- 00:32:10these are essentially data stores for
- 00:32:11all of our
- 00:32:13embeddings before we can do any of the
- 00:32:15retrieval and so on we got to do that
- 00:32:16work very commonly this is part of your
- 00:32:18rag pipeline or your pre-processing part
- 00:32:21of data this is where there's actually a
- 00:32:22decent amount of engineering and
- 00:32:24trickery that happens how do I make sure
- 00:32:26using the right embedding model for my
- 00:32:27use case how do I make sure I'm chunking
- 00:32:29appropriately and so on that's a lot of
- 00:32:30work fortunately there are tools
- 00:32:32especially in the Amazon Bedrock
- 00:32:34ecosystem like Amazon Bedrock knowledge
- 00:32:36bases which are wonderful tools for
- 00:32:38helping you with a lot of that work so
- 00:32:40you could do that engineering Yourself
- 00:32:41by all means but there are also many
- 00:32:43kind of infrastructures of service
- 00:32:44platforms to help you with
- 00:32:46that so before we do any of that
- 00:32:48retrieval we got to do that work that
- 00:32:50chunking that pre-processing getting
- 00:32:51that stuff in a vector database once
- 00:32:53we've got it all in a vector database
- 00:32:54and we feel great about our embeddings
- 00:32:57now we can start going ahead and
- 00:32:58retrieving
- 00:33:00information and here is how that
- 00:33:02traditionally works we might have a
- 00:33:03situation where a user asks a question I
- 00:33:05want to get my daughter more interested
- 00:33:06in science what kind of gifts should I
- 00:33:08get her we then take that query we then
- 00:33:11take that information and we embed it we
- 00:33:13turn it into an embedding we turn it
- 00:33:14into a list of a bunch of floating Point
- 00:33:16numbers and then we go ahead and we take
- 00:33:18that embedding and we go back to our
- 00:33:20Vector database and we say what similar
- 00:33:22results do I have what similar search
- 00:33:24results do I have for that particular
- 00:33:26thing that is the retrieval part once we
- 00:33:29have those results we then augment our
- 00:33:32prompt a little bit with some Dynamic
- 00:33:34information and generate a response so
- 00:33:37that's the A and the g in rag that's the
- 00:33:39that's the quick and easy part but that
- 00:33:40r that retrieval that process of making
- 00:33:42sure that the data somewhere is living
- 00:33:45in some store where I can search by
- 00:33:46similar meanings or hybrid searches or
- 00:33:48so on and I can get a result that's
- 00:33:50where it gets a little bit more
- 00:33:52interesting to give you a little bit of
- 00:33:54a visualization of this rag architecture
- 00:33:55and again if you want to see this in
- 00:33:56action come take a look at our Dem Booth
- 00:33:58we got some really cool quick starts of
- 00:33:59actually rag in application using tools
- 00:34:01like knowledge bases we're going to take
- 00:34:03that question we're going to embed it
- 00:34:05we're going to turn it into that long
- 00:34:07list of floating Point numbers we're
- 00:34:09then going to go ahead and execute some
- 00:34:10kind of similarity search that
- 00:34:12similarity search can be a variation of
- 00:34:14a couple algorithms you might be
- 00:34:15familiar with Manhattan distance or
- 00:34:17ukian distance or cosine similarity or
- 00:34:19dot product it's really just trying to
- 00:34:21find two similar points in some kind of
- 00:34:23dimensional space once we find those
- 00:34:26similar results we can go ahead and take
- 00:34:29those results that we got back and
- 00:34:30augment our prompt and generate a
- 00:34:32completion so again that R in retrieval
- 00:34:35is really where the tricky part
- 00:34:37happens the reason why I brought up tool
- 00:34:39use earlier is because you could imagine
- 00:34:42a situation where you ask Claude
- 00:34:44something like who was the president of
- 00:34:46the United States in
- 00:34:481776 well Claude doesn't need to do
- 00:34:50something like ah let me go ahead and
- 00:34:51embed that and go to the vector database
- 00:34:53and so on that's something that Claud
- 00:34:54should hopefully know and I feel pretty
- 00:34:55good about that one so cla's just going
- 00:34:57to give your
- 00:34:58response so instead of jumping to
- 00:35:00something like rag maybe you just want
- 00:35:02to use claud's knowledge out of the box
- 00:35:05this is where tool use can be very
- 00:35:06helpful you might have a situation where
- 00:35:08a query comes in and Claude basically
- 00:35:11tries to figure out if it knows the
- 00:35:12answer or not and you can even through
- 00:35:15prompt engineering be very intentional
- 00:35:16about what it means to know or not know
- 00:35:18and if Claude says something like I
- 00:35:20don't know well then go ahead let's go
- 00:35:22use that tool and we probably got to go
- 00:35:23find something or hey if a question
- 00:35:25comes in and it's past your knowledge
- 00:35:26cutoff date or if a question comes in
- 00:35:28about some document that you do not
- 00:35:30understand go ahead and use that tool
- 00:35:32then we'll go do all the retrieval part
- 00:35:34of things cuz you can imagine taking
- 00:35:36your data embedding it turning into a
- 00:35:38list and so on it's going to be a little
- 00:35:39bit timec consuming it's going to
- 00:35:40require some engineering and so on so if
- 00:35:42you don't have to do that that would be
- 00:35:44ideal what we also see a lot with
- 00:35:46production grade rag pipelines is it's
- 00:35:48not as simple as just having one vector
- 00:35:50data store for one particular set of
- 00:35:52data you can imagine a very very large
- 00:35:54scale system is doing quite a bit of
- 00:35:56embeddings over quite a few different
- 00:35:57kinds of databases and data stores so
- 00:36:00depending on your particular use case
- 00:36:02you might not just stuff everything in
- 00:36:04one vector data store you might actually
- 00:36:05have multiple ones with multiple tools
- 00:36:07trying to figure out how to interact
- 00:36:09with your application and generate the
- 00:36:11correct
- 00:36:13completion another really powerful tool
- 00:36:15that I want to point you towards and
- 00:36:16we'll talk a little bit more about this
- 00:36:16with the idea of contextual retrieval is
- 00:36:18the idea of actually being able to use a
- 00:36:20model to rewrite your query the classic
- 00:36:23example here is you might have some kind
- 00:36:24of customer support situation where I
- 00:36:27say something like my username is Ellie
- 00:36:29and I just want to let you know I am
- 00:36:30pissed this service is terrible I am
- 00:36:32very unhappy I've been waiting for 3
- 00:36:33hours to talk to a human and all I get
- 00:36:35is this chatbot I am feeling miserable
- 00:36:37at the end of the day what we really
- 00:36:39need to do is just look up Ellie so can
- 00:36:41we instead of taking all that data and
- 00:36:43all that you know frustration that Ellie
- 00:36:44has maybe we'll handle that in a
- 00:36:46different way instead of taking that and
- 00:36:47embedding it and trying to find similar
- 00:36:49searches to a frustrated customer can we
- 00:36:51instead rewrite the query just to focus
- 00:36:53on the information that we need this is
- 00:36:56very commonly done by kind of throwing a
- 00:36:58a smaller model that can do a little bit
- 00:36:59more of that classification or rewriting
- 00:37:02to essentially get you better results so
- 00:37:04with rag with these ideas of rag
- 00:37:06pipelines and so on there's so much
- 00:37:07complexity that you can start throwing
- 00:37:09on top but my goal here for this session
- 00:37:11is just to make sure you have at a high
- 00:37:12level an understanding of what it is how
- 00:37:14it works why you might want to use it
- 00:37:15for your particular use
- 00:37:17case when you think about that chunking
- 00:37:19process we talked a bit about breaking
- 00:37:21your data into smaller sections I want
- 00:37:23to point you towards this format that we
- 00:37:25recommend and again you can take a look
- 00:37:26at our documentation for more examples
- 00:37:27of this but what's really important when
- 00:37:29you break up your data is that you give
- 00:37:31it some kind of meta data so that we can
- 00:37:34search for it not only by the
- 00:37:35information but also retrieve some
- 00:37:37metadata so you can see up here we have
- 00:37:39this documents tag and inside we have a
- 00:37:41document subtag with an index of one
- 00:37:44those kind of attributes that we're
- 00:37:46adding here you can treat as metadata or
- 00:37:48higher level information about the
- 00:37:50particular document that could be very
- 00:37:51helpful when doing retrieval to get
- 00:37:53things like information aside from the
- 00:37:55source that I might want to know some
- 00:37:56unique identi maybe the author of that
- 00:37:58particular piece or such in your prompt
- 00:38:01you can then refer to those by their
- 00:38:02indic or their metadata for that kind of
- 00:38:05filtering so you'll see ideas around
- 00:38:06metadata filtering as well to improve
- 00:38:08General retrieval
- 00:38:11performance what's really interesting
- 00:38:12about R I would say at this point is it
- 00:38:15is not going anywhere but it is
- 00:38:16constantly constantly constantly
- 00:38:18evolving the first thing that I want to
- 00:38:19talk about that is really kind of
- 00:38:20leading us to a slight change in the way
- 00:38:23that we think about rag is a tool that
- 00:38:24we have had in our first party API but
- 00:38:26that we actually just released with
- 00:38:28Amazon Bedrock that is the idea of
- 00:38:29prompt caching prompt caching is the
- 00:38:32idea of instead of taking some document
- 00:38:35and putting it in your prompt and then
- 00:38:37on each conversation we got to generate
- 00:38:39all the tokens again for that particular
- 00:38:41document that's going to get a little
- 00:38:43expensive that's going to be a bit time
- 00:38:44consuming can we instead explicitly say
- 00:38:47I'm going to give you these documents I
- 00:38:49want you to cash these documents they
- 00:38:51are not going to change these are static
- 00:38:52pieces of information and then on all
- 00:38:54subsequent prompts don't go ahead and
- 00:38:56regenerate all the tokens for that just
- 00:38:58find them in the cache if you're
- 00:39:00familiar with caching with any kind of
- 00:39:02architecture it's a very similar kind of
- 00:39:03idea find and store this information in
- 00:39:06a quick retrieval process because it is
- 00:39:08not going to change anytime
- 00:39:10soon why is this meaningful with rag
- 00:39:13because instead of taking documents and
- 00:39:15chunking them and embedding them and
- 00:39:17storing them somewhere you can actually
- 00:39:18take your documents and put them in the
- 00:39:20prompt itself and you can cash those
- 00:39:22documents to then retrieve information
- 00:39:25very very quickly without doing all the
- 00:39:26embedding and so on obviously context
- 00:39:29window size is going to take play here
- 00:39:31so if you have a lot a lot a lot of
- 00:39:32documents you still can't do that but at
- 00:39:34the same time context windows are vastly
- 00:39:37improving in size compared to where we
- 00:39:39were 6 months ago a year ago two years
- 00:39:41ago we are nearing worlds where we will
- 00:39:42have millions tens of millions of tokens
- 00:39:44in context windows so when you think
- 00:39:46about combining prompt caching taking a
- 00:39:48tremendous amount of your documents
- 00:39:50caching them and then leveraging a very
- 00:39:52very large context window you can
- 00:39:54potentially avoid the need for having to
- 00:39:55do that chunking and so on so right now
- 00:39:58if your data is of a medium size you
- 00:39:59really got to jump to rag if your data
- 00:40:01is of a large size you got to jump to
- 00:40:02rag even a smaller size depending on
- 00:40:04what you're working with can we instead
- 00:40:06as we start to see prom caching get more
- 00:40:08and more widespread as we start to get
- 00:40:09context windows that are even larger can
- 00:40:11we start to shift that a little bit
- 00:40:13towards not jumping for rag not needing
- 00:40:15to worry about these massive pipelines
- 00:40:17and infrastructure and instead just
- 00:40:18putting that in the context of the
- 00:40:20prompt what's interesting as well we
- 00:40:22talked about Vector databases talked
- 00:40:24about this idea of embeddings and
- 00:40:25embedding models but there's always a
- 00:40:27lot of research there's always a lot of
- 00:40:29questions for is this the best approach
- 00:40:30is this really what we want to do
- 00:40:32there's a lot of really interesting
- 00:40:33emerging research for using different
- 00:40:34data structures for storing the kind of
- 00:40:37retrieval that you want so instead of
- 00:40:39using a vector database a list of large
- 00:40:41floating Point numbers can we instead
- 00:40:43use a different data structure like a
- 00:40:44graph can we instead think of our data
- 00:40:46as just a series of nodes interconnected
- 00:40:49by edges and when we try to retrieve our
- 00:40:51data find those similar nodes with more
- 00:40:53meaning as opposed to a general semantic
- 00:40:56search while this is not yet something
- 00:40:58that's very very large in production
- 00:40:59there's a lot of really interesting
- 00:41:00research around this idea of graph Rag
- 00:41:02and knowledge graphs that may lead us to
- 00:41:04potentially not have to use Vector
- 00:41:05stores and get better performance with
- 00:41:07retrieval we going talk a little bit
- 00:41:09soon about the idea of contextual
- 00:41:11retrieval this is really interesting
- 00:41:12research that we put out and again for
- 00:41:14those of you that have been to the booth
- 00:41:15you know that we have a section on
- 00:41:16Research as well so I welcome you to
- 00:41:17come take a look at that chat about the
- 00:41:18research if you're interested especially
- 00:41:20in things like interpretability how our
- 00:41:22models are behaving from the inside out
- 00:41:24trying to make sense of that contextual
- 00:41:26retrieval that I'll talk about we got a
- 00:41:27lot of really good uh good folks to talk
- 00:41:28to about that kind of
- 00:41:30stuff as we mentioned embedding models
- 00:41:32are constantly changing there are a wide
- 00:41:34variety of embedding models for all
- 00:41:35different kinds of providers from open
- 00:41:37source to commercial providers for all
- 00:41:38different kinds of dimensions and
- 00:41:39pricing and so on these are always
- 00:41:41getting better it's also a very very
- 00:41:43large world in the reranking space I'll
- 00:41:45talk a little bit about reranking your
- 00:41:46results when you get them back we're
- 00:41:48also seeing a lot of improvement for
- 00:41:50measuring the effectiveness of rag
- 00:41:52through evaluations this can be done
- 00:41:54using platforms that are Enterprise
- 00:41:55grade this could also be done using
- 00:41:57using Open Source Products like promp Fu
- 00:41:59there's also an entire evaluation
- 00:42:00framework called rag ass or rag
- 00:42:02assessments that basically will analyze
- 00:42:04the relevance the accuracy really
- 00:42:06important metrics for whether you're
- 00:42:07doing a good job or not with your rag
- 00:42:10pipeline again I'll add a really
- 00:42:11important bullet point here because in a
- 00:42:13second we'll talk about fine-tuning
- 00:42:15other techniques like model distillation
- 00:42:16to try to teach the model new things and
- 00:42:18extract more knowledge and introduce
- 00:42:20Behavior change but before you jump to
- 00:42:21any of that even before you try to go
- 00:42:24crazy with your rag pipeline think a lot
- 00:42:26about prompting can get a lot a lot a
- 00:42:28lot of wins with very minimal
- 00:42:29engineering and effort through prompting
- 00:42:31so even with Rag and other options there
- 00:42:33are always
- 00:42:34optimizations I mentioned a little bit
- 00:42:36about this idea of contextual retrieval
- 00:42:38so I want to point you towards this idea
- 00:42:40and research that we have here instead
- 00:42:42of Performing the traditional approach
- 00:42:44like we mentioned of taking your Corpus
- 00:42:45of data and breaking it up into chunks
- 00:42:48before you break it up into chunks what
- 00:42:50we're going to do is we're actually
- 00:42:52going to bring cloud back in the mix
- 00:42:54we're going to run a little bit of
- 00:42:56prompting on each of those chunks to
- 00:42:59provide some context we're going to give
- 00:43:00it 50 or 100 or so extra tokens which
- 00:43:03reference the context of that chunk
- 00:43:05what's a classic example here let's say
- 00:43:07we take a a large 10k of a very very
- 00:43:10large publicly traded company maybe
- 00:43:12let's go with Amazon that seems relevant
- 00:43:14we go ahead and we take our giant 10K
- 00:43:17and we go ahead and we chunk it and one
- 00:43:18of the sections in that chunk is
- 00:43:20something like Revenue increased 37%
- 00:43:22while cost decreased by 12% and
- 00:43:24operating income was up 63%
- 00:43:27seems like a reasonable thing we can
- 00:43:28chunk but if someone goes and searches
- 00:43:31for that particular chunk and they say
- 00:43:32how's the company doing from a operating
- 00:43:35perspective we don't know if that chunk
- 00:43:37is is that the last quarter is that a
- 00:43:38forecast for the next quarter was that
- 00:43:40from last year what does that refer to
- 00:43:42what is the context of that information
- 00:43:44within the scope of the document and the
- 00:43:46goal of contextual retrieval is simply
- 00:43:48just to add a little bit more context to
- 00:43:50each of those chunks so that when we
- 00:43:52retrieve we can get much more accurate
- 00:43:54results we don't just get a block of
- 00:43:55text we get a block of text with a
- 00:43:57little bit of context for what it refers
- 00:43:59to you can take a look in our research
- 00:44:01for what this prompt looks like how it's
- 00:44:03run the different kinds of searches that
- 00:44:05we have you can also see here that
- 00:44:07instead of just using an embedding model
- 00:44:09we're actually doing what's called a
- 00:44:10hybrid search so that we're performing
- 00:44:12the semantic search but we're also
- 00:44:13performing other popular kinds of
- 00:44:15searches a very common one called bm25
- 00:44:17or best match 25 it's kind of that TF
- 00:44:20IDF similarity search right there so
- 00:44:22when performing these kinds of large rag
- 00:44:24pipelines and pre-processing we think a
- 00:44:26lot of about not only what we're going
- 00:44:28to store but also how we're going to
- 00:44:30retrieve and for those of you that have
- 00:44:31questions I welcome you to ask those and
- 00:44:33also take a look at our uh our booth if
- 00:44:35you have questions on those so we talked
- 00:44:37a bit about tool use we talked a bit
- 00:44:39about prompting we talked a bit about
- 00:44:42the idea of taking llms and extending
- 00:44:44their functionality we saw some really
- 00:44:45awesome demos of how you can essentially
- 00:44:47use cloud to analyze screenshots and
- 00:44:49perform actions and so on and what that
- 00:44:51really leads us to is shifting from just
- 00:44:54the tool to the eventual teammate that
- 00:44:56you can work
- 00:44:58with as the complexity of your use case
- 00:45:01grows the technical investment you need
- 00:45:03to make increases as well you can
- 00:45:05imagine you might have something like a
- 00:45:07classification task tell me if this is
- 00:45:08Spam tell me if this is not spam tell me
- 00:45:10if this is hot dog tell me if this is
- 00:45:11not hot dog that's a relatively easier
- 00:45:14task in the scheme of things we provide
- 00:45:16enough examples we have enough
- 00:45:17intelligence to do so we can lean on
- 00:45:19models that maybe are a little bit more
- 00:45:20cost effective potentially easier on the
- 00:45:22latency side of things because we can
- 00:45:24solve those we move on to summarization
- 00:45:27our question and answer but the second
- 00:45:29that we really start to stretch our
- 00:45:30imagination to what we can do with these
- 00:45:32tools like models taking independent
- 00:45:34actions on their own models being given
- 00:45:37a task and then models required to plan
- 00:45:40out an action remember previous actions
- 00:45:43with some kind of memory course correct
- 00:45:45when things go wrong make use of a wide
- 00:45:48variety of tools follow very complex
- 00:45:50flows of instruction that's where we
- 00:45:52really start to shift things more
- 00:45:54towards this idea of Agents
- 00:45:57what is an agent what is an llm agent
- 00:45:59it's a system that combines large
- 00:46:00language models with the ability to take
- 00:46:02actions in the real world or digital
- 00:46:03environments and what do they include
- 00:46:06what we have right here I'll give you
- 00:46:07the very high level definition and we'll
- 00:46:09talk about some of the diagrams with a
- 00:46:10bit more interest at the end of the day
- 00:46:13I want you to think of an agent as just
- 00:46:14three things a model in this case Claude
- 00:46:1635 Sonet a bunch of tools or functions
- 00:46:20that the agent can use I'm going to give
- 00:46:22you a tool like the ability to search
- 00:46:23the web I'm going to give you a tool
- 00:46:25like the ability to analyze a screen
- 00:46:26screenshot I'm going to give you a tool
- 00:46:28like the ability to take a look at the
- 00:46:29terminal and figure out what command to
- 00:46:31pass
- 00:46:32in and then we're going to go ahead and
- 00:46:34give it a goal and then we're just going
- 00:46:37to let it let it go what do we mean by
- 00:46:39Let It Go essentially if you're familiar
- 00:46:42with python if you look at our computer
- 00:46:43reference documentation it's while true
- 00:46:46it's an infinite Loop go ahead and take
- 00:46:48the next bit of data and execute on it
- 00:46:51take the next bit of data execute on it
- 00:46:52take the ne next bit of data execute on
- 00:46:54it again and again and again and again
- 00:46:56so I'm going to give you the goal of
- 00:46:57trying to find something it's probably
- 00:46:59going to require 5 10 15 20 steps and
- 00:47:01just like a human would the human would
- 00:47:03probably plan what do I need to do I'm
- 00:47:05probably going to have to open up the
- 00:47:05browser and search for something and
- 00:47:07then going to go ahead and cross
- 00:47:08reference that with something that I
- 00:47:09have in a document and then I'm going to
- 00:47:10take that information and I'll email it
- 00:47:11to someone and then maybe I'll go ahead
- 00:47:13and go back to my text editor and make
- 00:47:15that change and push that up to GitHub
- 00:47:16and submit a poll request and then
- 00:47:17communicate with my team lead about the
- 00:47:19change I made these are all things that
- 00:47:21when I say it very quickly you're like
- 00:47:22wow that's a lot but think about what we
- 00:47:24do every day many many different kinds
- 00:47:26of tasks very very quickly all in
- 00:47:28sequence we have the ability to plan out
- 00:47:30what we want to do we have the tools at
- 00:47:32our disposal or at least Claude and the
- 00:47:34internet to figure out the tools that we
- 00:47:36need and we have the memory to remember
- 00:47:38what needs to be
- 00:47:39done when you think about agents and
- 00:47:41agentic workflows it's really just an
- 00:47:43extension of all the things you've seen
- 00:47:45before it's an extension of tool use
- 00:47:48it's an extension of prompting it's
- 00:47:50combining those together to perform
- 00:47:52tasks just like we do in the real world
- 00:47:55instead of a single task instead of a
- 00:47:56sing single turn we're talking about
- 00:47:58longer multi- turn
- 00:47:59examples I really like this slide
- 00:48:01because it drives a strong analogy for
- 00:48:03those of you familiar with building web
- 00:48:04applications essentially what it looks
- 00:48:06like with building agents the same way
- 00:48:08that you might start with building a
- 00:48:09static page you might have some HTML
- 00:48:11some CSS some JavaScript in the context
- 00:48:14of an agent that's a pretty
- 00:48:15straightforward
- 00:48:16conversation as you start thinking about
- 00:48:18interactivity as you start adding
- 00:48:20JavaScript and you're handling clicks
- 00:48:21and you're handling interactions this is
- 00:48:23where you know the prompts have to get a
- 00:48:25little bit more complex so the same way
- 00:48:27that you think about expanding into the
- 00:48:29world of agents and leveraging more in
- 00:48:30the geni space really try to draw that
- 00:48:32analogy to software other things that
- 00:48:34you might be familiar with what's the
- 00:48:36business use case what am I use what do
- 00:48:37my users need what am I trying to build
- 00:48:39for at the end of the day my application
- 00:48:41grows larger I got to start separating
- 00:48:43JavaScript files or whatever language
- 00:48:45that I'm working with at this point if
- 00:48:48you're doing this in a generative
- 00:48:49generative AI world you know we're we're
- 00:48:51improving the prompts we're not getting
- 00:48:52too crazy as we start thinking more
- 00:48:55about building a website as we start
- 00:48:57moving to Frameworks as we start
- 00:48:58thinking about breaking things into
- 00:48:59microservices and distributed systems
- 00:49:01and so on as we start reaching larger
- 00:49:03scale that's where I want you to kind of
- 00:49:05draw the parallel to this is where
- 00:49:07agents can really come into play this is
- 00:49:09where the idea of agentic workflows come
- 00:49:10in so what's really nice about this just
- 00:49:12from a visual perspective is if you're
- 00:49:14familiar with things like software
- 00:49:15you're familiar with the analogy of
- 00:49:17building a website and so on you can
- 00:49:19draw that parallel to where things like
- 00:49:21Agents come into the
- 00:49:23mix we're still relatively early in the
- 00:49:25world of agents in terms terms of how we
- 00:49:27think about memory how we think about
- 00:49:28planning but what's really really
- 00:49:30powerful about these models and
- 00:49:31something that even Maggie spoke a
- 00:49:32little bit about some of the benchmarks
- 00:49:34that we're seeing with model performance
- 00:49:36especially with 35 Sonet and even 35 Hau
- 00:49:38on the coding front make us more and
- 00:49:40more confident that it can perform the
- 00:49:42tasks necessary that agents need to do
- 00:49:45so what makes the model the right choice
- 00:49:46for agentic workflows it's got to be
- 00:49:48really really good at following
- 00:49:50instructions and just like humans some
- 00:49:52models are not great at following lots
- 00:49:54and lots and lots of instructions if I
- 00:49:55were to give the model hundreds and
- 00:49:56hundreds of tasks to do could it
- 00:49:58remember those tasks could it handle it
- 00:49:59in the appropriate order could it figure
- 00:50:01out when something went wrong and go
- 00:50:02back and correct itself so what's really
- 00:50:05exciting about this kind of agentic
- 00:50:06workflow and situation that we're in is
- 00:50:08that we're also starting to develop
- 00:50:09benchmarks that can really determine the
- 00:50:12effectiveness of these models for those
- 00:50:14of you that are interested in digging in
- 00:50:15that a little bit more I really
- 00:50:16recommend taking a look at sbench or swe
- 00:50:19bench this is a benchmark that basically
- 00:50:22takes a model takes a open source
- 00:50:24repository so it could be a very large
- 00:50:26codebase something like jeno or very
- 00:50:28large python code base and it basically
- 00:50:31puts the model in a situation where it
- 00:50:33says here's the code base here is the
- 00:50:35issue go and write the poll request go
- 00:50:37and write the code necessary to get
- 00:50:39tests to pass and also make sure that
- 00:50:41things don't break these kinds of
- 00:50:43benchmarks are really the foundation for
- 00:50:45how we can determine how we can feel
- 00:50:46confident that the models don't just
- 00:50:48understand ideas or answer highle
- 00:50:50questions about code but actually
- 00:50:51perform things similar to what humans do
- 00:50:53so definitely recommend on a lot of our
- 00:50:55documentation a lot of the kind of model
- 00:50:57cards that we have taking a look at s
- 00:50:58bench and then another one tow bench for
- 00:51:00following particular actions really
- 00:51:02really interesting data source for how
- 00:51:04we think about the effectiveness of
- 00:51:05these
- 00:51:07models the last piece I want to talk
- 00:51:08about is an idea called fine-tuning I'll
- 00:51:10also talk a little bit about the idea of
- 00:51:11model
- 00:51:12distillation we're in a situation where
- 00:51:15prompting is not getting us where we
- 00:51:17need to go rag is not getting us where
- 00:51:19we need to go so there are other options
- 00:51:22for trying to improve the performance of
- 00:51:25your model it always sounds very
- 00:51:27exciting when you say some improve the
- 00:51:28performance of your model so see a lot
- 00:51:29of heads go up of like cool how do I do
- 00:51:31that and it's very tempting to look at
- 00:51:32this and say ah fine tuning is the
- 00:51:34answer but fine-tuning is just one way
- 00:51:36to try to solve a certain problem so I
- 00:51:37want to do my best to kind of give you a
- 00:51:39little bit of an overview of what
- 00:51:41fine-tuning is how it works in a
- 00:51:43nutshell fine tuning something you can
- 00:51:45do through the Bedrock interface with
- 00:51:47Hau 3 it's the ability to take a curated
- 00:51:51data set this is also called supervised
- 00:51:53fine-tuning so if you're familiar with
- 00:51:54that idea in machine learning we are
- 00:51:56essentially building a curated set of
- 00:51:58data of inputs and outputs that we would
- 00:52:02like the model to respond we're
- 00:52:03basically giving it the question and the
- 00:52:04answer so this is not unstructured we
- 00:52:06are not giving it a question and let
- 00:52:08letting it figure out the answer we are
- 00:52:09basically going to curate a highquality
- 00:52:11data set which again that's going to
- 00:52:13take time that's going to take effort
- 00:52:14that's going to take a lot of thought
- 00:52:15into what data you can use to move
- 00:52:17things forward and what we're going to
- 00:52:19do is we're going to take that long long
- 00:52:21long list of inputs and outputs and we
- 00:52:24essentially are going to take our base
- 00:52:25model this could to be something like
- 00:52:27hiq 3 and we're going to go ahead and
- 00:52:31run that data through a training set
- 00:52:35using Hardware this could be some of the
- 00:52:36hardware that Amazon provides and we're
- 00:52:38going to go ahead and we're going to
- 00:52:39come out with a custom model so what
- 00:52:41we're actually going to do we're update
- 00:52:42the underlying weights of the model and
- 00:52:44in the context of fine tuning there are
- 00:52:45many different kinds of aspects for
- 00:52:47updating certain kinds of weights or
- 00:52:49parameter efficient weights and so on
- 00:52:50but what we're actually doing is
- 00:52:52updating the underlying model weights to
- 00:52:54then produce a custom model
- 00:52:57we then evaluate that custom model and
- 00:52:59hope that it does better at a particular
- 00:53:01task the thing you want to be really
- 00:53:03careful of with fine-tuning just looking
- 00:53:05at all of these data is really hard to
- 00:53:07come by that's high quality you can
- 00:53:09introduce data to your model and your
- 00:53:11model can become worse so fine tuning is
- 00:53:13not a guarantee everything gets better
- 00:53:14right away that's why we have a applied
- 00:53:16AI team that focuses strictly on fine
- 00:53:18tuning at our company you want to be
- 00:53:20mindful of the model that you're working
- 00:53:21with you use hiou 3 and then another
- 00:53:23model comes out that is far more
- 00:53:25intelligent and then all of a sudden you
- 00:53:26have a custom model and you can't just
- 00:53:27undo this so the math right here doesn't
- 00:53:30really allow for subtraction or undoing
- 00:53:32once you go ahead and you run that
- 00:53:34training and you spend the money in the
- 00:53:35compute you got that custom model you
- 00:53:37can go ahead and do this over and over
- 00:53:38again but this is not a reversible
- 00:53:41decision when should you consider
- 00:53:43something like
- 00:53:44fine-tuning the most common use case I
- 00:53:47want you to think about with fine-tuning
- 00:53:48is introducing Behavior change you want
- 00:53:51to follow a specific kind of schema you
- 00:53:53want to follow a particular kind of
- 00:53:55format people always like to get the
- 00:53:56analogy of Talk Like a Pirate but I want
- 00:53:59you to think of that for a particular
- 00:54:00use case you want the model to Output
- 00:54:02things in a certain format you want the
- 00:54:03model to be a little bit more
- 00:54:04constrained with how it calls apis or
- 00:54:07references particular documents this is
- 00:54:09where you're going to find more
- 00:54:11likelihood with fine-tuning again you're
- 00:54:13going to see that rag is also a very
- 00:54:14viable option for a couple of these
- 00:54:16other tasks especially the latter ones
- 00:54:17that I'll talk about in a second but
- 00:54:19just remember the trade-offs here there
- 00:54:21are many situations where you have Rag
- 00:54:22and fine tuning but when you think about
- 00:54:24rag that is something that you can
- 00:54:26iterate on that's something that you can
- 00:54:27change with your pipeline when you're
- 00:54:29dealing with prompting that's also
- 00:54:30something you can constantly iterate on
- 00:54:32over and over again it's also why we
- 00:54:33push a lot of our customers many times
- 00:54:35to First focus on the prompt what can we
- 00:54:38improve in the prompt what can we change
- 00:54:39in the prompt what's what best practices
- 00:54:41are missing from the prompt does the
- 00:54:42prompt even make sense that's where we
- 00:54:44want to really start thinking about
- 00:54:46things if you're trying to teach the
- 00:54:49model new knowledge you're trying to
- 00:54:51teach the model something brand new and
- 00:54:54then you hope that from that knowledge
- 00:54:56it can learn other things and expand its
- 00:54:58knowledge that is not very very likely
- 00:55:01with fine
- 00:55:02tuning fine tuning is not a great way to
- 00:55:05teach the model brand new things and
- 00:55:07expect it to generalize for other tasks
- 00:55:09that is where we have not seen a
- 00:55:10tremendous amount of success with fine
- 00:55:12tuning as those algorithms change as we
- 00:55:14think more about this field and so on
- 00:55:16that may change but at the same point I
- 00:55:18really try to anchor on this slide quite
- 00:55:20a bit for thinking about that decision
- 00:55:21we see a lot of customers that jump to
- 00:55:23fine tuning as a this is the way to
- 00:55:24solve all of my problems there actually
- 00:55:27other ways to think about solving that
- 00:55:28at a high level again rag very very
- 00:55:30common way to go about handling those
- 00:55:33pieces so in general avoiding the
- 00:55:35pitfalls of fine tuning it's exciting
- 00:55:37but it's not for every single use
- 00:55:39case if you have any questions or
- 00:55:41curious about your particular use case
- 00:55:43again come talk to us we're happy to
- 00:55:44talk about those particular situations
- 00:55:45we have a lot of folks on the team that
- 00:55:46have done a lot of Direct Customer work
- 00:55:48with fine tuning happy I'm sure to
- 00:55:49answer
- 00:55:50questions at the end of the day before
- 00:55:52you jump to rag there's a reason why we
- 00:55:54have the order of things in this
- 00:55:55particular presentation you want to
- 00:55:57think a lot about what you can get out
- 00:55:58of prompt engineering be mindful of the
- 00:56:00versions you have with prompt
- 00:56:01engineering iterate on your prompts as
- 00:56:03you go on and then no matter what you do
- 00:56:07the most important thing I want you to
- 00:56:08have if you walk away from anything from
- 00:56:09this presentation when you think about
- 00:56:11building any kind of generative AI
- 00:56:12application when you think about
- 00:56:14actually going from proof of concept to
- 00:56:15production you want to make sure you
- 00:56:17have some kind of evaluation criteria we
- 00:56:19like to call these EV vals as well if
- 00:56:21you're coming from software this is
- 00:56:22essentially like a unit test or maybe an
- 00:56:24integration test you want to make make
- 00:56:26sure that you have some kind of way in
- 00:56:28your application of benchmarking the
- 00:56:30performance of the model of the prompt
- 00:56:33of the fine-tuning of the rag pipeline
- 00:56:35if you have no way of doing that then
- 00:56:38prompt Engineering in this entire
- 00:56:39ecosystem just kind of becomes an art
- 00:56:41and not a science so no matter what
- 00:56:43you're doing it is Mission critical to
- 00:56:45make sure that when you're building
- 00:56:46these applications you are using some
- 00:56:48kind of benchmarking or evaluation Suite
- 00:56:50Amazon Bedrock provides that many open
- 00:56:53source libraries and companies provide
- 00:56:54that as well but just like you wouldn't
- 00:56:56develop software that is Mission
- 00:56:57critical or software in production
- 00:56:59without any kind of testing you want to
- 00:57:00make sure you have that as well so do
- 00:57:02you have an evaluation with a success
- 00:57:04criteria whether it's just for fine
- 00:57:05tuning or rag or so on it's really one
- 00:57:07of the most important pieces that you
- 00:57:08can do have you tried prompt engineering
- 00:57:11determine if you have a baseline with
- 00:57:13that prompt engineering and again make
- 00:57:15sure you have evaluation so that you're
- 00:57:17really getting as much as you possibly
- 00:57:18can out of the prompt we see with a lot
- 00:57:20of customers not having a robust enough
- 00:57:22evaluation Suite basically just means
- 00:57:23we're kind of starting from scratch
- 00:57:24we're building on a house of carts
- 00:57:26the last part with fine tuning again
- 00:57:28like we mentioned it's irreversible and
- 00:57:30when you think about the data you need
- 00:57:31to curate the amount of data that you
- 00:57:33need that's where things are going to
- 00:57:35get a little bit trickier so how do you
- 00:57:36plan to build that fine-tuning data
- 00:57:39set got a couple minutes left I just
- 00:57:41want to wrap up with all the things that
- 00:57:43we've seen here because it's a lot of
- 00:57:44information but at the end of the day my
- 00:57:45goal is to give you a bit of a
- 00:57:46foundation here that's what I hope we
- 00:57:48have talked a bit about tool use talked
- 00:57:51a bit about computer use this idea idea
- 00:57:53of extending Cloud's capabilities
- 00:57:55Cloud's functional
- 00:57:57just by providing some tools what are
- 00:57:58tools just these these objects these key
- 00:58:01value pairs we give it a name we give it
- 00:58:04some kind of description and we provide
- 00:58:06the necessary arguments or parameters
- 00:58:08that that particular tool needs we then
- 00:58:10let Claud do the rest of the work if a
- 00:58:12prompt comes in Claude says looks like
- 00:58:13someone's trying to use that tool again
- 00:58:16you're going to hear a lot of things
- 00:58:17like Claude controls the computer and so
- 00:58:19on Claude itself is not moving the mouse
- 00:58:23and clicking and opening and closing and
- 00:58:24executing commands all that Claud is
- 00:58:26doing is taking in some screenshot
- 00:58:29interpreting the necessary command and
- 00:58:31then there is underlying code that a
- 00:58:32developer writes to go ahead and execute
- 00:58:35that code necessary so tool use and
- 00:58:37computer use it's a really really
- 00:58:39interesting and Powerful way to achieve
- 00:58:40all these new and interesting use cases
- 00:58:43but at the end of the day from a
- 00:58:44conceptual standpoint it's not something
- 00:58:46that should appear terribly intimidating
- 00:58:48we talked a little bit about rag
- 00:58:50retrieving data externally talked about
- 00:58:52that pre-processing side of things
- 00:58:53breaking up our data into chunks
- 00:58:55embedding we also talked about some of
- 00:58:56the other interesting ideas in this
- 00:58:58ecosystem from prompt caching to
- 00:59:00contextual retrieval and so on you're
- 00:59:01welcome to dig into that research
- 00:59:03finally wrapped up quite a bit with fine
- 00:59:06tuning so we got a lot of information
- 00:59:08coming out here again I just want to say
- 00:59:09thank you all so so much for giving me
- 00:59:10some time happy to answer questions
- 00:59:12stick around for a little bit and have a
- 00:59:13wonderful reinvent everyone thank you
- 00:59:14all so much
- Anthropic
- Claude modell
- AI development
- prompt engineering
- tool use
- RAG
- fine-tuning
- computer use
- agentic workflows
- model training