AWS re:Invent 2024 - Leverage Anthropic's Claude models for AI's evolving landscape (AIM123)

00:59:21
https://www.youtube.com/watch?v=21PjWyB_beU

Ringkasan

TLDRPræsentationen dækkede en række emner fra introduktion af virksomheden Anthropic og deres mission, til en detaljeret gennemgang af deres AI-model, Claude. Grundlagt i 2021, søger Anthropic at fremme sikker AI-udvikling med forskningsfokus på alignment og interpretability. Claude 3.5 Sonet blev lanceret i oktober 2024 med funktioner som computerbrug og forbedret agentisk kapabilitet, der gør det lettere at navigere komplekse arbejdsprocesser. Virksomheder som Jane Street og DoorDash bruger Claude til at forbedre effektiviteten i deres arbejdsprocesser. Computerbrugsfunktionerne i Claude blev demonstreret gennem video, der viste, hvordan det kan bruges til at udvikle hjemmesider og mere komplekse kodningsopgaver. Funktionerne understøtter manuel QA og kan analysere skærmbilleder for nødvendige handlinger. Desuden blev emner som prompt engineering, herunder brugen af prompt-generatoren, RAG (Retrieval Augmented Generation) og fine-tuning behandlet. Prompt-generatoren hjælper med hurtigt at konstruere prompts ved hjælp af bedste praksiser. RAG bruger eksterne data til at forbedre sprogmodellernes svar, mens tool use udvider Claudes funktionalitet ved at bruge eksterne værktøjer og funktioner. Fine-tuning kan ændre en models adfærd baseret på specifikke behov, men kræver omhyggeligt valgte data for at undgå forringelse af modellen.

Takeaways

  • 🤖 Anthropic blev grundlagt i 2021 for at fremme sikker AI-udvikling.
  • 🆕 Claude 3.5 Sonet inkluderer ny computerbrugsfunktionalitet og agentiske evner.
  • 🏢 Virksomheder som Jane Street og DoorDash bruger Claude for øget effektivitet.
  • 👨‍💻 Computerbrugsfærdigheder hos Claude blev fremvist gennem kodningsdemonstrationer.
  • 🔧 Prompt-generatoren hjælper med at oprette korrekte og effektive prompts.
  • 🔍 RAG integrerer ekstern viden i AI-modellers funktionalitet.
  • 🧠 Fine-tuning justerer en models adfærd for specifikke anvendelser.
  • 🛠 Tool use udvider Claude's kapabiliteter ved hjælp af eksterne værktøjer.
  • 📈 Trænede modeller kan opnå forbedringer i instruktionsfølge og handling.
  • 🖥️ Claude kan analyse skærmbilleder og forstå det nødvendige næste skridt.

Garis waktu

  • 00:00:00 - 00:05:00

    Talen forekommer taknemmelig for deltagernes tilstedeværelse og præsenterer sessionens agenda, der inkluderer emner som prompt engineering, fin-tuning og retrieverarkitekturer. Maggie Vo introducerer sig selv og sin kollega Ellie Shipet som de primære oplægsholdere og kundgør Anthropics mission om at sikre en sikker overgang til transformative AI.

  • 00:05:00 - 00:10:00

    Maggie præsenterer Claude 3.5 Sonet-modellen, lanceret i oktober 2024, og fremhæver dens forbedringer, især inden for kodegenerering og computerbrug. Claude kan navigere computergrænseflader og bruge agenter til mange-trins ræsonnering, hvilket støttes af eksempler som Jane Street.

  • 00:10:00 - 00:15:00

    Claude 3.5 Haikoo-modellen fremhæves sammen med succesen hos virksomheder som DoorDash, der bruger modellen til at rute billetforespørgsler. Maggie overdrager præsentationen til Ellie, der vil diskutere computerværdien mere detaljeret og vise de praktiske anvendelser.

  • 00:15:00 - 00:20:00

    Ellie, med fokus på prompt engineering og computersystemer, introducerer computerskærmbillederstjenesten, demonstreret gennem en demo af Alex. Denne funktion gør Claude i stand til at styre computermiljøer ved at interagere med skærmbilleder for at udføre komplekse opgaver.

  • 00:20:00 - 00:25:00

    En demo af Alex viser Claude, der interagerer med computerskærme for at udføre kodedeving og fejlretning, hvilket illustrerer agenttjenestens potentialer. Ellie opfordrer til deltagelse i demoer og informerer om muligheden for at observere Claude's evner live ved boderne.

  • 00:25:00 - 00:30:00

    Ellie diskuterer AI's integrationsmodeller, leveret via Amazon Bedrock. Målet er at levere sikkerhedsforanstaltninger og tjenester som embeddings og finjusteringer. Dybdegående information om værktøjsbrug, RAG-arkitektur og prompt engineering gives for bedre at kunne forstå og anvende teknikker.

  • 00:30:00 - 00:35:00

    Der indføres værktøjer til promptgenerering og forbedring af promptkvaliteten, med eksempler og bedste praksisser fremhævet. En ny undervises i punkt for punkt ræsonnering og brugen af XML-tags til bedre strukturering af prompts.

  • 00:35:00 - 00:40:00

    Importance of prompt engineering highlighted by encouraging revision and version control for efficacy. Examples and XML tags are emphasized as powerful tools to maintain clarity.

  • 00:40:00 - 00:45:00

    Ellie forklarer begrebet værktøjsbrug som en udvidelse af Clodes funktionalitet og noterer dets sammenligning med daily værktøjsopgaver som vejropdateringer. Claude’s computerbrug betragtes som en kompleks forlengelse af værktøjsbrug, hvilket gør det muligt for AI at interagere med skærmbilleder og skærmbilledetekst. RAG-arkitekturen gennemgås grundigt med et fokus på eksternt placerede data.

  • 00:45:00 - 00:50:00

    Brugen af embeddings til søgningsforbedring og lagring af data i en vektordatabase samt prompt-hentning for optimeret udførelse blev forklaret. Disse teknikker hjælper med at organisere information til mere effektiv og præcis hentning af data.

  • 00:50:00 - 00:59:21

    Ellie afslutter med diskussioner om fin-tuning og dets brug for adfærdsændring snarere end indlæring af ny information. Der blev også diskuteret betydningen af velstruktureret evaluering og variagning af arbejdsprocesser for at nå frem til optimal modelydelse.

Tampilkan lebih banyak

Peta Pikiran

Video Tanya Jawab

  • Hvornår blev Anthropic grundlagt, og hvad er deres mission?

    Anthropic blev grundlagt i 2021 med missionen om at sikre en sikker overgang til transformative AI ved hjælp af forskningsmetoder, der fokuserer på alignment og interpretability.

  • Hvornår blev Claude 3.5 Sonet lanceret, og hvilke funktioner har den?

    Claude 3.5 Sonet blev lanceret i oktober 2024 og byder på forbedringer inden for bl.a. computerbrug og agentier.

  • Hvordan bruger virksomheder som Jane Street og DoorDash Claude?

    Claude bruges til at forbedre kodnings effektivitet og udviklerproduktivitet ved virksomheder som Jane Street og DoorDash.

  • Hvilken slags opgaver blev demonstreret ved hjælp af Claude i præsentationen?

    Præsenterede demovideoer viste brugen af Claude til opgaver såsom webudvikling ved hjælp af computerbrugsfunktioner.

  • Hvem er Maggie Vo og Ellie Shopic?

    Maggie Vo leder det tekniske uddannelsesteam hos Anthropic, og Ellie Shopic er chef for teknisk træning.

  • Hvilke evner inden for computerbrug blev præsenteret for Claude?

    Claude kan analysere skærmbilleder og bestemme nødvendige kommandoer, hvilket kan bruges til opgaver som at teste apps eller manipulere Excel-ark.

  • Hvad er formålet med Anthropic's prompt-generator?

    Prompt-generatoren hos Anthropic er designet til at hjælpe med hurtig oprettelse af prompts ved at følge bedste praksis.

  • Hvad er RAG, og hvordan bruges det?

    Rag står for "retrieval augmented generation," som integrerer ekstern viden med sprogmodeller ved at hente relevante data og bruge dem i en prompt.

  • Hvornår er det hensigtsmæssigt at bruge fine-tuning?

    Fine-tuning kan bruges til at ændre model-adfærd til specifikke formater eller skemaer, men det kan gøre modellen dårligere, hvis data ikke vælges omhyggeligt.

  • Hvad er tool use, og hvordan bruges det med Claude?

    Tool use er en metode til at udvide Claudes kapaciteter ved at give den værktøjer til at udføre specifikke opgaver baseret på forespørgsler.

Lihat lebih banyak ringkasan video

Dapatkan akses instan ke ringkasan video YouTube gratis yang didukung oleh AI!
Teks
en
Gulir Otomatis:
  • 00:00:00
    hi everybody thank you so much for
  • 00:00:02
    spending some of your day today coming
  • 00:00:03
    to this session I hope it's a session
  • 00:00:05
    that's super useful for you jam-packed
  • 00:00:07
    full of um tips and tricks everything
  • 00:00:09
    from prompt engineering to fine-tuning
  • 00:00:11
    to retrieval architectures and so on um
  • 00:00:14
    my name is Maggie vo I lead the tech the
  • 00:00:16
    technical Education team at anthropic
  • 00:00:18
    and my colleague is Ellie shopic who
  • 00:00:20
    will be doing the majority of this
  • 00:00:22
    presentation he is much more
  • 00:00:23
    knowledgeable than I am um and he's the
  • 00:00:25
    head of Technical
  • 00:00:27
    Training again pictures of our faces
  • 00:00:31
    guess so our agenda today um is lightly
  • 00:00:34
    I'll go into an introduction about
  • 00:00:35
    anthropic an overview of our latest
  • 00:00:37
    models and what's exciting about them
  • 00:00:39
    including some of our latest features
  • 00:00:40
    like computer use then we'll go into
  • 00:00:43
    techniques and best practices all sorts
  • 00:00:45
    of things top to botom PRP engineering
  • 00:00:46
    to agents and so
  • 00:00:49
    on so to begin with I'm curious how many
  • 00:00:52
    of you have heard of
  • 00:00:54
    Claude great love that how many of you
  • 00:00:57
    have used
  • 00:00:58
    Claude and how many of you Claude in
  • 00:01:00
    production somewhere great okay so
  • 00:01:03
    pretty good familiarity with Claude and
  • 00:01:06
    uh hopefully with anthropic as well um
  • 00:01:08
    for those of you who don't know we're a
  • 00:01:09
    pretty young company we were founded in
  • 00:01:12
    2021 and anthropic is our mission is to
  • 00:01:16
    help ensure that the world safely makes
  • 00:01:18
    the transition through transformative AI
  • 00:01:20
    we do this in a wide variety of ways
  • 00:01:22
    with alignment research interpretability
  • 00:01:24
    research um as well as of course making
  • 00:01:27
    some of the world's most intelligent
  • 00:01:28
    models while ensuring that they remain
  • 00:01:30
    trustworthy we also push other companies
  • 00:01:32
    to also be uh produce trustworthy
  • 00:01:34
    systems and just in general we try to
  • 00:01:35
    support businesses with generative AI
  • 00:01:38
    with um you know foundational
  • 00:01:40
    transformative uh features while also
  • 00:01:43
    making sure that they are safe for us
  • 00:01:46
    though safety is not something
  • 00:01:47
    theoretical it's not about safety guard
  • 00:01:49
    rails that you're not sure of during
  • 00:01:50
    training and so on we are the frontier
  • 00:01:52
    of research that makes Claude easier to
  • 00:01:55
    steer harder to jailbreak and also the
  • 00:01:57
    least likely model to hallucinate on the
  • 00:01:59
    market
  • 00:02:00
    according to um Galileo's llm
  • 00:02:02
    hallucination index and we're very proud
  • 00:02:05
    of these as we believe that these are
  • 00:02:06
    critical pieces to making sure you can
  • 00:02:08
    trust the models you have in deployment
  • 00:02:10
    um or even just in your personal life
  • 00:02:12
    and personal
  • 00:02:15
    use in October 2024 we launched uh
  • 00:02:18
    Claude 3.5 Sonet um the latest version
  • 00:02:21
    of it and we upgraded Cloud 3.5 Sonet
  • 00:02:25
    basically to a state-of-the-art model in
  • 00:02:27
    quite a variety of areas especially in
  • 00:02:29
    coding where we um you know put on a
  • 00:02:32
    wide variety of actual uh you know task
  • 00:02:35
    oriented benchmarks not just theoretical
  • 00:02:37
    remove from everyday life kind of
  • 00:02:39
    benchmarks and it showed some vast
  • 00:02:40
    improvements over whatever was uh
  • 00:02:42
    leading uh before we also released a
  • 00:02:45
    cloud 3.5 hi coup model including uh 3.5
  • 00:02:48
    hi cou uh fast model on Bedrock
  • 00:02:51
    specifically um that is inference
  • 00:02:53
    optimize in order to serve you the best
  • 00:02:55
    speeds for High cool level
  • 00:02:57
    intelligence and then uh something that
  • 00:02:59
    we'll talk about briefly it's also
  • 00:03:01
    computer use which is a experimental
  • 00:03:03
    beta um featuring capability that we
  • 00:03:06
    released with the latest Cloud 345 Sonic
  • 00:03:08
    model where cloud is able to use
  • 00:03:11
    computer interfaces with a variety of um
  • 00:03:13
    combination of screenshots and so on
  • 00:03:14
    cloud can navigate around your whole
  • 00:03:16
    computer you don't need apis and so on
  • 00:03:18
    because because Cloud can understand the
  • 00:03:20
    whole computer interface and start
  • 00:03:21
    interacting with it
  • 00:03:24
    directly a little bit about the cloud
  • 00:03:26
    3.5 Sonet model specifically you can see
  • 00:03:28
    some of our benchmarks here on the
  • 00:03:30
    standard benchmarks um but I think more
  • 00:03:31
    importantly what's interesting is that a
  • 00:03:33
    its computer vision has been vastly
  • 00:03:35
    improved so it can do something like
  • 00:03:37
    computer use and use it with greater
  • 00:03:38
    accuracy we trained it for strong
  • 00:03:41
    agentic capabilities too which is how
  • 00:03:43
    computer use actually works it can do um
  • 00:03:45
    much more nuanced thinking
  • 00:03:48
    decision-making um and also it's just
  • 00:03:50
    got much better at code generation in
  • 00:03:52
    terms of accuracy in terms of um
  • 00:03:54
    readability in terms of converting you
  • 00:03:56
    know Legacy code into um you know modern
  • 00:03:59
    uh code
  • 00:04:01
    all of these um combine into a wide
  • 00:04:05
    variety of use cases that I encourage
  • 00:04:08
    you to explore and look at um things
  • 00:04:10
    like as I mentioned code generation
  • 00:04:12
    that's a big one but also great visual
  • 00:04:14
    analysis in combination with computer
  • 00:04:16
    use Cloud's able to do things like read
  • 00:04:18
    Excel spreadsheets and do some analysis
  • 00:04:20
    for you even manipulate the spreadsheets
  • 00:04:22
    and then do some um you know additional
  • 00:04:25
    uh write some functions and so on to
  • 00:04:27
    help you with financial analysis for
  • 00:04:28
    example and then we've also trained
  • 00:04:31
    Cloud very well to just handle really
  • 00:04:33
    complex queries a lot of instructions
  • 00:04:35
    multi-step situations if this then that
  • 00:04:38
    sorts of situations um so Cloud's really
  • 00:04:40
    able to reason in multi-step ways quite
  • 00:04:43
    well at this
  • 00:04:44
    point here's an example I want to
  • 00:04:46
    highlight from one of our customers Jane
  • 00:04:48
    Street um and they're basically using
  • 00:04:50
    Claude to scale their um coding to be uh
  • 00:04:55
    much more efficient their quality is
  • 00:04:57
    increased developer productivity is V
  • 00:04:59
    increase and just the general time spent
  • 00:05:02
    on having you know writing PRS and
  • 00:05:03
    improving their codebase fixing code has
  • 00:05:06
    vastly decreased we're very proud of
  • 00:05:10
    this then there's hot Cloud 3 H cou 3.5
  • 00:05:13
    H cou which I won't spend too much time
  • 00:05:14
    on um it's you know for a similar speed
  • 00:05:17
    to Cloud 3 ha cou um but with much
  • 00:05:20
    greater uh intelligence and it's really
  • 00:05:24
    really good at coding as well um
  • 00:05:26
    especially fast cating where it is
  • 00:05:28
    currently best in class and it's able B
  • 00:05:29
    to um you know beat the previous Cloud
  • 00:05:32
    35 Sonet model also um on sbench and um
  • 00:05:35
    kind of other coding
  • 00:05:38
    benchmarks this is an example of a use
  • 00:05:40
    case that we found great success with
  • 00:05:43
    where door Dash is using Claude at Claud
  • 00:05:44
    3.5 haou in order to um route tickets in
  • 00:05:48
    their support system which has been much
  • 00:05:51
    more increasing the accuracy time of
  • 00:05:53
    response um in their uh ticket routing
  • 00:05:57
    and their customer service and
  • 00:05:59
    decreasing the you know average time
  • 00:06:01
    resolution and the uh rerouting rates
  • 00:06:03
    and accuracy
  • 00:06:05
    rates the last thing I will talk about
  • 00:06:08
    is computer use and here I'll actually
  • 00:06:09
    transfer over to Ellie but just to
  • 00:06:11
    introduce it computer use is this
  • 00:06:13
    ability that Claude has to perform tasks
  • 00:06:16
    by interpreting screenshots and then
  • 00:06:18
    taking actions based on those
  • 00:06:19
    screenshots so Claude does all sorts of
  • 00:06:21
    things now like it can test your apps
  • 00:06:22
    for you as it said it can manipulate
  • 00:06:24
    Excel spreadsheets um you can plan
  • 00:06:26
    vacations with Claude or ask Claud all
  • 00:06:28
    sorts of questions that it can then go
  • 00:06:29
    on the internet and browse and decide
  • 00:06:31
    things um I'll pass La if you don't mind
  • 00:06:33
    coming up I'll pass it here for LA to
  • 00:06:35
    further explain the use cases as well to
  • 00:06:37
    show you some things that computer use
  • 00:06:38
    can do and so on awesome
  • 00:06:42
    thank hopefully you all can hear me in a
  • 00:06:44
    quick second when this there we go
  • 00:06:45
    awesome hello everyone my name is Ellie
  • 00:06:46
    shopic I'm the head of Technical
  • 00:06:47
    Training here at anthropic and I'm super
  • 00:06:49
    excited to get a chance to talk to you
  • 00:06:51
    about computer use and prompt
  • 00:06:52
    engineering and so on just a quick show
  • 00:06:54
    of hands I know kind of piggyback what
  • 00:06:55
    Maggie was saying how many of you had a
  • 00:06:57
    chance to check out our booth and
  • 00:06:58
    actually see any of these demos in
  • 00:06:59
    action just quick show of hands all
  • 00:07:02
    right awesome so I'm going to do a lot
  • 00:07:03
    to give a theoretical overview of some
  • 00:07:05
    of these ideas we're going to talk about
  • 00:07:06
    computer use we're going to talk about
  • 00:07:07
    prompt engineering we're going to talk
  • 00:07:08
    about Rag and Tool use and fine tuning
  • 00:07:10
    but if you want to actually see these in
  • 00:07:11
    action I definitely recommend you come
  • 00:07:13
    check us out because we've got plenty of
  • 00:07:15
    demos going on in our booth we've got a
  • 00:07:16
    really wonderful team also to guide you
  • 00:07:18
    through any questions you may have or
  • 00:07:19
    any kind of technical or non-technical
  • 00:07:21
    questions that you have we're happy
  • 00:07:22
    happy to answer so maie talked a little
  • 00:07:24
    bit about computer use and computer use
  • 00:07:26
    is this capability that our upgraded 35
  • 00:07:28
    sonnet has to basically analyze a
  • 00:07:30
    screenshot and figure out the necessary
  • 00:07:33
    computer command to take so what that
  • 00:07:35
    basically means is if we give Claude a
  • 00:07:36
    screenshot of a desktop and we give
  • 00:07:38
    Claude a prompt something like I need
  • 00:07:40
    you to go and figure out some really
  • 00:07:41
    awesome hiking trails near Vegas because
  • 00:07:43
    I need to see the sun I've been at this
  • 00:07:45
    conference for too long and I need to
  • 00:07:46
    get outside Claude is going to go and
  • 00:07:48
    take a look at that screenshot and it
  • 00:07:49
    what it's going to do is it's actually
  • 00:07:50
    going to say something like okay in
  • 00:07:52
    order to figure out the best hikes near
  • 00:07:54
    the convention I'm going to go ahead and
  • 00:07:57
    go to the browser and do that research
  • 00:07:59
    so I see based on this screenshot that
  • 00:08:01
    the Firefox or Chrome or Safari or
  • 00:08:03
    whatever browser you have icon is at
  • 00:08:05
    this coordinate and you should go and
  • 00:08:07
    execute a click command it's then up to
  • 00:08:09
    you as the developer to write the code
  • 00:08:11
    necessary to execute that click and you
  • 00:08:13
    can do that in your language of choice
  • 00:08:14
    the reference implementation that we
  • 00:08:16
    have is in Python once that click
  • 00:08:18
    happens we then take a screenshot and
  • 00:08:19
    feed it again to Claude and Claude can
  • 00:08:21
    then take a look at this and say great
  • 00:08:23
    this looks like Firefox I see that in
  • 00:08:25
    the browser bar we should probably type
  • 00:08:27
    something like good hikes near Vegas and
  • 00:08:29
    press enter you as the developer execute
  • 00:08:32
    that command repeat repeat repeat that
  • 00:08:34
    idea of giving a model some kind of
  • 00:08:37
    tools some kind of ability to interact
  • 00:08:39
    with some kind of system and then just
  • 00:08:41
    running a loop over and over and over
  • 00:08:42
    again is a very very very high Lev way
  • 00:08:45
    of explaining what an agent is you're
  • 00:08:46
    going to hear this term agents and
  • 00:08:47
    agentic workflows and so on we're going
  • 00:08:49
    to talk about that in a little bit but
  • 00:08:50
    when you hear that term agents an llm
  • 00:08:52
    something like Claud a set of tools some
  • 00:08:55
    ability to perform actions like
  • 00:08:57
    interpreting screenshots and then just a
  • 00:08:59
    loop do it again do it again do it again
  • 00:09:01
    do it again what's really interesting
  • 00:09:03
    about computer use is while we are so
  • 00:09:05
    excited about this technology it's still
  • 00:09:06
    relatively early but in a short period
  • 00:09:09
    of time we've been able to basically
  • 00:09:10
    double the performance and evaluation
  • 00:09:12
    benchmarks of previous state-of-the-art
  • 00:09:13
    models and some of the use cases just
  • 00:09:15
    like Maggie mentioned that we're
  • 00:09:16
    exploring are in that kind of manual QA
  • 00:09:18
    anytime that there's a human in the mix
  • 00:09:20
    doing very tedious time incentive and
  • 00:09:22
    very very laborious tasks things like
  • 00:09:25
    manually QA a particular piece of
  • 00:09:27
    software and thinking about every single
  • 00:09:29
    inter action that might need to be taken
  • 00:09:30
    and instead of having to worry about all
  • 00:09:32
    those interactions and taking them
  • 00:09:34
    programmatically we can say here Claude
  • 00:09:36
    is the flow that I'd like you to go
  • 00:09:37
    through here are all the permutations
  • 00:09:38
    that I want you to analyze and I might
  • 00:09:40
    even miss some of them so if there is a
  • 00:09:42
    flow in this application or architecture
  • 00:09:44
    that I've missed why don't you go ahead
  • 00:09:45
    and do that and test it for me take
  • 00:09:47
    those results write the output to some
  • 00:09:49
    file I'll give you an expected output
  • 00:09:50
    you let me know if those match so you
  • 00:09:52
    can think about kind of QA workflows in
  • 00:09:55
    that particular capacity I also want to
  • 00:09:56
    show you a little demo this is from a
  • 00:09:58
    colleague of mine Alex a wonderful
  • 00:10:00
    wonderful demo here just talk a little
  • 00:10:01
    bit about computer use with
  • 00:10:03
    coding so go ahead and give that a video
  • 00:10:05
    a second to play I'm Alex I lead
  • 00:10:08
    developer relations at anthropic and
  • 00:10:10
    today I'm going to be showing you a
  • 00:10:11
    coding task with computer
  • 00:10:14
    [Music]
  • 00:10:19
    use so we're going to be showing Claude
  • 00:10:21
    doing a website coding task by actually
  • 00:10:24
    controlling my laptop but before we
  • 00:10:26
    start coding we need an actual website
  • 00:10:29
    ite for Claude to make changes to so
  • 00:10:31
    let's ask Claude to navigate to cloud.
  • 00:10:34
    within my Chrome browser and ask Claude
  • 00:10:36
    within cloud. to create a fun '90s
  • 00:10:39
    themed personal homepage for
  • 00:10:42
    [Music]
  • 00:10:43
    itself Claud opens
  • 00:10:45
    [Music]
  • 00:10:47
    Chrome searches for
  • 00:10:52
    cloud. and then types in a prompt asking
  • 00:10:55
    the other Cloud to create a personal
  • 00:10:57
    homepage for itself
  • 00:11:00
    [Music]
  • 00:11:05
    cloud. returns some
  • 00:11:10
    code and that gets nicely rendered in an
  • 00:11:12
    artifact on the right hand side that
  • 00:11:14
    looks great but I want to make a few
  • 00:11:16
    changes to the website locally on my own
  • 00:11:18
    computer let's ask Claude to download
  • 00:11:21
    the file and then open it up in vs
  • 00:11:24
    code Claude clicks the save to file
  • 00:11:27
    button
  • 00:11:29
    opens up VSS
  • 00:11:32
    code and then finds the file within my
  • 00:11:35
    downloads folder and opens it
  • 00:11:37
    [Music]
  • 00:11:39
    up perfect now that the file is up and
  • 00:11:42
    running let's ask Claude to start up a
  • 00:11:44
    server so that we can actually view the
  • 00:11:46
    file within our
  • 00:11:48
    [Music]
  • 00:11:51
    browser Claude opens up the VSS code
  • 00:11:54
    terminal and tries to start a server
  • 00:12:00
    but it hits an error we don't actually
  • 00:12:01
    have python installed on our machine but
  • 00:12:03
    that's all right because Claude realizes
  • 00:12:05
    this by looking at the terminal output
  • 00:12:07
    and then tries again with Python 3 which
  • 00:12:09
    we do have installed on our machine that
  • 00:12:11
    works so now the server is up and
  • 00:12:13
    running now that we have the local
  • 00:12:14
    server started we can go manually take a
  • 00:12:16
    look at the website within the browser
  • 00:12:19
    and it looks pretty good but I noticed
  • 00:12:21
    that there's actually an error in the
  • 00:12:22
    terminal output and we also have this
  • 00:12:23
    missing file icon at the top here let's
  • 00:12:26
    ask Claude to identify this error and
  • 00:12:28
    then fix it within the file Claude
  • 00:12:31
    visually reads the terminal output and
  • 00:12:33
    then opens up the find and replace tool
  • 00:12:35
    in BS code to find the line that's
  • 00:12:38
    throwing the actual error in this case
  • 00:12:40
    we just ask Claude to get rid of the
  • 00:12:41
    error entirely so it will just delete
  • 00:12:43
    the whole line then Claude will save the
  • 00:12:45
    file and automatically rerun the website
  • 00:12:48
    so now that the error is gone let's go
  • 00:12:50
    take a final look at our website and we
  • 00:12:52
    can see that the file icon has
  • 00:12:54
    disappeared and the aor is gone as well
  • 00:12:57
    perfect so that's coding with computer
  • 00:12:59
    use and Claude this took a few prompts
  • 00:13:01
    now but we can imagine in the future
  • 00:13:03
    that Claude will be able to do tasks
  • 00:13:04
    like this end to
  • 00:13:05
    [Music]
  • 00:13:08
    end awesome thank you so much Alex and I
  • 00:13:10
    could watch that over and over again but
  • 00:13:12
    I'll have to talk to you for the next 40
  • 00:13:13
    or so minutes so bear with me but what's
  • 00:13:15
    really interesting about that is we saw
  • 00:13:17
    screenshots we saw screenshots of a
  • 00:13:18
    terminal we saw screenshots of a browser
  • 00:13:20
    we saw screenshots of a text editor we
  • 00:13:23
    fed those screenshots to Claude and
  • 00:13:24
    Claude had the ability to analyze that
  • 00:13:26
    screenshot and figure out the necessary
  • 00:13:28
    command whether that is inputting text
  • 00:13:30
    whether that's looking at a terminal and
  • 00:13:32
    realizing that there's an error what you
  • 00:13:34
    just looked at that idea of a prompt and
  • 00:13:36
    then feeding a screenshot and an action
  • 00:13:37
    and a screenshot and an action and a
  • 00:13:39
    screenshot an action is that agentic
  • 00:13:41
    workflow and again if you'd like to see
  • 00:13:42
    that actually in action come check it
  • 00:13:43
    out at the booth we've got that demo
  • 00:13:44
    running you can kind of play around for
  • 00:13:46
    yourself put in your own prompts and
  • 00:13:47
    explore all the fun things you can do
  • 00:13:48
    with computer use as we shift gears a
  • 00:13:51
    little bit the models that we have in
  • 00:13:53
    our ecosystem CLA 35 Sonet 35 Hau and so
  • 00:13:56
    on and the other ones as well all
  • 00:13:57
    supported in the Bedrock environment and
  • 00:13:59
    what we're really really proud of is the
  • 00:14:00
    ability to leverage some of the
  • 00:14:02
    functionality that Bedrock has out out
  • 00:14:03
    of the box from security as well as some
  • 00:14:05
    of the embeddings and fine-tuning
  • 00:14:07
    services that Amazon Bedrock offers
  • 00:14:09
    combined with our models to produce
  • 00:14:11
    really high power generative AI
  • 00:14:14
    applications as we start thinking about
  • 00:14:15
    some of the highle things in the
  • 00:14:17
    generative AI space and some of the kind
  • 00:14:18
    of most essential building blocks for
  • 00:14:20
    building successful applications first
  • 00:14:22
    thing I want to start with is the idea
  • 00:14:23
    of prompt engineering talking a little
  • 00:14:24
    bit about prompting I'm sure you're all
  • 00:14:26
    very familiar with a prompt and if not I
  • 00:14:28
    will gladly read Define it for you
  • 00:14:29
    prompt is the information you pass into
  • 00:14:32
    a large language model to get a response
  • 00:14:34
    when you're working with a large
  • 00:14:35
    language model in a more conversational
  • 00:14:37
    way this could be something like Claud
  • 00:14:38
    AI or just a chat bot you're working
  • 00:14:40
    with you have a little bit more luxury
  • 00:14:42
    with going back and forth oh you didn't
  • 00:14:44
    really understand what I meant so here's
  • 00:14:45
    what I want or no actually here's what
  • 00:14:47
    I'm looking for give me this kind of
  • 00:14:48
    thing but when you're dealing with
  • 00:14:50
    prompts in more of an Enterprise or API
  • 00:14:52
    context you really just have one
  • 00:14:54
    opportunity to create a very high
  • 00:14:56
    quality prompt that consists of your
  • 00:14:57
    context your data your conversation
  • 00:14:59
    history examples and so on and many
  • 00:15:01
    times that leads to creating prompts
  • 00:15:03
    that look like this and it's relatively
  • 00:15:04
    scary fortunately it's very colorful so
  • 00:15:06
    at least you have that part but if you
  • 00:15:07
    look at this it's pretty intimidating to
  • 00:15:09
    figure out all the parts of this and
  • 00:15:11
    figure out what goes where and you might
  • 00:15:12
    be looking at this and taking a picture
  • 00:15:13
    furiously and trying to jot it down and
  • 00:15:15
    so on but what's really challenging
  • 00:15:17
    about getting these prompts ready for
  • 00:15:18
    production is actually going from zero
  • 00:15:20
    to one starting with this idea you have
  • 00:15:22
    for an application I'm going to build a
  • 00:15:23
    classifier I'm going to build a
  • 00:15:24
    summarizer and then taking that and
  • 00:15:26
    turning it into this leveraging the best
  • 00:15:27
    practices we have around where the task
  • 00:15:30
    content goes where the dynamic data goes
  • 00:15:31
    where the pre-filled response goes and
  • 00:15:33
    so on so in order to solve that problem
  • 00:15:35
    we have a really lovely tool that we
  • 00:15:36
    call the prompt generator and we're firm
  • 00:15:38
    Believers that as prompt engineering
  • 00:15:39
    grows and evolves there's going to be a
  • 00:15:41
    really strong combination of manual work
  • 00:15:44
    and programmatic work initially this was
  • 00:15:46
    really all done manually with lots and
  • 00:15:47
    lots of iteration we don't believe this
  • 00:15:49
    is something that is going to be 100%
  • 00:15:51
    programmatic because there's really a
  • 00:15:52
    bit of an art and science to it but if
  • 00:15:54
    there's a situation where you need to go
  • 00:15:56
    from zero to one you need to generate a
  • 00:15:57
    prompt come take a look at this
  • 00:15:59
    particular tool that we have at console.
  • 00:16:01
    anthropic tocom I also recommend you
  • 00:16:03
    come check out the demos that we have
  • 00:16:05
    for the workbench to get up to speed on
  • 00:16:07
    the promp generator as well again this
  • 00:16:09
    is not going to solve all of your
  • 00:16:10
    problems the promp generator is a way
  • 00:16:12
    for you to put in the task at hand and
  • 00:16:13
    have automatically generated prompt with
  • 00:16:15
    best practices in mind but again just
  • 00:16:18
    like with any software that is generated
  • 00:16:19
    for you there's still things you're
  • 00:16:20
    going to have to go in and tweak and
  • 00:16:21
    edit to fit your particular use case
  • 00:16:24
    something that I do recommend you really
  • 00:16:25
    think about when doing prompt
  • 00:16:27
    engineering is really try to leverage
  • 00:16:28
    some of the best best practices you may
  • 00:16:29
    have from software so for those of you
  • 00:16:31
    that are familiar with software
  • 00:16:32
    engineering been in the software
  • 00:16:33
    engineering space you're probably
  • 00:16:34
    familiar with the idea of Version
  • 00:16:35
    Control keeping track of changes to
  • 00:16:37
    files that you've made edits deletions
  • 00:16:39
    modifications new file Creations do the
  • 00:16:41
    same with your prompts as you have new
  • 00:16:43
    prompts as you iterate on prompts you
  • 00:16:45
    want to make sure that you're keeping
  • 00:16:46
    track of the previous ones that you have
  • 00:16:48
    so that you can iterate on them
  • 00:16:49
    appropriately instead of just redoing
  • 00:16:52
    and redoing and redoing again we've got
  • 00:16:54
    a lot of really good stuff on our docs
  • 00:16:55
    around this and I recommend you all take
  • 00:16:56
    a look at the prompt generators to
  • 00:16:57
    really get up and running with high
  • 00:16:59
    quality prompts in general some best
  • 00:17:01
    practices that we have around prompt
  • 00:17:02
    engineering I'm going to show you some
  • 00:17:03
    highle ones but the big three that I
  • 00:17:05
    really want to focus on again the first
  • 00:17:08
    one's going to seem a bit simplistic but
  • 00:17:09
    you'd be shocked how many times this
  • 00:17:10
    goes ay be clear and direct at the end
  • 00:17:13
    of the day all that these models are are
  • 00:17:16
    tools for predicting the next token the
  • 00:17:18
    next word the next series of text so the
  • 00:17:21
    more that you can give to the model to
  • 00:17:23
    give it some context to have it pay
  • 00:17:25
    attention to what it's seen before to
  • 00:17:26
    figure out what's next the better of a
  • 00:17:28
    response you're going to give I really
  • 00:17:30
    like to draw the parallel to talking to
  • 00:17:31
    a human as well I want you to think of
  • 00:17:33
    the llm that you're working with as
  • 00:17:35
    someone who has a very very very very
  • 00:17:37
    large broad range of knowledge but
  • 00:17:39
    absolutely no idea what the task is that
  • 00:17:40
    you want to do it's up to you to tell
  • 00:17:43
    the large language model what the task
  • 00:17:44
    is at hand and explain it succinctly
  • 00:17:47
    what does that mean start with a couple
  • 00:17:48
    highle sentences of the role and the
  • 00:17:50
    highle task description we recommend if
  • 00:17:53
    there's Dynamic data coming in whether
  • 00:17:54
    that's from something like retrieval AED
  • 00:17:56
    generation or any variables coming in
  • 00:17:58
    we'll talk about that rag in a little
  • 00:17:59
    bit put that content at the top make
  • 00:18:01
    sure you've got some detailed task
  • 00:18:03
    instructions and then we'll talk a
  • 00:18:04
    little bit about if your particular use
  • 00:18:07
    case requires a little bit more in-depth
  • 00:18:09
    explanation examples are incredibly
  • 00:18:11
    powerful so be clear and direct provide
  • 00:18:13
    examples and then you want to be really
  • 00:18:15
    intentional about the process that you
  • 00:18:17
    want the model to think through I'm sure
  • 00:18:19
    many of you have heard this idea of
  • 00:18:20
    Chain of Thought or thinking step by
  • 00:18:21
    step thinking step byep is a great start
  • 00:18:24
    but just like if I told you I want you
  • 00:18:25
    to think step by step about what's going
  • 00:18:27
    on here you might say all right I'll
  • 00:18:28
    take a extra seconds and think about it
  • 00:18:30
    but what's even more powerful than the
  • 00:18:31
    think step byep is actually telling
  • 00:18:33
    Claude how to do that thinking if you
  • 00:18:35
    were to explain this to someone new at
  • 00:18:37
    your company even someone senior or an
  • 00:18:39
    intern how would you go ahead and think
  • 00:18:40
    about performing that task really try to
  • 00:18:42
    draw that parallel when thinking about
  • 00:18:44
    prompt
  • 00:18:45
    engineering something you're going to
  • 00:18:46
    see quite a bit is this idea of XML tags
  • 00:18:49
    if you are familiar with HTML it's a
  • 00:18:50
    very similar kind of idea you have an
  • 00:18:51
    open tag and a closed tag with XML you
  • 00:18:53
    can pick the name of that tag the name
  • 00:18:56
    of the tag does not really have any
  • 00:18:58
    significance but you want to be
  • 00:18:59
    intentional about the semantic meaning
  • 00:19:01
    for what that is the purpose of using
  • 00:19:03
    XML tags is to help create organization
  • 00:19:06
    when you have a very very large prompt
  • 00:19:07
    if you have a relatively short prompt
  • 00:19:09
    you don't need to move the needle too
  • 00:19:10
    much with XML tags but as your prompts
  • 00:19:12
    get longer and longer and longer the
  • 00:19:14
    same way that we as humans like
  • 00:19:15
    indentation and Whit space and so on
  • 00:19:18
    Claude likes XML text you can use other
  • 00:19:20
    delimiters other formats and so on but
  • 00:19:22
    we prefer XML because it's clear and
  • 00:19:24
    token
  • 00:19:25
    efficient we talked a little bit about
  • 00:19:27
    examples when when you think about
  • 00:19:29
    providing your examples these are
  • 00:19:31
    essentially one of the most powerful
  • 00:19:32
    tools you can provide because Claude is
  • 00:19:35
    very very good at pattern matching
  • 00:19:36
    especially if there's a particular
  • 00:19:37
    format that you want Claude to adhere to
  • 00:19:39
    you really want to give Claude
  • 00:19:40
    essentially as much information as
  • 00:19:42
    possible that it can figure out what to
  • 00:19:44
    do with its output in general just like
  • 00:19:46
    with humans it's much easier to just
  • 00:19:48
    tell me how to do it and show me what it
  • 00:19:50
    looks like as opposed to long-winded
  • 00:19:52
    explanations of all the things at hand
  • 00:19:54
    so providing examples but being
  • 00:19:55
    intentional with your examples the
  • 00:19:57
    relevance the diversity the quantity you
  • 00:19:59
    really want to make sure that's
  • 00:19:59
    something you have in all of your
  • 00:20:01
    prompts something that's a bit unique
  • 00:20:03
    about Cloud something that other large
  • 00:20:04
    language model providers don't
  • 00:20:05
    necessarily offer is the idea of
  • 00:20:07
    pre-filling Cloud's response if you're
  • 00:20:10
    familiar with communicating with apis
  • 00:20:12
    with users and assistants the basic flow
  • 00:20:14
    is that you have a user message that
  • 00:20:16
    could be something that the user types
  • 00:20:17
    that could be a prompt that you have
  • 00:20:19
    programmatically and the assistant is
  • 00:20:21
    what you are getting back from claw that
  • 00:20:22
    is the response you are getting back
  • 00:20:24
    from the large language model what you
  • 00:20:26
    can do by pre-filling the response is
  • 00:20:28
    essentially put some words into claude's
  • 00:20:30
    mouth and what that basically means is
  • 00:20:32
    every single response that Claude gives
  • 00:20:34
    can start with something that you as the
  • 00:20:37
    human have
  • 00:20:38
    dictated why might that be useful if
  • 00:20:40
    there's a situation where you want to
  • 00:20:42
    steer claud's Behavior a bit there might
  • 00:20:43
    be a situation where Claud again like
  • 00:20:45
    Maggie mentioned we are a safety first
  • 00:20:47
    company that is a priority to what we do
  • 00:20:49
    and there may be situations where your
  • 00:20:50
    use case is actually getting blocked by
  • 00:20:53
    some of the safety considerations that
  • 00:20:54
    we have Again by using pre-filling the
  • 00:20:57
    response you are not going to jailbreak
  • 00:20:58
    everything by any means but you can put
  • 00:21:00
    some words into cla's mouth to try to
  • 00:21:01
    give it some context for what it is that
  • 00:21:03
    you're trying to do you can be
  • 00:21:05
    intentional with that to make sure that
  • 00:21:06
    you have a little bit more control over
  • 00:21:08
    the behavior and the formatting so
  • 00:21:09
    pre-filling the response is a very
  • 00:21:11
    common way that you can put some words
  • 00:21:12
    into claude's mouth and that way CLA can
  • 00:21:14
    essentially just pick up where you as
  • 00:21:16
    the human left off it's really an
  • 00:21:18
    underrated aspect to some of the
  • 00:21:19
    prompting that we can do so we've got a
  • 00:21:22
    good sense of prompt engineering we got
  • 00:21:24
    a good sense of the data that we passed
  • 00:21:25
    to our model to elicit a response we
  • 00:21:27
    talked about being clear IND direct we
  • 00:21:29
    talked about some of the most important
  • 00:21:30
    ideas with prompt engineering I want to
  • 00:21:32
    shift gears a little bit to Tool use
  • 00:21:33
    just raise your hand if you're familiar
  • 00:21:34
    with the idea of tool
  • 00:21:36
    use right excellent you can go and come
  • 00:21:38
    back in five minutes so tool use the
  • 00:21:40
    idea here is simply to extend claud's
  • 00:21:42
    functionality the classic example for
  • 00:21:44
    Tool use is a situation where I might
  • 00:21:46
    ask you something like hey Claude what
  • 00:21:47
    is the weather right now in Las Vegas
  • 00:21:50
    the response that Claude is going to
  • 00:21:51
    give me is most certainly going to be
  • 00:21:52
    something like I'm sorry I don't have
  • 00:21:54
    that information right now thank you or
  • 00:21:58
    maybe I can tell the weather you know in
  • 00:22:00
    August or April of last year or so on
  • 00:22:02
    probably not going to get that but you
  • 00:22:03
    can imagine the information that I want
  • 00:22:05
    at this exact moment is not something
  • 00:22:06
    that Claude has out of the box so tool
  • 00:22:09
    use allows us to instead give our
  • 00:22:12
    application a little bit more awareness
  • 00:22:15
    of other things that we might want to do
  • 00:22:17
    we're not going to give Claude the
  • 00:22:18
    ability to go and find the weather and
  • 00:22:20
    return it to us that is not the purpose
  • 00:22:21
    of tool use the purpose of tool use is
  • 00:22:23
    to Simply extend claude's capability so
  • 00:22:27
    that instead of saying something like I
  • 00:22:28
    don't know what the weather is Claude is
  • 00:22:30
    actually going to say something like hey
  • 00:22:32
    looks like you trying to find the
  • 00:22:33
    weather looks like your location is Las
  • 00:22:36
    Vegas and since you're based in the US
  • 00:22:38
    I'm going to assume you want that in
  • 00:22:40
    Fahrenheit it's then up to the developer
  • 00:22:42
    of the application to take that response
  • 00:22:45
    and do what they want with it so I'll
  • 00:22:47
    give you an example here you might have
  • 00:22:48
    a prompt what was the final score of a
  • 00:22:49
    SF Giants game on October 28th 2024 this
  • 00:22:52
    is something out of the box that Claude
  • 00:22:53
    does not know this is past our training
  • 00:22:54
    cutoff date so what is Claude normally
  • 00:22:56
    going to say I don't know but instead
  • 00:22:59
    if we give Claude a list of tools and
  • 00:23:00
    I'm going to show you what a tool looks
  • 00:23:01
    like again it's actually not that
  • 00:23:03
    difficult to do and there are many tools
  • 00:23:04
    that can even help you generate and
  • 00:23:05
    validate the tools that you make so with
  • 00:23:07
    this particular list of tools every tool
  • 00:23:09
    has a name every tool has a description
  • 00:23:12
    you might wonder why do you need a name
  • 00:23:13
    why do you give why do you need a
  • 00:23:15
    description because when a prompt comes
  • 00:23:16
    in asking about something like a final
  • 00:23:18
    score of a game if we have a tool that
  • 00:23:21
    is called get score that's related to
  • 00:23:23
    some kind of baseball game Claud is
  • 00:23:24
    going to be able to infer oh that's the
  • 00:23:26
    one that you probably want to use so
  • 00:23:28
    when you start making use of tool use
  • 00:23:29
    you want to have a really good name and
  • 00:23:30
    a really good description because that's
  • 00:23:32
    what Claude is going to use to infer
  • 00:23:33
    what action to take again Claude is not
  • 00:23:36
    going to go to some sports application
  • 00:23:39
    or to a database and get you the score
  • 00:23:41
    all that Claude is going to do is return
  • 00:23:42
    to the developer of the application the
  • 00:23:45
    particular tool and the inputs and then
  • 00:23:47
    claw just kind of walks away like I've
  • 00:23:48
    done my work another example here you
  • 00:23:51
    might have a situation where you're
  • 00:23:52
    building a chatbot and in this
  • 00:23:54
    particular chatbot you want to have the
  • 00:23:55
    ability for a user to look up something
  • 00:23:57
    in the inventory
  • 00:23:59
    well if you ask Claude for example you
  • 00:24:00
    know how many units of this particular
  • 00:24:02
    skew or item or so do we have cla's
  • 00:24:04
    going to say I have no idea but if you
  • 00:24:05
    give it a tool like inventory lookup
  • 00:24:08
    Claude will respond and say something
  • 00:24:09
    like oh it looks like this person is
  • 00:24:10
    trying to find this item and they want
  • 00:24:12
    to know the particular quantity again
  • 00:24:14
    it's up to you as the developer to go to
  • 00:24:16
    a database go to an API go to a service
  • 00:24:18
    do whatever it is that you want why do I
  • 00:24:20
    really want to hammer home this idea of
  • 00:24:22
    tool use because computer use is
  • 00:24:24
    actually just an extension of tool use
  • 00:24:27
    computer use is simply just a bunch of
  • 00:24:29
    tools that we at anthropic have defined
  • 00:24:32
    for you to use those tools include
  • 00:24:35
    things like taking a look at a
  • 00:24:37
    screenshot and figuring out the
  • 00:24:38
    necessary command taking a look at a
  • 00:24:40
    terminal and figuring out where things
  • 00:24:42
    are and what commands might need to be
  • 00:24:44
    input and executed taking a look at some
  • 00:24:47
    text and figuring out where text needs
  • 00:24:49
    to be written or copied or modified or
  • 00:24:50
    so on so if you have an understanding of
  • 00:24:53
    tool use understanding computer use is
  • 00:24:55
    actually just a very very small jump
  • 00:24:57
    conceptually
  • 00:24:58
    one more visualization for this idea
  • 00:25:00
    again I'll walk you through it you might
  • 00:25:02
    have a situation where you ask how many
  • 00:25:03
    shares of GM can I buy with $500 while
  • 00:25:05
    Claude is going to say I mean I can tell
  • 00:25:06
    you GM maybe a year ago but I don't have
  • 00:25:08
    it right now instead if we give our
  • 00:25:10
    application a tool Claude can then say
  • 00:25:12
    something like oh looks like you're
  • 00:25:14
    trying to use that get stock price tool
  • 00:25:16
    and you've passed in General Motors as
  • 00:25:18
    the argument great our application can
  • 00:25:20
    now go to an API go to some external
  • 00:25:22
    data source fetch that price figure out
  • 00:25:25
    what's necessary and then Claude can
  • 00:25:26
    take it the rest of the way
  • 00:25:29
    tool use is something that you can do
  • 00:25:31
    with quite quite high accuracy with a
  • 00:25:33
    tremendous amount of tools in fact this
  • 00:25:34
    is actually much closer to 95 plus and
  • 00:25:36
    hundreds of tools is something that is
  • 00:25:38
    no problem here you just want to be
  • 00:25:40
    mindful that you have accurate and
  • 00:25:41
    reasonable tool definitions just to show
  • 00:25:43
    you what a tool looks like again it it
  • 00:25:45
    shouldn't be too scary looking because
  • 00:25:47
    this is really all it is you give it a
  • 00:25:49
    name you give it a description for those
  • 00:25:51
    of you familiar with Json schema whether
  • 00:25:52
    it's through typing or API validation or
  • 00:25:54
    so on you can see here in this input
  • 00:25:56
    schema we basically just put in any
  • 00:25:59
    Properties or parameters or arguments
  • 00:26:01
    that that tool should have so the name
  • 00:26:03
    of the tool get weather the description
  • 00:26:05
    get the current weather and then we
  • 00:26:07
    simply just give it some properties of
  • 00:26:08
    things you should look for when using
  • 00:26:10
    that tool so if a prompt comes in and
  • 00:26:12
    someone says what is the weather right
  • 00:26:14
    now Cloud's going to basically say
  • 00:26:16
    something like oh it looks like you're
  • 00:26:17
    trying to use the get weather tool but
  • 00:26:19
    the location is required so could you
  • 00:26:21
    tell me what location you are in then
  • 00:26:23
    the user might say something like oh
  • 00:26:24
    sorry I'm in Vegas great Claude has that
  • 00:26:27
    information and then says excellent I
  • 00:26:29
    know the location I know the name of
  • 00:26:30
    this tool okay application here is the
  • 00:26:34
    tool you're trying to use here is the
  • 00:26:35
    input go and do what you want with it so
  • 00:26:38
    if you can kind of build that flow that
  • 00:26:39
    kind of mental model for how tool use
  • 00:26:42
    Works how tools are defined how we work
  • 00:26:45
    with those getting up to the
  • 00:26:47
    understanding of computer use and agents
  • 00:26:48
    and so on it's not too much of a leap
  • 00:26:50
    but you really want to make sure at the
  • 00:26:51
    end of the day you understand what a
  • 00:26:53
    large language model is which hopefully
  • 00:26:54
    we've got that Foundation you want to
  • 00:26:56
    make sure you understand tool use this
  • 00:26:58
    extension passing in these object like
  • 00:27:02
    ideas to Claude to basically interpret
  • 00:27:04
    analyze and then execute a particular
  • 00:27:07
    command we talked a little bit about
  • 00:27:09
    some just good practice for Tool use
  • 00:27:11
    again a simple and accurate tool name
  • 00:27:12
    being clear and direct same exact idea
  • 00:27:14
    from prompt engineering coming right
  • 00:27:16
    back to Tool use being mindful of the
  • 00:27:18
    description what the tool returns how
  • 00:27:20
    the tool is used examples are actually
  • 00:27:23
    less important than having a clear and
  • 00:27:25
    comprehensive explanation so you might
  • 00:27:26
    be tempted to show all these examples
  • 00:27:28
    again with many ideas in prompt
  • 00:27:30
    engineering start with something
  • 00:27:31
    relatively simple simple just see what
  • 00:27:33
    you get see how it works and then
  • 00:27:35
    iterate from there again if you want to
  • 00:27:37
    take a look at all of these ideas we're
  • 00:27:38
    talking about rag we're talk about tool
  • 00:27:40
    use computer use you talked a little bit
  • 00:27:41
    about the prompt generator and so on
  • 00:27:43
    we've got tools for improving prompts
  • 00:27:44
    come take a look at those in the booth
  • 00:27:46
    we've got lots of really fun uh quick
  • 00:27:47
    starts and ways for you to get
  • 00:27:48
    interacted with
  • 00:27:50
    that in general you might think of a
  • 00:27:53
    large application with hundreds and
  • 00:27:54
    hundreds of tools to make sure this is
  • 00:27:57
    working properly to make sure you're
  • 00:27:58
    being really intentional about how this
  • 00:28:00
    is all done you want to think a lot
  • 00:28:02
    about each tool having one particular
  • 00:28:04
    meaning and one particular concern so if
  • 00:28:06
    you're familiar in software engineering
  • 00:28:07
    of the single responsibility principle
  • 00:28:09
    or such you want to kind of follow the
  • 00:28:11
    same idea you really don't want to have
  • 00:28:13
    a tool that tries to do 10 things all at
  • 00:28:14
    once that's why even with computer use
  • 00:28:16
    we don't have one tool called do all the
  • 00:28:18
    computer stuff we have things like a
  • 00:28:20
    computer tool and a bash tool and a
  • 00:28:22
    string edit tool and so on so again you
  • 00:28:24
    want to have one tool that just does one
  • 00:28:26
    thing well
  • 00:28:30
    as we shift gears we talked a little bit
  • 00:28:31
    about tool use it's important that we
  • 00:28:33
    understand tool use actually before we
  • 00:28:34
    talk about rag because it is possible
  • 00:28:36
    that you might want to use a tool to
  • 00:28:38
    figure out whether you should do rag or
  • 00:28:40
    not quick show of hands how many of you
  • 00:28:42
    are familiar with the idea of rag or the
  • 00:28:43
    acronym or so on or if I were to call on
  • 00:28:45
    you you could give me a definition all
  • 00:28:47
    those hands went down very quickly but
  • 00:28:49
    that's okay um that's that's all right
  • 00:28:50
    I'll do that for you so talk about rag
  • 00:28:53
    architecture and tips here rag or
  • 00:28:54
    retrieval augmented generation the
  • 00:28:56
    really cool thing about rag is it's
  • 00:28:58
    short and fun sounding acronym the even
  • 00:29:00
    more exciting thing is 99% of the work
  • 00:29:02
    is just that letter r the A and the g
  • 00:29:04
    are just a nice way to make it sound
  • 00:29:05
    kind of cool so retrieval is really
  • 00:29:07
    where all the hard stuff is going to
  • 00:29:08
    happen so we'll talk about what this
  • 00:29:10
    idea is we'll walk through it step by
  • 00:29:12
    step rag is the idea of searching for
  • 00:29:14
    retrieving and adding context what does
  • 00:29:16
    that mean you might have data that is
  • 00:29:18
    proprietary you might have data internal
  • 00:29:20
    to your company you might have data that
  • 00:29:22
    is past the training cutof date
  • 00:29:24
    information that Claude is not aware
  • 00:29:25
    about you also might have lots and lots
  • 00:29:27
    and lots of lots and lots of documents
  • 00:29:29
    that you can't just stuff into one
  • 00:29:30
    prompt in the context window because the
  • 00:29:33
    context window is not big enough so how
  • 00:29:35
    do we augment language models with
  • 00:29:37
    external knowledge well we take that
  • 00:29:39
    external knowledge and we put it
  • 00:29:41
    somewhere else we then go ahead and
  • 00:29:44
    retrieve that external
  • 00:29:46
    knowledge in order to set up more of a
  • 00:29:48
    professional grade rag Pipeline and so
  • 00:29:51
    on there's a little bit of work that
  • 00:29:52
    needs to happen this idea of
  • 00:29:55
    pre-processing and working and chunking
  • 00:29:56
    with your data is how this all kicks off
  • 00:29:59
    so I want you to imagine a situation of
  • 00:30:00
    you have all of your company's internal
  • 00:30:02
    documents for onboarding new hires and
  • 00:30:05
    so on you can imagine this might be you
  • 00:30:07
    know 50 100 200 500 pages all kinds of
  • 00:30:10
    things of benefits and insurance plans
  • 00:30:11
    and all kinds of internal information
  • 00:30:13
    that Claud is not aware of again we
  • 00:30:16
    can't take all that stuff and stuff it
  • 00:30:17
    into a prompt and ask it questions the
  • 00:30:19
    context window is not big enough so what
  • 00:30:21
    do we do we take that data and we break
  • 00:30:24
    up all of that data into smaller chunks
  • 00:30:27
    this is a process very commonly called
  • 00:30:28
    chunking you take your data whether it's
  • 00:30:30
    text whether it's image whether it's
  • 00:30:32
    video whether it's audio however it may
  • 00:30:33
    be and we take that data and we turn it
  • 00:30:35
    into what's called an embedding an
  • 00:30:37
    embedding is just a really really fancy
  • 00:30:39
    way of saying a list of long long
  • 00:30:42
    numbers or floating Point numbers the
  • 00:30:44
    purpose of this is at the end of the day
  • 00:30:46
    we have to take this text and get down
  • 00:30:48
    to numbers that's how models work as
  • 00:30:49
    well we take these text get them down to
  • 00:30:51
    numbers make our way up to a bunch of
  • 00:30:53
    probabilities and then figure out the
  • 00:30:55
    next
  • 00:30:56
    token the reason why we use these
  • 00:30:58
    embeddings is because embeddings allow
  • 00:31:01
    us to perform semantic or similar
  • 00:31:04
    searches why do we want to do something
  • 00:31:06
    like that well let's imagine you go
  • 00:31:08
    ahead and you take some internal
  • 00:31:09
    anthropic documents and they're about me
  • 00:31:12
    you might ask a question like who is
  • 00:31:13
    Ellie shopic you might ask a question
  • 00:31:15
    like who is that Ellie person whose last
  • 00:31:17
    name I can't pronounce or who is that
  • 00:31:19
    Ellie person who does not stop talking
  • 00:31:20
    about tool use and rag they're all
  • 00:31:22
    actually the same question but they're
  • 00:31:23
    all somewhat similar which means that
  • 00:31:25
    when we go ahead and we try to retrieve
  • 00:31:27
    doc doents related to whatever that may
  • 00:31:29
    be we can't just use an exact search so
  • 00:31:32
    what we do is we take our text we break
  • 00:31:34
    it down into embeddings in order to do
  • 00:31:37
    that we make use of an embedding model
  • 00:31:39
    which is actually a different kind of
  • 00:31:40
    model that instead of trying to Output
  • 00:31:42
    the next token or so on it actually
  • 00:31:43
    outputs a bunch of embeddings those
  • 00:31:45
    embeddings those numerical
  • 00:31:47
    representations refer to a particular
  • 00:31:49
    piece of meaning in a certain
  • 00:31:51
    dimensional space so you might have 500
  • 00:31:54
    300 2,000 5,000 Dimensions with each
  • 00:31:57
    particular vector representing a
  • 00:31:59
    particular part of a semantic meaning of
  • 00:32:01
    some text we take all that text and we
  • 00:32:04
    store it externally very commonly that
  • 00:32:06
    is done in a vector store so you might
  • 00:32:08
    have heard of vector databases or so on
  • 00:32:10
    these are essentially data stores for
  • 00:32:11
    all of our
  • 00:32:13
    embeddings before we can do any of the
  • 00:32:15
    retrieval and so on we got to do that
  • 00:32:16
    work very commonly this is part of your
  • 00:32:18
    rag pipeline or your pre-processing part
  • 00:32:21
    of data this is where there's actually a
  • 00:32:22
    decent amount of engineering and
  • 00:32:24
    trickery that happens how do I make sure
  • 00:32:26
    using the right embedding model for my
  • 00:32:27
    use case how do I make sure I'm chunking
  • 00:32:29
    appropriately and so on that's a lot of
  • 00:32:30
    work fortunately there are tools
  • 00:32:32
    especially in the Amazon Bedrock
  • 00:32:34
    ecosystem like Amazon Bedrock knowledge
  • 00:32:36
    bases which are wonderful tools for
  • 00:32:38
    helping you with a lot of that work so
  • 00:32:40
    you could do that engineering Yourself
  • 00:32:41
    by all means but there are also many
  • 00:32:43
    kind of infrastructures of service
  • 00:32:44
    platforms to help you with
  • 00:32:46
    that so before we do any of that
  • 00:32:48
    retrieval we got to do that work that
  • 00:32:50
    chunking that pre-processing getting
  • 00:32:51
    that stuff in a vector database once
  • 00:32:53
    we've got it all in a vector database
  • 00:32:54
    and we feel great about our embeddings
  • 00:32:57
    now we can start going ahead and
  • 00:32:58
    retrieving
  • 00:33:00
    information and here is how that
  • 00:33:02
    traditionally works we might have a
  • 00:33:03
    situation where a user asks a question I
  • 00:33:05
    want to get my daughter more interested
  • 00:33:06
    in science what kind of gifts should I
  • 00:33:08
    get her we then take that query we then
  • 00:33:11
    take that information and we embed it we
  • 00:33:13
    turn it into an embedding we turn it
  • 00:33:14
    into a list of a bunch of floating Point
  • 00:33:16
    numbers and then we go ahead and we take
  • 00:33:18
    that embedding and we go back to our
  • 00:33:20
    Vector database and we say what similar
  • 00:33:22
    results do I have what similar search
  • 00:33:24
    results do I have for that particular
  • 00:33:26
    thing that is the retrieval part once we
  • 00:33:29
    have those results we then augment our
  • 00:33:32
    prompt a little bit with some Dynamic
  • 00:33:34
    information and generate a response so
  • 00:33:37
    that's the A and the g in rag that's the
  • 00:33:39
    that's the quick and easy part but that
  • 00:33:40
    r that retrieval that process of making
  • 00:33:42
    sure that the data somewhere is living
  • 00:33:45
    in some store where I can search by
  • 00:33:46
    similar meanings or hybrid searches or
  • 00:33:48
    so on and I can get a result that's
  • 00:33:50
    where it gets a little bit more
  • 00:33:52
    interesting to give you a little bit of
  • 00:33:54
    a visualization of this rag architecture
  • 00:33:55
    and again if you want to see this in
  • 00:33:56
    action come take a look at our Dem Booth
  • 00:33:58
    we got some really cool quick starts of
  • 00:33:59
    actually rag in application using tools
  • 00:34:01
    like knowledge bases we're going to take
  • 00:34:03
    that question we're going to embed it
  • 00:34:05
    we're going to turn it into that long
  • 00:34:07
    list of floating Point numbers we're
  • 00:34:09
    then going to go ahead and execute some
  • 00:34:10
    kind of similarity search that
  • 00:34:12
    similarity search can be a variation of
  • 00:34:14
    a couple algorithms you might be
  • 00:34:15
    familiar with Manhattan distance or
  • 00:34:17
    ukian distance or cosine similarity or
  • 00:34:19
    dot product it's really just trying to
  • 00:34:21
    find two similar points in some kind of
  • 00:34:23
    dimensional space once we find those
  • 00:34:26
    similar results we can go ahead and take
  • 00:34:29
    those results that we got back and
  • 00:34:30
    augment our prompt and generate a
  • 00:34:32
    completion so again that R in retrieval
  • 00:34:35
    is really where the tricky part
  • 00:34:37
    happens the reason why I brought up tool
  • 00:34:39
    use earlier is because you could imagine
  • 00:34:42
    a situation where you ask Claude
  • 00:34:44
    something like who was the president of
  • 00:34:46
    the United States in
  • 00:34:48
    1776 well Claude doesn't need to do
  • 00:34:50
    something like ah let me go ahead and
  • 00:34:51
    embed that and go to the vector database
  • 00:34:53
    and so on that's something that Claud
  • 00:34:54
    should hopefully know and I feel pretty
  • 00:34:55
    good about that one so cla's just going
  • 00:34:57
    to give your
  • 00:34:58
    response so instead of jumping to
  • 00:35:00
    something like rag maybe you just want
  • 00:35:02
    to use claud's knowledge out of the box
  • 00:35:05
    this is where tool use can be very
  • 00:35:06
    helpful you might have a situation where
  • 00:35:08
    a query comes in and Claude basically
  • 00:35:11
    tries to figure out if it knows the
  • 00:35:12
    answer or not and you can even through
  • 00:35:15
    prompt engineering be very intentional
  • 00:35:16
    about what it means to know or not know
  • 00:35:18
    and if Claude says something like I
  • 00:35:20
    don't know well then go ahead let's go
  • 00:35:22
    use that tool and we probably got to go
  • 00:35:23
    find something or hey if a question
  • 00:35:25
    comes in and it's past your knowledge
  • 00:35:26
    cutoff date or if a question comes in
  • 00:35:28
    about some document that you do not
  • 00:35:30
    understand go ahead and use that tool
  • 00:35:32
    then we'll go do all the retrieval part
  • 00:35:34
    of things cuz you can imagine taking
  • 00:35:36
    your data embedding it turning into a
  • 00:35:38
    list and so on it's going to be a little
  • 00:35:39
    bit timec consuming it's going to
  • 00:35:40
    require some engineering and so on so if
  • 00:35:42
    you don't have to do that that would be
  • 00:35:44
    ideal what we also see a lot with
  • 00:35:46
    production grade rag pipelines is it's
  • 00:35:48
    not as simple as just having one vector
  • 00:35:50
    data store for one particular set of
  • 00:35:52
    data you can imagine a very very large
  • 00:35:54
    scale system is doing quite a bit of
  • 00:35:56
    embeddings over quite a few different
  • 00:35:57
    kinds of databases and data stores so
  • 00:36:00
    depending on your particular use case
  • 00:36:02
    you might not just stuff everything in
  • 00:36:04
    one vector data store you might actually
  • 00:36:05
    have multiple ones with multiple tools
  • 00:36:07
    trying to figure out how to interact
  • 00:36:09
    with your application and generate the
  • 00:36:11
    correct
  • 00:36:13
    completion another really powerful tool
  • 00:36:15
    that I want to point you towards and
  • 00:36:16
    we'll talk a little bit more about this
  • 00:36:16
    with the idea of contextual retrieval is
  • 00:36:18
    the idea of actually being able to use a
  • 00:36:20
    model to rewrite your query the classic
  • 00:36:23
    example here is you might have some kind
  • 00:36:24
    of customer support situation where I
  • 00:36:27
    say something like my username is Ellie
  • 00:36:29
    and I just want to let you know I am
  • 00:36:30
    pissed this service is terrible I am
  • 00:36:32
    very unhappy I've been waiting for 3
  • 00:36:33
    hours to talk to a human and all I get
  • 00:36:35
    is this chatbot I am feeling miserable
  • 00:36:37
    at the end of the day what we really
  • 00:36:39
    need to do is just look up Ellie so can
  • 00:36:41
    we instead of taking all that data and
  • 00:36:43
    all that you know frustration that Ellie
  • 00:36:44
    has maybe we'll handle that in a
  • 00:36:46
    different way instead of taking that and
  • 00:36:47
    embedding it and trying to find similar
  • 00:36:49
    searches to a frustrated customer can we
  • 00:36:51
    instead rewrite the query just to focus
  • 00:36:53
    on the information that we need this is
  • 00:36:56
    very commonly done by kind of throwing a
  • 00:36:58
    a smaller model that can do a little bit
  • 00:36:59
    more of that classification or rewriting
  • 00:37:02
    to essentially get you better results so
  • 00:37:04
    with rag with these ideas of rag
  • 00:37:06
    pipelines and so on there's so much
  • 00:37:07
    complexity that you can start throwing
  • 00:37:09
    on top but my goal here for this session
  • 00:37:11
    is just to make sure you have at a high
  • 00:37:12
    level an understanding of what it is how
  • 00:37:14
    it works why you might want to use it
  • 00:37:15
    for your particular use
  • 00:37:17
    case when you think about that chunking
  • 00:37:19
    process we talked a bit about breaking
  • 00:37:21
    your data into smaller sections I want
  • 00:37:23
    to point you towards this format that we
  • 00:37:25
    recommend and again you can take a look
  • 00:37:26
    at our documentation for more examples
  • 00:37:27
    of this but what's really important when
  • 00:37:29
    you break up your data is that you give
  • 00:37:31
    it some kind of meta data so that we can
  • 00:37:34
    search for it not only by the
  • 00:37:35
    information but also retrieve some
  • 00:37:37
    metadata so you can see up here we have
  • 00:37:39
    this documents tag and inside we have a
  • 00:37:41
    document subtag with an index of one
  • 00:37:44
    those kind of attributes that we're
  • 00:37:46
    adding here you can treat as metadata or
  • 00:37:48
    higher level information about the
  • 00:37:50
    particular document that could be very
  • 00:37:51
    helpful when doing retrieval to get
  • 00:37:53
    things like information aside from the
  • 00:37:55
    source that I might want to know some
  • 00:37:56
    unique identi maybe the author of that
  • 00:37:58
    particular piece or such in your prompt
  • 00:38:01
    you can then refer to those by their
  • 00:38:02
    indic or their metadata for that kind of
  • 00:38:05
    filtering so you'll see ideas around
  • 00:38:06
    metadata filtering as well to improve
  • 00:38:08
    General retrieval
  • 00:38:11
    performance what's really interesting
  • 00:38:12
    about R I would say at this point is it
  • 00:38:15
    is not going anywhere but it is
  • 00:38:16
    constantly constantly constantly
  • 00:38:18
    evolving the first thing that I want to
  • 00:38:19
    talk about that is really kind of
  • 00:38:20
    leading us to a slight change in the way
  • 00:38:23
    that we think about rag is a tool that
  • 00:38:24
    we have had in our first party API but
  • 00:38:26
    that we actually just released with
  • 00:38:28
    Amazon Bedrock that is the idea of
  • 00:38:29
    prompt caching prompt caching is the
  • 00:38:32
    idea of instead of taking some document
  • 00:38:35
    and putting it in your prompt and then
  • 00:38:37
    on each conversation we got to generate
  • 00:38:39
    all the tokens again for that particular
  • 00:38:41
    document that's going to get a little
  • 00:38:43
    expensive that's going to be a bit time
  • 00:38:44
    consuming can we instead explicitly say
  • 00:38:47
    I'm going to give you these documents I
  • 00:38:49
    want you to cash these documents they
  • 00:38:51
    are not going to change these are static
  • 00:38:52
    pieces of information and then on all
  • 00:38:54
    subsequent prompts don't go ahead and
  • 00:38:56
    regenerate all the tokens for that just
  • 00:38:58
    find them in the cache if you're
  • 00:39:00
    familiar with caching with any kind of
  • 00:39:02
    architecture it's a very similar kind of
  • 00:39:03
    idea find and store this information in
  • 00:39:06
    a quick retrieval process because it is
  • 00:39:08
    not going to change anytime
  • 00:39:10
    soon why is this meaningful with rag
  • 00:39:13
    because instead of taking documents and
  • 00:39:15
    chunking them and embedding them and
  • 00:39:17
    storing them somewhere you can actually
  • 00:39:18
    take your documents and put them in the
  • 00:39:20
    prompt itself and you can cash those
  • 00:39:22
    documents to then retrieve information
  • 00:39:25
    very very quickly without doing all the
  • 00:39:26
    embedding and so on obviously context
  • 00:39:29
    window size is going to take play here
  • 00:39:31
    so if you have a lot a lot a lot of
  • 00:39:32
    documents you still can't do that but at
  • 00:39:34
    the same time context windows are vastly
  • 00:39:37
    improving in size compared to where we
  • 00:39:39
    were 6 months ago a year ago two years
  • 00:39:41
    ago we are nearing worlds where we will
  • 00:39:42
    have millions tens of millions of tokens
  • 00:39:44
    in context windows so when you think
  • 00:39:46
    about combining prompt caching taking a
  • 00:39:48
    tremendous amount of your documents
  • 00:39:50
    caching them and then leveraging a very
  • 00:39:52
    very large context window you can
  • 00:39:54
    potentially avoid the need for having to
  • 00:39:55
    do that chunking and so on so right now
  • 00:39:58
    if your data is of a medium size you
  • 00:39:59
    really got to jump to rag if your data
  • 00:40:01
    is of a large size you got to jump to
  • 00:40:02
    rag even a smaller size depending on
  • 00:40:04
    what you're working with can we instead
  • 00:40:06
    as we start to see prom caching get more
  • 00:40:08
    and more widespread as we start to get
  • 00:40:09
    context windows that are even larger can
  • 00:40:11
    we start to shift that a little bit
  • 00:40:13
    towards not jumping for rag not needing
  • 00:40:15
    to worry about these massive pipelines
  • 00:40:17
    and infrastructure and instead just
  • 00:40:18
    putting that in the context of the
  • 00:40:20
    prompt what's interesting as well we
  • 00:40:22
    talked about Vector databases talked
  • 00:40:24
    about this idea of embeddings and
  • 00:40:25
    embedding models but there's always a
  • 00:40:27
    lot of research there's always a lot of
  • 00:40:29
    questions for is this the best approach
  • 00:40:30
    is this really what we want to do
  • 00:40:32
    there's a lot of really interesting
  • 00:40:33
    emerging research for using different
  • 00:40:34
    data structures for storing the kind of
  • 00:40:37
    retrieval that you want so instead of
  • 00:40:39
    using a vector database a list of large
  • 00:40:41
    floating Point numbers can we instead
  • 00:40:43
    use a different data structure like a
  • 00:40:44
    graph can we instead think of our data
  • 00:40:46
    as just a series of nodes interconnected
  • 00:40:49
    by edges and when we try to retrieve our
  • 00:40:51
    data find those similar nodes with more
  • 00:40:53
    meaning as opposed to a general semantic
  • 00:40:56
    search while this is not yet something
  • 00:40:58
    that's very very large in production
  • 00:40:59
    there's a lot of really interesting
  • 00:41:00
    research around this idea of graph Rag
  • 00:41:02
    and knowledge graphs that may lead us to
  • 00:41:04
    potentially not have to use Vector
  • 00:41:05
    stores and get better performance with
  • 00:41:07
    retrieval we going talk a little bit
  • 00:41:09
    soon about the idea of contextual
  • 00:41:11
    retrieval this is really interesting
  • 00:41:12
    research that we put out and again for
  • 00:41:14
    those of you that have been to the booth
  • 00:41:15
    you know that we have a section on
  • 00:41:16
    Research as well so I welcome you to
  • 00:41:17
    come take a look at that chat about the
  • 00:41:18
    research if you're interested especially
  • 00:41:20
    in things like interpretability how our
  • 00:41:22
    models are behaving from the inside out
  • 00:41:24
    trying to make sense of that contextual
  • 00:41:26
    retrieval that I'll talk about we got a
  • 00:41:27
    lot of really good uh good folks to talk
  • 00:41:28
    to about that kind of
  • 00:41:30
    stuff as we mentioned embedding models
  • 00:41:32
    are constantly changing there are a wide
  • 00:41:34
    variety of embedding models for all
  • 00:41:35
    different kinds of providers from open
  • 00:41:37
    source to commercial providers for all
  • 00:41:38
    different kinds of dimensions and
  • 00:41:39
    pricing and so on these are always
  • 00:41:41
    getting better it's also a very very
  • 00:41:43
    large world in the reranking space I'll
  • 00:41:45
    talk a little bit about reranking your
  • 00:41:46
    results when you get them back we're
  • 00:41:48
    also seeing a lot of improvement for
  • 00:41:50
    measuring the effectiveness of rag
  • 00:41:52
    through evaluations this can be done
  • 00:41:54
    using platforms that are Enterprise
  • 00:41:55
    grade this could also be done using
  • 00:41:57
    using Open Source Products like promp Fu
  • 00:41:59
    there's also an entire evaluation
  • 00:42:00
    framework called rag ass or rag
  • 00:42:02
    assessments that basically will analyze
  • 00:42:04
    the relevance the accuracy really
  • 00:42:06
    important metrics for whether you're
  • 00:42:07
    doing a good job or not with your rag
  • 00:42:10
    pipeline again I'll add a really
  • 00:42:11
    important bullet point here because in a
  • 00:42:13
    second we'll talk about fine-tuning
  • 00:42:15
    other techniques like model distillation
  • 00:42:16
    to try to teach the model new things and
  • 00:42:18
    extract more knowledge and introduce
  • 00:42:20
    Behavior change but before you jump to
  • 00:42:21
    any of that even before you try to go
  • 00:42:24
    crazy with your rag pipeline think a lot
  • 00:42:26
    about prompting can get a lot a lot a
  • 00:42:28
    lot of wins with very minimal
  • 00:42:29
    engineering and effort through prompting
  • 00:42:31
    so even with Rag and other options there
  • 00:42:33
    are always
  • 00:42:34
    optimizations I mentioned a little bit
  • 00:42:36
    about this idea of contextual retrieval
  • 00:42:38
    so I want to point you towards this idea
  • 00:42:40
    and research that we have here instead
  • 00:42:42
    of Performing the traditional approach
  • 00:42:44
    like we mentioned of taking your Corpus
  • 00:42:45
    of data and breaking it up into chunks
  • 00:42:48
    before you break it up into chunks what
  • 00:42:50
    we're going to do is we're actually
  • 00:42:52
    going to bring cloud back in the mix
  • 00:42:54
    we're going to run a little bit of
  • 00:42:56
    prompting on each of those chunks to
  • 00:42:59
    provide some context we're going to give
  • 00:43:00
    it 50 or 100 or so extra tokens which
  • 00:43:03
    reference the context of that chunk
  • 00:43:05
    what's a classic example here let's say
  • 00:43:07
    we take a a large 10k of a very very
  • 00:43:10
    large publicly traded company maybe
  • 00:43:12
    let's go with Amazon that seems relevant
  • 00:43:14
    we go ahead and we take our giant 10K
  • 00:43:17
    and we go ahead and we chunk it and one
  • 00:43:18
    of the sections in that chunk is
  • 00:43:20
    something like Revenue increased 37%
  • 00:43:22
    while cost decreased by 12% and
  • 00:43:24
    operating income was up 63%
  • 00:43:27
    seems like a reasonable thing we can
  • 00:43:28
    chunk but if someone goes and searches
  • 00:43:31
    for that particular chunk and they say
  • 00:43:32
    how's the company doing from a operating
  • 00:43:35
    perspective we don't know if that chunk
  • 00:43:37
    is is that the last quarter is that a
  • 00:43:38
    forecast for the next quarter was that
  • 00:43:40
    from last year what does that refer to
  • 00:43:42
    what is the context of that information
  • 00:43:44
    within the scope of the document and the
  • 00:43:46
    goal of contextual retrieval is simply
  • 00:43:48
    just to add a little bit more context to
  • 00:43:50
    each of those chunks so that when we
  • 00:43:52
    retrieve we can get much more accurate
  • 00:43:54
    results we don't just get a block of
  • 00:43:55
    text we get a block of text with a
  • 00:43:57
    little bit of context for what it refers
  • 00:43:59
    to you can take a look in our research
  • 00:44:01
    for what this prompt looks like how it's
  • 00:44:03
    run the different kinds of searches that
  • 00:44:05
    we have you can also see here that
  • 00:44:07
    instead of just using an embedding model
  • 00:44:09
    we're actually doing what's called a
  • 00:44:10
    hybrid search so that we're performing
  • 00:44:12
    the semantic search but we're also
  • 00:44:13
    performing other popular kinds of
  • 00:44:15
    searches a very common one called bm25
  • 00:44:17
    or best match 25 it's kind of that TF
  • 00:44:20
    IDF similarity search right there so
  • 00:44:22
    when performing these kinds of large rag
  • 00:44:24
    pipelines and pre-processing we think a
  • 00:44:26
    lot of about not only what we're going
  • 00:44:28
    to store but also how we're going to
  • 00:44:30
    retrieve and for those of you that have
  • 00:44:31
    questions I welcome you to ask those and
  • 00:44:33
    also take a look at our uh our booth if
  • 00:44:35
    you have questions on those so we talked
  • 00:44:37
    a bit about tool use we talked a bit
  • 00:44:39
    about prompting we talked a bit about
  • 00:44:42
    the idea of taking llms and extending
  • 00:44:44
    their functionality we saw some really
  • 00:44:45
    awesome demos of how you can essentially
  • 00:44:47
    use cloud to analyze screenshots and
  • 00:44:49
    perform actions and so on and what that
  • 00:44:51
    really leads us to is shifting from just
  • 00:44:54
    the tool to the eventual teammate that
  • 00:44:56
    you can work
  • 00:44:58
    with as the complexity of your use case
  • 00:45:01
    grows the technical investment you need
  • 00:45:03
    to make increases as well you can
  • 00:45:05
    imagine you might have something like a
  • 00:45:07
    classification task tell me if this is
  • 00:45:08
    Spam tell me if this is not spam tell me
  • 00:45:10
    if this is hot dog tell me if this is
  • 00:45:11
    not hot dog that's a relatively easier
  • 00:45:14
    task in the scheme of things we provide
  • 00:45:16
    enough examples we have enough
  • 00:45:17
    intelligence to do so we can lean on
  • 00:45:19
    models that maybe are a little bit more
  • 00:45:20
    cost effective potentially easier on the
  • 00:45:22
    latency side of things because we can
  • 00:45:24
    solve those we move on to summarization
  • 00:45:27
    our question and answer but the second
  • 00:45:29
    that we really start to stretch our
  • 00:45:30
    imagination to what we can do with these
  • 00:45:32
    tools like models taking independent
  • 00:45:34
    actions on their own models being given
  • 00:45:37
    a task and then models required to plan
  • 00:45:40
    out an action remember previous actions
  • 00:45:43
    with some kind of memory course correct
  • 00:45:45
    when things go wrong make use of a wide
  • 00:45:48
    variety of tools follow very complex
  • 00:45:50
    flows of instruction that's where we
  • 00:45:52
    really start to shift things more
  • 00:45:54
    towards this idea of Agents
  • 00:45:57
    what is an agent what is an llm agent
  • 00:45:59
    it's a system that combines large
  • 00:46:00
    language models with the ability to take
  • 00:46:02
    actions in the real world or digital
  • 00:46:03
    environments and what do they include
  • 00:46:06
    what we have right here I'll give you
  • 00:46:07
    the very high level definition and we'll
  • 00:46:09
    talk about some of the diagrams with a
  • 00:46:10
    bit more interest at the end of the day
  • 00:46:13
    I want you to think of an agent as just
  • 00:46:14
    three things a model in this case Claude
  • 00:46:16
    35 Sonet a bunch of tools or functions
  • 00:46:20
    that the agent can use I'm going to give
  • 00:46:22
    you a tool like the ability to search
  • 00:46:23
    the web I'm going to give you a tool
  • 00:46:25
    like the ability to analyze a screen
  • 00:46:26
    screenshot I'm going to give you a tool
  • 00:46:28
    like the ability to take a look at the
  • 00:46:29
    terminal and figure out what command to
  • 00:46:31
    pass
  • 00:46:32
    in and then we're going to go ahead and
  • 00:46:34
    give it a goal and then we're just going
  • 00:46:37
    to let it let it go what do we mean by
  • 00:46:39
    Let It Go essentially if you're familiar
  • 00:46:42
    with python if you look at our computer
  • 00:46:43
    reference documentation it's while true
  • 00:46:46
    it's an infinite Loop go ahead and take
  • 00:46:48
    the next bit of data and execute on it
  • 00:46:51
    take the next bit of data execute on it
  • 00:46:52
    take the ne next bit of data execute on
  • 00:46:54
    it again and again and again and again
  • 00:46:56
    so I'm going to give you the goal of
  • 00:46:57
    trying to find something it's probably
  • 00:46:59
    going to require 5 10 15 20 steps and
  • 00:47:01
    just like a human would the human would
  • 00:47:03
    probably plan what do I need to do I'm
  • 00:47:05
    probably going to have to open up the
  • 00:47:05
    browser and search for something and
  • 00:47:07
    then going to go ahead and cross
  • 00:47:08
    reference that with something that I
  • 00:47:09
    have in a document and then I'm going to
  • 00:47:10
    take that information and I'll email it
  • 00:47:11
    to someone and then maybe I'll go ahead
  • 00:47:13
    and go back to my text editor and make
  • 00:47:15
    that change and push that up to GitHub
  • 00:47:16
    and submit a poll request and then
  • 00:47:17
    communicate with my team lead about the
  • 00:47:19
    change I made these are all things that
  • 00:47:21
    when I say it very quickly you're like
  • 00:47:22
    wow that's a lot but think about what we
  • 00:47:24
    do every day many many different kinds
  • 00:47:26
    of tasks very very quickly all in
  • 00:47:28
    sequence we have the ability to plan out
  • 00:47:30
    what we want to do we have the tools at
  • 00:47:32
    our disposal or at least Claude and the
  • 00:47:34
    internet to figure out the tools that we
  • 00:47:36
    need and we have the memory to remember
  • 00:47:38
    what needs to be
  • 00:47:39
    done when you think about agents and
  • 00:47:41
    agentic workflows it's really just an
  • 00:47:43
    extension of all the things you've seen
  • 00:47:45
    before it's an extension of tool use
  • 00:47:48
    it's an extension of prompting it's
  • 00:47:50
    combining those together to perform
  • 00:47:52
    tasks just like we do in the real world
  • 00:47:55
    instead of a single task instead of a
  • 00:47:56
    sing single turn we're talking about
  • 00:47:58
    longer multi- turn
  • 00:47:59
    examples I really like this slide
  • 00:48:01
    because it drives a strong analogy for
  • 00:48:03
    those of you familiar with building web
  • 00:48:04
    applications essentially what it looks
  • 00:48:06
    like with building agents the same way
  • 00:48:08
    that you might start with building a
  • 00:48:09
    static page you might have some HTML
  • 00:48:11
    some CSS some JavaScript in the context
  • 00:48:14
    of an agent that's a pretty
  • 00:48:15
    straightforward
  • 00:48:16
    conversation as you start thinking about
  • 00:48:18
    interactivity as you start adding
  • 00:48:20
    JavaScript and you're handling clicks
  • 00:48:21
    and you're handling interactions this is
  • 00:48:23
    where you know the prompts have to get a
  • 00:48:25
    little bit more complex so the same way
  • 00:48:27
    that you think about expanding into the
  • 00:48:29
    world of agents and leveraging more in
  • 00:48:30
    the geni space really try to draw that
  • 00:48:32
    analogy to software other things that
  • 00:48:34
    you might be familiar with what's the
  • 00:48:36
    business use case what am I use what do
  • 00:48:37
    my users need what am I trying to build
  • 00:48:39
    for at the end of the day my application
  • 00:48:41
    grows larger I got to start separating
  • 00:48:43
    JavaScript files or whatever language
  • 00:48:45
    that I'm working with at this point if
  • 00:48:48
    you're doing this in a generative
  • 00:48:49
    generative AI world you know we're we're
  • 00:48:51
    improving the prompts we're not getting
  • 00:48:52
    too crazy as we start thinking more
  • 00:48:55
    about building a website as we start
  • 00:48:57
    moving to Frameworks as we start
  • 00:48:58
    thinking about breaking things into
  • 00:48:59
    microservices and distributed systems
  • 00:49:01
    and so on as we start reaching larger
  • 00:49:03
    scale that's where I want you to kind of
  • 00:49:05
    draw the parallel to this is where
  • 00:49:07
    agents can really come into play this is
  • 00:49:09
    where the idea of agentic workflows come
  • 00:49:10
    in so what's really nice about this just
  • 00:49:12
    from a visual perspective is if you're
  • 00:49:14
    familiar with things like software
  • 00:49:15
    you're familiar with the analogy of
  • 00:49:17
    building a website and so on you can
  • 00:49:19
    draw that parallel to where things like
  • 00:49:21
    Agents come into the
  • 00:49:23
    mix we're still relatively early in the
  • 00:49:25
    world of agents in terms terms of how we
  • 00:49:27
    think about memory how we think about
  • 00:49:28
    planning but what's really really
  • 00:49:30
    powerful about these models and
  • 00:49:31
    something that even Maggie spoke a
  • 00:49:32
    little bit about some of the benchmarks
  • 00:49:34
    that we're seeing with model performance
  • 00:49:36
    especially with 35 Sonet and even 35 Hau
  • 00:49:38
    on the coding front make us more and
  • 00:49:40
    more confident that it can perform the
  • 00:49:42
    tasks necessary that agents need to do
  • 00:49:45
    so what makes the model the right choice
  • 00:49:46
    for agentic workflows it's got to be
  • 00:49:48
    really really good at following
  • 00:49:50
    instructions and just like humans some
  • 00:49:52
    models are not great at following lots
  • 00:49:54
    and lots and lots of instructions if I
  • 00:49:55
    were to give the model hundreds and
  • 00:49:56
    hundreds of tasks to do could it
  • 00:49:58
    remember those tasks could it handle it
  • 00:49:59
    in the appropriate order could it figure
  • 00:50:01
    out when something went wrong and go
  • 00:50:02
    back and correct itself so what's really
  • 00:50:05
    exciting about this kind of agentic
  • 00:50:06
    workflow and situation that we're in is
  • 00:50:08
    that we're also starting to develop
  • 00:50:09
    benchmarks that can really determine the
  • 00:50:12
    effectiveness of these models for those
  • 00:50:14
    of you that are interested in digging in
  • 00:50:15
    that a little bit more I really
  • 00:50:16
    recommend taking a look at sbench or swe
  • 00:50:19
    bench this is a benchmark that basically
  • 00:50:22
    takes a model takes a open source
  • 00:50:24
    repository so it could be a very large
  • 00:50:26
    codebase something like jeno or very
  • 00:50:28
    large python code base and it basically
  • 00:50:31
    puts the model in a situation where it
  • 00:50:33
    says here's the code base here is the
  • 00:50:35
    issue go and write the poll request go
  • 00:50:37
    and write the code necessary to get
  • 00:50:39
    tests to pass and also make sure that
  • 00:50:41
    things don't break these kinds of
  • 00:50:43
    benchmarks are really the foundation for
  • 00:50:45
    how we can determine how we can feel
  • 00:50:46
    confident that the models don't just
  • 00:50:48
    understand ideas or answer highle
  • 00:50:50
    questions about code but actually
  • 00:50:51
    perform things similar to what humans do
  • 00:50:53
    so definitely recommend on a lot of our
  • 00:50:55
    documentation a lot of the kind of model
  • 00:50:57
    cards that we have taking a look at s
  • 00:50:58
    bench and then another one tow bench for
  • 00:51:00
    following particular actions really
  • 00:51:02
    really interesting data source for how
  • 00:51:04
    we think about the effectiveness of
  • 00:51:05
    these
  • 00:51:07
    models the last piece I want to talk
  • 00:51:08
    about is an idea called fine-tuning I'll
  • 00:51:10
    also talk a little bit about the idea of
  • 00:51:11
    model
  • 00:51:12
    distillation we're in a situation where
  • 00:51:15
    prompting is not getting us where we
  • 00:51:17
    need to go rag is not getting us where
  • 00:51:19
    we need to go so there are other options
  • 00:51:22
    for trying to improve the performance of
  • 00:51:25
    your model it always sounds very
  • 00:51:27
    exciting when you say some improve the
  • 00:51:28
    performance of your model so see a lot
  • 00:51:29
    of heads go up of like cool how do I do
  • 00:51:31
    that and it's very tempting to look at
  • 00:51:32
    this and say ah fine tuning is the
  • 00:51:34
    answer but fine-tuning is just one way
  • 00:51:36
    to try to solve a certain problem so I
  • 00:51:37
    want to do my best to kind of give you a
  • 00:51:39
    little bit of an overview of what
  • 00:51:41
    fine-tuning is how it works in a
  • 00:51:43
    nutshell fine tuning something you can
  • 00:51:45
    do through the Bedrock interface with
  • 00:51:47
    Hau 3 it's the ability to take a curated
  • 00:51:51
    data set this is also called supervised
  • 00:51:53
    fine-tuning so if you're familiar with
  • 00:51:54
    that idea in machine learning we are
  • 00:51:56
    essentially building a curated set of
  • 00:51:58
    data of inputs and outputs that we would
  • 00:52:02
    like the model to respond we're
  • 00:52:03
    basically giving it the question and the
  • 00:52:04
    answer so this is not unstructured we
  • 00:52:06
    are not giving it a question and let
  • 00:52:08
    letting it figure out the answer we are
  • 00:52:09
    basically going to curate a highquality
  • 00:52:11
    data set which again that's going to
  • 00:52:13
    take time that's going to take effort
  • 00:52:14
    that's going to take a lot of thought
  • 00:52:15
    into what data you can use to move
  • 00:52:17
    things forward and what we're going to
  • 00:52:19
    do is we're going to take that long long
  • 00:52:21
    long list of inputs and outputs and we
  • 00:52:24
    essentially are going to take our base
  • 00:52:25
    model this could to be something like
  • 00:52:27
    hiq 3 and we're going to go ahead and
  • 00:52:31
    run that data through a training set
  • 00:52:35
    using Hardware this could be some of the
  • 00:52:36
    hardware that Amazon provides and we're
  • 00:52:38
    going to go ahead and we're going to
  • 00:52:39
    come out with a custom model so what
  • 00:52:41
    we're actually going to do we're update
  • 00:52:42
    the underlying weights of the model and
  • 00:52:44
    in the context of fine tuning there are
  • 00:52:45
    many different kinds of aspects for
  • 00:52:47
    updating certain kinds of weights or
  • 00:52:49
    parameter efficient weights and so on
  • 00:52:50
    but what we're actually doing is
  • 00:52:52
    updating the underlying model weights to
  • 00:52:54
    then produce a custom model
  • 00:52:57
    we then evaluate that custom model and
  • 00:52:59
    hope that it does better at a particular
  • 00:53:01
    task the thing you want to be really
  • 00:53:03
    careful of with fine-tuning just looking
  • 00:53:05
    at all of these data is really hard to
  • 00:53:07
    come by that's high quality you can
  • 00:53:09
    introduce data to your model and your
  • 00:53:11
    model can become worse so fine tuning is
  • 00:53:13
    not a guarantee everything gets better
  • 00:53:14
    right away that's why we have a applied
  • 00:53:16
    AI team that focuses strictly on fine
  • 00:53:18
    tuning at our company you want to be
  • 00:53:20
    mindful of the model that you're working
  • 00:53:21
    with you use hiou 3 and then another
  • 00:53:23
    model comes out that is far more
  • 00:53:25
    intelligent and then all of a sudden you
  • 00:53:26
    have a custom model and you can't just
  • 00:53:27
    undo this so the math right here doesn't
  • 00:53:30
    really allow for subtraction or undoing
  • 00:53:32
    once you go ahead and you run that
  • 00:53:34
    training and you spend the money in the
  • 00:53:35
    compute you got that custom model you
  • 00:53:37
    can go ahead and do this over and over
  • 00:53:38
    again but this is not a reversible
  • 00:53:41
    decision when should you consider
  • 00:53:43
    something like
  • 00:53:44
    fine-tuning the most common use case I
  • 00:53:47
    want you to think about with fine-tuning
  • 00:53:48
    is introducing Behavior change you want
  • 00:53:51
    to follow a specific kind of schema you
  • 00:53:53
    want to follow a particular kind of
  • 00:53:55
    format people always like to get the
  • 00:53:56
    analogy of Talk Like a Pirate but I want
  • 00:53:59
    you to think of that for a particular
  • 00:54:00
    use case you want the model to Output
  • 00:54:02
    things in a certain format you want the
  • 00:54:03
    model to be a little bit more
  • 00:54:04
    constrained with how it calls apis or
  • 00:54:07
    references particular documents this is
  • 00:54:09
    where you're going to find more
  • 00:54:11
    likelihood with fine-tuning again you're
  • 00:54:13
    going to see that rag is also a very
  • 00:54:14
    viable option for a couple of these
  • 00:54:16
    other tasks especially the latter ones
  • 00:54:17
    that I'll talk about in a second but
  • 00:54:19
    just remember the trade-offs here there
  • 00:54:21
    are many situations where you have Rag
  • 00:54:22
    and fine tuning but when you think about
  • 00:54:24
    rag that is something that you can
  • 00:54:26
    iterate on that's something that you can
  • 00:54:27
    change with your pipeline when you're
  • 00:54:29
    dealing with prompting that's also
  • 00:54:30
    something you can constantly iterate on
  • 00:54:32
    over and over again it's also why we
  • 00:54:33
    push a lot of our customers many times
  • 00:54:35
    to First focus on the prompt what can we
  • 00:54:38
    improve in the prompt what can we change
  • 00:54:39
    in the prompt what's what best practices
  • 00:54:41
    are missing from the prompt does the
  • 00:54:42
    prompt even make sense that's where we
  • 00:54:44
    want to really start thinking about
  • 00:54:46
    things if you're trying to teach the
  • 00:54:49
    model new knowledge you're trying to
  • 00:54:51
    teach the model something brand new and
  • 00:54:54
    then you hope that from that knowledge
  • 00:54:56
    it can learn other things and expand its
  • 00:54:58
    knowledge that is not very very likely
  • 00:55:01
    with fine
  • 00:55:02
    tuning fine tuning is not a great way to
  • 00:55:05
    teach the model brand new things and
  • 00:55:07
    expect it to generalize for other tasks
  • 00:55:09
    that is where we have not seen a
  • 00:55:10
    tremendous amount of success with fine
  • 00:55:12
    tuning as those algorithms change as we
  • 00:55:14
    think more about this field and so on
  • 00:55:16
    that may change but at the same point I
  • 00:55:18
    really try to anchor on this slide quite
  • 00:55:20
    a bit for thinking about that decision
  • 00:55:21
    we see a lot of customers that jump to
  • 00:55:23
    fine tuning as a this is the way to
  • 00:55:24
    solve all of my problems there actually
  • 00:55:27
    other ways to think about solving that
  • 00:55:28
    at a high level again rag very very
  • 00:55:30
    common way to go about handling those
  • 00:55:33
    pieces so in general avoiding the
  • 00:55:35
    pitfalls of fine tuning it's exciting
  • 00:55:37
    but it's not for every single use
  • 00:55:39
    case if you have any questions or
  • 00:55:41
    curious about your particular use case
  • 00:55:43
    again come talk to us we're happy to
  • 00:55:44
    talk about those particular situations
  • 00:55:45
    we have a lot of folks on the team that
  • 00:55:46
    have done a lot of Direct Customer work
  • 00:55:48
    with fine tuning happy I'm sure to
  • 00:55:49
    answer
  • 00:55:50
    questions at the end of the day before
  • 00:55:52
    you jump to rag there's a reason why we
  • 00:55:54
    have the order of things in this
  • 00:55:55
    particular presentation you want to
  • 00:55:57
    think a lot about what you can get out
  • 00:55:58
    of prompt engineering be mindful of the
  • 00:56:00
    versions you have with prompt
  • 00:56:01
    engineering iterate on your prompts as
  • 00:56:03
    you go on and then no matter what you do
  • 00:56:07
    the most important thing I want you to
  • 00:56:08
    have if you walk away from anything from
  • 00:56:09
    this presentation when you think about
  • 00:56:11
    building any kind of generative AI
  • 00:56:12
    application when you think about
  • 00:56:14
    actually going from proof of concept to
  • 00:56:15
    production you want to make sure you
  • 00:56:17
    have some kind of evaluation criteria we
  • 00:56:19
    like to call these EV vals as well if
  • 00:56:21
    you're coming from software this is
  • 00:56:22
    essentially like a unit test or maybe an
  • 00:56:24
    integration test you want to make make
  • 00:56:26
    sure that you have some kind of way in
  • 00:56:28
    your application of benchmarking the
  • 00:56:30
    performance of the model of the prompt
  • 00:56:33
    of the fine-tuning of the rag pipeline
  • 00:56:35
    if you have no way of doing that then
  • 00:56:38
    prompt Engineering in this entire
  • 00:56:39
    ecosystem just kind of becomes an art
  • 00:56:41
    and not a science so no matter what
  • 00:56:43
    you're doing it is Mission critical to
  • 00:56:45
    make sure that when you're building
  • 00:56:46
    these applications you are using some
  • 00:56:48
    kind of benchmarking or evaluation Suite
  • 00:56:50
    Amazon Bedrock provides that many open
  • 00:56:53
    source libraries and companies provide
  • 00:56:54
    that as well but just like you wouldn't
  • 00:56:56
    develop software that is Mission
  • 00:56:57
    critical or software in production
  • 00:56:59
    without any kind of testing you want to
  • 00:57:00
    make sure you have that as well so do
  • 00:57:02
    you have an evaluation with a success
  • 00:57:04
    criteria whether it's just for fine
  • 00:57:05
    tuning or rag or so on it's really one
  • 00:57:07
    of the most important pieces that you
  • 00:57:08
    can do have you tried prompt engineering
  • 00:57:11
    determine if you have a baseline with
  • 00:57:13
    that prompt engineering and again make
  • 00:57:15
    sure you have evaluation so that you're
  • 00:57:17
    really getting as much as you possibly
  • 00:57:18
    can out of the prompt we see with a lot
  • 00:57:20
    of customers not having a robust enough
  • 00:57:22
    evaluation Suite basically just means
  • 00:57:23
    we're kind of starting from scratch
  • 00:57:24
    we're building on a house of carts
  • 00:57:26
    the last part with fine tuning again
  • 00:57:28
    like we mentioned it's irreversible and
  • 00:57:30
    when you think about the data you need
  • 00:57:31
    to curate the amount of data that you
  • 00:57:33
    need that's where things are going to
  • 00:57:35
    get a little bit trickier so how do you
  • 00:57:36
    plan to build that fine-tuning data
  • 00:57:39
    set got a couple minutes left I just
  • 00:57:41
    want to wrap up with all the things that
  • 00:57:43
    we've seen here because it's a lot of
  • 00:57:44
    information but at the end of the day my
  • 00:57:45
    goal is to give you a bit of a
  • 00:57:46
    foundation here that's what I hope we
  • 00:57:48
    have talked a bit about tool use talked
  • 00:57:51
    a bit about computer use this idea idea
  • 00:57:53
    of extending Cloud's capabilities
  • 00:57:55
    Cloud's functional
  • 00:57:57
    just by providing some tools what are
  • 00:57:58
    tools just these these objects these key
  • 00:58:01
    value pairs we give it a name we give it
  • 00:58:04
    some kind of description and we provide
  • 00:58:06
    the necessary arguments or parameters
  • 00:58:08
    that that particular tool needs we then
  • 00:58:10
    let Claud do the rest of the work if a
  • 00:58:12
    prompt comes in Claude says looks like
  • 00:58:13
    someone's trying to use that tool again
  • 00:58:16
    you're going to hear a lot of things
  • 00:58:17
    like Claude controls the computer and so
  • 00:58:19
    on Claude itself is not moving the mouse
  • 00:58:23
    and clicking and opening and closing and
  • 00:58:24
    executing commands all that Claud is
  • 00:58:26
    doing is taking in some screenshot
  • 00:58:29
    interpreting the necessary command and
  • 00:58:31
    then there is underlying code that a
  • 00:58:32
    developer writes to go ahead and execute
  • 00:58:35
    that code necessary so tool use and
  • 00:58:37
    computer use it's a really really
  • 00:58:39
    interesting and Powerful way to achieve
  • 00:58:40
    all these new and interesting use cases
  • 00:58:43
    but at the end of the day from a
  • 00:58:44
    conceptual standpoint it's not something
  • 00:58:46
    that should appear terribly intimidating
  • 00:58:48
    we talked a little bit about rag
  • 00:58:50
    retrieving data externally talked about
  • 00:58:52
    that pre-processing side of things
  • 00:58:53
    breaking up our data into chunks
  • 00:58:55
    embedding we also talked about some of
  • 00:58:56
    the other interesting ideas in this
  • 00:58:58
    ecosystem from prompt caching to
  • 00:59:00
    contextual retrieval and so on you're
  • 00:59:01
    welcome to dig into that research
  • 00:59:03
    finally wrapped up quite a bit with fine
  • 00:59:06
    tuning so we got a lot of information
  • 00:59:08
    coming out here again I just want to say
  • 00:59:09
    thank you all so so much for giving me
  • 00:59:10
    some time happy to answer questions
  • 00:59:12
    stick around for a little bit and have a
  • 00:59:13
    wonderful reinvent everyone thank you
  • 00:59:14
    all so much
Tags
  • Anthropic
  • Claude modell
  • AI development
  • prompt engineering
  • tool use
  • RAG
  • fine-tuning
  • computer use
  • agentic workflows
  • model training