Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

00:26:52
https://www.youtube.com/watch?v=KrRD7r7y7NY

摘要

TLDRAndrew discusses the transformative potential of AI, comparing it to the pervasive and versatile nature of electricity. He highlights the rapid growth of AI opportunities, stressing the significant progress in generative AI, which enables faster development of machine learning models than before. Andrew identifies agentic AI workflows as a crucial trend for the future, allowing better productivity by employing iterative tasks like planning, testing, and revisions. He explains the concept of the AI stack, which includes semiconductors, cloud infrastructure, foundational models, and applications. Fast iterations in AI limit conventional evaluations as new data emerges, influencing how models are tested and deployed. Additionally, Andrew emphasizes the potential of multimodal AI agents, particularly in processing both visual and text data, and outlines four major design patterns within agentic workflows: reflection, tool use, planning, and multi-agent collaboration. These innovations foster enhanced AI applications, perfect for experimenting with new ideas in a responsible manner. Andrew concludes by noting the evolving AI landscape and invites engagement with visual AI demos to further explore these opportunities.

心得

  • ⚡ AI is likened to electricity for its broad applicability.
  • 🚀 Generative AI accelerates model building, reducing timeframes significantly.
  • 🤖 Agentic AI workflows are crucial for advanced AI development.
  • 🌀 Iterative prototyping allows rapid testing and refinement of ideas.
  • 📊 AI stack includes semiconductors to applications.
  • 🔍 AI reduces traditional evaluation phases, speeding deployment.
  • 🎨 Multimodal AI processes complex data like videos and images.
  • 🔧 Four design patterns optimize agentic AI: reflection, tool use, planning, and collaboration.
  • 💡 Fast experimentation leads to efficient, responsible AI innovation.
  • 🌟 AI trends include token generation speed, structured querying, and unstructured data management.

时间轴

  • 00:00:00 - 00:05:00

    Andrew highlights the vast opportunities AI presents, drawing a parallel to electricity in terms of its broad applicability. He outlines the AI stack from semiconductors to application layers, emphasizing the need for effective application development to maximize AI technology's economic value. He points out that generative AI is expediting certain AI processes, allowing tasks that once took months to be achieved in days, thus fostering more rapid experimentation and prototype development.

  • 00:05:00 - 00:10:00

    He discusses the bottleneck caused by evaluation processes in AI development. Although machine learning model prototyping is faster, integrating the findings into reliable applications remains complex. Developers face pressure to speed up processes beyond model development. Andrew advocates for a doctrine of 'move fast and be responsible', allowing rapid prototyping without compromising safety. Agentic AI workflows, which involve iterative, step-by-step processes akin to human task management, are emphasized as the most exciting trend.

  • 00:10:00 - 00:15:00

    Andrew describes major AI workflow design patterns like reflection and planning. Reflection involves critiquing AI's output to improve it, while planning uses AI to assess and execute a sequence of actions for complex tasks. He discusses multi-agent collaboration where AI models are prompted to diversify roles, improving task performance. These patterns are proving beneficial in practical applications, offering structured ways to task complex AI models.

  • 00:15:00 - 00:20:00

    He showcases a demo involving agentic workflows for visual AI tasks, illustrating how AI can accurately process images and video data. This involves dynamic sequence planning and code generation to execute tasks, providing utility in handling large datasets. Andrew emphasizes the reduced complexity in developing such applications today and how these capabilities can extract substantial value from stored visual data.

  • 00:20:00 - 00:26:52

    Andrew concludes by touching on AI trends, stressing the rise of agentic AI. He highlights efforts to speed up token generation, the adaptation of language models for tool use, and the growing importance of data engineering for unstructured data. Andrew anticipates a forthcoming revolution in image processing, predicting substantial value extraction from visual data. He encourages builders to experiment with these new AI capabilities, underscoring a transformative phase in AI application development.

显示更多

思维导图

视频问答

  • What analogy does Andrew use to describe AI's potential?

    Andrew likens AI to electricity, emphasizing its general-purpose technology capabilities.

  • What is causing faster development of machine learning models?

    Generative AI is enabling faster machine learning model development, allowing projects that once took months to complete to be built in weeks.

  • What is a major trend in AI development according to Andrew?

    Agentic AI workflows are a major trend, enabling more sophisticated and iterative processes in AI applications.

  • How do agentic AI workflows improve productivity?

    These workflows allow for iterative tasks like planning, testing, and revising, ultimately leading to better results through repeated refinement.

  • What key layers make up the AI stack?

    The AI stack consists of semiconductors, cloud infrastructure, foundational model trainers, and the application layer.

  • How is AI affecting the speed of prototyping?

    AI, especially generative AI, is making prototyping faster by allowing teams to create quick models and prototypes that can be tested and iterated swiftly.

  • What implications does fast experimentation in AI have?

    Fast experimentation allows teams to prototype multiple ideas rapidly, focusing only on those that are successful.

  • What challenges come with fast prototyping in AI?

    Evaluations or testing (evals) can become a bottleneck when models are built quickly without sequential data collection.

  • How does Andrew suggest improving AI development processes?

    Andrew suggests using agentic workflows with design patterns like reflection, tool use, planning, and multi-agent collaboration.

  • What are some benefits of incorporating multimodal inputs in AI?

    Multimodal inputs allow AI to process and interpret complex data types, such as visual and textual data, leading to richer applications.

查看更多视频摘要

即时访问由人工智能支持的免费 YouTube 视频摘要!
字幕
en
自动滚动:
  • 00:00:00
    please welcome Andrew
  • 00:00:02
    [Applause]
  • 00:00:13
    in thank you it's such a good time to be
  • 00:00:16
    a builder I'm excited to be back here at
  • 00:00:19
    snowfake
  • 00:00:20
    build what i' like to do today is share
  • 00:00:23
    you where I think are some of ai's
  • 00:00:25
    biggest
  • 00:00:26
    opportunities you may have heard me say
  • 00:00:28
    that I think AI is the new electricity
  • 00:00:30
    that's because a has a general purpose
  • 00:00:32
    technology like electricity if I ask you
  • 00:00:35
    what is electricity good for it's always
  • 00:00:36
    hard to answer because it's good for so
  • 00:00:38
    many different things and new AI
  • 00:00:41
    technology is creating a huge set of
  • 00:00:43
    opportunities for us to build new
  • 00:00:45
    applications that weren't possible
  • 00:00:47
    before people often ask me hey Andrew
  • 00:00:50
    where are the biggest AI opportunities
  • 00:00:52
    this is what I think of as the AI stack
  • 00:00:54
    at the lowest level is the
  • 00:00:56
    semiconductors and then on top of that
  • 00:00:58
    lot of the cloud infr to including of
  • 00:01:00
    Course Snowflake and then on top of that
  • 00:01:03
    are many of the foundation model
  • 00:01:05
    trainers and models and it turns out
  • 00:01:08
    that a lot of the media hype and
  • 00:01:10
    excitement and social media Buzz has
  • 00:01:11
    been on these layers of the stack kind
  • 00:01:13
    of the new technology layers when if
  • 00:01:16
    there's a new technology like generative
  • 00:01:17
    AI L the buzz is on these technology
  • 00:01:19
    layers and there's nothing wrong with
  • 00:01:21
    that but I think that almost by
  • 00:01:24
    definition there's another layer of the
  • 00:01:26
    stack that has to work out even better
  • 00:01:29
    and that's the applic apption layer
  • 00:01:31
    because we need the applications to
  • 00:01:32
    generate even more value and even more
  • 00:01:34
    Revenue so that you know to really
  • 00:01:36
    afford to pay the technology providers
  • 00:01:39
    below so I spend a lot of my time
  • 00:01:41
    thinking about AI applications and I
  • 00:01:43
    think that's where lot of the best
  • 00:01:45
    opportunities will be to build new
  • 00:01:48
    things one of the trends that has been
  • 00:01:51
    growing for the last couple years in no
  • 00:01:53
    small pop because of generative AI is
  • 00:01:55
    fast and faster machine learning model
  • 00:01:58
    development um and in particular
  • 00:02:01
    generative AI is letting us build things
  • 00:02:03
    faster than ever before take the problem
  • 00:02:06
    of say building a sentiment cost vario
  • 00:02:09
    taking text and deciding is this a
  • 00:02:10
    positive or negative sentiment for
  • 00:02:12
    reputation monitoring say typical
  • 00:02:14
    workflow using supervised learning might
  • 00:02:16
    be that will take a month to get some
  • 00:02:19
    label data and then you know train AI
  • 00:02:22
    model that might take a few months and
  • 00:02:24
    then find a cloud service or something
  • 00:02:27
    to deploy on that'll take another few
  • 00:02:28
    months and so for a long time very
  • 00:02:31
    valuable AI systems might take good AI
  • 00:02:33
    teams six to 12 months to build right
  • 00:02:35
    and there's nothing wrong with that I
  • 00:02:37
    think many people create very valuable
  • 00:02:39
    AI systems this way but with generative
  • 00:02:41
    AI there's certain cles of applications
  • 00:02:44
    where you can write a prompt in days and
  • 00:02:48
    then deploy it in you know again maybe
  • 00:02:51
    days and what this means is there are a
  • 00:02:53
    lot of applications that used to take me
  • 00:02:55
    and used to take very good AI teams
  • 00:02:57
    months to build that today you can build
  • 00:02:59
    in maybe 10 days or so and this opens up
  • 00:03:02
    the opportunity to experiment with build
  • 00:03:06
    new prototypes and and ship new AI
  • 00:03:09
    products that's certainly the
  • 00:03:10
    prototyping aspect of it and these are
  • 00:03:13
    some of the consequences of this trend
  • 00:03:15
    which is fast experimentation is
  • 00:03:18
    becoming a more promising path to
  • 00:03:21
    invention previously if it took six
  • 00:03:23
    months to build something then you know
  • 00:03:25
    we better study it make sure there user
  • 00:03:26
    demand have product managers we look at
  • 00:03:28
    it document it and and then spend all
  • 00:03:30
    that effort to build in it hopefully it
  • 00:03:32
    turns out to be
  • 00:03:33
    worthwhile but now for fast moving AI
  • 00:03:35
    teams I see a design pattern where you
  • 00:03:38
    can say you know what it take us a
  • 00:03:40
    weekend to throw together prototype
  • 00:03:42
    let's build 20 prototypes and see what
  • 00:03:43
    SS and if 18 of them don't work out
  • 00:03:45
    we'll just stitch them and stick with
  • 00:03:47
    what works so fast iteration and fast
  • 00:03:50
    experimentation is becoming a new path
  • 00:03:53
    to inventing new user
  • 00:03:55
    experiences um one of interesting
  • 00:03:57
    implication is that evaluations or evals
  • 00:04:00
    for short are becoming a bigger
  • 00:04:01
    bottleneck for how we build things so it
  • 00:04:04
    turns out back in supervised learning
  • 00:04:06
    world if you're collecting 10,000 data
  • 00:04:08
    points anyway to trade a model then you
  • 00:04:10
    know if you needed to collect an extra
  • 00:04:12
    1,000 data points for testing it was
  • 00:04:14
    fine whereas extra 10% increase in cost
  • 00:04:18
    but for a lot of large language Mel
  • 00:04:19
    based apps if there's no need to have
  • 00:04:21
    any trading data if you made me slow
  • 00:04:24
    down to collect a thousand test examples
  • 00:04:26
    boy that seems like a huge bottleneck
  • 00:04:28
    and so the new Dev velopment workflow
  • 00:04:30
    often feels as if we're building and
  • 00:04:32
    collecting data more in parallel rather
  • 00:04:34
    than sequentially um in which we build a
  • 00:04:37
    prototype and then as it becomes import
  • 00:04:39
    more important and as robustness and
  • 00:04:42
    reliability becomes more important then
  • 00:04:43
    we gradually build up that test St here
  • 00:04:46
    in parallel but I see exciting
  • 00:04:48
    Innovations to be had still in how we
  • 00:04:50
    build evals um and then what I'm seeing
  • 00:04:53
    as well is the prototyping of machine
  • 00:04:56
    learning has become much faster but
  • 00:04:58
    building a software application has lots
  • 00:05:00
    of steps does the product work you know
  • 00:05:02
    the design work does the software
  • 00:05:03
    integration work a lot of Plumbing work
  • 00:05:06
    um then after deployment Dev Ops and L
  • 00:05:08
    Ops so some of those other pieces are
  • 00:05:10
    becoming faster but they haven't become
  • 00:05:13
    faster at the same rate that the machine
  • 00:05:14
    learning modeling pot has become faster
  • 00:05:17
    so you take a process and one piece of
  • 00:05:19
    it becomes much faster um what I'm
  • 00:05:21
    seeing is prototyping is not really
  • 00:05:23
    really fast but sometimes you take a
  • 00:05:25
    prototype into robust reliable
  • 00:05:28
    production with guard rails and so on
  • 00:05:30
    those other steps still take some time
  • 00:05:33
    but the interesting Dynamic I'm seeing
  • 00:05:34
    is the fact that the machine learning p
  • 00:05:36
    is so fast is putting a lot of pressure
  • 00:05:38
    on organizations to speed up all of
  • 00:05:41
    those other parts as well so that's been
  • 00:05:43
    exciting progress for our few and in
  • 00:05:46
    terms of how machine learning
  • 00:05:48
    development um is speeding things up I
  • 00:05:51
    think the Mantra moved fast and break
  • 00:05:53
    things got a bad rep because you know it
  • 00:05:57
    broke things um I think some people
  • 00:06:00
    interpret this to mean we shouldn't move
  • 00:06:01
    fast but I disagree with that I think
  • 00:06:04
    the better mindra is move fast and be
  • 00:06:08
    responsible I'm seeing a lot of teams
  • 00:06:10
    able to prototype quickly evaluate and
  • 00:06:12
    test robustly so without shipping
  • 00:06:14
    anything out to The Wider world that
  • 00:06:16
    could you know cause damage or cause um
  • 00:06:18
    meaningful harm I'm finding smart teams
  • 00:06:21
    able to build really quickly and move
  • 00:06:23
    really fast but also do this in a very
  • 00:06:25
    responsible way and I find this
  • 00:06:26
    exhilarating that you can build things
  • 00:06:28
    and ship things and responsible way much
  • 00:06:30
    faster than ever
  • 00:06:32
    before now there's a lot going on in Ai
  • 00:06:35
    and of all the things going on AI um in
  • 00:06:38
    terms of technical Trend the one Trend
  • 00:06:41
    I'm most excited about is agentic AI
  • 00:06:44
    workflows and so if you to ask what's
  • 00:06:46
    the one most important AI technology to
  • 00:06:48
    pay attention to I would say is agentic
  • 00:06:50
    AI um I think when I started saying this
  • 00:06:55
    you know near the beginning of this year
  • 00:06:56
    it was a bit of a controversial
  • 00:06:58
    statement but now the word AI agents has
  • 00:07:01
    is become so widely used uh by by
  • 00:07:04
    Technical and non-technical people is
  • 00:07:06
    become you know little bit of a hype
  • 00:07:08
    term uh but so let me just share with
  • 00:07:10
    you how I view AI agents and why I think
  • 00:07:13
    they're important approaching just from
  • 00:07:15
    a technical
  • 00:07:16
    perspective the way that most of us use
  • 00:07:19
    large language models today is with what
  • 00:07:21
    something is called zero shot prompting
  • 00:07:23
    and that roughly means we would ask it
  • 00:07:25
    to uh give it a prompt write an essay or
  • 00:07:29
    write an output for us and it's a bit
  • 00:07:31
    like if we're going to a person or in
  • 00:07:33
    this case going to an AI and asking it
  • 00:07:36
    to type out an essay for us by going
  • 00:07:38
    from the first word writing from the
  • 00:07:40
    first word to the last word all in one
  • 00:07:42
    go without ever using backspac just
  • 00:07:44
    right from start to finish like that and
  • 00:07:47
    it turns out people you know we don't do
  • 00:07:49
    our best writing this way uh but despite
  • 00:07:51
    the difficulty of being forced to write
  • 00:07:53
    this way a Lish models do you know not
  • 00:07:55
    bad pretty
  • 00:07:56
    well here's what an agentic workflow
  • 00:07:59
    it's like uh to gener an essay we ask an
  • 00:08:02
    AI to First write an essay outline and
  • 00:08:04
    ask you do you need to do some web
  • 00:08:06
    research if so let's download some web
  • 00:08:07
    pages and put into the context of the
  • 00:08:09
    large H model then let's write the first
  • 00:08:11
    draft and then let's read the first
  • 00:08:12
    draft and critique it and revise the
  • 00:08:15
    draft and so on and this workflow looks
  • 00:08:17
    more like um doing some thinking or some
  • 00:08:20
    research and then some revision and then
  • 00:08:23
    going back to do more thinking and more
  • 00:08:24
    research and by going round this Loop
  • 00:08:27
    over and over um it takes longer but
  • 00:08:29
    this results in a much better work
  • 00:08:31
    output so in some teams I work with we
  • 00:08:34
    apply this agentic workflow to
  • 00:08:36
    processing complex tricky legal
  • 00:08:38
    documents or to um do Health Care
  • 00:08:41
    diagnosis Assistance or to do very
  • 00:08:43
    complex compliance with government
  • 00:08:45
    paperwork so many times I'm seeing this
  • 00:08:47
    drive much better results than was ever
  • 00:08:50
    possible and one thing I'm want to focus
  • 00:08:51
    on in this presentation I'll talk about
  • 00:08:53
    later is devise of visual AI where
  • 00:08:55
    agentic repal are letting us process
  • 00:08:58
    image and video data
  • 00:09:00
    but to get back to that later um it
  • 00:09:03
    turns out that there are benchmarks that
  • 00:09:05
    show seem to show a gentic workflows
  • 00:09:07
    deliver much better results um this is
  • 00:09:10
    the human eval Benchmark which is a
  • 00:09:12
    benchmark for open AI that measures
  • 00:09:15
    learning out lar rage model's ability to
  • 00:09:17
    solve coding puzzles like this one and
  • 00:09:20
    um my team collected some data turns out
  • 00:09:23
    that um on this Benchmark I think it was
  • 00:09:25
    POS K Benchmark POS K metric GB 3.5 got
  • 00:09:29
    48% right on this coding Benchmark gb4
  • 00:09:33
    huge Improvement you know
  • 00:09:36
    67% but the improvement from GB 3.5 to
  • 00:09:39
    gbd4 is dwarf by the improvement from
  • 00:09:42
    gbt 3.5 to GB 3.5 using an agentic
  • 00:09:46
    workflow um which gets over up to about
  • 00:09:49
    95% and gbd4 with an agentic workflow
  • 00:09:53
    also does much better um and so it turns
  • 00:09:58
    out that in the way Builders built
  • 00:10:00
    agentic reasoning or agentic workflows
  • 00:10:03
    in their applications there are I want
  • 00:10:05
    to say four major design patterns which
  • 00:10:07
    are reflection two use planning and
  • 00:10:09
    multi-agent collaboration and to
  • 00:10:12
    demystify agentic workflows a little bit
  • 00:10:14
    let me quickly step through what these
  • 00:10:16
    workflows mean um and I find that
  • 00:10:19
    agentic workflows sometimes seem a
  • 00:10:21
    little bit mysterious until you actually
  • 00:10:22
    read through the code for one or two of
  • 00:10:24
    these go oh that's it you know that's
  • 00:10:26
    really cool but oh that's all it takes
  • 00:10:28
    but let me just step through
  • 00:10:29
    um to for for concreteness what
  • 00:10:32
    reflection with ls looks like so I might
  • 00:10:36
    start off uh prompting an L there a
  • 00:10:39
    coder agent l so maybe an assistant
  • 00:10:41
    message to your roles to be a coder and
  • 00:10:43
    write code um so you can tell you know
  • 00:10:45
    please write code for certain tasks and
  • 00:10:47
    the L May generate codes and then it
  • 00:10:50
    turns out that you can construct a
  • 00:10:52
    prompt that takes the code that was just
  • 00:10:54
    generated and copy paste the code back
  • 00:10:57
    into the prompt and ask it you know he
  • 00:10:59
    some code intended for a Tas examine
  • 00:11:01
    this code and critique it right and it
  • 00:11:04
    turns out you prompt the same Elum this
  • 00:11:05
    way it may sometimes um find some
  • 00:11:09
    problems with it or make some useful
  • 00:11:12
    suggestions out proofy code then you
  • 00:11:14
    prompt the same LM with the feedback and
  • 00:11:17
    ask you to improve the code and become
  • 00:11:19
    with with a new version and uh maybe
  • 00:11:21
    foreshadowing two use you can have the
  • 00:11:23
    LM run some unit tests and give the
  • 00:11:25
    feedback of the unit test back to the LM
  • 00:11:28
    then that can be additional feedback to
  • 00:11:29
    help it iterate further to further
  • 00:11:31
    improve the code and it turns out that
  • 00:11:33
    this type of reflection workflow is not
  • 00:11:35
    magic doesn't solve all problems um but
  • 00:11:37
    it will often take the Baseline level
  • 00:11:39
    performance and lift it uh to to better
  • 00:11:43
    level performance and it turns out also
  • 00:11:46
    with this type of workflow where we're
  • 00:11:47
    think of prompting an LM to critique his
  • 00:11:49
    own output use it own criticism to
  • 00:11:51
    improve it this may be also foreshadows
  • 00:11:54
    multi-agent planning or multi-agent
  • 00:11:56
    workflows where you can prompt one
  • 00:11:58
    prompt an LM to sometimes play the role
  • 00:12:00
    of a coder and sometimes prom on to play
  • 00:12:03
    the role of a CR of a Critic um to
  • 00:12:06
    review the code so such the same
  • 00:12:08
    conversation but we can prompt the LM
  • 00:12:10
    you know differently to tell sometimes
  • 00:12:13
    work on the code sometimes try to make
  • 00:12:15
    helpful suggestions and this same
  • 00:12:17
    results in improved performance so this
  • 00:12:19
    is a reflection design pattern um and
  • 00:12:24
    second major design pattern is to use uh
  • 00:12:27
    in which a lar language model can be
  • 00:12:29
    prompted to generate a request for an
  • 00:12:31
    API call to have it decide when it needs
  • 00:12:34
    to uh search the web or execute code or
  • 00:12:37
    take a the task like um issue a customer
  • 00:12:39
    refund or send an email or pull up a
  • 00:12:41
    calendar entry so to use is a major
  • 00:12:43
    design pattern that is letting large
  • 00:12:45
    language models make function calls and
  • 00:12:47
    I think this is expanding what we can do
  • 00:12:49
    with these agentic workflows um real
  • 00:12:52
    quick here's a planning or reasoning
  • 00:12:55
    design pattern in which if you were to
  • 00:12:57
    give a fairly complex request you know
  • 00:12:58
    generate image or where girls reading a
  • 00:13:01
    book and so on then an LM this example
  • 00:13:04
    adapted from the hugging GTP paper an LM
  • 00:13:06
    can look at the picture and decide to
  • 00:13:09
    first use a um open pose model to detect
  • 00:13:12
    the pose and then after that gener
  • 00:13:14
    picture of a girl um after that you'll
  • 00:13:17
    describe the image and after that use
  • 00:13:19
    sex the spe or TTS to generate the audio
  • 00:13:21
    but so in planning you an L look at a
  • 00:13:24
    complex request and pick a sequence of
  • 00:13:27
    actions execute in order to deliver on a
  • 00:13:30
    complex task um and lastly multi Asian
  • 00:13:33
    collaboration is that design pattern
  • 00:13:35
    alluded to where instead of prompting an
  • 00:13:37
    LM to just do one thing you prompt the
  • 00:13:40
    LM to play different roles at different
  • 00:13:42
    points in time so the different agents
  • 00:13:44
    simulate agents interact with each other
  • 00:13:46
    and come together to solve a task and I
  • 00:13:49
    know that some people may may wonder you
  • 00:13:52
    know if you're using one why do you need
  • 00:13:54
    to make this one play the role with
  • 00:13:57
    multip multiple agents um many teams
  • 00:13:59
    have demonstrated significant improved
  • 00:14:02
    performance for a variety of tasks using
  • 00:14:04
    this design pattern and it turns out
  • 00:14:07
    that if you have an LM sometimes
  • 00:14:08
    specialize on different tasks maybe one
  • 00:14:10
    at a time have it interact many teams
  • 00:14:13
    seem to really get much better results
  • 00:14:14
    using this I feel like maybe um there's
  • 00:14:18
    an analogy to if you're running jobs on
  • 00:14:20
    a processor on a CPU you why do we need
  • 00:14:23
    multiple processes it's all the same
  • 00:14:25
    process there you know at the end of the
  • 00:14:27
    day but we found that having multiple FS
  • 00:14:29
    of processes is a useful extraction for
  • 00:14:31
    developers to take a task and break it
  • 00:14:33
    down to subtask and I think multi-agent
  • 00:14:35
    collaboration is a bit like that too if
  • 00:14:37
    you were big task then if you think of
  • 00:14:39
    hiring a bunch of agents to do different
  • 00:14:41
    pieces of task then interact sometimes
  • 00:14:43
    that helps the developer um build
  • 00:14:46
    complex systems to deliver a good
  • 00:14:48
    result so I think with these four major
  • 00:14:52
    agentic design patterns agentic
  • 00:14:54
    reasoning workflow design patterns um it
  • 00:14:57
    gives us a huge space to play with to
  • 00:14:59
    build Rich agents to do things that
  • 00:15:01
    frankly were just not possible you know
  • 00:15:04
    even a year ago um and I want to one
  • 00:15:08
    aspect of this I'm particularly excited
  • 00:15:10
    about is the rise of not not just large
  • 00:15:13
    language model B agents but large
  • 00:15:15
    multimodal based a large multimodal
  • 00:15:17
    model based agents so um give an image
  • 00:15:21
    like this if you were wanted to uh use a
  • 00:15:25
    lmm large multimodal model you could
  • 00:15:27
    actually do zero shot PR and that's a
  • 00:15:29
    bit like telling it you know take a
  • 00:15:31
    glance at the image and just tell me the
  • 00:15:33
    output and for simple image thoughts
  • 00:15:36
    that's okay you can actually have it you
  • 00:15:38
    know look at the image and uh right give
  • 00:15:40
    you the numbers of the runners or
  • 00:15:42
    something but it turns out just as with
  • 00:15:44
    large language modelbased agents SL
  • 00:15:46
    multi modelbased model based agents can
  • 00:15:48
    do better with an itative workflow where
  • 00:15:51
    you can approach this problem step by
  • 00:15:53
    step so detect the faces detect the
  • 00:15:55
    numbers put it together and so with this
  • 00:15:58
    more irrit workflow uh you can actually
  • 00:16:00
    get an agent to do some planning testing
  • 00:16:03
    right code plan test right code and come
  • 00:16:06
    up with a most complex plan as
  • 00:16:08
    articulated expressing code to deliver
  • 00:16:11
    on more complex thoughts so what I like
  • 00:16:14
    to do is um show you a demo of some work
  • 00:16:17
    that uh Dan Malone and I and the H AI
  • 00:16:20
    team has been working on on building
  • 00:16:22
    agentic workflows for visual AI
  • 00:16:27
    tasks so if we switch to my
  • 00:16:31
    laptop
  • 00:16:32
    um let me have an image here of a uh
  • 00:16:38
    soccer game or football game and um I'm
  • 00:16:41
    going to say let's see counts the
  • 00:16:43
    players in the vi oh and just so fun if
  • 00:16:47
    you're not how to prompt it after
  • 00:16:49
    uploading an image This little light
  • 00:16:50
    bulb here you know gives some suggested
  • 00:16:53
    prompts you may ask for this uh but let
  • 00:16:55
    me run this so count players on the
  • 00:16:57
    field right and what this kicks off is a
  • 00:17:00
    process that actually runs for a couple
  • 00:17:02
    minutes um to Think Through how to write
  • 00:17:04
    code uh in order to come up a plan to
  • 00:17:07
    give an accurate result for uh counting
  • 00:17:10
    the number of players in the few this is
  • 00:17:11
    actually a little bit complex because
  • 00:17:12
    you don't want the players in the
  • 00:17:13
    background just be in the few I already
  • 00:17:15
    ran this earlier so we just jumped to
  • 00:17:18
    the result um but it says the Cod has
  • 00:17:22
    selected seven players on the field and
  • 00:17:26
    I think that should right 1 2 3 4 5 six
  • 00:17:28
    seven
  • 00:17:30
    um and if I were to zoom in to the model
  • 00:17:33
    output Now 1 2 3 4 five six seven I
  • 00:17:37
    think that's actually right and the part
  • 00:17:39
    of the output of this is that um it has
  • 00:17:45
    also generated code uh that you can run
  • 00:17:48
    over and over um actually generated
  • 00:17:51
    python code uh
  • 00:17:54
    that if you want you can run over and
  • 00:17:56
    over on the large collection of images
  • 00:17:59
    es and I think this is exciting because
  • 00:18:01
    there are a lot of companies um and
  • 00:18:04
    teams that actually have a lot of visual
  • 00:18:06
    AI data have a lot of images um have a
  • 00:18:09
    lot of videos kind of stored somewhere
  • 00:18:12
    and until now it's been really difficult
  • 00:18:15
    to get value out of this data so for a
  • 00:18:18
    lot of the you know small teams or large
  • 00:18:20
    businesses with a lot of visual data
  • 00:18:23
    visual AI capabilities like the vision
  • 00:18:25
    agent lets you take all this data
  • 00:18:27
    previously shove somewhere in BL storage
  • 00:18:29
    and and you know get real value out of
  • 00:18:31
    this I think this is a big
  • 00:18:32
    transformation for AI um here's another
  • 00:18:35
    example you know this says um given a
  • 00:18:38
    video split this another soccer game or
  • 00:18:42
    football
  • 00:18:43
    game so given video split the video
  • 00:18:46
    clips of 5 Seconds find the clip where
  • 00:18:48
    go is being scored display a frame so
  • 00:18:50
    output so Rand is already because takes
  • 00:18:52
    a little the time to run then this will
  • 00:18:54
    generate code evaluate code for a while
  • 00:18:56
    and this is the output and it says true
  • 00:19:00
    1015 so it think those a go St you know
  • 00:19:04
    around here around between
  • 00:19:06
    the right and there you go that's the go
  • 00:19:10
    and also as instructed you know
  • 00:19:13
    extracted some of the frames associated
  • 00:19:15
    with this so really useful for
  • 00:19:17
    processing um video data and maybe
  • 00:19:21
    here's one last example uh of of of the
  • 00:19:23
    vision agent which is um you can also
  • 00:19:25
    ask it FR program to split the input
  • 00:19:27
    video into small video chunks every 6
  • 00:19:29
    seconds describe each chunk andore the
  • 00:19:32
    information at Panda's data frame along
  • 00:19:33
    with clip name s and end time return the
  • 00:19:35
    Panda's data frame so this is a way to
  • 00:19:38
    look at video data that you may have and
  • 00:19:41
    generate metadata for this uh that you
  • 00:19:44
    can then store you know in snow fake or
  • 00:19:46
    somewhere uh to then build other
  • 00:19:48
    applications on top of but just to show
  • 00:19:50
    you the output of this um so you know
  • 00:19:54
    clip name start time end time and then
  • 00:19:57
    there actually written code um here
  • 00:20:00
    right wrot code that you can then run
  • 00:20:02
    elsewhere if you want uh let me put in a
  • 00:20:03
    stream the tab or something that you can
  • 00:20:06
    then use to then write a lot of you know
  • 00:20:10
    text descriptions for this um and using
  • 00:20:15
    this capability of the vision agent to
  • 00:20:17
    help write code my team at Landing AI
  • 00:20:21
    actually built this little demo app that
  • 00:20:24
    um uses code from the vision agent so
  • 00:20:26
    instead of us sing the write code have
  • 00:20:28
    the Vision agent write the code to build
  • 00:20:30
    this metadata and then um indexes a
  • 00:20:34
    bunch of videos so let's see I say
  • 00:20:36
    browsing so skar airborne right I
  • 00:20:39
    actually ran this earlier hope it works
  • 00:20:42
    so what this demo shows is um we already
  • 00:20:45
    ran the code to take the video split in
  • 00:20:47
    chunks store the metadata and then when
  • 00:20:50
    I do a search for skier Airborne you
  • 00:20:52
    know it shows the clips uh that have
  • 00:20:55
    high
  • 00:20:57
    similarity right right oh marked here
  • 00:20:59
    with the green has high similarity well
  • 00:21:02
    this is getting my heart rate out seeing
  • 00:21:03
    do that oh here's another one whoa all
  • 00:21:08
    right all right and and the green parts
  • 00:21:11
    of the timeline show where the skier is
  • 00:21:13
    Airborne let's see gray wolf at night I
  • 00:21:18
    actually find it pretty fun yeah when
  • 00:21:20
    when you have a collection of video to
  • 00:21:22
    index it and then just browse through
  • 00:21:24
    right here's a gray wolf at night and
  • 00:21:26
    this timeline in green shows what a gr
  • 00:21:29
    wolf and Knight is and if I actually
  • 00:21:30
    jump to different part of the video
  • 00:21:33
    there's a bunch of other stuff as well
  • 00:21:35
    right there that's not a g wolf at night
  • 00:21:37
    so I that's pretty cool
  • 00:21:40
    um let's see just one last example so
  • 00:21:47
    um yeah if I actually been on the road a
  • 00:21:50
    lot uh but if sear if your luggage this
  • 00:21:53
    black luggage right
  • 00:21:56
    um there this but it turns out turns out
  • 00:21:59
    there actually a lot of black Luggage So
  • 00:22:00
    if you want your luggage let's say black
  • 00:22:02
    luggage with
  • 00:22:04
    rainbow strap this there a lot of black
  • 00:22:08
    luggage out
  • 00:22:09
    there
  • 00:22:11
    then you know there right black luggage
  • 00:22:14
    with rainbow strap so a lot of fun
  • 00:22:16
    things to do um and I think the nice
  • 00:22:18
    thing about this is uh the work needed
  • 00:22:22
    to build applications like this is lower
  • 00:22:25
    than ever before so let's go back to the
  • 00:22:27
    slides
  • 00:22:30
    um
  • 00:22:33
    and in terms of AI opportunities I spoke
  • 00:22:37
    a bit about agentic workflows and um how
  • 00:22:42
    that is changing the AI stack is as
  • 00:22:44
    follows it turns out that in addition to
  • 00:22:48
    this stack I show there's actually a new
  • 00:22:51
    emerging um agentic orchestration layer
  • 00:22:54
    and there little orchestration layer
  • 00:22:56
    like L chain that been around for a
  • 00:22:58
    while that are also becoming
  • 00:22:59
    increasingly agentic through langra for
  • 00:23:02
    example and this new agentic
  • 00:23:04
    orchestration layer is also making
  • 00:23:06
    easier for developers to build
  • 00:23:08
    applications on top uh and I hope that
  • 00:23:10
    Landing ai's Vision agent is another
  • 00:23:13
    contribution to this to makes it easier
  • 00:23:15
    for you to build visual AI applications
  • 00:23:17
    to process all this image and video data
  • 00:23:21
    that possibly you had but that was
  • 00:23:22
    really hard to get value all of um until
  • 00:23:25
    until more recently so but fire when I
  • 00:23:28
    you what to think are maybe four of the
  • 00:23:30
    most important AI Trends there's a lot
  • 00:23:32
    going on on AI is impossible to
  • 00:23:34
    summarize everything in one slide if you
  • 00:23:36
    had to make me pick what's the one most
  • 00:23:38
    important Trend I would say is a gentic
  • 00:23:40
    AI but here are four of things I think
  • 00:23:42
    are worth paying attention to first um
  • 00:23:45
    turns out agentic workflows need to read
  • 00:23:47
    a lot of text or images and generate a
  • 00:23:49
    lot of text so we say that generates a
  • 00:23:51
    lot of tokens and their exciting efforts
  • 00:23:54
    to speed up token generation including
  • 00:23:56
    semiconductor work by Sova Service drop
  • 00:23:59
    and others a lot of software and other
  • 00:24:01
    types of Hardware work as well this will
  • 00:24:02
    make a gentic workflows work much better
  • 00:24:05
    second Trend I'm about excited about
  • 00:24:07
    today's large language models has
  • 00:24:09
    started off being optimized to answer
  • 00:24:11
    human questions and human generated
  • 00:24:14
    instructions things like you know why
  • 00:24:16
    did Shakespeare write mcbath or explain
  • 00:24:18
    why Shakespeare wrote Mac beath these
  • 00:24:19
    are the types of questions that L
  • 00:24:21
    langage models are often as answer on
  • 00:24:23
    the internet but agentic workflows call
  • 00:24:25
    for other operations like to use so the
  • 00:24:28
    fact that large language models are
  • 00:24:30
    often now tuned explicitly to support
  • 00:24:32
    tool use or just a couple weeks ago um
  • 00:24:35
    anthropic release a model that can
  • 00:24:37
    support computer use I think these
  • 00:24:39
    exciting developments are create a lot
  • 00:24:41
    of lift rate create a much higher
  • 00:24:43
    ceiling for what we can now get atic
  • 00:24:45
    workloads to do with L langage models
  • 00:24:48
    that tune not just to answer human
  • 00:24:50
    queries but to tune EXA explicitly to
  • 00:24:53
    fit into these erative agentic workflows
  • 00:24:57
    um third
  • 00:24:58
    data engineering's importance is rising
  • 00:25:01
    particularly with unstructured data it
  • 00:25:03
    turns out that a lot of the value of
  • 00:25:05
    machine learning was a Structure data
  • 00:25:07
    kind of tables of numbers but with geni
  • 00:25:10
    we're much better than ever before at
  • 00:25:12
    processing text and images and video and
  • 00:25:14
    maybe audio and so the importance of
  • 00:25:17
    data engineering is increasing in terms
  • 00:25:19
    of how to manage your unstructured data
  • 00:25:21
    and the metad DAT for that and
  • 00:25:22
    deployment to get the unstructured data
  • 00:25:24
    where it needs to go to create value so
  • 00:25:26
    that that would be a major effort for a
  • 00:25:28
    lot of large businesses and then lastly
  • 00:25:31
    um I think we've all seen that the text
  • 00:25:32
    processing revolution has already
  • 00:25:34
    arrived the image processing Revolution
  • 00:25:36
    is in a slightly early phase but it is
  • 00:25:38
    coming and as it comes many people many
  • 00:25:40
    businesses um will be able to get a lot
  • 00:25:42
    more value out of the visual data than
  • 00:25:45
    was possible ever before and I'm excited
  • 00:25:48
    because I think that will significantly
  • 00:25:49
    increase the space of applications we
  • 00:25:51
    can build as well so just wrap up this
  • 00:25:56
    is a great time to be a builder uh gen
  • 00:25:59
    is learning us experiment faster than
  • 00:26:01
    ever a gentic AI is expanding the set of
  • 00:26:03
    things that now possible and there just
  • 00:26:05
    so many new applications that we can now
  • 00:26:08
    build in visual AI or not in visual AI
  • 00:26:11
    that just weren't possible ever before
  • 00:26:13
    if you're interested in checking out the
  • 00:26:15
    uh visual AI demos that I ran uh please
  • 00:26:19
    go to va. landing.ai the exact demos
  • 00:26:21
    that I ran you better try out yourself
  • 00:26:24
    online and get the code and uh run code
  • 00:26:26
    yourself in your own applications so
  • 00:26:28
    with that let me say thank you all very
  • 00:26:31
    much and please also join me in
  • 00:26:32
    welcoming Elsa back onto the stage thank
  • 00:26:34
    you
标签
  • AI
  • agentic AI
  • generative AI
  • machine learning
  • prototyping
  • multimodal AI
  • AI applications
  • AI trends
  • AI stack
  • AI workflow