What analogy does Andrew use to describe AI's potential?

Andrew likens AI to electricity, emphasizing its general-purpose technology capabilities.

What is causing faster development of machine learning models?

Generative AI is enabling faster machine learning model development, allowing projects that once took months to complete to be built in weeks.

What is a major trend in AI development according to Andrew?

Agentic AI workflows are a major trend, enabling more sophisticated and iterative processes in AI applications.

How do agentic AI workflows improve productivity?

These workflows allow for iterative tasks like planning, testing, and revising, ultimately leading to better results through repeated refinement.

How is AI affecting the speed of prototyping?

AI, especially generative AI, is making prototyping faster by allowing teams to create quick models and prototypes that can be tested and iterated swiftly.

What implications does fast experimentation in AI have?

Fast experimentation allows teams to prototype multiple ideas rapidly, focusing only on those that are successful.

What challenges come with fast prototyping in AI?

Evaluations or testing (evals) can become a bottleneck when models are built quickly without sequential data collection.

How does Andrew suggest improving AI development processes?

Andrew suggests using agentic workflows with design patterns like reflection, tool use, planning, and multi-agent collaboration.

What are some benefits of incorporating multimodal inputs in AI?

Multimodal inputs allow AI to process and interpret complex data types, such as visual and textual data, leading to richer applications.

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

00:26:52

https://www.youtube.com/watch?v=KrRD7r7y7NY

摘要

TLDRAndrew discusses the transformative potential of AI, comparing it to the pervasive and versatile nature of electricity. He highlights the rapid growth of AI opportunities, stressing the significant progress in generative AI, which enables faster development of machine learning models than before. Andrew identifies agentic AI workflows as a crucial trend for the future, allowing better productivity by employing iterative tasks like planning, testing, and revisions. He explains the concept of the AI stack, which includes semiconductors, cloud infrastructure, foundational models, and applications. Fast iterations in AI limit conventional evaluations as new data emerges, influencing how models are tested and deployed. Additionally, Andrew emphasizes the potential of multimodal AI agents, particularly in processing both visual and text data, and outlines four major design patterns within agentic workflows: reflection, tool use, planning, and multi-agent collaboration. These innovations foster enhanced AI applications, perfect for experimenting with new ideas in a responsible manner. Andrew concludes by noting the evolving AI landscape and invites engagement with visual AI demos to further explore these opportunities.

心得

⚡ AI is likened to electricity for its broad applicability.
🚀 Generative AI accelerates model building, reducing timeframes significantly.
🤖 Agentic AI workflows are crucial for advanced AI development.
🌀 Iterative prototyping allows rapid testing and refinement of ideas.
📊 AI stack includes semiconductors to applications.
🔍 AI reduces traditional evaluation phases, speeding deployment.
🎨 Multimodal AI processes complex data like videos and images.
🔧 Four design patterns optimize agentic AI: reflection, tool use, planning, and collaboration.
💡 Fast experimentation leads to efficient, responsible AI innovation.
🌟 AI trends include token generation speed, structured querying, and unstructured data management.

时间轴

00:00:00 - 00:05:00
Andrew highlights the vast opportunities AI presents, drawing a parallel to electricity in terms of its broad applicability. He outlines the AI stack from semiconductors to application layers, emphasizing the need for effective application development to maximize AI technology's economic value. He points out that generative AI is expediting certain AI processes, allowing tasks that once took months to be achieved in days, thus fostering more rapid experimentation and prototype development.
00:05:00 - 00:10:00
He discusses the bottleneck caused by evaluation processes in AI development. Although machine learning model prototyping is faster, integrating the findings into reliable applications remains complex. Developers face pressure to speed up processes beyond model development. Andrew advocates for a doctrine of 'move fast and be responsible', allowing rapid prototyping without compromising safety. Agentic AI workflows, which involve iterative, step-by-step processes akin to human task management, are emphasized as the most exciting trend.
00:10:00 - 00:15:00
Andrew describes major AI workflow design patterns like reflection and planning. Reflection involves critiquing AI's output to improve it, while planning uses AI to assess and execute a sequence of actions for complex tasks. He discusses multi-agent collaboration where AI models are prompted to diversify roles, improving task performance. These patterns are proving beneficial in practical applications, offering structured ways to task complex AI models.
00:15:00 - 00:20:00
He showcases a demo involving agentic workflows for visual AI tasks, illustrating how AI can accurately process images and video data. This involves dynamic sequence planning and code generation to execute tasks, providing utility in handling large datasets. Andrew emphasizes the reduced complexity in developing such applications today and how these capabilities can extract substantial value from stored visual data.
00:20:00 - 00:26:52
Andrew concludes by touching on AI trends, stressing the rise of agentic AI. He highlights efforts to speed up token generation, the adaptation of language models for tool use, and the growing importance of data engineering for unstructured data. Andrew anticipates a forthcoming revolution in image processing, predicting substantial value extraction from visual data. He encourages builders to experiment with these new AI capabilities, underscoring a transformative phase in AI application development.

显示更多

思维导图

视频问答

What analogy does Andrew use to describe AI's potential?
Andrew likens AI to electricity, emphasizing its general-purpose technology capabilities.
What is causing faster development of machine learning models?
Generative AI is enabling faster machine learning model development, allowing projects that once took months to complete to be built in weeks.
What is a major trend in AI development according to Andrew?
Agentic AI workflows are a major trend, enabling more sophisticated and iterative processes in AI applications.
How do agentic AI workflows improve productivity?
These workflows allow for iterative tasks like planning, testing, and revising, ultimately leading to better results through repeated refinement.
What key layers make up the AI stack?
The AI stack consists of semiconductors, cloud infrastructure, foundational model trainers, and the application layer.
How is AI affecting the speed of prototyping?
AI, especially generative AI, is making prototyping faster by allowing teams to create quick models and prototypes that can be tested and iterated swiftly.
What implications does fast experimentation in AI have?
Fast experimentation allows teams to prototype multiple ideas rapidly, focusing only on those that are successful.
What challenges come with fast prototyping in AI?
Evaluations or testing (evals) can become a bottleneck when models are built quickly without sequential data collection.
How does Andrew suggest improving AI development processes?
Andrew suggests using agentic workflows with design patterns like reflection, tool use, planning, and multi-agent collaboration.
What are some benefits of incorporating multimodal inputs in AI?
Multimodal inputs allow AI to process and interpret complex data types, such as visual and textual data, leading to richer applications.

查看更多视频摘要

即时访问由人工智能支持的免费 YouTube 视频摘要！

字幕

自动滚动:

00:00:00
please welcome Andrew
00:00:02
[Applause]
00:00:13
in thank you it's such a good time to be
00:00:16
a builder I'm excited to be back here at
00:00:19
snowfake
00:00:20
build what i' like to do today is share
00:00:23
you where I think are some of ai's
00:00:25
biggest
00:00:26
opportunities you may have heard me say
00:00:28
that I think AI is the new electricity
00:00:30
that's because a has a general purpose
00:00:32
technology like electricity if I ask you
00:00:35
what is electricity good for it's always
00:00:36
hard to answer because it's good for so
00:00:38
many different things and new AI
00:00:41
technology is creating a huge set of
00:00:43
opportunities for us to build new
00:00:45
applications that weren't possible
00:00:47
before people often ask me hey Andrew
00:00:50
where are the biggest AI opportunities
00:00:52
this is what I think of as the AI stack
00:00:54
at the lowest level is the
00:00:56
semiconductors and then on top of that
00:00:58
lot of the cloud infr to including of
00:01:00
Course Snowflake and then on top of that
00:01:03
are many of the foundation model
00:01:05
trainers and models and it turns out
00:01:08
that a lot of the media hype and
00:01:10
excitement and social media Buzz has
00:01:11
been on these layers of the stack kind
00:01:13
of the new technology layers when if
00:01:16
there's a new technology like generative
00:01:17
AI L the buzz is on these technology
00:01:19
layers and there's nothing wrong with
00:01:21
that but I think that almost by
00:01:24
definition there's another layer of the
00:01:26
stack that has to work out even better
00:01:29
and that's the applic apption layer
00:01:31
because we need the applications to
00:01:32
generate even more value and even more
00:01:34
Revenue so that you know to really
00:01:36
afford to pay the technology providers
00:01:39
below so I spend a lot of my time
00:01:41
thinking about AI applications and I
00:01:43
think that's where lot of the best
00:01:45
opportunities will be to build new
00:01:48
things one of the trends that has been
00:01:51
growing for the last couple years in no
00:01:53
small pop because of generative AI is
00:01:55
fast and faster machine learning model
00:01:58
development um and in particular
00:02:01
generative AI is letting us build things
00:02:03
faster than ever before take the problem
00:02:06
of say building a sentiment cost vario
00:02:09
taking text and deciding is this a
00:02:10
positive or negative sentiment for
00:02:12
reputation monitoring say typical
00:02:14
workflow using supervised learning might
00:02:16
be that will take a month to get some
00:02:19
label data and then you know train AI
00:02:22
model that might take a few months and
00:02:24
then find a cloud service or something
00:02:27
to deploy on that'll take another few
00:02:28
months and so for a long time very
00:02:31
valuable AI systems might take good AI
00:02:33
teams six to 12 months to build right
00:02:35
and there's nothing wrong with that I
00:02:37
think many people create very valuable
00:02:39
AI systems this way but with generative
00:02:41
AI there's certain cles of applications
00:02:44
where you can write a prompt in days and
00:02:48
then deploy it in you know again maybe
00:02:51
days and what this means is there are a
00:02:53
lot of applications that used to take me
00:02:55
and used to take very good AI teams
00:02:57
months to build that today you can build
00:02:59
in maybe 10 days or so and this opens up
00:03:02
the opportunity to experiment with build
00:03:06
new prototypes and and ship new AI
00:03:09
products that's certainly the
00:03:10
prototyping aspect of it and these are
00:03:13
some of the consequences of this trend
00:03:15
which is fast experimentation is
00:03:18
becoming a more promising path to
00:03:21
invention previously if it took six
00:03:23
months to build something then you know
00:03:25
we better study it make sure there user
00:03:26
demand have product managers we look at
00:03:28
it document it and and then spend all
00:03:30
that effort to build in it hopefully it
00:03:32
turns out to be
00:03:33
worthwhile but now for fast moving AI
00:03:35
teams I see a design pattern where you
00:03:38
can say you know what it take us a
00:03:40
weekend to throw together prototype
00:03:42
let's build 20 prototypes and see what
00:03:43
SS and if 18 of them don't work out
00:03:45
we'll just stitch them and stick with
00:03:47
what works so fast iteration and fast
00:03:50
experimentation is becoming a new path
00:03:53
to inventing new user
00:03:55
experiences um one of interesting
00:03:57
implication is that evaluations or evals
00:04:00
for short are becoming a bigger
00:04:01
bottleneck for how we build things so it
00:04:04
turns out back in supervised learning
00:04:06
world if you're collecting 10,000 data
00:04:08
points anyway to trade a model then you
00:04:10
know if you needed to collect an extra
00:04:12
1,000 data points for testing it was
00:04:14
fine whereas extra 10% increase in cost
00:04:18
but for a lot of large language Mel
00:04:19
based apps if there's no need to have
00:04:21
any trading data if you made me slow
00:04:24
down to collect a thousand test examples
00:04:26
boy that seems like a huge bottleneck
00:04:28
and so the new Dev velopment workflow
00:04:30
often feels as if we're building and
00:04:32
collecting data more in parallel rather
00:04:34
than sequentially um in which we build a
00:04:37
prototype and then as it becomes import
00:04:39
more important and as robustness and
00:04:42
reliability becomes more important then
00:04:43
we gradually build up that test St here
00:04:46
in parallel but I see exciting
00:04:48
Innovations to be had still in how we
00:04:50
build evals um and then what I'm seeing
00:04:53
as well is the prototyping of machine
00:04:56
learning has become much faster but
00:04:58
building a software application has lots
00:05:00
of steps does the product work you know
00:05:02
the design work does the software
00:05:03
integration work a lot of Plumbing work
00:05:06
um then after deployment Dev Ops and L
00:05:08
Ops so some of those other pieces are
00:05:10
becoming faster but they haven't become
00:05:13
faster at the same rate that the machine
00:05:14
learning modeling pot has become faster
00:05:17
so you take a process and one piece of
00:05:19
it becomes much faster um what I'm
00:05:21
seeing is prototyping is not really
00:05:23
really fast but sometimes you take a
00:05:25
prototype into robust reliable
00:05:28
production with guard rails and so on
00:05:30
those other steps still take some time
00:05:33
but the interesting Dynamic I'm seeing
00:05:34
is the fact that the machine learning p
00:05:36
is so fast is putting a lot of pressure
00:05:38
on organizations to speed up all of
00:05:41
those other parts as well so that's been
00:05:43
exciting progress for our few and in
00:05:46
terms of how machine learning
00:05:48
development um is speeding things up I
00:05:51
think the Mantra moved fast and break
00:05:53
things got a bad rep because you know it
00:05:57
broke things um I think some people
00:06:00
interpret this to mean we shouldn't move
00:06:01
fast but I disagree with that I think
00:06:04
the better mindra is move fast and be
00:06:08
responsible I'm seeing a lot of teams
00:06:10
able to prototype quickly evaluate and
00:06:12
test robustly so without shipping
00:06:14
anything out to The Wider world that
00:06:16
could you know cause damage or cause um
00:06:18
meaningful harm I'm finding smart teams
00:06:21
able to build really quickly and move
00:06:23
really fast but also do this in a very
00:06:25
responsible way and I find this
00:06:26
exhilarating that you can build things
00:06:28
and ship things and responsible way much
00:06:30
faster than ever
00:06:32
before now there's a lot going on in Ai
00:06:35
and of all the things going on AI um in
00:06:38
terms of technical Trend the one Trend
00:06:41
I'm most excited about is agentic AI
00:06:44
workflows and so if you to ask what's
00:06:46
the one most important AI technology to
00:06:48
pay attention to I would say is agentic
00:06:50
AI um I think when I started saying this
00:06:55
you know near the beginning of this year
00:06:56
it was a bit of a controversial
00:06:58
statement but now the word AI agents has
00:07:01
is become so widely used uh by by
00:07:04
Technical and non-technical people is
00:07:06
become you know little bit of a hype
00:07:08
term uh but so let me just share with
00:07:10
you how I view AI agents and why I think
00:07:13
they're important approaching just from
00:07:15
a technical
00:07:16
perspective the way that most of us use
00:07:19
large language models today is with what
00:07:21
something is called zero shot prompting
00:07:23
and that roughly means we would ask it
00:07:25
to uh give it a prompt write an essay or
00:07:29
write an output for us and it's a bit
00:07:31
like if we're going to a person or in
00:07:33
this case going to an AI and asking it
00:07:36
to type out an essay for us by going
00:07:38
from the first word writing from the
00:07:40
first word to the last word all in one
00:07:42
go without ever using backspac just
00:07:44
right from start to finish like that and
00:07:47
it turns out people you know we don't do
00:07:49
our best writing this way uh but despite
00:07:51
the difficulty of being forced to write
00:07:53
this way a Lish models do you know not
00:07:55
bad pretty
00:07:56
well here's what an agentic workflow
00:07:59
it's like uh to gener an essay we ask an
00:08:02
AI to First write an essay outline and
00:08:04
ask you do you need to do some web
00:08:06
research if so let's download some web
00:08:07
pages and put into the context of the
00:08:09
large H model then let's write the first
00:08:11
draft and then let's read the first
00:08:12
draft and critique it and revise the
00:08:15
draft and so on and this workflow looks
00:08:17
more like um doing some thinking or some
00:08:20
research and then some revision and then
00:08:23
going back to do more thinking and more
00:08:24
research and by going round this Loop
00:08:27
over and over um it takes longer but
00:08:29
this results in a much better work
00:08:31
output so in some teams I work with we
00:08:34
apply this agentic workflow to
00:08:36
processing complex tricky legal
00:08:38
documents or to um do Health Care
00:08:41
diagnosis Assistance or to do very
00:08:43
complex compliance with government
00:08:45
paperwork so many times I'm seeing this
00:08:47
drive much better results than was ever
00:08:50
possible and one thing I'm want to focus
00:08:51
on in this presentation I'll talk about
00:08:53
later is devise of visual AI where
00:08:55
agentic repal are letting us process
00:08:58
image and video data
00:09:00
but to get back to that later um it
00:09:03
turns out that there are benchmarks that
00:09:05
show seem to show a gentic workflows
00:09:07
deliver much better results um this is
00:09:10
the human eval Benchmark which is a
00:09:12
benchmark for open AI that measures
00:09:15
learning out lar rage model's ability to
00:09:17
solve coding puzzles like this one and
00:09:20
um my team collected some data turns out
00:09:23
that um on this Benchmark I think it was
00:09:25
POS K Benchmark POS K metric GB 3.5 got
00:09:29
48% right on this coding Benchmark gb4
00:09:33
huge Improvement you know
00:09:36
67% but the improvement from GB 3.5 to
00:09:39
gbd4 is dwarf by the improvement from
00:09:42
gbt 3.5 to GB 3.5 using an agentic
00:09:46
workflow um which gets over up to about
00:09:49
95% and gbd4 with an agentic workflow
00:09:53
also does much better um and so it turns
00:09:58
out that in the way Builders built
00:10:00
agentic reasoning or agentic workflows
00:10:03
in their applications there are I want
00:10:05
to say four major design patterns which
00:10:07
are reflection two use planning and
00:10:09
multi-agent collaboration and to
00:10:12
demystify agentic workflows a little bit
00:10:14
let me quickly step through what these
00:10:16
workflows mean um and I find that
00:10:19
agentic workflows sometimes seem a
00:10:21
little bit mysterious until you actually
00:10:22
read through the code for one or two of
00:10:24
these go oh that's it you know that's
00:10:26
really cool but oh that's all it takes
00:10:28
but let me just step through
00:10:29
um to for for concreteness what
00:10:32
reflection with ls looks like so I might
00:10:36
start off uh prompting an L there a
00:10:39
coder agent l so maybe an assistant
00:10:41
message to your roles to be a coder and
00:10:43
write code um so you can tell you know
00:10:45
please write code for certain tasks and
00:10:47
the L May generate codes and then it
00:10:50
turns out that you can construct a
00:10:52
prompt that takes the code that was just
00:10:54
generated and copy paste the code back
00:10:57
into the prompt and ask it you know he
00:10:59
some code intended for a Tas examine
00:11:01
this code and critique it right and it
00:11:04
turns out you prompt the same Elum this
00:11:05
way it may sometimes um find some
00:11:09
problems with it or make some useful
00:11:12
suggestions out proofy code then you
00:11:14
prompt the same LM with the feedback and
00:11:17
ask you to improve the code and become
00:11:19
with with a new version and uh maybe
00:11:21
foreshadowing two use you can have the
00:11:23
LM run some unit tests and give the
00:11:25
feedback of the unit test back to the LM
00:11:28
then that can be additional feedback to
00:11:29
help it iterate further to further
00:11:31
improve the code and it turns out that
00:11:33
this type of reflection workflow is not
00:11:35
magic doesn't solve all problems um but
00:11:37
it will often take the Baseline level
00:11:39
performance and lift it uh to to better
00:11:43
level performance and it turns out also
00:11:46
with this type of workflow where we're
00:11:47
think of prompting an LM to critique his
00:11:49
own output use it own criticism to
00:11:51
improve it this may be also foreshadows
00:11:54
multi-agent planning or multi-agent
00:11:56
workflows where you can prompt one
00:11:58
prompt an LM to sometimes play the role
00:12:00
of a coder and sometimes prom on to play
00:12:03
the role of a CR of a Critic um to
00:12:06
review the code so such the same
00:12:08
conversation but we can prompt the LM
00:12:10
you know differently to tell sometimes
00:12:13
work on the code sometimes try to make
00:12:15
helpful suggestions and this same
00:12:17
results in improved performance so this
00:12:19
is a reflection design pattern um and
00:12:24
second major design pattern is to use uh
00:12:27
in which a lar language model can be
00:12:29
prompted to generate a request for an
00:12:31
API call to have it decide when it needs
00:12:34
to uh search the web or execute code or
00:12:37
take a the task like um issue a customer
00:12:39
refund or send an email or pull up a
00:12:41
calendar entry so to use is a major
00:12:43
design pattern that is letting large
00:12:45
language models make function calls and
00:12:47
I think this is expanding what we can do
00:12:49
with these agentic workflows um real
00:12:52
quick here's a planning or reasoning
00:12:55
design pattern in which if you were to
00:12:57
give a fairly complex request you know
00:12:58
generate image or where girls reading a
00:13:01
book and so on then an LM this example
00:13:04
adapted from the hugging GTP paper an LM
00:13:06
can look at the picture and decide to
00:13:09
first use a um open pose model to detect
00:13:12
the pose and then after that gener
00:13:14
picture of a girl um after that you'll
00:13:17
describe the image and after that use
00:13:19
sex the spe or TTS to generate the audio
00:13:21
but so in planning you an L look at a
00:13:24
complex request and pick a sequence of
00:13:27
actions execute in order to deliver on a
00:13:30
complex task um and lastly multi Asian
00:13:33
collaboration is that design pattern
00:13:35
alluded to where instead of prompting an
00:13:37
LM to just do one thing you prompt the
00:13:40
LM to play different roles at different
00:13:42
points in time so the different agents
00:13:44
simulate agents interact with each other
00:13:46
and come together to solve a task and I
00:13:49
know that some people may may wonder you
00:13:52
know if you're using one why do you need
00:13:54
to make this one play the role with
00:13:57
multip multiple agents um many teams
00:13:59
have demonstrated significant improved
00:14:02
performance for a variety of tasks using
00:14:04
this design pattern and it turns out
00:14:07
that if you have an LM sometimes
00:14:08
specialize on different tasks maybe one
00:14:10
at a time have it interact many teams
00:14:13
seem to really get much better results
00:14:14
using this I feel like maybe um there's
00:14:18
an analogy to if you're running jobs on
00:14:20
a processor on a CPU you why do we need
00:14:23
multiple processes it's all the same
00:14:25
process there you know at the end of the
00:14:27
day but we found that having multiple FS
00:14:29
of processes is a useful extraction for
00:14:31
developers to take a task and break it
00:14:33
down to subtask and I think multi-agent
00:14:35
collaboration is a bit like that too if
00:14:37
you were big task then if you think of
00:14:39
hiring a bunch of agents to do different
00:14:41
pieces of task then interact sometimes
00:14:43
that helps the developer um build
00:14:46
complex systems to deliver a good
00:14:48
result so I think with these four major
00:14:52
agentic design patterns agentic
00:14:54
reasoning workflow design patterns um it
00:14:57
gives us a huge space to play with to
00:14:59
build Rich agents to do things that
00:15:01
frankly were just not possible you know
00:15:04
even a year ago um and I want to one
00:15:08
aspect of this I'm particularly excited
00:15:10
about is the rise of not not just large
00:15:13
language model B agents but large
00:15:15
multimodal based a large multimodal
00:15:17
model based agents so um give an image
00:15:21
like this if you were wanted to uh use a
00:15:25
lmm large multimodal model you could
00:15:27
actually do zero shot PR and that's a
00:15:29
bit like telling it you know take a
00:15:31
glance at the image and just tell me the
00:15:33
output and for simple image thoughts
00:15:36
that's okay you can actually have it you
00:15:38
know look at the image and uh right give
00:15:40
you the numbers of the runners or
00:15:42
something but it turns out just as with
00:15:44
large language modelbased agents SL
00:15:46
multi modelbased model based agents can
00:15:48
do better with an itative workflow where
00:15:51
you can approach this problem step by
00:15:53
step so detect the faces detect the
00:15:55
numbers put it together and so with this
00:15:58
more irrit workflow uh you can actually
00:16:00
get an agent to do some planning testing
00:16:03
right code plan test right code and come
00:16:06
up with a most complex plan as
00:16:08
articulated expressing code to deliver
00:16:11
on more complex thoughts so what I like
00:16:14
to do is um show you a demo of some work
00:16:17
that uh Dan Malone and I and the H AI
00:16:20
team has been working on on building
00:16:22
agentic workflows for visual AI
00:16:27
tasks so if we switch to my
00:16:31
laptop
00:16:32
um let me have an image here of a uh
00:16:38
soccer game or football game and um I'm
00:16:41
going to say let's see counts the
00:16:43
players in the vi oh and just so fun if
00:16:47
you're not how to prompt it after
00:16:49
uploading an image This little light
00:16:50
bulb here you know gives some suggested
00:16:53
prompts you may ask for this uh but let
00:16:55
me run this so count players on the
00:16:57
field right and what this kicks off is a
00:17:00
process that actually runs for a couple
00:17:02
minutes um to Think Through how to write
00:17:04
code uh in order to come up a plan to
00:17:07
give an accurate result for uh counting
00:17:10
the number of players in the few this is
00:17:11
actually a little bit complex because
00:17:12
you don't want the players in the
00:17:13
background just be in the few I already
00:17:15
ran this earlier so we just jumped to
00:17:18
the result um but it says the Cod has
00:17:22
selected seven players on the field and
00:17:26
I think that should right 1 2 3 4 5 six
00:17:28
seven
00:17:30
um and if I were to zoom in to the model
00:17:33
output Now 1 2 3 4 five six seven I
00:17:37
think that's actually right and the part
00:17:39
of the output of this is that um it has
00:17:45
also generated code uh that you can run
00:17:48
over and over um actually generated
00:17:51
python code uh
00:17:54
that if you want you can run over and
00:17:56
over on the large collection of images
00:17:59
es and I think this is exciting because
00:18:01
there are a lot of companies um and
00:18:04
teams that actually have a lot of visual
00:18:06
AI data have a lot of images um have a
00:18:09
lot of videos kind of stored somewhere
00:18:12
and until now it's been really difficult
00:18:15
to get value out of this data so for a
00:18:18
lot of the you know small teams or large
00:18:20
businesses with a lot of visual data
00:18:23
visual AI capabilities like the vision
00:18:25
agent lets you take all this data
00:18:27
previously shove somewhere in BL storage
00:18:29
and and you know get real value out of
00:18:31
this I think this is a big
00:18:32
transformation for AI um here's another
00:18:35
example you know this says um given a
00:18:38
video split this another soccer game or
00:18:42
football
00:18:43
game so given video split the video
00:18:46
clips of 5 Seconds find the clip where
00:18:48
go is being scored display a frame so
00:18:50
output so Rand is already because takes
00:18:52
a little the time to run then this will
00:18:54
generate code evaluate code for a while
00:18:56
and this is the output and it says true
00:19:00
1015 so it think those a go St you know
00:19:04
around here around between
00:19:06
the right and there you go that's the go
00:19:10
and also as instructed you know
00:19:13
extracted some of the frames associated
00:19:15
with this so really useful for
00:19:17
processing um video data and maybe
00:19:21
here's one last example uh of of of the
00:19:23
vision agent which is um you can also
00:19:25
ask it FR program to split the input
00:19:27
video into small video chunks every 6
00:19:29
seconds describe each chunk andore the
00:19:32
information at Panda's data frame along
00:19:33
with clip name s and end time return the
00:19:35
Panda's data frame so this is a way to
00:19:38
look at video data that you may have and
00:19:41
generate metadata for this uh that you
00:19:44
can then store you know in snow fake or
00:19:46
somewhere uh to then build other
00:19:48
applications on top of but just to show
00:19:50
you the output of this um so you know
00:19:54
clip name start time end time and then
00:19:57
there actually written code um here
00:20:00
right wrot code that you can then run
00:20:02
elsewhere if you want uh let me put in a
00:20:03
stream the tab or something that you can
00:20:06
then use to then write a lot of you know
00:20:10
text descriptions for this um and using
00:20:15
this capability of the vision agent to
00:20:17
help write code my team at Landing AI
00:20:21
actually built this little demo app that
00:20:24
um uses code from the vision agent so
00:20:26
instead of us sing the write code have
00:20:28
the Vision agent write the code to build
00:20:30
this metadata and then um indexes a
00:20:34
bunch of videos so let's see I say
00:20:36
browsing so skar airborne right I
00:20:39
actually ran this earlier hope it works
00:20:42
so what this demo shows is um we already
00:20:45
ran the code to take the video split in
00:20:47
chunks store the metadata and then when
00:20:50
I do a search for skier Airborne you
00:20:52
know it shows the clips uh that have
00:20:55
high
00:20:57
similarity right right oh marked here
00:20:59
with the green has high similarity well
00:21:02
this is getting my heart rate out seeing
00:21:03
do that oh here's another one whoa all
00:21:08
right all right and and the green parts
00:21:11
of the timeline show where the skier is
00:21:13
Airborne let's see gray wolf at night I
00:21:18
actually find it pretty fun yeah when
00:21:20
when you have a collection of video to
00:21:22
index it and then just browse through
00:21:24
right here's a gray wolf at night and
00:21:26
this timeline in green shows what a gr
00:21:29
wolf and Knight is and if I actually
00:21:30
jump to different part of the video
00:21:33
there's a bunch of other stuff as well
00:21:35
right there that's not a g wolf at night
00:21:37
so I that's pretty cool
00:21:40
um let's see just one last example so
00:21:47
um yeah if I actually been on the road a
00:21:50
lot uh but if sear if your luggage this
00:21:53
black luggage right
00:21:56
um there this but it turns out turns out
00:21:59
there actually a lot of black Luggage So
00:22:00
if you want your luggage let's say black
00:22:02
luggage with
00:22:04
rainbow strap this there a lot of black
00:22:08
luggage out
00:22:09
there
00:22:11
then you know there right black luggage
00:22:14
with rainbow strap so a lot of fun
00:22:16
things to do um and I think the nice
00:22:18
thing about this is uh the work needed
00:22:22
to build applications like this is lower
00:22:25
than ever before so let's go back to the
00:22:27
slides
00:22:30
um
00:22:33
and in terms of AI opportunities I spoke
00:22:37
a bit about agentic workflows and um how
00:22:42
that is changing the AI stack is as
00:22:44
follows it turns out that in addition to
00:22:48
this stack I show there's actually a new
00:22:51
emerging um agentic orchestration layer
00:22:54
and there little orchestration layer
00:22:56
like L chain that been around for a
00:22:58
while that are also becoming
00:22:59
increasingly agentic through langra for
00:23:02
example and this new agentic
00:23:04
orchestration layer is also making
00:23:06
easier for developers to build
00:23:08
applications on top uh and I hope that
00:23:10
Landing ai's Vision agent is another
00:23:13
contribution to this to makes it easier
00:23:15
for you to build visual AI applications
00:23:17
to process all this image and video data
00:23:21
that possibly you had but that was
00:23:22
really hard to get value all of um until
00:23:25
until more recently so but fire when I
00:23:28
you what to think are maybe four of the
00:23:30
most important AI Trends there's a lot
00:23:32
going on on AI is impossible to
00:23:34
summarize everything in one slide if you
00:23:36
had to make me pick what's the one most
00:23:38
important Trend I would say is a gentic
00:23:40
AI but here are four of things I think
00:23:42
are worth paying attention to first um
00:23:45
turns out agentic workflows need to read
00:23:47
a lot of text or images and generate a
00:23:49
lot of text so we say that generates a
00:23:51
lot of tokens and their exciting efforts
00:23:54
to speed up token generation including
00:23:56
semiconductor work by Sova Service drop
00:23:59
and others a lot of software and other
00:24:01
types of Hardware work as well this will
00:24:02
make a gentic workflows work much better
00:24:05
second Trend I'm about excited about
00:24:07
today's large language models has
00:24:09
started off being optimized to answer
00:24:11
human questions and human generated
00:24:14
instructions things like you know why
00:24:16
did Shakespeare write mcbath or explain
00:24:18
why Shakespeare wrote Mac beath these
00:24:19
are the types of questions that L
00:24:21
langage models are often as answer on
00:24:23
the internet but agentic workflows call
00:24:25
for other operations like to use so the
00:24:28
fact that large language models are
00:24:30
often now tuned explicitly to support
00:24:32
tool use or just a couple weeks ago um
00:24:35
anthropic release a model that can
00:24:37
support computer use I think these
00:24:39
exciting developments are create a lot
00:24:41
of lift rate create a much higher
00:24:43
ceiling for what we can now get atic
00:24:45
workloads to do with L langage models
00:24:48
that tune not just to answer human
00:24:50
queries but to tune EXA explicitly to
00:24:53
fit into these erative agentic workflows
00:24:57
um third
00:24:58
data engineering's importance is rising
00:25:01
particularly with unstructured data it
00:25:03
turns out that a lot of the value of
00:25:05
machine learning was a Structure data
00:25:07
kind of tables of numbers but with geni
00:25:10
we're much better than ever before at
00:25:12
processing text and images and video and
00:25:14
maybe audio and so the importance of
00:25:17
data engineering is increasing in terms
00:25:19
of how to manage your unstructured data
00:25:21
and the metad DAT for that and
00:25:22
deployment to get the unstructured data
00:25:24
where it needs to go to create value so
00:25:26
that that would be a major effort for a
00:25:28
lot of large businesses and then lastly
00:25:31
um I think we've all seen that the text
00:25:32
processing revolution has already
00:25:34
arrived the image processing Revolution
00:25:36
is in a slightly early phase but it is
00:25:38
coming and as it comes many people many
00:25:40
businesses um will be able to get a lot
00:25:42
more value out of the visual data than
00:25:45
was possible ever before and I'm excited
00:25:48
because I think that will significantly
00:25:49
increase the space of applications we
00:25:51
can build as well so just wrap up this
00:25:56
is a great time to be a builder uh gen
00:25:59
is learning us experiment faster than
00:26:01
ever a gentic AI is expanding the set of
00:26:03
things that now possible and there just
00:26:05
so many new applications that we can now
00:26:08
build in visual AI or not in visual AI
00:26:11
that just weren't possible ever before
00:26:13
if you're interested in checking out the
00:26:15
uh visual AI demos that I ran uh please
00:26:19
go to va. landing.ai the exact demos
00:26:21
that I ran you better try out yourself
00:26:24
online and get the code and uh run code
00:26:26
yourself in your own applications so
00:26:28
with that let me say thank you all very
00:26:31
much and please also join me in
00:26:32
welcoming Elsa back onto the stage thank
00:26:34
you

标签

AI
agentic AI
generative AI
machine learning
prototyping
multimodal AI
AI applications
AI trends
AI stack
AI workflow