00:00:07
hey everyone I'm Kevin and I lead
00:00:09
product at open aai today we're here to
00:00:11
talk developers and agents and in
00:00:14
particular we're excited to launch a
00:00:16
bunch of new tools that make it easy for
00:00:18
developers to build reliable and useful
00:00:20
agents now when we say agent we mean A
00:00:23
system that can act independently to do
00:00:26
tasks on your behalf and we've launched
00:00:28
two agents this year in chat PT the
00:00:30
first is uh operator which can browse
00:00:33
the web and do things for you on the web
00:00:36
the second is deep research which can uh
00:00:39
create detailed reports for you on any
00:00:41
topic you want so you give it a topic
00:00:44
and it can go off and do what might be a
00:00:46
week's worth of research for you and
00:00:48
come back with an answer in 15 minutes
00:00:50
now the feedback for those has been
00:00:51
fantastic but we want to Now launch
00:00:55
those tools and more in the API to
00:00:58
developers so we've spent the last
00:01:00
couple months going around talking to
00:01:01
developers all over the world about how
00:01:03
we can make it easy for them to build
00:01:04
agents and what we've heard is that the
00:01:06
models are ready so with Advanced
00:01:09
reasoning with multimodal understanding
00:01:12
our models can now do the kind of
00:01:14
complex multi-step workflows that agents
00:01:16
need but on the other hand developers
00:01:19
feel like they're having to Cobble
00:01:21
together different low-level apis from
00:01:23
different sources it's difficult it's
00:01:26
slow it often feels brittle So today
00:01:28
we're really excited to bring that
00:01:30
together into a series of tools uh and
00:01:33
and a new API and an open source SDK to
00:01:36
make this a lot easier so with that let
00:01:39
me introduce the team yeah hi I'm Elan
00:01:42
I'm an engineer on the developer
00:01:43
experience team I'm Steve I'm an
00:01:45
engineer on the API team and I'm Nik I
00:01:48
work on the API product team so let's
00:01:50
dive into all the stuff that we are
00:01:51
launching today like Kevin mentioned we
00:01:53
have three new built-in tools we have a
00:01:56
new API and an open source SDK uh
00:01:59
starting off with the built-in tools the
00:02:01
first tool that we're announcing today
00:02:03
is called the web search tool the web
00:02:05
search tool allows our models to access
00:02:07
information from the internet so that
00:02:09
your responses and the output that you
00:02:11
get is up to-date and factual uh the web
00:02:15
search tool is the same tool that powers
00:02:17
chat gbd search and it's powered by a
00:02:19
fine-tuned model under the hood so this
00:02:21
is a fine tuned gbd 40 or 40 mini that
00:02:25
is really good at looking at large
00:02:26
amounts of data retriev from the web
00:02:28
finding the relevant pieces of
00:02:30
information and then clearly citing it
00:02:32
in its response um in a benchmark that
00:02:35
uh measures uh these type of things uh
00:02:37
which is called Simple QA uh you can see
00:02:40
that gbd 40 hits a high score of
00:02:43
state-of-the-art score of
00:02:45
90% so that's the first tool Steve do
00:02:47
you want to tell us about the second one
00:02:48
yeah the second tool is actually my
00:02:50
favorite tool and this is the file
00:02:51
Search tool now we launched the file
00:02:53
Search tool last year uh in the
00:02:55
assistance API as a way for developers
00:02:57
to upload chunk embed their documents
00:03:00
and then do really easily do uh rag
00:03:02
really easily over those documents now
00:03:04
we're really excited to be launching two
00:03:06
new features in the file Search tool
00:03:07
today the first is metadata filtering so
00:03:10
with metadata filtering you can add
00:03:12
attributes to your files to be able to
00:03:13
easily filter them down to just the ones
00:03:15
that are the most relevant for your
00:03:17
query the second is a direct search
00:03:19
endpoint so now you can directly search
00:03:21
your vector stores without your queries
00:03:22
being filtered through the model first
00:03:25
nice so you have web search for the
00:03:26
public data file search for the the
00:03:28
private data that you have and then the
00:03:30
third tool that we are launching is the
00:03:32
computer use tool the computer use tool
00:03:34
is operator in the API but it allows you
00:03:37
to control the computers that you are
00:03:39
operating so this could be a virtual
00:03:40
machine it could be a legacy application
00:03:43
that just has a graphical user interface
00:03:45
and you have no API access to it if you
00:03:47
want to automate those kind of tasks and
00:03:49
build applications on that you can use
00:03:51
the computer use tool which comes with
00:03:53
the computer use model um so this is the
00:03:56
same model that is used by operator in
00:03:58
chat gbt it has soda benchmarks on uh OS
00:04:03
World web Arena web Voyager early user
00:04:06
feedback on the Kua model and the tool
00:04:08
has been super super positive so I'm
00:04:10
really excited to see what all of you
00:04:11
built with it all right so those are the
00:04:14
three tools um and while we were
00:04:16
building these tools and thinking of
00:04:17
getting them out we also wanted to take
00:04:19
a first principles Approach at designing
00:04:21
the best API for these tools um we
00:04:24
released chat completions I think in
00:04:27
March 2023 alongside gbd 3.5 5 turbo and
00:04:31
every single API interaction at that
00:04:32
time was just text in and text out since
00:04:35
then we've we've uh introduced
00:04:37
multimodality so you have images you
00:04:39
have audio we're introducing tools today
00:04:42
and you also have products like 01 Pro
00:04:44
deep research operator that make these
00:04:46
multiple model turns and multiple tool
00:04:48
calls behind the scenes so you wanted to
00:04:51
build an API primitive that is flexible
00:04:53
enough it supports multiple terms it
00:04:55
supports tools um and we're calling this
00:04:58
new API the respon API and to show you
00:05:01
the responses API I'm going to hand it
00:05:03
over to Steve cool let's go ahead and
00:05:05
take a look at the responses API so if
00:05:07
you've used chat completions before this
00:05:09
will look really familiar to you you
00:05:11
select some context you pick a model and
00:05:13
you get a response that's pretty simple
00:05:16
it's pretty
00:05:17
simple and it's always hilarious so
00:05:21
maybe not I don't know um so to
00:05:24
demonstrate the power of the responses
00:05:26
API we're going to be building sort of a
00:05:27
personal stylist assistant so let's
00:05:29
start by giving it some instructions you
00:05:32
are a
00:05:35
personal stylist you're only typing in
00:05:37
front of like 50,000 people right now
00:05:39
don't worry about
00:05:41
it cool and we'll say uh we'll get rid
00:05:44
of this and we'll
00:05:46
say what are some of the latest
00:05:53
trends the jokes in the context the joke
00:05:55
is in the let's see what it
00:05:58
says okay okay cool great um but no
00:06:01
personal stylist assistant is complete
00:06:03
unless it understands what its users
00:06:05
like so in order to demonstrate this
00:06:07
we've created a vector store that has uh
00:06:10
some you know like some entries almost
00:06:13
some diary entries of what people on the
00:06:14
team have been wearing um we've kind
00:06:16
that's not weird at all it's not weird
00:06:18
at all I would just let it happen uh
00:06:19
we've kind of been following people
00:06:20
around the office and kind of like
00:06:22
understanding what they what they've
00:06:23
been up to so we we we uh we yeah
00:06:25
there's a whole there's a team there's a
00:06:26
team on it
00:06:28
yeah so go ahead and add the file Search
00:06:31
tool and uh I'll copy in my Vector store
00:06:35
ID and here I can actually filter down
00:06:38
this the files in this Vector store to
00:06:40
just the ones that are relevant to the
00:06:42
person that we want to style so uh in
00:06:44
this case let's start with Elon we'll go
00:06:46
ahead and filter down to his
00:06:49
username and we'll come back here and
00:06:52
we'll refresh and we'll say uh can you
00:06:58
briefly
00:07:00
summarize what Elon likes to
00:07:04
wear I often ask chat GPT this question
00:07:07
yeah but it never knows and now it can
00:07:09
actually tell you what Alon lookes
00:07:11
to cool so Elon has a distinct in
00:07:13
consistent style characterized by Miami
00:07:15
Chic that's really
00:07:17
awesome um so the file Search tool is a
00:07:20
great way to bring information about
00:07:22
your users into your application but in
00:07:24
order to be able to create a really good
00:07:26
application for this personal stylist we
00:07:28
want to be able to bring in fresh data
00:07:30
from around the web um so that we have
00:07:32
both the newest information and also
00:07:34
stuff that's really relevant to your
00:07:35
users so in order to demonstrate that
00:07:37
I'll add the web search
00:07:40
tool cool the web search tool is really
00:07:43
great because you can also add loc you
00:07:44
can also add data about like where your
00:07:46
user is so let's try with somebody else
00:07:49
Kevin are you Happ going to be taking
00:07:51
any trips anytime soon let's say Tokyo
00:07:53
okay cool Tokyo so I'll put in Tokyo
00:07:57
here and we'll swap in Kevin and the
00:08:01
responses API is really cool because it
00:08:03
can do multiple things at once it can
00:08:05
call a file Search tool it can call the
00:08:06
web search tool and it can give you a
00:08:08
final answer just in one API response so
00:08:11
in order to tell it exactly what we want
00:08:13
let's give it some
00:08:15
instructions and it'd be good if I knew
00:08:18
how to code well great you say you're an
00:08:22
engineer here yeah well I'm in
00:08:24
training so uh what we want we want the
00:08:27
model to do is when it's asked recommend
00:08:29
products we wanted to use the file
00:08:30
Search tool to understand what Kevin
00:08:32
likes and then use the web search tool
00:08:34
to find a store near him where he can
00:08:35
buy something that he might be
00:08:37
interested in so let's go back and say
00:08:40
uh find me a
00:08:42
jacket um that I would
00:08:45
like
00:08:47
nearby and what the model will do is it
00:08:49
will uh issue a file Search tool call to
00:08:52
understand what kinds of things Kevin
00:08:54
likes to wear and then it will isue a
00:08:56
web search tool call to then go and find
00:08:58
uh stuff that Kevin would like based on
00:09:00
where he is so the model was able to uh
00:09:03
just in the scope of one API call find a
00:09:04
bunch of Patagonia stores in Tokyo for
00:09:07
you Kevin which which go it actually
00:09:09
corresponds to Kevin's preferences he's
00:09:11
been wearing a lot of Patagonia around
00:09:13
the office so um but no personal stylist
00:09:17
assistant would be complete unless they
00:09:19
could actually go and make purchases on
00:09:20
your behalf so in order to do that let's
00:09:22
demonstrate the computer use
00:09:24
tool so we'll go ahead and add this
00:09:28
we're using the computer use preview mod
00:09:29
mod and the computer use preview tool
00:09:31
and we will ask um help me find my
00:09:36
friend Kevin a new
00:09:40
pagonia jacket what's your favorite
00:09:42
color Kev uh let's go with black and
00:09:45
black can't have too many black patagon
00:09:48
jackets and what the model will do is it
00:09:50
will ask us for a screenshot and we have
00:09:51
a Docker container running locally on
00:09:53
this computer and we will go ahead and
00:09:55
send that screenshot to the model it
00:09:56
will look at the state of the computer
00:09:58
and issue another action click drag move
00:10:00
type and then we will execute that
00:10:02
action take another screenshot send it
00:10:04
back to the model and then it will
00:10:06
continue in this fashion until it feels
00:10:07
that it's completed the task and then
00:10:09
return a final answer so well this is
00:10:12
kind of going and doing its thing we'll
00:10:13
hand it back to nun yeah awesome so
00:10:16
these are some really cool tools and a
00:10:18
really flexible API for you to build uh
00:10:21
agents and and you have you have amazing
00:10:23
building blocks to to do that now but
00:10:25
for those of you who have built more
00:10:26
complex applications like say you're
00:10:28
building a customer support agent it's
00:10:30
not always about just having one agent
00:10:32
that's sort of the personal style uh
00:10:34
stylist you also have some uh agentic
00:10:37
application that's doing your refunds
00:10:39
you have another thing that's answering
00:10:40
customer support uh FAQ queries you have
00:10:43
something else that's dealing with
00:10:44
orders and billing Etc and to make these
00:10:47
applications easy to build we released
00:10:49
an SDK last year called swarm and swarm
00:10:52
made it easy to do agent
00:10:54
orchestration this was uh supposed to be
00:10:56
an experimental and educational thing
00:10:58
but so many of you took it to production
00:11:00
anyway so uh you're like forcing our
00:11:02
hand over here and so uh we've decided
00:11:05
to take swarm and make it production
00:11:07
ready add a bunch of new features and
00:11:09
we're going to be rebranding it to be
00:11:11
called the agents SDK Elan built uh
00:11:15
swarm uh and help build it so I'm going
00:11:17
to have hand it over to him to tell you
00:11:19
more about how it works yeah thanks nun
00:11:22
yeah so uh in my time at open AI I've
00:11:24
spent a lot of time working with
00:11:26
Enterprises and Builders to help them
00:11:27
build out agentic experience
00:11:29
and I've seen firsthand how pretty
00:11:31
simple ideas can actually grow in
00:11:33
complexity like when you actually go to
00:11:35
implement them and so the idea with the
00:11:37
agents SDK is to keep Simple ideas
00:11:39
simple to implement while allowing you
00:11:42
to build more complex and robust ideas
00:11:44
still in a pretty like straightforward
00:11:46
and simple way so um let's take a look
00:11:49
at what Steve had before in the demo but
00:11:51
implemented using the agents s it's
00:11:54
going to look very similar at first we
00:11:55
have our agent defined here we have some
00:11:58
instructions
00:11:59
um and we also have both of the tools
00:12:02
file Search tool web search tool that we
00:12:04
had before is this using like responses
00:12:06
under the hood yeah so by default this
00:12:08
is using the responses API but we
00:12:10
actually support multiple vendors
00:12:12
anything that really fits the chat
00:12:14
completions um shape can work with the
00:12:16
agents SDK nice so um during the
00:12:19
practice runs we actually we actually
00:12:21
accidentally ordered like many many
00:12:23
pagonas so I'm sorry we're have I
00:12:25
understand what's the problem we're
00:12:27
helping you here uh want to return some
00:12:29
of them uh and so to do that I could
00:12:32
usually just add in like a returns tool
00:12:34
and like add more to this prompt and get
00:12:36
it to work but the problem with that is
00:12:37
you start to mix all of this business
00:12:39
logic which makes your agents a little
00:12:41
bit harder to test and so this is the
00:12:43
power of multiple agents is you can
00:12:45
actually separate your concerns and
00:12:47
develop and test them separately so to
00:12:49
do so let's actually introduce a like an
00:12:51
agent specifically to deal with the
00:12:53
sorts of uh like returns so I'm going to
00:12:56
load mine in and great so we still have
00:12:59
our agent from before but you can see
00:13:01
there's also this new agent the customer
00:13:03
support agent here and I've defined a
00:13:05
couple tools for it to use the guest get
00:13:08
passed orders and then submit refund
00:13:11
request and um you might notice these
00:13:14
are just regular python functions as
00:13:15
this is actually a feature that we
00:13:17
people really loved in swarm that we
00:13:19
brought over to the agent SDK which is
00:13:21
we'll take your python functions and
00:13:24
look at the type inference or look at
00:13:25
the type signatures and then
00:13:26
automatically generate the Json schema
00:13:28
that the models need to use to perform
00:13:31
those function calls and then once they
00:13:32
do we actually run the code and then
00:13:34
return the results so you can just
00:13:36
Define these functions um as as they are
00:13:40
now I've given them um now we have our
00:13:42
two agents right we have the stylist
00:13:43
agent and we have the customer support
00:13:46
refunds agent so how do we interact with
00:13:48
both of them as a user this is where the
00:13:50
notion of handoffs come in and a handoff
00:13:54
is actually a pretty simple idea it's
00:13:55
pretty powerful and it's when you have
00:13:58
one conversation where One agent is
00:14:00
handling it and then it hands it off to
00:14:02
another where you keep the entire
00:14:04
conversation the same but behind the
00:14:06
scenes you just swap out the
00:14:07
instructions and the tools um and this
00:14:09
gives you a way to triage conversations
00:14:11
and like load in the correct context for
00:14:14
each part of the conversation so what
00:14:15
we've done here is created this triage
00:14:17
agent that can hand off to the stylist
00:14:20
agent or the customer support agent so
00:14:22
enough talking let's actually see this
00:14:24
in action so I'm going to
00:14:26
save and do you know um I think we may
00:14:30
have ordered one too many
00:14:34
pagonas can you help me return I don't
00:14:37
understand I I know I'm so sorry I can
00:14:40
get you one
00:14:41
later so what just happened here is it
00:14:43
started off by transferring remember
00:14:45
we're starting with the triage agent um
00:14:48
to the customer support agent and this
00:14:50
is just a function call that I'll show
00:14:51
show you in a second um and then the
00:14:53
customer support agent proactively
00:14:55
called the get past orders function
00:14:57
where we can see all of Kevin's pedagog
00:14:59
I think you'll be
00:15:00
okay um cool so to actually see what
00:15:04
happened behind the scenes usually you
00:15:05
might need to add some debugging
00:15:07
statements by hand but one of the things
00:15:08
that the agents s brings right out of
00:15:11
the box is monitoring and tracing so I'm
00:15:13
going to go over to the tracing UI that
00:15:16
we have on our platform um to actually
00:15:18
take a look what just happened so these
00:15:20
are some of the previous runs that we've
00:15:22
had I'm just refreshing the page um and
00:15:24
we can see the last one uh and this last
00:15:26
one you can actually see exactly what
00:15:27
happened we started with a tree agent
00:15:30
which um we sent a request to made a
00:15:32
handoff and then switched over to the
00:15:34
customer support agent which called the
00:15:36
function
00:15:37
now uh we can see what the original
00:15:39
input was and handoffs are first class
00:15:42
objects in this dashboard so you can see
00:15:44
not only which agent we actually handed
00:15:47
it off to but any that it like it had as
00:15:49
options that it did not which is
00:15:51
actually a really useful feature for
00:15:53
debugging um afterward once we're in the
00:15:55
customer support agent you can see they
00:15:57
get get past orders function call with
00:15:59
any input prams Here There Were None um
00:16:01
and then the output is just again just
00:16:03
all of Kevin's very monotonous history
00:16:06
um and then finally we can get to the
00:16:08
end where you get a response and so
00:16:11
these are some of the features that you
00:16:12
get right out of the box with the agents
00:16:13
SDK there's a few more you uh we also
00:16:16
have built-in guard rails that you can
00:16:18
enable we have life cycle events um and
00:16:21
importantly this is an open source
00:16:23
framework so we're going to keep
00:16:24
building it out um and you can install
00:16:27
it like very soon or right now so you
00:16:29
can just do pip install open AI middle
00:16:31
Dash agents and we'll have an one for
00:16:33
the JavaScript coming soon um but to
00:16:36
close this off let's um let's let's
00:16:39
actually perform the the refund so uh
00:16:42
you know uh you know what I'm sorry
00:16:44
Kevin get rid of all of them
00:16:47
oh what am I going to
00:16:50
wear Kevin's going to be cold yeah let's
00:16:55
see it's a lot of them there we go takes
00:16:59
a while to return so many P gam and so
00:17:01
what what happens under the hood how do
00:17:02
you how do you debug this how do you
00:17:04
understand more about what's going on
00:17:06
yeah so that we can all do back in the
00:17:08
in the tracing in the tracing UI so this
00:17:11
is a pretty nice straightforward way to
00:17:13
build out these experiences yeah the
00:17:16
awesome pass to you I'm so excited for
00:17:18
all of you to have access to all of
00:17:20
these tools uh and before we wrap up I
00:17:22
wanted to make two additional points
00:17:24
first we've introduced the responses API
00:17:27
but the chat completions API is not
00:17:29
going away we're going to continue
00:17:30
supporting it with new models and
00:17:32
capabilities there will be certain
00:17:34
capabilities that require built-in tool
00:17:36
use and there'll be certain models and
00:17:38
agentic products that we release in the
00:17:39
future that will require will require
00:17:42
them and those will be available in
00:17:44
responses API only responses API
00:17:47
features are a superet of what chat chat
00:17:50
completions support so whenever you
00:17:52
decide to migrate over it should be a
00:17:54
pretty straightforward migration to you
00:17:56
and we hope you love the developer
00:17:57
experience of responses cuz be put a lot
00:17:59
of thought into that the second point I
00:18:01
wanted to make was around the assistance
00:18:03
API we built the assistance API based on
00:18:07
all the great feedback that we got from
00:18:09
all of our beta users and uh you know we
00:18:12
we wouldn't be here without uh without
00:18:13
all the learnings that we had during the
00:18:15
assistance API phase we are going to be
00:18:18
adding more features to the responses
00:18:20
API so that it can support everything
00:18:23
that the assistance API can do and once
00:18:25
that happens we'll be sharing a
00:18:26
migration guide that makes it really
00:18:29
easy for all of you to migrate your
00:18:31
applications from assistants to
00:18:33
responses without any loss of
00:18:35
functionality or data we'll give you
00:18:38
ample time to move things over and once
00:18:40
we once we're done with that we plan to
00:18:42
Sunset the assistance API sometime in
00:18:45
2026 we'll be sharing a lot more details
00:18:48
about this uh offline as well but yeah
00:18:51
that's it for me I'll hand it over to
00:18:52
Kevin to wrap us up awesome well we're
00:18:54
super excited to announce the the
00:18:56
responses API and the idea that we can
00:18:58
bring take a single powerful API and
00:19:01
bring together a whole bunch of
00:19:03
different tools from Rag and file search
00:19:05
to web search to Kua and our uh operator
00:19:09
uh computer use apis now um now you can
00:19:14
count on us to continue building
00:19:16
powerful new models and bring more
00:19:18
intelligence to bring more powerful
00:19:20
tools to help you build better agents
00:19:22
20125 is going to be the year of the
00:19:24
agent it's the year that chat GPT and
00:19:27
our developer tools go from just
00:19:29
answering questions to actually doing
00:19:31
things for you out in the real world
00:19:33
we're super excited about that we're
00:19:35
just getting started we know you are too
00:19:37
and we can't wait to see what you build