00:00:02
[Music]
00:00:17
[Music]
00:00:21
hello welcome to the 12 days of open AI
00:00:24
we're going to try something that as far
00:00:25
as we know no tech company has done
00:00:27
before which is every day for the next
00:00:28
12 every week day we are going to launch
00:00:31
or demo some new thing that we built and
00:00:33
we think we've got some great stuff for
00:00:35
you starting today we hope you'll really
00:00:36
love it and you know we'll try to make
00:00:39
this fun and fast and not take too long
00:00:41
but it'll be a way to show you what
00:00:42
we've been working on and a little
00:00:44
holiday present from us so we'll jump
00:00:46
right into this first day uh today we
00:00:47
actually have two things to launch the
00:00:49
first one is the full version of 01 we
00:00:51
have been very hard at work we've
00:00:53
listened to your feedback you want uh
00:00:54
you like o one preview but you want it
00:00:56
to be smarter and faster and be
00:00:58
multimodal and be better in instruction
00:01:00
following a bunch of other things so
00:01:01
we've put a lot of work into this and
00:01:03
for scientists engineers coders we think
00:01:06
they will really love this new model uh
00:01:08
I'd like to show you quickly about how
00:01:10
it performs so you can see uh the jump
00:01:13
from GPT 40 to o1 preview across math
00:01:16
competition coding GP QA Diamond um and
00:01:20
you can see that 01 is a pretty big step
00:01:22
forward um it's also much better in a
00:01:24
lot of other ways but raw intelligence
00:01:26
is something that we care about coding
00:01:27
performance in particular is an area
00:01:29
where people people are using the model
00:01:30
a lot so in just a minute uh these guys
00:01:33
will demo some things about a one
00:01:35
they'll show you how it does at speed
00:01:37
how it does at really hard problems how
00:01:39
it does with multimodality but first I
00:01:41
want to talk just for a minute about the
00:01:42
second thing we're launching today a lot
00:01:45
of people uh Power users of chat gbt at
00:01:47
this point they really use it a lot and
00:01:49
they want more compute than $20 a month
00:01:51
can buy so we're launching a new tier
00:01:53
chat gbt pro and pro has unlimited
00:01:56
access to our models uh and also things
00:01:58
like advanced voice mode it also has a
00:02:01
uh a new thing called 01 PR mode so 01
00:02:04
is the smartest model in the world now
00:02:06
except for 01 being used in PR mode and
00:02:09
for the hardest problems that people
00:02:10
have uh 01 PR mode lets you do even a
00:02:13
little bit better um so you can see a
00:02:15
competition math you can see a GP QA
00:02:17
Diamond um and these boosts may look
00:02:19
small but in in complex workflows where
00:02:21
you're really pushing the limits of
00:02:22
these models it's pretty significant uh
00:02:25
I'll show you one more thing about Pro
00:02:27
about the pro mode so one that people
00:02:30
really have said they want is
00:02:31
reliability and here you can see how the
00:02:34
reliability of an answer from prom mode
00:02:36
Compares to1 and this isn't even
00:02:37
stronger Delta and again for our Pro
00:02:40
users we've heard a lot about how much
00:02:41
people want this chat PT Pro is $200 a
00:02:44
month uh launches today over the course
00:02:46
of this these 12 days we have some other
00:02:48
things to add to it that we think you
00:02:50
also really love um but Unlimited Model
00:02:52
use and uh this new 01 prom mode so I
00:02:55
want to jump right in and we'll show
00:02:56
some of those demos that we talked about
00:02:59
uh and these are some of the guys that
00:03:00
helped build 01 uh with many other
00:03:03
people behind them on the team thanks
00:03:05
Sam hi um I'm H onean I'm Jason and I'm
00:03:09
Max we're all research scientists who
00:03:10
worked on building 01 o1 is really
00:03:13
distinctive because it's the first model
00:03:14
we've trained that thinks before it
00:03:16
responds meaning it gives much better
00:03:18
and often more detailed and more correct
00:03:20
responses than other models you might
00:03:21
have tried 01 is being rolled out today
00:03:24
to all uh plus and soon to be Pro
00:03:27
subscribers on chat gbt replacing o1 PR
00:03:31
o1 model is uh faster and smarter than
00:03:34
the o1 preview model which we launched
00:03:36
in September after the launch many
00:03:38
people asked about the multimodel input
00:03:40
so we added that uh so now the oan model
00:03:43
live today is able to region through
00:03:46
both images and text
00:03:48
jointly as Sam mentioned today we're
00:03:50
also going to launch a new tier of Chad
00:03:52
gbt called chbt pro chbt pro offers
00:03:56
unlimited access to our best models like
00:03:59
01 40 and advanced voice chbt Pro also
00:04:03
has a special way of using 01 called 01
00:04:06
Pro mode with o1 Pro mode you can ask
00:04:09
the model to use even more compute to
00:04:11
think even harder on some of the most
00:04:13
difficult
00:04:14
problems we think the audience for chat
00:04:17
gbt Pro will be the power users of chat
00:04:19
gbt those who are already pushing the
00:04:21
models to the limits of their
00:04:22
capabilities on tasks like math
00:04:25
programming and writing it's been
00:04:26
amazing to see how much people are
00:04:28
pushing a one preview how much people
00:04:30
who do technical work all day get out of
00:04:32
this and uh we're really excited to let
00:04:33
them push it further yeah sure we also
00:04:36
really think that 01 will be much better
00:04:37
for everyday use cases not necessarily
00:04:40
just really hard math and programming
00:04:42
problems in particular one piece of
00:04:43
feedback we received about o1 preview
00:04:45
constantly was that it was way too slow
00:04:47
it would think for 10 seconds if you
00:04:48
said High to it and we fixed that was
00:04:50
really annoying it it was kind of funny
00:04:52
honestly it really thought it cared
00:04:55
really thought hard about saying hi back
00:04:56
yeah um and so we fixed that 01 will now
00:04:59
think much more intelligently if you ask
00:05:01
it a simple question it'll respond
00:05:03
really quickly and if you ask it a
00:05:04
really hard question it'll think for a
00:05:05
really long time uh we ran a pretty
00:05:08
detailed Suite of human evaluations for
00:05:09
this model and what we found was that it
00:05:11
made major mistakes about 34% less often
00:05:14
than o one preview while thinking fully
00:05:17
about 50% faster and we think this will
00:05:19
be a really really noticeable difference
00:05:21
for all of you so I really enjoy just
00:05:23
talking to these models I'm a big
00:05:25
history buff and I'll show you a really
00:05:26
quick demo of for example a sort of
00:05:28
question that I might ask one of these
00:05:30
models so uh right here I on the left I
00:05:33
have 01 on the right I have o1 preview
00:05:36
and I'm just asking at a really simple
00:05:37
history question list the Roman EMP of
00:05:39
the second century tell me about their
00:05:41
dates what they did um not hard but you
00:05:44
know GPT 40 actually gets this wrong a
00:05:46
reasonable fraction of the time um and
00:05:49
so I've asked o1 this I've asked o1
00:05:51
preview this I tested this offline a few
00:05:53
times and I found that 01 on average
00:05:55
responded about 60% faster than1 preview
00:05:58
um this could be a little bit aable
00:05:59
because right now we're in the process
00:06:01
of swapping all our gpus from 01 Pro
00:06:04
preview to 01 so actually 01 thought for
00:06:07
about 14 seconds 01 preview still
00:06:11
going there's a lot of Roman emperors
00:06:13
there's a lot of Roman emperors yeah 40
00:06:15
actually gets this wrong a lot of the
00:06:16
time there are a lot of folks who rolled
00:06:17
for like uh 6 days 12 days a month and
00:06:20
it sometimes forgets those can you do
00:06:22
them all for memory including the six
00:06:23
day people
00:06:25
no yep so here we go 01 thought for
00:06:28
about 14 seconds preview thought for
00:06:30
about 33 seconds these should both be
00:06:32
faster once we finish deploying but we
00:06:33
wanted this to go live right now exactly
00:06:35
um so yeah we we think you'll really
00:06:37
enjoy talking to this model we we found
00:06:39
that it gave great responses it thought
00:06:40
much faster it should just be a much
00:06:42
better user experience for everyone so
00:06:44
one other feature we know that people
00:06:45
really wanted for everyday use cases
00:06:47
that we've had requested a lot is
00:06:49
multimodal inputs and image
00:06:50
understanding and hungan is going to
00:06:52
talk about that now yep to illustrate
00:06:54
the multimodal input and reasoning uh I
00:06:57
created this toy problem uh with some
00:07:00
handdrawn diagrams and so on so here it
00:07:03
is it's hard to see so I already took a
00:07:05
photo of this and so let's look at this
00:07:08
photo in a laptop so once you upload the
00:07:11
image into the chat GPT you can click on
00:07:14
it and um to see the zoomed in version
00:07:17
so this is a system of a data center in
00:07:20
space so maybe um in the future we might
00:07:24
want to train AI models in the space uh
00:07:28
I think we should do that but the Power
00:07:30
number looks a little low one G okay but
00:07:33
the general idea rookie numbers in this
00:07:35
rookie numbers rookie okay yeah so uh we
00:07:38
have a sun right here uh taking in power
00:07:41
on this solar panel and then uh there's
00:07:44
a small data center here it's exactly
00:07:46
what they look like yeah GPU Rex and
00:07:49
then pump nice pump here and one
00:07:52
interesting thing about um operation in
00:07:55
space is that on Earth we can do air
00:07:58
cooling water cooling to cool the gpus
00:08:00
but in space there's nothing there so we
00:08:03
have to radiate this um heat into the
00:08:06
deep space and that's why we need this
00:08:09
uh giant radiator cooling panel and this
00:08:12
problem is about finding the lower bound
00:08:14
estimate of the cooling panel area
00:08:18
required to operate um this 1 gaw uh uh
00:08:22
data center probably going to be very
00:08:24
big yeah let's see how big is let's see
00:08:28
so that's the problem and going to this
00:08:30
prompt and uh yeah this is essentially
00:08:33
asking for that so let me uh hit go and
00:08:36
the model will think for
00:08:39
seconds by the way most people don't
00:08:41
know I've been working with henan for a
00:08:42
long time henan actually has a PHD in
00:08:46
thermodynamics which it's totally
00:08:48
unrelated to Ai and you always joke that
00:08:50
you haven't been able to use your PhD
00:08:52
work in your job until today so you can
00:08:55
you can trust hungan on this analysis
00:08:57
finally finally uh thanks for hyping up
00:09:00
now I really have to get this right uh
00:09:03
okay so the model finished thinking only
00:09:06
10 seconds it's a simple problem so
00:09:08
let's see if how the model did it so
00:09:11
power input um so first of all this one
00:09:14
gwatt that was only drawn in the paper
00:09:17
so the model was able to pick that up
00:09:19
nicely and then um radiative heat
00:09:21
transfer only that's the thing I
00:09:23
mentioned so in space nothing else and
00:09:25
then some simplifying um uh choices and
00:09:29
one critical thing is that I
00:09:30
intentionally made this problem under
00:09:32
specified meaning that um the critical
00:09:36
parameter is a temperature of the
00:09:37
cooling panel uh I left it out so that
00:09:41
uh we can test out the model's ability
00:09:43
to handle um ambiguity and so on so the
00:09:47
model was able to recognize that this is
00:09:50
actually a unspecified but important
00:09:53
parameter and it actually picked the
00:09:55
right um range of param uh temperature
00:09:58
which is about the room temperature and
00:10:00
with that it continues to the analysis
00:10:03
and does a whole bunch of things and
00:10:05
then found out the area which is 2.42
00:10:09
million square meters just to get a
00:10:10
sense of how big this is this is about
00:10:13
2% of the uh land area of San Francisco
00:10:16
this is huge not that bad not that bad
00:10:19
yeah oh
00:10:20
okay um yeah so I guess this this uh
00:10:24
reasonable I'll skip through the rest of
00:10:26
the details but I think the model did a
00:10:28
great job job um making nice consistent
00:10:33
assumptions that um you know make the
00:10:35
required area as little as possible and
00:10:38
so um yeah so this is the demonstration
00:10:42
of the multimodal reasoning and this is
00:10:45
a simple problem but o1 is actually very
00:10:48
strong and on standard benchmarks like
00:10:50
mm muu and math Vista o1 actually has
00:10:54
the state-ofthe-art
00:10:55
performance now Jason will showcase the
00:10:58
the pr mode
00:10:59
great so I want to give a short demo of
00:11:02
uh chb1 Pro mode um people will find uh
00:11:07
o1 prom mode the most useful for say
00:11:09
hard math science or programming
00:11:11
problems so here I have a pretty
00:11:13
challenging chemistry problem that o1
00:11:16
preview gets usually Incorrect and so I
00:11:19
will uh let the model start
00:11:22
thinking um one thing we've learned with
00:11:24
these models is that uh for these very
00:11:27
challenging problems the model can think
00:11:29
up to a few minutes I think for this
00:11:31
problem the model usually thinks
00:11:32
anywhere from 1 minute to up to 3
00:11:35
minutes um and so we have to provide
00:11:37
some entertainment for for people while
00:11:39
the model is thinking so I'll describe
00:11:41
the problem a little bit and then if the
00:11:43
model's still thinking when I'm done
00:11:45
I've prepared a dad joke for for us uh
00:11:48
to fill the rest of the time um so I
00:11:51
hope it think for a long
00:11:52
time you can see uh the problem asks for
00:11:56
a protein that fits a very specific
00:11:59
specific set of criteria so uh there are
00:12:01
six criteria and the challenge is each
00:12:04
of them ask for pretty chemistry domain
00:12:06
specific knowledge that the model would
00:12:08
have to
00:12:09
recall and the other thing to know about
00:12:11
this problem uh is that none of these
00:12:14
criteria actually give away what the
00:12:16
correct answer is so for any given
00:12:18
criteria there could be dozens of
00:12:20
proteins that might fit that criteria
00:12:23
and so the model has to think through
00:12:24
all the candidates and then check if
00:12:26
they fit all the
00:12:27
criteria okay so you could see the model
00:12:30
actually was faster this time uh so it
00:12:33
finished in 53 seconds you can click and
00:12:36
see some of the thought process that the
00:12:38
model went through to get the answer uh
00:12:40
you could see it's uh thinking about
00:12:42
different candidates like neuro Lian
00:12:44
initially um and then it arrives at the
00:12:46
correct answer which is uh retino chisen
00:12:49
uh which is
00:12:51
great um okay so to summarize um we saw
00:12:54
from Max that o1 is smarter and faster
00:12:59
than uh o1 preview we saw from hangan
00:13:02
that oan can now reason over both text
00:13:05
and images and then finally we saw with
00:13:08
Chach BT Pro mode uh you can use o1 to
00:13:11
think about uh the the to to to to
00:13:15
reason about the hardest uh science and
00:13:17
math problems yep there's more to come
00:13:20
um for the chpt pro tier uh we're
00:13:23
working on even more computer intensive
00:13:26
tasks to uh Power longer and bigger
00:13:28
tasks ask for those who want to push the
00:13:31
model even further and we're still
00:13:34
working on adding tools to the o1 um
00:13:37
model such as web browsing file uploads
00:13:41
and things like that we're also hard at
00:13:43
work to bring o1 to to the API we're
00:13:45
going to be adding some new features for
00:13:47
developers structured outputs function
00:13:49
calling developer messages and API image
00:13:52
understanding which we think you'll
00:13:53
really enjoy we expect this to be a
00:13:55
great model for developers and really
00:13:57
unlock a whole new frontier of aent
00:13:59
things you guys can build we hope you
00:14:00
love it as much as we
00:14:02
do that was great thank you guys so much
00:14:05
congratulations uh to you and the team
00:14:07
on on getting this done uh we we really
00:14:10
hope that you'll enjoy 01 and prom mode
00:14:13
uh or Pro tier uh we have a lot more
00:14:15
stuff to come tomorrow we'll be back
00:14:16
with something great for developers uh
00:14:19
and we'll keep going from there before
00:14:21
we wrap up can can we hear your joke yes
00:14:24
uh so um I made this joke this
00:14:27
morning the the joke is this so Santa
00:14:31
was trying to get his large language
00:14:33
model to do a math problem and he was
00:14:36
prompting it really hard but it wasn't
00:14:37
working how did he eventually fix
00:14:40
it no idea he used reindeer enforcement
00:14:48
learning thank you very much thank you