What new capabilities does the 01 model series provide?

The 01 model series can now be customized and fine-tuned using reinforcement learning, allowing users to create expert models for specific tasks.

Who can access the preview for the new model customization program?

The preview is aimed at universities, researchers, and enterprises with further access details to be provided.

How does reinforcement fine-tuning differ from standard fine-tuning?

Reinforcement fine-tuning uses reinforcement learning algorithms to allow models to learn reasoning in custom domains, whereas standard fine-tuning mimics input examples.

What fields can benefit from the new reinforcement fine-tuning process?

Fields requiring deep expertise, like legal, finance, engineering, and insurance, can benefit from this new process.

What collaboration was mentioned during the video?

OpenAI partnered with Thompson Reuters to utilize reinforcement fine-tuning for a legal assistant in their co-counsel AI.

What example of scientific application was highlighted?

A project involving rare genetic diseases where OpenAI models help identify gene mutations responsible for specific conditions.

What is the reinforcement fine-tuning research program?

It is a program allowing organizations working on complex tasks to gain early access to reinforcement fine-tuning capabilities.

When is the public launch of the reinforcement fine-tuning features expected?

The public launch is planned for early next year.

What are the anticipated benefits of the reinforcement fine-tuning in bioinformatics?

It can help improve understanding and treatment of rare diseases by reasoning over biomedical data.

What was a humorous moment mentioned in the video?

A 'Christmas-themed joke' involving self-driving sleighs not identifying trees due to lack of 'pine-tuning' by Santa.

Reinforcement Fine-Tuning—12 Days of OpenAI: Day 2

00:20:35

https://www.youtube.com/watch?v=yCIYS9fx56U

概要

TLDROpenAI is advancing its 01 series, introducing a model customization program with reinforcement fine-tuning, set to launch publicly next year. This technique allows models to adapt using users' datasets, offering tailored reasoning capabilities for various specialized fields, including legal, finance, and healthcare. A notable collaboration with Thompson Reuters showcased legal AI development, and scientific research benefits from improved rare disease analysis methods using OpenAI models. Researchers are invited to apply for the reinforcement fine-tuning research program, which emphasizes OpenAI's commitment to enhancing AI application in real-world scenarios.

収穫

🚀 OpenAI launches new model customization capabilities with 01 series.
🧠 Reinforcement fine-tuning allows for advanced domain-specific reasoning.
📅 Public launch of customization features planned for next year.
🤝 Collaborations include Thompson Reuters for legal AI.
🔬 Models can aid in genetic research for rare diseases.
📊 Opportunity for researchers to apply for fine-tuning program.
💼 Fields benefiting include legal, finance, and engineering.
💡 Emphasis on practical applications enhancing real-world impacts.
📉 Fine-tuning seen as a significant advancement over standard methods.
🎄 Presented a humorous Christmas-themed AI joke.

タイムライン

00:00:00 - 00:05:00
Mark from OpenAI introduced updates on their new model series 01, emphasizing its ability for better delayed response learned solutions. Reinforcement fine tuning is a new method introduced for customization which allows users from academia and enterprise sectors to train the model on their specific data sets utilizing reinforcement learning, leading to expert-level capabilities.
00:05:00 - 00:10:00
John, Julie, and Justin explain the advantages and functionalities of Reinforcement Fine Tuning (RFT) for the 01 series. It's designed to allow domain specific learning by utilizing few examples for teaching models to reason and improve its performance in fields like legal, finance, and engineering. The goal is to enhance their AI with deep expertise in various domains such as law, collaborating with organizations like Thomson Reuters.
00:10:00 - 00:15:00
Justin from Berkeley discusses the potential of using the 01 model with reinforcement fine tuning for understanding rare genetic diseases by combining medical expertise with model's reasoning capabilities. A dataset is used to improve its ability to predict causative genes for specific symptoms, showcasing how enhanced reasoning aids biomedical research.
00:15:00 - 00:20:35
Customizing models using reinforcement fine tuning involves creating datasets and graders to evaluate performance. OpenAI simplifies the process by leveraging its infrastructure to train models efficiently, showcasing improved model performance in medical research. Fine-tuning enhances model's ability to generalize and reason, potentially transforming workflows in healthcare and other sectors.

ビデオQ&A

What new capabilities does the 01 model series provide?
The 01 model series can now be customized and fine-tuned using reinforcement learning, allowing users to create expert models for specific tasks.
Who can access the preview for the new model customization program?
The preview is aimed at universities, researchers, and enterprises with further access details to be provided.
How does reinforcement fine-tuning differ from standard fine-tuning?
Reinforcement fine-tuning uses reinforcement learning algorithms to allow models to learn reasoning in custom domains, whereas standard fine-tuning mimics input examples.
What fields can benefit from the new reinforcement fine-tuning process?
Fields requiring deep expertise, like legal, finance, engineering, and insurance, can benefit from this new process.
What collaboration was mentioned during the video?
OpenAI partnered with Thompson Reuters to utilize reinforcement fine-tuning for a legal assistant in their co-counsel AI.
What example of scientific application was highlighted?
A project involving rare genetic diseases where OpenAI models help identify gene mutations responsible for specific conditions.
What is the reinforcement fine-tuning research program?
It is a program allowing organizations working on complex tasks to gain early access to reinforcement fine-tuning capabilities.
When is the public launch of the reinforcement fine-tuning features expected?
The public launch is planned for early next year.
What are the anticipated benefits of the reinforcement fine-tuning in bioinformatics?
It can help improve understanding and treatment of rare diseases by reasoning over biomedical data.
What was a humorous moment mentioned in the video?
A 'Christmas-themed joke' involving self-driving sleighs not identifying trees due to lack of 'pine-tuning' by Santa.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス！

字幕

オートスクロール:

00:00:00
hi everyone my name is Mark and I lead
00:00:02
research at openai yesterday we took 01
00:00:06
out of preview and we launched it in
00:00:08
chbt we're soon going to launch it in
00:00:10
the API if you haven't been following o1
00:00:14
it's our latest series of model
00:00:15
improvements that allow the models to
00:00:17
think for a while before they come back
00:00:19
with a response today we're really
00:00:22
excited to preview our latest
00:00:24
advancement in our model customization
00:00:26
program it'll let users find to 01 on
00:00:30
their own data sets and again this isn't
00:00:33
standard fine tuning this is
00:00:35
reinforcement fine tuning which really
00:00:38
leverages the reinforcement learning
00:00:40
algorithms that took us from Advanced
00:00:42
High School level to expert PhD level
00:00:45
for your own use cases I want to stress
00:00:48
again that this is a preview of
00:00:49
something that we're going to launch
00:00:51
publicly next year but if you are a
00:00:54
university or you're a researcher or
00:00:56
you're an Enterprise we'll give you some
00:00:58
information on how you can access our
00:01:00
output program later so why would you
00:01:03
want this thing well it allows you to
00:01:05
take your golden data sets and turn them
00:01:08
into unique
00:01:09
offerings that will give you the same
00:01:11
magic that we have for your own users
00:01:14
and your own customers so I'll let John
00:01:16
Julie and Justin say a little bit more
00:01:19
yeah hello everyone yeah my name is John
00:01:20
Allard and I'm an engineer here at openi
00:01:22
hi everyone I'm Julie W I'm a researcher
00:01:24
here at open AI I'm Justin ree I'm a
00:01:27
computational biologist at Berkeley lb
00:01:29
today today we're so excited to be
00:01:31
introducing this new way of model
00:01:33
customization for our 01 series of
00:01:35
models uh reinforcement fine tuning or
00:01:37
rft for short for the first time
00:01:40
developers researchers and machine
00:01:42
learning Engineers will be able to use
00:01:44
reinforcement learning to create expert
00:01:46
models capable of excelling at their
00:01:49
specific tasks within their domain we
00:01:51
believe that any field which requires
00:01:53
deep expertise in their AI models stands
00:01:56
the benefit so if you work in say legal
00:01:59
Finance engineering Insurance uh this
00:02:02
one's for you for example we recently
00:02:04
partnered with Thompson Reuters to to
00:02:07
use reinforcement fine-tuning to
00:02:09
fine-tune 01 mini to be a legal
00:02:11
assistant in their co-counsel AI um this
00:02:15
this tool assist their legal
00:02:16
Professionals in accomplishing some of
00:02:18
their most analytical
00:02:21
workflows yeah so some of you will be
00:02:23
familiar with the supervised fine tuning
00:02:25
API that we launched um early last year
00:02:27
and supervised fine tuning is really
00:02:29
powerful what you're trying to do is get
00:02:30
the model to replicate features that it
00:02:32
finds in uh input text or images and
00:02:35
this is great if you want to change the
00:02:37
tone or the style or the response format
00:02:39
of the model now with reinforcement
00:02:41
fine-tuning or reindeer enforcement fine
00:02:43
tuning I should
00:02:44
say the with reinforcement fine tuning
00:02:47
um it's actually it's different so
00:02:49
you're not just teaching the model to
00:02:50
mimic its um its inputs what you're
00:02:52
teaching it to do is to learn to reason
00:02:53
in entirely new ways over custom domains
00:02:56
and the way this works is that we when
00:02:58
the model sees a problem we give it
00:03:00
space to Think Through the problem and
00:03:02
then we grade the final answer from the
00:03:04
model and then using the power of
00:03:06
reinforcement learning we reinforce
00:03:08
lines of thinking that led to correct
00:03:09
answers and we disincentivize lines of
00:03:11
thinking that led to incorrect answers
00:03:14
um and what you'll see is that you know
00:03:15
with as little as a few dozen examples
00:03:17
the model will learn to reason in new
00:03:20
and effective ways over custom domains
00:03:23
that's crazy that you can do that with
00:03:25
just 12 examples that's not something
00:03:27
you can do with regular uh fine tuning
00:03:29
yeah exactly yeah in the in the space of
00:03:30
large language models and large machine
00:03:32
learning a few dozen examples is is
00:03:33
basically nothing yeah so for the first
00:03:36
time our model customization platform
00:03:38
will support reinforcement learning and
00:03:40
and notably this is the same technique
00:03:41
that we use internally at open AI to
00:03:43
train our Frontier models like gp40 and
00:03:46
the o1 series one area with many
00:03:49
exciting applications is scientific
00:03:50
research but don't just take our word
00:03:52
for it that's why we're joined today by
00:03:54
Justin ree uh Justin is a researcher at
00:03:57
Berkeley lab and one of his areas of
00:03:58
study is using computational methods to
00:04:01
understand the Gen the genetic causes
00:04:04
underlying rare diseases Justin thank
00:04:06
you so much for being here do you mind
00:04:08
telling us a little bit more about your
00:04:09
research and how reinforcement fine
00:04:11
tuning might help sure thanks it's great
00:04:13
to be here so one of the areas of my
00:04:15
research is rare genetic disease so
00:04:17
contrary to the name rare rare genetic
00:04:19
disease is actually not rare So any one
00:04:21
rare disease is rare but if you put them
00:04:22
all together um they're actually quite
00:04:24
common and so we're talking about 300
00:04:26
million people globally who who suffer
00:04:28
from a rare disease and what's more
00:04:30
these people often have a long
00:04:31
diagnostic Odyssey of months to years
00:04:33
before they find out about their
00:04:34
condition W it's like uh the whole
00:04:36
population of the US yes it's not a
00:04:38
small number of people and so what we're
00:04:40
working on is better uh computational
00:04:42
tools and methods to to Really research
00:04:44
what's important uh and and to help us
00:04:46
understand and treat these diseases so
00:04:48
we we do our work in kind of an academic
00:04:50
setting and learning more about the rare
00:04:52
disease and their causes and the the
00:04:53
hope is we'll be able to advance the
00:04:55
healthcare for these folks uh going down
00:04:57
the line and now assessing rare disease
00:05:00
is kind of hard because you kind of have
00:05:01
to have two things you have to have sort
00:05:03
of expert domain knowledge about the the
00:05:04
medical side of things and you also have
00:05:06
to have uh sort of systematic reasoning
00:05:08
over the biomedical data and this is an
00:05:10
area where we think that o the oan model
00:05:12
can really help us out with its
00:05:14
reasoning capabilities that makes a lot
00:05:16
of sense you know our large language
00:05:18
models have domain knowledge and our o1
00:05:20
models are really systemic reasoners so
00:05:23
it seems like now there's a pretty good
00:05:25
computational method for addressing some
00:05:27
of these that's right can you tell us a
00:05:29
little a little bit more about the data
00:05:30
sets that you're using sure so this was
00:05:32
sort of a collaborative effort between
00:05:34
uh our our group and charot Hospital in
00:05:36
Germany and Peter Robinson's lab and the
00:05:38
Monarch initiative um and what we did
00:05:41
really was was to extract disease
00:05:43
information from hundreds of scientific
00:05:45
Publications that were case reports
00:05:46
about rare disease and so uh we sort of
00:05:49
curated the information uh and that's
00:05:51
lists of signs and symptoms that were
00:05:53
present in in the patient and that were
00:05:55
excluded in the patient and then of
00:05:56
course the the disease that they had and
00:05:59
importantly for the conversation the
00:06:00
causative Gene that was mutated that was
00:06:02
causing the problems in these fols I see
00:06:04
so you and maybe some doctors are trying
00:06:06
to figure out given a patient symptoms
00:06:09
uh What gene might have mutated to cause
00:06:11
those symptoms yeah that's right and and
00:06:13
so something we've been working on
00:06:14
together with the open AI team is is uh
00:06:17
training the old one the old one models
00:06:20
to reason more effectively about the
00:06:21
causes of disease incredible thank you
00:06:24
Justin uh we're now going to give you a
00:06:26
preview of reinforcement fine-tuning at
00:06:28
work and not to steal any Thunder but
00:06:31
we're going to take 01 mini and make it
00:06:33
exceed the performance of 01 on this
00:06:35
task uh that's the 01 that we just
00:06:37
launched yesterday and this matters so
00:06:39
much because o1 mini is a smaller faster
00:06:43
and cheaper model than 01 yeah so using
00:06:46
Justin's data set we're going to show
00:06:48
you can just drastically improve the
00:06:49
performance of ow and mini um on this
00:06:51
task where given a list of symptoms
00:06:54
you're trying to predict which Gene
00:06:55
might be responsible for the genetic
00:06:57
disease and so to give an overview of
00:06:59
this process we're going to start by
00:07:00
looking at uh data sets that are used to
00:07:02
train the model and graders that are
00:07:04
used to evaluate the model and then
00:07:06
we're going to um launch a training job
00:07:08
on open AI training infrastructure and
00:07:10
finally we'll evaluate the resulting
00:07:12
fine-tune model so we can see how it's
00:07:13
improved over the base model that we
00:07:15
started with so to start us off we're
00:07:17
going to jump over to the open AI
00:07:18
development platform and we're going to
00:07:20
go ahead and we're going to create a new
00:07:21
model so um you know we've had
00:07:23
supervised fine tuning for a bit over a
00:07:25
year now what we're going to do is we're
00:07:26
going to select reinforcement fine
00:07:27
tuning now we're going to be training o1
00:07:30
so we'll select that as the base model
00:07:32
and now we need to upload a training
00:07:33
data set and now training data sets
00:07:35
they're just Json L files which is just
00:07:37
a file where each line in the file is an
00:07:39
example that you want the model to be
00:07:41
trained on for this um case Justin and
00:07:44
his colleagues assembled a data set of
00:07:45
about 1100 examples um and so I'll go
00:07:48
ahead and upload that one and just so
00:07:51
that we get a really good feel for um
00:07:54
how this data set works and what this
00:07:55
task is we'll zoom in on an individual
00:07:57
data point really quickly and so this is
00:08:00
what an individual data point looks like
00:08:02
and there's really three important
00:08:03
things here so the first is the case
00:08:06
report and this is a description of the
00:08:08
patient and the patient symptoms so we
00:08:10
see that the patient was a 51-year-old
00:08:12
woman the disease onset was not
00:08:14
specified we have a list of symptoms
00:08:16
like hyperism um hyperthyroidism and
00:08:19
others um as Justin said earlier we have
00:08:21
the absent symptoms um these are the
00:08:22
symptoms that are not present and this
00:08:24
is important because it helps the model
00:08:25
to rule out genes that it might think
00:08:28
would otherwise be responsible um for
00:08:29
the symptoms that are present next we
00:08:32
have the instructions and I'm sure if
00:08:34
you're watching this live stream you're
00:08:35
familiar with prompting and so all we're
00:08:36
doing here is just prompting the model
00:08:38
for um what we wanted to do for this
00:08:39
task and so what we're saying is you
00:08:41
know given the list of symptoms and the
00:08:43
case report can you list all the genes
00:08:45
that you think might be responsible for
00:08:47
the um for the genetic disease that that
00:08:49
you think is present and then we also
00:08:51
asked it to provide an explanation for
00:08:53
why it thinks those genes might be
00:08:55
responsible um finally we also have the
00:08:58
correct answer and so this is the gene
00:09:00
that we happen to know is responsible
00:09:02
but importantly we're not showing this
00:09:03
to the model during the training process
00:09:05
that would be cheating but we're using
00:09:07
it internally during the training
00:09:09
process to grade the model's outputs or
00:09:10
to check if the model is correct this is
00:09:13
a pretty hard task uh I definitely have
00:09:15
no hope of answering this question yeah
00:09:17
I mean you can tell that we're we've
00:09:19
come up far away from just trying to
00:09:20
count the number of RS in the word
00:09:22
strawberry yeah so
00:09:25
um so um now when we give the model this
00:09:28
prompt this case report and these
00:09:30
instructions the model is going to
00:09:31
output something like this which is a
00:09:33
list of genes that it thinks might be
00:09:35
responsible and importantly the genes
00:09:37
are in sorted order where the first Gene
00:09:39
in the list is the one that it thinks is
00:09:40
most likely to be responsible the second
00:09:42
one in the list is the one that it
00:09:43
thinks is second most likely and so on
00:09:45
and so forth cool so um we'll hop back
00:09:50
over and so um next we need to upload
00:09:52
some validation data and validation data
00:09:55
um it's going to be in the exact same
00:09:56
format as the training data but
00:09:58
importantly there's no no overlap in the
00:10:00
correct genes between the validation
00:10:02
data set and the training data set and
00:10:04
what that means is that the model can't
00:10:05
cheat it has to um or it can't learn to
00:10:07
just memorize a list of symptoms and
00:10:10
Associate those with the gene it has to
00:10:11
actually generalize from the training
00:10:13
data set to the validation data set
00:10:15
gotcha so I mean where's the
00:10:17
reinforcement part come in you know we
00:10:19
talked about grading uh is that part of
00:10:21
the process here yeah that's a really
00:10:22
good question so grading is done by this
00:10:24
concept of graders that we're we're
00:10:26
introducing here and so um graders are
00:10:28
are really simple what a greater does is
00:10:30
it takes the output from the model and
00:10:32
it takes the correct answer and it
00:10:33
Compares them and it returns a score
00:10:35
between zero and one and so zero means
00:10:37
that the model did not get the answer
00:10:38
correct at all and one means the model
00:10:40
got the answer correct and you can also
00:10:42
give partial credit so it can be
00:10:43
anywhere in that range so for this
00:10:45
specific task we have a grader that
00:10:47
looks like this so it takes the correct
00:10:50
answer um that we happen to know and it
00:10:52
takes the output from the model which is
00:10:54
the list of genes and it produces a
00:10:55
score so in this case you know foxy3 is
00:10:58
the correct answer it was second in the
00:11:00
list of genes and so it gets a score of
00:11:01
like 7 I see so if it had instead said
00:11:05
foxy 3 was first in the list I would
00:11:07
have gotten a grade of one yeah exactly
00:11:09
and then as it gets further and further
00:11:10
along the list the score kind of
00:11:11
gradually decays to zero ah nice makes
00:11:14
sense um but what if I have a task that
00:11:16
isn't you know grading A ranked list do
00:11:18
we have other graders that are more
00:11:20
General yeah yeah so we're supplying
00:11:22
kind of a collection of graders that we
00:11:23
think pretty effectively cover the space
00:11:25
of possible intents that you might have
00:11:27
um you for while doing enforcement fine
00:11:29
tuning and we're always adding more yeah
00:11:31
and eventually we're going to hopefully
00:11:33
let you define your own graders yeah
00:11:35
yeah maybe like upload a python file or
00:11:36
something and do some custom grading
00:11:38
yeah cool so um we've defined our
00:11:41
training data set we've defined our
00:11:42
validation data set let me go ahead and
00:11:44
copy in the grader really quick
00:11:46
um and now openi allows you to set um
00:11:49
you know we allow you to customize these
00:11:51
fine tuning runs by setting hyper
00:11:52
parameters but we set some pretty good
00:11:53
default so I'm just going to go ahead
00:11:54
and click create
00:11:56
here now what this is doing is is um you
00:12:00
know we we've just kicked off a training
00:12:02
job um and so the really cool thing is
00:12:05
that um you bring the data set and you
00:12:08
bring the greater and these are the
00:12:09
places where you really have domain
00:12:11
expertise and where you can really
00:12:12
contribute to this problem and then you
00:12:14
get to leverage the full power of open
00:12:16
AI reinforcement learning algorithms and
00:12:17
our full distributed model training
00:12:19
stack to customize a Frontier Model for
00:12:22
your use case so as a user as a user I
00:12:25
just like to bring my data set and
00:12:26
grader and open takes care of everything
00:12:28
else yeah exactly yeah um so you know
00:12:33
reinforcement F tuning jobs can take
00:12:34
anywhere from a few hours to a few days
00:12:36
to run so we're going to jump over to a
00:12:37
job that I ran earlier this week on the
00:12:39
same data set um just so we can kind of
00:12:40
see the
00:12:41
results so I'll jump over
00:12:44
here um so I have this job that I ran
00:12:46
earlier this week um it completed
00:12:48
successfully It produced a fine two
00:12:49
model for us and there's one thing that
00:12:51
I want to um look at which is um the
00:12:53
validation reward score and so what this
00:12:55
is is the it's the average score from
00:12:57
the greater on the validation data set
00:12:59
and how it changed over the course of
00:13:01
the fine-tuning run and so what we can
00:13:02
see is that the score is going up and as
00:13:05
we said earlier since there's no overlap
00:13:06
in genes between the training data set
00:13:08
and the validation data set it means
00:13:10
that the model really learned to
00:13:11
generalize on our task um it wasn't
00:13:13
simply memorizing a list of symptoms and
00:13:15
mapping those to genes so you know while
00:13:17
this is cool you know the chart goes up
00:13:19
and to the right which is what we like
00:13:20
to see um it'd be nice if we could get a
00:13:22
better feel for how the model has
00:13:24
actually changed during the fine-tuning
00:13:25
process and so you know we'll we'll take
00:13:27
a closer look at that now
00:13:29
all right so we're going to pop over to
00:13:32
the evaluations dashboard which is a
00:13:34
product in our developer platform that
00:13:36
we launched earlier this year um there's
00:13:38
a lot of numbers but don't worry we're
00:13:39
going to go through all of them so I've
00:13:41
set up three different runs here the
00:13:43
first one was uh a run against our 01
00:13:46
model which we released yesterday the
00:13:48
second was against 01 mini which was the
00:13:50
starting point of our fine-tuning job
00:13:53
and then finally uh the reinforcement
00:13:55
fine tuned 01 mini now we looked at the
00:13:59
reward going up and to the right but
00:14:00
what does that actually mean for this
00:14:02
task I've set up three different
00:14:03
evaluations to sort of assess that the
00:14:06
first one is top at one which is how
00:14:08
often is the correct answer the very
00:14:10
first item in the list top at five which
00:14:12
is how often is the correct answer in
00:14:14
the top five elements of the list and
00:14:16
finally top at Max did we at all put the
00:14:19
right answer in our list um and so
00:14:22
looking at top at one we can see that
00:14:23
our starting point ow and mini got 177%
00:14:27
on our data set of about 200
00:14:29
01 got 25% so it's doing better uh but
00:14:33
then our fine-tuned 01 mini got
00:14:36
31% awesome uh I took a screenshot of
00:14:39
this and I put it into chat GPT and
00:14:41
asked it to make me a plot a Christmas
00:14:42
them plot and uh here's a nice
00:14:45
visualization of those nine numbers that
00:14:47
we saw earlier so you can see uh our
00:14:49
starting point 01 mini uh across top at
00:14:52
one top at 5 and top at Max our 01 model
00:14:56
and then finally our best performing
00:14:57
model which is this 01 mini fine tune
00:15:00
here in um dotted red line so looking at
00:15:03
these results what do you think Justin
00:15:05
well I I think this is pretty impressive
00:15:07
performance and and especially the
00:15:08
increase in the in the validation uh
00:15:11
data because that implies that the model
00:15:13
is learning something generally about
00:15:14
how to reason over these kind of data
00:15:15
which is pretty exciting um and and so
00:15:18
an obvious question you might ask is how
00:15:19
is this doing comparing compared to
00:15:21
existing bioinformatics tools and I
00:15:23
don't really have an Apples to Apples
00:15:25
comparison but um because typically in
00:15:27
this kind of experiment you would
00:15:28
provide uh genomic sequencing data and
00:15:30
we haven't included that here um but the
00:15:33
sort of open-ended quering of models
00:15:35
here over incomplete symptom list is is
00:15:37
new and exciting I think great uh so
00:15:41
these are aggregate statistics but let's
00:15:43
look at the actual model responses so
00:15:46
I'm going to pop on over to this data
00:15:48
tab let's filter by the passes uh and so
00:15:52
here is the input that we're giving to
00:15:54
the model so the problem as John
00:15:56
described earlier is to identify genes
00:15:58
that may be responsible for a set of
00:16:00
observed symptoms uh we asked the model
00:16:02
to Output a dictionary containing both a
00:16:05
string that explains you know why did I
00:16:08
pick these genes and of course the genes
00:16:10
themselves in ranked order and then
00:16:12
finally we have the symptom list as well
00:16:15
so this um patient presented with sub
00:16:19
endal nodules
00:16:21
seizure uh yeah and a couple other
00:16:23
things uh we then run our models so this
00:16:26
was our 01 model this this one our
00:16:29
fine-tuned o1 Mini model um we gave it
00:16:32
that input and now the output is uh this
00:16:36
dictionary that we described earlier so
00:16:38
reasoning the combination of sub endal
00:16:41
nodules uh seizure cortical tubulars are
00:16:44
indicative of this complex which is
00:16:46
commonly caused by mutations in these
00:16:48
genes um it lists a couple other
00:16:50
potential ones and then it says
00:16:53
tsc2 is the most likely um candidate and
00:16:57
if we scroll back on over to our answer
00:16:59
we'll see that tsc2 is in fact the
00:17:01
correct answer so that allowed us to get
00:17:04
a pass on top at one top at five and top
00:17:07
at Max so looking at this output um
00:17:11
Justin like is this a useful output for
00:17:13
the model to be giving back yeah
00:17:15
absolutely so it's particularly useful
00:17:17
to see the model's reasoning and that's
00:17:18
a big contribution here and also the r
00:17:21
obviously the rank list of answers so
00:17:22
even if the the the correct answer is
00:17:24
not first you know you can you can look
00:17:26
at all the possibilities um and
00:17:29
it's it's also great to see the
00:17:31
fine-tuning improves the performance of
00:17:33
in the rank list of possible answers so
00:17:35
that that right answer is getting closer
00:17:36
to one so that's gratifying Justin kind
00:17:39
of zooming out a little bit like how
00:17:40
does reinforcement learning shape your
00:17:42
field um can you talk about some Trends
00:17:44
in viy sure so I I think there's a lot
00:17:47
of interest in the research community
00:17:49
and Import in using these models for
00:17:51
these for these kind of tasks and so the
00:17:54
feeling for this particular use case is
00:17:56
that the best solution in in the near
00:17:58
term is probably a hybrid solution
00:18:00
between existing bioinformatic tools and
00:18:02
these uh models like 01 uh and so this
00:18:05
is I think is excellent progressing sort
00:18:07
of characterizing the strengths of these
00:18:08
models and and and also how we can use
00:18:11
uh tools like fine-tuning to improve
00:18:13
performance um and so like I said
00:18:16
there's not really a comparable
00:18:16
Benchmark to compare the two but um it
00:18:19
it's it's definitely progress and and
00:18:21
how we can use these models to kind of
00:18:23
understand disease and then you know in
00:18:25
a larger sense to sort of how we can
00:18:27
incorporate these models into a workflow
00:18:29
that will eventually improve healthcare
00:18:31
for these folks right amazing thank you
00:18:33
Justin so while we've just shown you an
00:18:35
exciting application of reinforcement
00:18:37
fine-tuning in scientific research this
00:18:39
is a general purpose technique we've
00:18:41
seen promising results in data sets from
00:18:44
biocham from AI safety from legal um and
00:18:48
from Health Care as well we can think of
00:18:50
hundreds of more examples or tasks that
00:18:53
we can use this model on but we know
00:18:55
that you can probably think of many more
00:18:57
uh so that's why we're so excited to be
00:18:59
expanding uh our Alpha program today to
00:19:02
enable more people to push the
00:19:04
boundaries of the capabilities of our 01
00:19:06
models on the tasks that matter the most
00:19:08
to them yeah so you know we've been
00:19:10
working with a small group of trusted
00:19:12
Partners to really test out
00:19:13
reinforcement fine tuning and today
00:19:15
we're expanding Alpha access via what
00:19:17
we're calling the reinforcement
00:19:18
fine-tuning research program so um you
00:19:21
know this program is ideal for
00:19:22
organizations who are currently working
00:19:24
on very complex tasks with teams of
00:19:26
experts and who think that they might
00:19:28
benefit from AI assistant on these tasks
00:19:30
so you know if you're interested in
00:19:32
applying for one of these limited spots
00:19:33
you can find a link to the application
00:19:35
in the description of this live stream
00:19:36
and as Mark said ear um earlier you know
00:19:38
we plan on launching this product
00:19:40
reinforcement fine tuning publicly early
00:19:42
next year yeah we're all really truly
00:19:44
excited to see what you do with
00:19:46
reinforcement fine tuning and really
00:19:48
speaking as a researcher there's nothing
00:19:49
that makes us happier than seeing our
00:19:51
models being adapted and used to advance
00:19:53
you know science knowledge in the real
00:19:55
world do you have a joke for us today
00:19:58
well as so happens I do uh as it's
00:20:01
become a tradition I have a Christmas
00:20:03
theme joke so you know we live in San
00:20:05
Francisco self-driving vehicles are all
00:20:07
the rage and actually Santa's been
00:20:09
trying to get in on this too um he's
00:20:11
trying to make a self-driving sleigh but
00:20:13
he for some reason his models just keep
00:20:16
not identifying trees and the Slate is
00:20:18
hitting trees left and right uh do you
00:20:21
guys have any guesses
00:20:23
why
00:20:25
no he didn't he didn't Pine tun his
00:20:29
oh jeez okay all right um please join us
00:20:32
next week we'll have a lot more to share
00:20:33
thank you

タグ

OpenAI
01 Model
Reinforcement Learning
Fine-Tuning
Customization
AI Research
Machine Learning
Healthcare
Legal AI
Genetic Research