What is GPT-5 rumor about?

The rumor suggests that OpenAI has developed GPT-5 but is keeping it internal for strategic reasons.

How does model distillation work?

Model distillation is a process where a powerful model generates data to enhance the performance of a smaller, cheaper model.

Why might OpenAI choose not to release GPT-5?

OpenAI may keep GPT-5 internal to leverage its capabilities for distilling knowledge into more user-friendly models and for cost control.

What did Anthropic do with Claude Opus 3.5?

Anthropic allegedly utilized Claude Opus 3.5 internally to improve the performance of Claude Sonnet 3.6.

How does AI industry competition affect model releases?

The lack of necessity to publicize cutting-edge models can lead to a strategic focus on improving internal models rather than external offerings.

What is the significance of reduced model sizes?

The trend shows that newer models can outperform larger predecessors while being smaller and cheaper, thanks to distillation.

What are the implications of AI companies prioritizing internal models?

Companies may focus on using advanced models internally to drive innovation without releasing them publicly.

How does distillation help in AI development?

Distillation allows companies to improve performance without incurring high costs from running larger models.

What future advancements are anticipated in AI technology?

Further enhancements in AI may stem from internal models training smaller public-facing models, continuing a cycle of improvement.

How does this trend of AI model development impact user access?

As companies become more self-sufficient, public access to the latest advancements could diminish.

This GPT-5 NEWS Could Change EVERYTHING...

00:26:31

https://www.youtube.com/watch?v=I3Wo-23WlQw

Resumen

TLDRIn the video, the speaker elaborates on the speculation surrounding OpenAI's GPT-5, suggesting that the model has been developed but is withheld from public release to maximize internal benefits. They discuss the recent evolution of AI technologies and model distillation processes, where powerful models improve the performance of smaller, less costly models. The impact of AI industry competition, advancements in smaller models, and the strategic decisions behind delaying releases are examined, ultimately raising questions about the future accessibility of cutting-edge AI systems and the trajectory that OpenAI and other companies might take.

Para llevar

🔍 Rumors suggest GPT-5 is ready but unreleased.
🧪 Model distillation enhances performance efficiently.
🤔 OpenAI may prioritize internal AI capabilities over public releases.
💡 Smaller models outperform larger predecessors through distillation.
📉 High operational costs impact AI model availability.
🔮 Future AI may increasingly rely on internal improvements.
🛡️ Public access to advanced AI may diminish over time.

Cronología

00:00:00 - 00:05:00
The video discusses rumors surrounding GPT-5 and its implications for the AI industry. The central claim is that OpenAI may have developed GPT-5 but is keeping it internal due to better return on investment, hinting at undisclosed advancements that could reshape AI applications.
00:05:00 - 00:10:00
The article mentions the mysterious absence of Anthropic's Claude Opus 3.5, which was anticipated as a competitor to GPT-4, suggesting that AI model development may involve internal use rather than public release to optimize performance and reduce costs while maintaining competitive edge.
00:10:00 - 00:15:00
Details on the performance of Claude 3.5 indicate that its results weren't satisfactory enough for public release. Instead, Anthropic might be using the training data generated from this model to enhance its existing models, such as Claude Sonnet 3.6, implying a cycle where models improve incrementally through distillation.
00:15:00 - 00:20:00
The concept of model distillation emerges as a crucial strategy, wherein powerful models are utilized to enhance smaller models' capabilities. This allows companies to reduce operational costs while simultaneously improving performance, which might be a common practice among leading AI labs, including OpenAI.
00:20:00 - 00:26:31
The discussion concludes with speculations that OpenAI may be internally refining GPT-5 without releasing it, potentially due to the high costs of inference and improving performance on smaller models. There is also a suggestion that the trajectory of AI model development may shift away from larger models to more efficient and smaller ones, emphasizing the business strategy behind AI releases.

Mapa mental

Vídeo de preguntas y respuestas

What is GPT-5 rumor about?
The rumor suggests that OpenAI has developed GPT-5 but is keeping it internal for strategic reasons.
How does model distillation work?
Model distillation is a process where a powerful model generates data to enhance the performance of a smaller, cheaper model.
Why might OpenAI choose not to release GPT-5?
OpenAI may keep GPT-5 internal to leverage its capabilities for distilling knowledge into more user-friendly models and for cost control.
What did Anthropic do with Claude Opus 3.5?
Anthropic allegedly utilized Claude Opus 3.5 internally to improve the performance of Claude Sonnet 3.6.
How does AI industry competition affect model releases?
The lack of necessity to publicize cutting-edge models can lead to a strategic focus on improving internal models rather than external offerings.
What is the significance of reduced model sizes?
The trend shows that newer models can outperform larger predecessors while being smaller and cheaper, thanks to distillation.
What are the implications of AI companies prioritizing internal models?
Companies may focus on using advanced models internally to drive innovation without releasing them publicly.
How does distillation help in AI development?
Distillation allows companies to improve performance without incurring high costs from running larger models.
What future advancements are anticipated in AI technology?
Further enhancements in AI may stem from internal models training smaller public-facing models, continuing a cycle of improvement.
How does this trend of AI model development impact user access?
As companies become more self-sufficient, public access to the latest advancements could diminish.

Ver más resúmenes de vídeos

Obtén acceso instantáneo a resúmenes gratuitos de vídeos de YouTube gracias a la IA.

Subtítulos

Desplazamiento automático:

00:00:00
so there has been a rumor floating
00:00:02
around about gbt 5 and in this video
00:00:04
I'll explain the rumor articulated by
00:00:06
Alberto Romero in a long winded article
00:00:09
that dives into the details of where
00:00:11
exactly GPT 5 is and this isn't just a
00:00:14
video on GPT 5 it's more about where
00:00:16
exactly the AI industry is headed as a
00:00:18
whole because there could be something
00:00:20
secret that these AI labs are doing that
00:00:22
you ought to know so you can see right
00:00:24
here the article starts that this rumor
00:00:26
is about you know gb5 and it does change
00:00:28
everything that it isn't an OV
00:00:30
exaggeration it actually does change
00:00:32
everything when I get into the details
00:00:34
because it means that the way that these
00:00:36
models going to be released is going to
00:00:37
be completely different so let's take a
00:00:40
look at what this article says so it
00:00:42
talks about GPT 5 internally and it says
00:00:44
that what if I told you that gbt 5 is
00:00:47
real not just real but already shaping
00:00:49
the world from where you can't see it so
00:00:51
this is the hypothesis okay this is the
00:00:53
entire hypothesis for this entire video
00:00:56
and credit goes to this user but it says
00:00:58
the open AI has already built GPT 5 but
00:01:01
it's kept it internally because the
00:01:03
return on investment is far greater than
00:01:05
if they released it to millions of chat
00:01:07
GPT users and also the ROI they're
00:01:10
getting is not money but something else
00:01:13
as you see the idea is simple enough but
00:01:16
the challenge is connecting the
00:01:17
breadcrumbs that lead up to it and this
00:01:19
article is a deep dive into why I
00:01:20
believe it all adds up so overall this
00:01:23
article is basically say that look
00:01:25
opening I have created GPT 5 and it is a
00:01:28
stunning model like the model is truly
00:01:30
stunning but they've decided not to
00:01:32
release this model because of a reason
00:01:34
that we're going to get into into this
00:01:36
article and that reason is particularly
00:01:39
interesting so let's go ahead and take a
00:01:40
look at this so one of the things they
00:01:42
actually talk about here is they talk
00:01:44
about the mysterious disappearance of
00:01:45
Opus 3.5 says before going into gbt 5 we
00:01:48
have to pay a visit to its distant
00:01:49
cousin also missing an action anthropics
00:01:52
clae Opus 3.5 and as you know the top
00:01:55
three AI labs open eye Google and deep
00:01:57
mind and anthropic offer a range of
00:01:59
models designed to span the price
00:02:01
latency versus performance Spectrum now
00:02:03
open AI provides options like GPT 40 gp2
00:02:06
40 mini as well as 01 and 01 mini and
00:02:09
Google Deep Mind offers Gemini Ultra Pro
00:02:12
and Flash while in thoric has Claude
00:02:14
Opus son and Hau and basically talking
00:02:16
about how with all of these different
00:02:17
models you're trying to cater to as many
00:02:19
customers as possible some prioritize
00:02:21
top T performance no matter the cost
00:02:23
While others seek affordable Solutions
00:02:25
which are you know good enough now
00:02:26
they're talking about the mysterious
00:02:27
disappearance of this model the Opus 3.5
00:02:30
model that was meant to be coming in the
00:02:32
series you see right here it says
00:02:33
something strange happened in October
00:02:35
2024 everyone was expecting anthropic to
00:02:37
announced claw 3.5 Opus as a response to
00:02:40
GPT 40 instead they released an updated
00:02:43
version of claw 3.5 Sonic that people
00:02:45
started a course on in 3.6 now Opus 3.5
00:02:49
was nowhere to be found remember this
00:02:51
was supposed to be like the GPT 5 type
00:02:53
model and it says this model was nowh to
00:02:55
be found seemingly leaving anthropic
00:02:56
without a direct competitor to GPT 40
00:02:59
but when we take a look at actually what
00:03:00
happened with Opa 3.5 it's rather
00:03:02
fascinating so they talk about how you
00:03:03
know um previously in October he wrote
00:03:06
in his nuclear weat post that there are
00:03:08
rumors that Sonic 3.6 which is you know
00:03:10
a really good model is just you know a
00:03:12
checkpoint of a failed training run on
00:03:14
the much anticipated Opus 3.5 and that
00:03:16
was because at the time Claude Opus 3.5
00:03:19
according to the web pages a lot of
00:03:21
people did think at the time that it was
00:03:23
scrapped because previously they were
00:03:24
stating that look claw 3.5 Opus is
00:03:26
coming it's coming and then they kind of
00:03:28
just removed the fact that the model was
00:03:30
going to be coming from a lot of
00:03:31
different web pages and recently there
00:03:33
was actually an interview with the CEO
00:03:35
of anthropic and this stuff about Opus
00:03:37
3.5 is actually all going to relate to
00:03:38
GPT 5 in a moment but the reason we're
00:03:41
talking about Opus 3.5 is because that
00:03:43
is the model that is supposed to be a
00:03:45
gp5 type model so just prefacing what
00:03:48
we're about to go into here is that
00:03:49
we're looking at how Opa 3.5 which is
00:03:52
supposed to be the GPT 5 type competitor
00:03:54
how anthropic have done that and how
00:03:57
revealing what anthropic have done
00:03:58
reveals kind of what done so they talk
00:04:01
about how you know Dario amade basically
00:04:02
says that you know on Opus 3.5 not
00:04:05
giving you an exact date but as far we
00:04:06
as we know the plan is still to have a
00:04:08
claw 3.5 Opus cautious yet ambiguous yet
00:04:12
valid well ridiculous timeline question
00:04:14
uh when is cloud Opus uh 3.5 coming up
00:04:18
uh not giving you an exact date uh but
00:04:20
you know they're they uh you know as far
00:04:23
as we know the plan is still to have a
00:04:24
Claude 3.5 Opus so right here you can
00:04:27
see that this is where we talk about the
00:04:29
fact that you know in Bloomberg there
00:04:30
were many articles at the time that were
00:04:32
interestingly enough talking about you
00:04:34
know open AI Google and anthropic are
00:04:36
struggling to build more advanced Ai and
00:04:38
this was going from the jump from GPT 4
00:04:39
to a GPT 5 type model and this article
00:04:41
is really fascinating cuz I remember at
00:04:43
the time I was reading it and it was
00:04:44
like Gemini 2 was you know disappointing
00:04:46
they were saying the GPT 5 was
00:04:47
disappointing and the OPA 3.5 they they
00:04:50
all performed essentially better but not
00:04:52
better enough so he talks about here you
00:04:54
know Bloomberg wrote an article that you
00:04:56
know after training it at anthropic
00:04:57
found that 3.5 Opus which is upt 5
00:05:00
competitor performs better on
00:05:02
evaluations than the older version but
00:05:03
not buy as much as it should given the
00:05:05
size of the model and how costly it was
00:05:07
to build and run and it seems that Dario
00:05:08
refrained from giving a date because
00:05:10
although the Opus 3.5 training run
00:05:11
hadn't failed its results were
00:05:13
underwhelming and the basically what
00:05:15
they actually stating about this was
00:05:16
that a lot of people missed was the fact
00:05:18
that these next iterational models gbt 5
00:05:20
Gemini 2 those models were you know good
00:05:23
but the only bad thing about them was
00:05:25
that they were just quite expensive for
00:05:27
what they were and TR trust me you want
00:05:28
to remember this fact because it you
00:05:30
know it ties into you know a part of why
00:05:32
they didn't release these models now
00:05:33
here's where we get some Insider
00:05:35
information SL some leaks and it says on
00:05:37
December 11th semiconductor expert Dylan
00:05:39
Patel and his semi analysis team
00:05:41
delivered the final plot twist
00:05:43
presenting an explanation that weaves
00:05:45
all the data points into a coherent
00:05:46
story says anthropic finished training
00:05:49
claw 3.5 Opus and it did perform well
00:05:52
with it scaling appropriately yet for
00:05:53
some reason anthropic didn't release it
00:05:55
so anthropic managed to get their GPT 5
00:05:58
model performing well doing very well
00:06:00
but apparently they didn't release it
00:06:01
and he says this is because instead of
00:06:03
releasing it publicly and this is where
00:06:04
we get to the crazy theories that
00:06:06
anthropic used claw 3.5 Opus to generate
00:06:09
synthetic data and for reward modeling
00:06:12
to improve Claude 3.5 Sonic
00:06:15
significantly alongside user data now
00:06:18
that's a crazy crazy statement but this
00:06:20
is from a reputable Source within the
00:06:22
industry there's previously spoken about
00:06:24
things before they've been released and
00:06:25
later it's come out that that
00:06:27
information has been true and this is
00:06:31
basically one of the key theories
00:06:33
driving behind why many people think
00:06:35
that GPT 5 is Alive and Well internally
00:06:38
but isn't currently yet being released
00:06:40
to the public and I mean it kind of
00:06:42
makes sense when we actually take a look
00:06:43
at more things because we all know just
00:06:45
how good claw 3.6 Sonet is or claw 3.5
00:06:48
Sonet is whatever you want to call it
00:06:49
but many people can't figure out why is
00:06:51
claw 3.5 Sonet so good or claw 3.6 on it
00:06:54
so good but they haven't released Claude
00:06:56
Opus just yet and so you see here it
00:06:57
says in short anthropic did TR CLA 3.5
00:07:00
Opus they dropped the name because it
00:07:02
wasn't good enough and Dario confident a
00:07:04
different training run could improve the
00:07:05
results avoided you know giving a date
00:07:08
and blueberg confirmed that the results
00:07:10
were better than existing models but not
00:07:12
enough to justify the inference cost
00:07:13
which is of course how much these models
00:07:15
cost to run when people are using the
00:07:17
models and it says here that Dylan and
00:07:19
his team uncovered the link between the
00:07:21
mysterious Sonet 3.6 and the missing
00:07:23
Opus 3.5 the latter was being used
00:07:25
internally to generate synthetic data to
00:07:27
boost the former's performance and so
00:07:29
this is where we have this kind of thing
00:07:31
right here and trust me it's going to
00:07:32
link to GPT 5 in a moment but we can see
00:07:34
here that this GPT file star model from
00:07:36
anthropic the one of the biggest Rivals
00:07:37
to you know one of the leading Labs we
00:07:39
can see that that model right there is
00:07:41
you know going down into Sonic 3.6 and
00:07:44
this is basically where um you know
00:07:45
they're distilling the model so
00:07:47
essentially you could say that they
00:07:48
internally built a really really smart
00:07:50
AI system but they probably haven't
00:07:52
released it to the public just yet and
00:07:53
we can see right here that you know
00:07:54
publicly Sonic 3.6 that's what's going
00:07:56
on but um yeah it's pretty crazy now it
00:07:59
gets crazier here so this is where they
00:08:00
start to explain model distillation this
00:08:02
is what they say better but also smaller
00:08:03
and cheaper the process of using a
00:08:05
powerful expensive model to generate
00:08:07
data that enhances the performance of a
00:08:09
slightly less capable model is known as
00:08:11
distillation and it's a common practice
00:08:13
where this technique allows a labs to
00:08:15
use their smaller models beyond what
00:08:17
could be achieved through additional
00:08:18
pre-training alone so one way that a lot
00:08:20
of these companies are managing to get
00:08:22
these smaller models to be better and
00:08:23
better is through model distillation so
00:08:25
you have a teacher model then you
00:08:27
distill that knowledge and you transfer
00:08:29
to a student model and this is something
00:08:31
that just performs a lot better than the
00:08:33
standard you know pre-training and that
00:08:35
entire Paradigm that we used to do so
00:08:37
this is something that has been you know
00:08:38
going pretty well for AI companies and
00:08:40
it's going to be something that we
00:08:41
continue to do now you can see right
00:08:43
here that it says that there are various
00:08:44
approaches to distillation but we're not
00:08:46
getting into them what you need to
00:08:47
remember is that a strong model acting
00:08:49
as a teacher turns student models from
00:08:51
small cheap and fast and weak into small
00:08:54
cheap fast and Powerful so distillation
00:08:56
turns a strong model into a gold mine
00:08:58
and Dylan explains why it made sense for
00:09:00
anthropic to do this with Opus 3.5 and
00:09:02
Sonet 3.6 so it talks about how the
00:09:04
inference costs of the new Sonet versus
00:09:06
the old Sonic they didn't change
00:09:08
drastically but the model's performance
00:09:10
did why release Opus 3.5 on a cost basis
00:09:13
when it does not make economic sense to
00:09:14
do so relative to releasing a 3.5 Sonic
00:09:17
with further Pro training from said 3.5
00:09:20
Opus basically saying that look there's
00:09:22
no point us going ahead and releasing
00:09:23
Opa 3.5 when it's so expensive and
00:09:25
costly to run why don't we just you know
00:09:27
train SL distill a lot of the
00:09:29
capabilities from this Opus model which
00:09:31
is so amazing into a 3.6 Sonic model
00:09:34
that they would love anyways so this is
00:09:36
super super fascinating here and of
00:09:38
course this is like what I said already
00:09:40
anthropic chose not to release this you
00:09:41
know not because of the poor results but
00:09:43
because it's just more valuable
00:09:44
internally for them to use so they
00:09:46
basically use this AI to train the next
00:09:48
set of AIS and this is why this is kind
00:09:50
of changing everything because a lot of
00:09:51
people have speculated for quite some
00:09:53
time that this is going to be the moment
00:09:55
where these AI systems are going to just
00:09:58
keep getting better and better better
00:09:59
because you use one to train the next
00:10:01
version and then use that version to
00:10:02
train next version and then you know so
00:10:04
far you're just going to you know
00:10:06
continue to expand in terms of the
00:10:07
intelligence so what's crazy about this
00:10:09
is that you know he also says that you
00:10:11
know Dylan Guy the one that we're
00:10:12
referencing here he says that that is
00:10:14
why the open source Community caught up
00:10:16
to GPT 4 so quickly they were basically
00:10:18
taking the gold straight from opening
00:10:19
eyes mind so people were essentially
00:10:21
distilling GPT 4's capabilities into
00:10:24
smaller models and that's you know
00:10:25
pretty interesting because GPT 40 is a
00:10:28
smaller model it's much faster and it's
00:10:29
much better but it's also really really
00:10:31
useful and that's essentially what these
00:10:33
other companies are doing as well
00:10:34
they're using those capabilities and
00:10:36
that's how they're getting these open
00:10:37
source projects up to GPT 4's level so
00:10:40
you can see here that you know one of
00:10:41
the things that we all know is that
00:10:42
Sonic 3.6 wasn't just good it was stated
00:10:45
the art good and better than GPT 40 and
00:10:47
of course anthropics mid-tier model
00:10:49
outperformed opening eyes flank ship
00:10:50
models thanks to distillation from Opus
00:10:53
3.5 and of course other reasons as well
00:10:55
so that was something that was like
00:10:56
pretty crazy and then this is where we
00:10:58
talk about you know bigger and better is
00:11:00
no longer the Paradigm so once these top
00:11:02
labs are you know no longer talking
00:11:03
about higher parameter counts being you
00:11:05
know better and the last time we got
00:11:07
knowledge about the parameter counts we
00:11:08
actually knew that GPT 3.5 was 175
00:11:11
billion parameters and GPT 4 there were
00:11:13
rumors saying that GPT 4 was 1.8
00:11:15
trillion parameters in a mixture of
00:11:17
experts but the craziest thing about
00:11:18
this is that they actually now speculate
00:11:21
that the future models like gpc5 and
00:11:22
Sonet 3.6 or like whatever distilled
00:11:24
models that we're getting like GPT 4 and
00:11:26
Sonic 3.6 are significantly smaller than
00:11:28
GPT 4 despite them both being better
00:11:30
than gbt 4 across both benchmarks so gbt
00:11:33
4 is 1.8 trillion parameters that's what
00:11:35
I'm trying to explain to you guys gbt 4
00:11:36
1.8 trillion parameters great model at
00:11:38
the time of release I think around two
00:11:40
years from now or one year and a half
00:11:41
ago and at the time that model was
00:11:43
considered crazy steady up but now we
00:11:45
got smaller models like gbt 40 and Son
00:11:48
3.6 that are significantly smaller than
00:11:50
this large model but they are better
00:11:52
because of distillation and that just
00:11:54
means that the knowledge in them is a
00:11:55
lot more efficient and the models are a
00:11:58
lot more smarter so you see right here
00:11:59
that it says you know current models
00:12:00
such as the original gbt 40 are probably
00:12:03
an order of magnitude smaller than gbt 4
00:12:06
with 40 having around 200 billion and
00:12:08
3.5 Sonet having around 400 billion
00:12:09
parameters though this estimate could be
00:12:11
off by a factor of two given the rough
00:12:13
we I've arrived at it and the point here
00:12:14
is that like the thing that you want to
00:12:16
pay attention to is that both these
00:12:18
companies are following a similar
00:12:19
trajectory their latest models are not
00:12:21
only better but also smaller and cheaper
00:12:23
than the previous generation we know how
00:12:25
anthropic pulled it off by distilling
00:12:27
Opus 3.5 into Sonet 3.6 but this is
00:12:30
where they get into something
00:12:31
interesting so what did open AI do
00:12:33
because of anthropic we know that they
00:12:35
trained op 3.5 and put the knowledge of
00:12:37
that model distilled it down into Sonic
00:12:39
3.6 what on Earth did opening I do and
00:12:42
this is where the GPT 5 Rumor comes into
00:12:44
account so we can take a look at this
00:12:45
diagram and basically shows that you
00:12:46
know we have the teacher models and we
00:12:49
have the distillation models and it's
00:12:50
basically saying that maybe opening eye
00:12:52
has a very secret internal model it
00:12:54
might be gbt 5 it might be a much
00:12:56
smarter model considering you know all
00:12:58
the things we've heard recently but we
00:12:59
can see here that these models are going
00:13:01
to be models that are very smart but
00:13:02
they're only used for distillation to
00:13:05
public models that are better smaller
00:13:06
and cheaper to actually run and this is
00:13:09
pretty crazy when you think about it so
00:13:11
internally at anthropic it's quite
00:13:12
likely they have Opus 3.5 which would be
00:13:14
incredible to use but they managed to
00:13:16
distill it down into Sonic 3.6 and Sonic
00:13:18
3.6 as you already know is pretty
00:13:20
credible and this is where they're
00:13:21
saying look with open eye it's
00:13:23
potentially true that we have a GPD 5
00:13:26
type model that's distilling all of this
00:13:28
information that the these you know
00:13:29
models that we're getting now these much
00:13:31
smaller ones that we're currently using
00:13:32
and I do know that there was some kind
00:13:33
of distillation going on with the
00:13:35
strawberry model so opening ey are
00:13:36
definitely familiar with that
00:13:37
distillation process it's quite like how
00:13:39
we had the 01 model and we had the 01
00:13:41
previews models and how effective those
00:13:43
ones were now when we get onto chapter 3
00:13:45
it says that one might assume the
00:13:46
anthropic distillation approach was
00:13:47
driven by unique circumstances an
00:13:49
underwhelming training run for Opus 3.5
00:13:52
but that is something that all of these
00:13:54
companies have experienced and the key
00:13:55
thing about this is that like the news
00:13:57
kind of flipped on everyone's head
00:13:58
because everyone took the wrong thing
00:14:00
away from the news they said they had
00:14:02
subpar training ones but subpar it
00:14:04
doesn't mean that it was worse because
00:14:06
think about like this okay like you can
00:14:08
have something that subpar but it
00:14:09
doesn't mean that it was worse than you
00:14:10
did before it just means that it didn't
00:14:11
live up to your expectations like
00:14:13
imagine you know you're expecting to win
00:14:14
$10,000 in a competition and you only
00:14:17
won $11,000 more than you usually win
00:14:19
like that's probably subpar but it
00:14:21
doesn't mean that it wasn't an
00:14:22
improvement and you have to remember
00:14:23
that like with intelligence every inch
00:14:25
that you gain unlocks a whole new host
00:14:27
of capability so they say say that the
00:14:29
causes for this don't matter to us
00:14:31
diminishing returns for lack of data
00:14:32
whatever the case is you know it doesn't
00:14:34
really matter because all of these
00:14:35
companies are going through at the same
00:14:36
time and basically this is why they talk
00:14:38
about you know one of the key things for
00:14:39
these companies that most people don't
00:14:40
understand is that you know if 300
00:14:42
million people are using your product
00:14:43
weekly the operational expenditures can
00:14:45
you know suddenly kill your company
00:14:47
which is really important like costs
00:14:49
matter costs really do matter for these
00:14:51
companies because there are so many
00:14:52
people using them there's so many
00:14:53
servers and you know this is something
00:14:55
that continually is expanding week on
00:14:56
week so it's pretty hard to keep up with
00:14:59
said demand and so this is why they talk
00:15:00
about how distillation was really good
00:15:02
because whatever drove anthropic to
00:15:03
distill Sonet 3.6 to Opus 3.5 is
00:15:06
affecting openingi several times over
00:15:09
and talks about you know it basically
00:15:10
just talks about here how distillation
00:15:11
works because it Bridges these two
00:15:13
Universal challenges into an advantage
00:15:15
you solve the inference cost Problem by
00:15:16
serving people a smaller model and avoid
00:15:18
the public backlash for underwelming
00:15:19
performance by not releasing a larger
00:15:21
one now of course as most of you guys
00:15:23
might know one of the things that you
00:15:25
can't do anymore is of course
00:15:26
overtraining so these AI Labs have
00:15:28
actually exhausted all the high quality
00:15:29
data sources for pre-training and this
00:15:31
is actually something that Elon Musk and
00:15:32
Elia satk have admitted in recent weeks
00:15:35
he says we're back at distillation I
00:15:37
think that both GPT 40 and Claw 3.5
00:15:39
Sonic have been distilled down from
00:15:41
larger models so this is something that
00:15:42
you know they both actually talk about
00:15:44
as a realistic things to the point where
00:15:46
like they're saying that the way how we
00:15:48
now get to a next level of model is that
00:15:50
we can't just put more and more data in
00:15:51
new Innovations are needed and the basic
00:15:53
thing that look distillation is probably
00:15:55
the only way that these models are going
00:15:57
to be getting better so you can see here
00:15:58
it says every piece of the puzzle so far
00:16:00
suggests that open ey is doing what
00:16:01
anthropic did with Opus 3.5 train the
00:16:04
model and hide the model in the same way
00:16:06
through distillation and for the same
00:16:08
reasons because of course there are poor
00:16:09
results in terms of the cost and that's
00:16:12
quite the discovery because Opus 3.5 is
00:16:14
still hidden but we have to think about
00:16:16
it where is opening eyes and nagalas
00:16:18
model is it hiding in the company's
00:16:19
basement care to venture a name and this
00:16:22
is why open ey are currently distilling
00:16:23
that model now what's really interesting
00:16:25
as well is that they talk about how he
00:16:26
who blazes the trail must clear the pass
00:16:28
so he he said I started this by
00:16:30
analyzing anthropics Opus 3.5 story
00:16:32
because it's the one where we have more
00:16:33
information then I traced a bridge to
00:16:35
open ey with the concept of distillation
00:16:37
and explained why the underlying forces
00:16:39
pushing anthropic are also pushing open
00:16:41
ey but there's a new obstacle in theory
00:16:44
because openi is the Pioneer they might
00:16:46
be facing obstacles that like anthropic
00:16:48
simply haven't found yet because if
00:16:49
you're innovating you're going to you
00:16:51
know face problems that nobody else has
00:16:52
seen yet so you need to clear that path
00:16:54
and it says here that one obstacle is
00:16:55
the hardware requirements to train GPT 5
00:16:58
so at 3 2.6 is comparable to GPT 40 but
00:17:01
it was released with a 5mon lag we
00:17:03
should assume GPT 5 is on another level
00:17:05
more powerful and bigger also more
00:17:07
expensive not only to inference but also
00:17:09
to train we could be talking about a
00:17:11
half billion dollar training run would
00:17:13
it even be possible to do such a thing
00:17:15
with current hardware and yes that would
00:17:17
be possible but the crazy thing about
00:17:19
this is that you know you wouldn't be
00:17:20
able to have inflence over that model so
00:17:22
basically what they stating here is that
00:17:23
these companies are probably still
00:17:25
scaling the models internally and doing
00:17:27
insane training runs but the only way
00:17:29
that they can actually you know provide
00:17:31
the inference you know to get these
00:17:32
models out into the public is to distill
00:17:34
those capabilities down into a smaller
00:17:36
model and it says you know in principle
00:17:38
our Cent Hardware is good enough to
00:17:39
serve models much bigger than GPT 4 for
00:17:42
example a 50 times scaled up version of
00:17:44
GPT 4 having around 100 trillion
00:17:46
parameters could probably be served at
00:17:49
$3,000 per million token and 10 to 20
00:17:51
tokens per second of output speed
00:17:53
however for this to be viable those big
00:17:55
models would have to unlock a lot of
00:17:56
economic value for the customers using
00:17:58
them so so of course this is the kind of
00:18:00
reason why they don't release it and
00:18:01
this is super interesting because we
00:18:02
know that these companies are always
00:18:04
currently struggling for inference CU
00:18:05
they're trying to do a lot of research
00:18:06
and all these kind of things but it will
00:18:08
be interesting to see what is actually
00:18:09
going on behind the scenes here because
00:18:11
if they are training up a model that is
00:18:12
that big then they could do a lot of
00:18:15
things with that but the only thing I
00:18:16
would say that this article doesn't
00:18:17
cover at the moment is the fact that gbt
00:18:19
40 I remember reading a few posts about
00:18:21
this and I do remember reading that gbt
00:18:23
4 is a model that is an omn model that
00:18:26
built from the ground up to be
00:18:27
multimodel in and multimodo out so I'm
00:18:30
not sure like it was just an llm but
00:18:32
maybe there are more details on that but
00:18:34
I do remember reading about GPT 40 being
00:18:36
like this Omni model so it was trained
00:18:38
on audio in audio out and I do remember
00:18:39
like reading the you know research paper
00:18:41
the entire full thing where I did you
00:18:43
know 30 40 minute you know details going
00:18:45
into it where you can actually see that
00:18:46
the model is capable of a lot of
00:18:48
different things so I do think that if
00:18:50
they do have a gbt 5 type model I don't
00:18:52
think they've just D it down into gbt 40
00:18:54
yet I think that model is definitely
00:18:55
going to be coming in the future and
00:18:56
they talk about you know spending that
00:18:58
kind of inference money is not even just
00:18:59
aable for Microsoft Google or Amazon
00:19:01
because of course they need to unlock a
00:19:03
lot of economic value if they plan to
00:19:04
serve this several trillion parameter
00:19:06
model to the public so they don't so
00:19:08
they train it they realize it performs
00:19:09
better than their current offerings but
00:19:11
they have to accept it as it hasn't
00:19:12
Advanced enough to justify the enormous
00:19:15
cost of keeping it running and that's
00:19:16
essentially what the Wall Street you
00:19:17
know Journal said on gbt 5 a month ago
00:19:20
and what Bloomberg said about Opus 3.5
00:19:22
now other the thing that they said if
00:19:23
opening ey were hypothetically
00:19:24
withholding GPT 5 under the preex status
00:19:26
not already they would achieve one more
00:19:28
thing besides cost control and
00:19:29
preventing public backlash so if openi
00:19:31
were hypothetically withholding GPT 5
00:19:33
under the pretext that it's not ready
00:19:35
they would achieve one more thing
00:19:36
besides the cost control and preventing
00:19:38
the public backlash that actually
00:19:40
sidestep the need to declare whether or
00:19:42
not it meets the threshold for being
00:19:43
categorized as AI as you know they have
00:19:45
a contract with Microsoft that says you
00:19:47
know AGI is a system that can generate
00:19:49
at least $100 in profits maybe Microsoft
00:19:52
would say that if people are able to
00:19:53
build rappers out of that and that's
00:19:55
able to get them to 100 billion maybe
00:19:57
they wouldn't mind triggering the a
00:19:58
clause and parting ways with Microsoft
00:20:01
so you know potentially if they were
00:20:02
looking at 100 billion in annual
00:20:04
recurring revenue from gbt 5 maybe they
00:20:07
wouldn't care now this is where the
00:20:09
theory starts to evolve into something
00:20:10
really crazy and it starts to finalize
00:20:12
so this is where the theory starts to
00:20:14
basically say that look they might not
00:20:15
need us so even if that were true no
00:20:17
skeptic has stopped to think that openi
00:20:19
may have a better internal use case than
00:20:21
whatever they'd get from it externally
00:20:23
there's vast differences between
00:20:24
creating an excellent model and creating
00:20:26
an excellent model that can be served
00:20:27
cheaply to 300 million people if you
00:20:29
can't you don't but also if you don't
00:20:32
need to you don't now it says here and
00:20:33
this is crazy okay that they were giving
00:20:35
us access to their best model because
00:20:37
they needed our data but not so much
00:20:39
anymore they're not chasing our money
00:20:40
either that's Microsoft but not them
00:20:43
they want a AGI and then they want ASI
00:20:45
and they want a legacy they're basically
00:20:47
stating that look before they needed to
00:20:48
train on user data and they needed to
00:20:50
figure out where to scale the model but
00:20:52
now that they have gbt 5 and whatever
00:20:53
internal models they're using to distill
00:20:55
down the knowledge into simple products
00:20:57
that we can use and get a decent amount
00:20:59
of value from on a day-to-day basis they
00:21:01
don't really need to provide us with the
00:21:02
Cutting Edge pieces of Technology
00:21:04
anymore simply because they don't need
00:21:06
to serve them anymore they just need to
00:21:08
use them themselves in order to develop
00:21:09
better products and better technology
00:21:11
and that's kind of like saying that you
00:21:13
know when open AI develops their own
00:21:15
internal AGI systems then they're just
00:21:16
going to use that themselves in order to
00:21:18
actually make money so this is where you
00:21:20
can see here that this is what they're
00:21:21
talking about internally their private
00:21:23
models are Opus 3.5 and privately they
00:21:25
have gbt 5 and they distill those
00:21:27
capabilities into these models I'm not
00:21:29
sure about the distilling into gbt 40
00:21:32
but maybe it's definitely being used to
00:21:34
help with some of the reinforcement
00:21:35
learning for future models and some of
00:21:37
the synthetic data generation and I have
00:21:39
heard that that is the case with 01/03
00:21:42
and you can see right here it says we're
00:21:43
nearing the end I believe i' laid out
00:21:45
enough arguments to make a solid case
00:21:47
open ey likely has gbt 5 working
00:21:49
internally just as anthropic does with
00:21:51
Opus 3.5 but it's quite plausible that
00:21:53
openi never releases GPT 5 at all the
00:21:56
public now measures performance again 01
00:21:58
sl3 Not Just gbt 4 or Claude sonit 3.5
00:22:02
now with the test Time new scaling laws
00:22:04
the bar for gbt 5 to clear keeps Rising
00:22:07
how could they ever release a GPT 5 that
00:22:09
truly outshines 01 and 03 and the coming
00:22:12
O Series models at the pace they're
00:22:14
producing them besides that they no
00:22:16
longer need our money or our data
00:22:18
anymore basically saying that look when
00:22:19
we take a look at how the fact that
00:22:21
these 01 series models are just so
00:22:22
incredible when it comes to the raw
00:22:24
capabilities of reasoning why on Earth
00:22:26
would they release a GPT 5 level model
00:22:27
at all when those smaller models that
00:22:29
we're currently getting with gbt 4 type
00:22:31
systems are just a lot more expensive
00:22:33
for incremental gains so you can see
00:22:34
right here it says training new base
00:22:36
models like gbt 5 gbt 6 and Beyond will
00:22:38
always make sense for op ey internally
00:22:40
but not necessarily as products that
00:22:42
part might be over the only goal that
00:22:45
matters to them from now on is to keep
00:22:47
generating better data for the next
00:22:49
generation of models from here on the
00:22:52
base models May operate in the
00:22:53
background empowering other models to
00:22:55
achieve Feats that they couldn't on
00:22:56
their own like an old hermit pass down
00:22:58
wisdom from a secret Mountain cave
00:23:00
except the that cave is a massive Data
00:23:02
Center and whether or not we meet him or
00:23:04
not is you know something we're going to
00:23:05
have to see so it's quite likely that
00:23:07
maybe we're going to have these internal
00:23:09
models gbt 6 gb7 gb5 producing the
00:23:12
synthetic data for these future models
00:23:14
to be trained on it could definitely be
00:23:15
the case considering the fact that it's
00:23:17
quite likely that we might not get these
00:23:19
models and so what about if you know gbt
00:23:21
5 suddenly gets released they basically
00:23:22
say that even if gbt 5 is eventually
00:23:24
released opening eye and the Tropic have
00:23:26
already initiated the operation of curse
00:23:28
of self-improvement with humans in the
00:23:30
loop and it doesn't really matter what
00:23:32
they give us publicly they're going to
00:23:33
be pulling further and further ahead
00:23:35
like the universe expanding so fast that
00:23:37
distant galaxies can no longer reach us
00:23:39
and that's probably how they jumped from
00:23:41
01 to 03 in barely 3 months and that's
00:23:44
how they're going to jump to 04 to 05 so
00:23:47
it's probably why they've been so
00:23:48
excited on social media because they've
00:23:50
implemented a new way to scale
00:23:53
incredibly so you can see right here
00:23:54
they actually talk about something that
00:23:55
I actually spoke about quite a long time
00:23:57
before like ages ago in I first launched
00:23:58
my ever AI Community I spoke up about
00:24:00
the fact that you know even if AGI does
00:24:02
arrive we probably won't get access to
00:24:04
it because the economic value for the
00:24:05
average person just simply doesn't make
00:24:07
sense and you can see right here that
00:24:08
they state did you really think
00:24:09
approaching AGI would mean gaining
00:24:12
access to increasingly powerful AI at
00:24:13
your fingertips that they'd release
00:24:15
every advancement for us to use surely
00:24:17
you don't believe that they meant it
00:24:18
when they said their models would push
00:24:20
them too far ahead for anyone else to
00:24:22
catch up and each new generation model
00:24:24
is an engine of escape velocity from the
00:24:26
stratosphere they're already w goodbye
00:24:28
and they're basically saying that look
00:24:30
with every time that they make a new
00:24:31
model it's going to become harder and
00:24:33
harder to catch up to open a ey because
00:24:34
they have something else that can
00:24:36
generate synthetic data and that can
00:24:38
also help them further the entire cycle
00:24:40
of increasing intelligence within those
00:24:42
models now recently there was also this
00:24:43
very cryptic tweet that has been going
00:24:45
viral on Twitter and it says just got
00:24:47
reading to some info what's happening
00:24:48
globally happening internally at openi
00:24:51
and holy mother of God I don't even know
00:24:53
how to express my feelings without
00:24:54
sounding like hype but I don't know what
00:24:57
to say but I will share this the
00:24:59
innovators are coming the problem is we
00:25:01
don't know how they got there now I will
00:25:02
say this isn't like a Jimmy apples kind
00:25:04
of post I haven't really seen any people
00:25:05
coating this statement but not just this
00:25:08
person has been stating this thing I've
00:25:09
been seeing time and time again from
00:25:10
people at open AI stating crazy things
00:25:12
about super intelligence open AI on
00:25:14
their blog have said that you know
00:25:15
they're now chasing artificial super
00:25:17
intelligence instead of AGI and they
00:25:18
know exactly how to get to AGI so all of
00:25:20
these statements coming around the same
00:25:22
time at this new paradigm isn't honestly
00:25:24
surprising and as I was making this
00:25:25
video there was actually a tweet about
00:25:27
GPT 5 you can see someone said can you
00:25:29
comment something about GPT 5 we know
00:25:31
you won't be able to say anything just
00:25:33
anything at all and he respond saying
00:25:35
what would you like to know and chubby
00:25:37
comments back saying when any time in
00:25:39
terms of the estimate of time arrival
00:25:41
and of course the performance how much
00:25:42
better is this going to be than gbt 40
00:25:44
and will this GPT series merge with the
00:25:46
O Series and he says he's still figuring
00:25:48
out when the estimated type of arrival
00:25:49
is and of course the performance and in
00:25:51
2025 they actually talk about merging
00:25:53
the 01 series and the GPT Series so
00:25:55
overall still a very vague response not
00:25:57
much to for there but it definitely does
00:26:00
make sense considering the fact that
00:26:02
they did create these models they were
00:26:03
going to make them anyways and of course
00:26:05
we know that they haven't released them
00:26:07
so it's quite likely that they are using
00:26:08
them internally to generate data and do
00:26:10
many other things just imagine a version
00:26:12
of Claude 3.6 that's even better than it
00:26:14
is now imagine what they could be using
00:26:16
that for and we've seen literally how
00:26:18
much Claude 3.6 Sonic has changed entire
00:26:21
Industries in terms of like cursor and
00:26:23
coding and what people are able to do
00:26:24
with that so it's super super
00:26:25
interesting to see where things go from
00:26:27
here with that being said hopefully you
00:26:29
guys enjoyed this video and I'll see you
00:26:30
in the next one

Etiquetas

GPT-5
OpenAI
AI development
model distillation
Anthropic
Claude Opus 3.5
AI industry
internal models
technology trends
cutting-edge AI