This GPT-5 NEWS Could Change EVERYTHING...
概要
TLDRIn the video, the speaker elaborates on the speculation surrounding OpenAI's GPT-5, suggesting that the model has been developed but is withheld from public release to maximize internal benefits. They discuss the recent evolution of AI technologies and model distillation processes, where powerful models improve the performance of smaller, less costly models. The impact of AI industry competition, advancements in smaller models, and the strategic decisions behind delaying releases are examined, ultimately raising questions about the future accessibility of cutting-edge AI systems and the trajectory that OpenAI and other companies might take.
収穫
- 🔍 Rumors suggest GPT-5 is ready but unreleased.
- 🧪 Model distillation enhances performance efficiently.
- 🤔 OpenAI may prioritize internal AI capabilities over public releases.
- 💡 Smaller models outperform larger predecessors through distillation.
- 📉 High operational costs impact AI model availability.
- 🔮 Future AI may increasingly rely on internal improvements.
- 🛡️ Public access to advanced AI may diminish over time.
タイムライン
- 00:00:00 - 00:05:00
The video discusses rumors surrounding GPT-5 and its implications for the AI industry. The central claim is that OpenAI may have developed GPT-5 but is keeping it internal due to better return on investment, hinting at undisclosed advancements that could reshape AI applications.
- 00:05:00 - 00:10:00
The article mentions the mysterious absence of Anthropic's Claude Opus 3.5, which was anticipated as a competitor to GPT-4, suggesting that AI model development may involve internal use rather than public release to optimize performance and reduce costs while maintaining competitive edge.
- 00:10:00 - 00:15:00
Details on the performance of Claude 3.5 indicate that its results weren't satisfactory enough for public release. Instead, Anthropic might be using the training data generated from this model to enhance its existing models, such as Claude Sonnet 3.6, implying a cycle where models improve incrementally through distillation.
- 00:15:00 - 00:20:00
The concept of model distillation emerges as a crucial strategy, wherein powerful models are utilized to enhance smaller models' capabilities. This allows companies to reduce operational costs while simultaneously improving performance, which might be a common practice among leading AI labs, including OpenAI.
- 00:20:00 - 00:26:31
The discussion concludes with speculations that OpenAI may be internally refining GPT-5 without releasing it, potentially due to the high costs of inference and improving performance on smaller models. There is also a suggestion that the trajectory of AI model development may shift away from larger models to more efficient and smaller ones, emphasizing the business strategy behind AI releases.
マインドマップ
ビデオQ&A
What is GPT-5 rumor about?
The rumor suggests that OpenAI has developed GPT-5 but is keeping it internal for strategic reasons.
How does model distillation work?
Model distillation is a process where a powerful model generates data to enhance the performance of a smaller, cheaper model.
Why might OpenAI choose not to release GPT-5?
OpenAI may keep GPT-5 internal to leverage its capabilities for distilling knowledge into more user-friendly models and for cost control.
What did Anthropic do with Claude Opus 3.5?
Anthropic allegedly utilized Claude Opus 3.5 internally to improve the performance of Claude Sonnet 3.6.
How does AI industry competition affect model releases?
The lack of necessity to publicize cutting-edge models can lead to a strategic focus on improving internal models rather than external offerings.
What is the significance of reduced model sizes?
The trend shows that newer models can outperform larger predecessors while being smaller and cheaper, thanks to distillation.
What are the implications of AI companies prioritizing internal models?
Companies may focus on using advanced models internally to drive innovation without releasing them publicly.
How does distillation help in AI development?
Distillation allows companies to improve performance without incurring high costs from running larger models.
What future advancements are anticipated in AI technology?
Further enhancements in AI may stem from internal models training smaller public-facing models, continuing a cycle of improvement.
How does this trend of AI model development impact user access?
As companies become more self-sufficient, public access to the latest advancements could diminish.
ビデオをもっと見る
- 00:00:00so there has been a rumor floating
- 00:00:02around about gbt 5 and in this video
- 00:00:04I'll explain the rumor articulated by
- 00:00:06Alberto Romero in a long winded article
- 00:00:09that dives into the details of where
- 00:00:11exactly GPT 5 is and this isn't just a
- 00:00:14video on GPT 5 it's more about where
- 00:00:16exactly the AI industry is headed as a
- 00:00:18whole because there could be something
- 00:00:20secret that these AI labs are doing that
- 00:00:22you ought to know so you can see right
- 00:00:24here the article starts that this rumor
- 00:00:26is about you know gb5 and it does change
- 00:00:28everything that it isn't an OV
- 00:00:30exaggeration it actually does change
- 00:00:32everything when I get into the details
- 00:00:34because it means that the way that these
- 00:00:36models going to be released is going to
- 00:00:37be completely different so let's take a
- 00:00:40look at what this article says so it
- 00:00:42talks about GPT 5 internally and it says
- 00:00:44that what if I told you that gbt 5 is
- 00:00:47real not just real but already shaping
- 00:00:49the world from where you can't see it so
- 00:00:51this is the hypothesis okay this is the
- 00:00:53entire hypothesis for this entire video
- 00:00:56and credit goes to this user but it says
- 00:00:58the open AI has already built GPT 5 but
- 00:01:01it's kept it internally because the
- 00:01:03return on investment is far greater than
- 00:01:05if they released it to millions of chat
- 00:01:07GPT users and also the ROI they're
- 00:01:10getting is not money but something else
- 00:01:13as you see the idea is simple enough but
- 00:01:16the challenge is connecting the
- 00:01:17breadcrumbs that lead up to it and this
- 00:01:19article is a deep dive into why I
- 00:01:20believe it all adds up so overall this
- 00:01:23article is basically say that look
- 00:01:25opening I have created GPT 5 and it is a
- 00:01:28stunning model like the model is truly
- 00:01:30stunning but they've decided not to
- 00:01:32release this model because of a reason
- 00:01:34that we're going to get into into this
- 00:01:36article and that reason is particularly
- 00:01:39interesting so let's go ahead and take a
- 00:01:40look at this so one of the things they
- 00:01:42actually talk about here is they talk
- 00:01:44about the mysterious disappearance of
- 00:01:45Opus 3.5 says before going into gbt 5 we
- 00:01:48have to pay a visit to its distant
- 00:01:49cousin also missing an action anthropics
- 00:01:52clae Opus 3.5 and as you know the top
- 00:01:55three AI labs open eye Google and deep
- 00:01:57mind and anthropic offer a range of
- 00:01:59models designed to span the price
- 00:02:01latency versus performance Spectrum now
- 00:02:03open AI provides options like GPT 40 gp2
- 00:02:0640 mini as well as 01 and 01 mini and
- 00:02:09Google Deep Mind offers Gemini Ultra Pro
- 00:02:12and Flash while in thoric has Claude
- 00:02:14Opus son and Hau and basically talking
- 00:02:16about how with all of these different
- 00:02:17models you're trying to cater to as many
- 00:02:19customers as possible some prioritize
- 00:02:21top T performance no matter the cost
- 00:02:23While others seek affordable Solutions
- 00:02:25which are you know good enough now
- 00:02:26they're talking about the mysterious
- 00:02:27disappearance of this model the Opus 3.5
- 00:02:30model that was meant to be coming in the
- 00:02:32series you see right here it says
- 00:02:33something strange happened in October
- 00:02:352024 everyone was expecting anthropic to
- 00:02:37announced claw 3.5 Opus as a response to
- 00:02:40GPT 40 instead they released an updated
- 00:02:43version of claw 3.5 Sonic that people
- 00:02:45started a course on in 3.6 now Opus 3.5
- 00:02:49was nowhere to be found remember this
- 00:02:51was supposed to be like the GPT 5 type
- 00:02:53model and it says this model was nowh to
- 00:02:55be found seemingly leaving anthropic
- 00:02:56without a direct competitor to GPT 40
- 00:02:59but when we take a look at actually what
- 00:03:00happened with Opa 3.5 it's rather
- 00:03:02fascinating so they talk about how you
- 00:03:03know um previously in October he wrote
- 00:03:06in his nuclear weat post that there are
- 00:03:08rumors that Sonic 3.6 which is you know
- 00:03:10a really good model is just you know a
- 00:03:12checkpoint of a failed training run on
- 00:03:14the much anticipated Opus 3.5 and that
- 00:03:16was because at the time Claude Opus 3.5
- 00:03:19according to the web pages a lot of
- 00:03:21people did think at the time that it was
- 00:03:23scrapped because previously they were
- 00:03:24stating that look claw 3.5 Opus is
- 00:03:26coming it's coming and then they kind of
- 00:03:28just removed the fact that the model was
- 00:03:30going to be coming from a lot of
- 00:03:31different web pages and recently there
- 00:03:33was actually an interview with the CEO
- 00:03:35of anthropic and this stuff about Opus
- 00:03:373.5 is actually all going to relate to
- 00:03:38GPT 5 in a moment but the reason we're
- 00:03:41talking about Opus 3.5 is because that
- 00:03:43is the model that is supposed to be a
- 00:03:45gp5 type model so just prefacing what
- 00:03:48we're about to go into here is that
- 00:03:49we're looking at how Opa 3.5 which is
- 00:03:52supposed to be the GPT 5 type competitor
- 00:03:54how anthropic have done that and how
- 00:03:57revealing what anthropic have done
- 00:03:58reveals kind of what done so they talk
- 00:04:01about how you know Dario amade basically
- 00:04:02says that you know on Opus 3.5 not
- 00:04:05giving you an exact date but as far we
- 00:04:06as we know the plan is still to have a
- 00:04:08claw 3.5 Opus cautious yet ambiguous yet
- 00:04:12valid well ridiculous timeline question
- 00:04:14uh when is cloud Opus uh 3.5 coming up
- 00:04:18uh not giving you an exact date uh but
- 00:04:20you know they're they uh you know as far
- 00:04:23as we know the plan is still to have a
- 00:04:24Claude 3.5 Opus so right here you can
- 00:04:27see that this is where we talk about the
- 00:04:29fact that you know in Bloomberg there
- 00:04:30were many articles at the time that were
- 00:04:32interestingly enough talking about you
- 00:04:34know open AI Google and anthropic are
- 00:04:36struggling to build more advanced Ai and
- 00:04:38this was going from the jump from GPT 4
- 00:04:39to a GPT 5 type model and this article
- 00:04:41is really fascinating cuz I remember at
- 00:04:43the time I was reading it and it was
- 00:04:44like Gemini 2 was you know disappointing
- 00:04:46they were saying the GPT 5 was
- 00:04:47disappointing and the OPA 3.5 they they
- 00:04:50all performed essentially better but not
- 00:04:52better enough so he talks about here you
- 00:04:54know Bloomberg wrote an article that you
- 00:04:56know after training it at anthropic
- 00:04:57found that 3.5 Opus which is upt 5
- 00:05:00competitor performs better on
- 00:05:02evaluations than the older version but
- 00:05:03not buy as much as it should given the
- 00:05:05size of the model and how costly it was
- 00:05:07to build and run and it seems that Dario
- 00:05:08refrained from giving a date because
- 00:05:10although the Opus 3.5 training run
- 00:05:11hadn't failed its results were
- 00:05:13underwhelming and the basically what
- 00:05:15they actually stating about this was
- 00:05:16that a lot of people missed was the fact
- 00:05:18that these next iterational models gbt 5
- 00:05:20Gemini 2 those models were you know good
- 00:05:23but the only bad thing about them was
- 00:05:25that they were just quite expensive for
- 00:05:27what they were and TR trust me you want
- 00:05:28to remember this fact because it you
- 00:05:30know it ties into you know a part of why
- 00:05:32they didn't release these models now
- 00:05:33here's where we get some Insider
- 00:05:35information SL some leaks and it says on
- 00:05:37December 11th semiconductor expert Dylan
- 00:05:39Patel and his semi analysis team
- 00:05:41delivered the final plot twist
- 00:05:43presenting an explanation that weaves
- 00:05:45all the data points into a coherent
- 00:05:46story says anthropic finished training
- 00:05:49claw 3.5 Opus and it did perform well
- 00:05:52with it scaling appropriately yet for
- 00:05:53some reason anthropic didn't release it
- 00:05:55so anthropic managed to get their GPT 5
- 00:05:58model performing well doing very well
- 00:06:00but apparently they didn't release it
- 00:06:01and he says this is because instead of
- 00:06:03releasing it publicly and this is where
- 00:06:04we get to the crazy theories that
- 00:06:06anthropic used claw 3.5 Opus to generate
- 00:06:09synthetic data and for reward modeling
- 00:06:12to improve Claude 3.5 Sonic
- 00:06:15significantly alongside user data now
- 00:06:18that's a crazy crazy statement but this
- 00:06:20is from a reputable Source within the
- 00:06:22industry there's previously spoken about
- 00:06:24things before they've been released and
- 00:06:25later it's come out that that
- 00:06:27information has been true and this is
- 00:06:31basically one of the key theories
- 00:06:33driving behind why many people think
- 00:06:35that GPT 5 is Alive and Well internally
- 00:06:38but isn't currently yet being released
- 00:06:40to the public and I mean it kind of
- 00:06:42makes sense when we actually take a look
- 00:06:43at more things because we all know just
- 00:06:45how good claw 3.6 Sonet is or claw 3.5
- 00:06:48Sonet is whatever you want to call it
- 00:06:49but many people can't figure out why is
- 00:06:51claw 3.5 Sonet so good or claw 3.6 on it
- 00:06:54so good but they haven't released Claude
- 00:06:56Opus just yet and so you see here it
- 00:06:57says in short anthropic did TR CLA 3.5
- 00:07:00Opus they dropped the name because it
- 00:07:02wasn't good enough and Dario confident a
- 00:07:04different training run could improve the
- 00:07:05results avoided you know giving a date
- 00:07:08and blueberg confirmed that the results
- 00:07:10were better than existing models but not
- 00:07:12enough to justify the inference cost
- 00:07:13which is of course how much these models
- 00:07:15cost to run when people are using the
- 00:07:17models and it says here that Dylan and
- 00:07:19his team uncovered the link between the
- 00:07:21mysterious Sonet 3.6 and the missing
- 00:07:23Opus 3.5 the latter was being used
- 00:07:25internally to generate synthetic data to
- 00:07:27boost the former's performance and so
- 00:07:29this is where we have this kind of thing
- 00:07:31right here and trust me it's going to
- 00:07:32link to GPT 5 in a moment but we can see
- 00:07:34here that this GPT file star model from
- 00:07:36anthropic the one of the biggest Rivals
- 00:07:37to you know one of the leading Labs we
- 00:07:39can see that that model right there is
- 00:07:41you know going down into Sonic 3.6 and
- 00:07:44this is basically where um you know
- 00:07:45they're distilling the model so
- 00:07:47essentially you could say that they
- 00:07:48internally built a really really smart
- 00:07:50AI system but they probably haven't
- 00:07:52released it to the public just yet and
- 00:07:53we can see right here that you know
- 00:07:54publicly Sonic 3.6 that's what's going
- 00:07:56on but um yeah it's pretty crazy now it
- 00:07:59gets crazier here so this is where they
- 00:08:00start to explain model distillation this
- 00:08:02is what they say better but also smaller
- 00:08:03and cheaper the process of using a
- 00:08:05powerful expensive model to generate
- 00:08:07data that enhances the performance of a
- 00:08:09slightly less capable model is known as
- 00:08:11distillation and it's a common practice
- 00:08:13where this technique allows a labs to
- 00:08:15use their smaller models beyond what
- 00:08:17could be achieved through additional
- 00:08:18pre-training alone so one way that a lot
- 00:08:20of these companies are managing to get
- 00:08:22these smaller models to be better and
- 00:08:23better is through model distillation so
- 00:08:25you have a teacher model then you
- 00:08:27distill that knowledge and you transfer
- 00:08:29to a student model and this is something
- 00:08:31that just performs a lot better than the
- 00:08:33standard you know pre-training and that
- 00:08:35entire Paradigm that we used to do so
- 00:08:37this is something that has been you know
- 00:08:38going pretty well for AI companies and
- 00:08:40it's going to be something that we
- 00:08:41continue to do now you can see right
- 00:08:43here that it says that there are various
- 00:08:44approaches to distillation but we're not
- 00:08:46getting into them what you need to
- 00:08:47remember is that a strong model acting
- 00:08:49as a teacher turns student models from
- 00:08:51small cheap and fast and weak into small
- 00:08:54cheap fast and Powerful so distillation
- 00:08:56turns a strong model into a gold mine
- 00:08:58and Dylan explains why it made sense for
- 00:09:00anthropic to do this with Opus 3.5 and
- 00:09:02Sonet 3.6 so it talks about how the
- 00:09:04inference costs of the new Sonet versus
- 00:09:06the old Sonic they didn't change
- 00:09:08drastically but the model's performance
- 00:09:10did why release Opus 3.5 on a cost basis
- 00:09:13when it does not make economic sense to
- 00:09:14do so relative to releasing a 3.5 Sonic
- 00:09:17with further Pro training from said 3.5
- 00:09:20Opus basically saying that look there's
- 00:09:22no point us going ahead and releasing
- 00:09:23Opa 3.5 when it's so expensive and
- 00:09:25costly to run why don't we just you know
- 00:09:27train SL distill a lot of the
- 00:09:29capabilities from this Opus model which
- 00:09:31is so amazing into a 3.6 Sonic model
- 00:09:34that they would love anyways so this is
- 00:09:36super super fascinating here and of
- 00:09:38course this is like what I said already
- 00:09:40anthropic chose not to release this you
- 00:09:41know not because of the poor results but
- 00:09:43because it's just more valuable
- 00:09:44internally for them to use so they
- 00:09:46basically use this AI to train the next
- 00:09:48set of AIS and this is why this is kind
- 00:09:50of changing everything because a lot of
- 00:09:51people have speculated for quite some
- 00:09:53time that this is going to be the moment
- 00:09:55where these AI systems are going to just
- 00:09:58keep getting better and better better
- 00:09:59because you use one to train the next
- 00:10:01version and then use that version to
- 00:10:02train next version and then you know so
- 00:10:04far you're just going to you know
- 00:10:06continue to expand in terms of the
- 00:10:07intelligence so what's crazy about this
- 00:10:09is that you know he also says that you
- 00:10:11know Dylan Guy the one that we're
- 00:10:12referencing here he says that that is
- 00:10:14why the open source Community caught up
- 00:10:16to GPT 4 so quickly they were basically
- 00:10:18taking the gold straight from opening
- 00:10:19eyes mind so people were essentially
- 00:10:21distilling GPT 4's capabilities into
- 00:10:24smaller models and that's you know
- 00:10:25pretty interesting because GPT 40 is a
- 00:10:28smaller model it's much faster and it's
- 00:10:29much better but it's also really really
- 00:10:31useful and that's essentially what these
- 00:10:33other companies are doing as well
- 00:10:34they're using those capabilities and
- 00:10:36that's how they're getting these open
- 00:10:37source projects up to GPT 4's level so
- 00:10:40you can see here that you know one of
- 00:10:41the things that we all know is that
- 00:10:42Sonic 3.6 wasn't just good it was stated
- 00:10:45the art good and better than GPT 40 and
- 00:10:47of course anthropics mid-tier model
- 00:10:49outperformed opening eyes flank ship
- 00:10:50models thanks to distillation from Opus
- 00:10:533.5 and of course other reasons as well
- 00:10:55so that was something that was like
- 00:10:56pretty crazy and then this is where we
- 00:10:58talk about you know bigger and better is
- 00:11:00no longer the Paradigm so once these top
- 00:11:02labs are you know no longer talking
- 00:11:03about higher parameter counts being you
- 00:11:05know better and the last time we got
- 00:11:07knowledge about the parameter counts we
- 00:11:08actually knew that GPT 3.5 was 175
- 00:11:11billion parameters and GPT 4 there were
- 00:11:13rumors saying that GPT 4 was 1.8
- 00:11:15trillion parameters in a mixture of
- 00:11:17experts but the craziest thing about
- 00:11:18this is that they actually now speculate
- 00:11:21that the future models like gpc5 and
- 00:11:22Sonet 3.6 or like whatever distilled
- 00:11:24models that we're getting like GPT 4 and
- 00:11:26Sonic 3.6 are significantly smaller than
- 00:11:28GPT 4 despite them both being better
- 00:11:30than gbt 4 across both benchmarks so gbt
- 00:11:334 is 1.8 trillion parameters that's what
- 00:11:35I'm trying to explain to you guys gbt 4
- 00:11:361.8 trillion parameters great model at
- 00:11:38the time of release I think around two
- 00:11:40years from now or one year and a half
- 00:11:41ago and at the time that model was
- 00:11:43considered crazy steady up but now we
- 00:11:45got smaller models like gbt 40 and Son
- 00:11:483.6 that are significantly smaller than
- 00:11:50this large model but they are better
- 00:11:52because of distillation and that just
- 00:11:54means that the knowledge in them is a
- 00:11:55lot more efficient and the models are a
- 00:11:58lot more smarter so you see right here
- 00:11:59that it says you know current models
- 00:12:00such as the original gbt 40 are probably
- 00:12:03an order of magnitude smaller than gbt 4
- 00:12:06with 40 having around 200 billion and
- 00:12:083.5 Sonet having around 400 billion
- 00:12:09parameters though this estimate could be
- 00:12:11off by a factor of two given the rough
- 00:12:13we I've arrived at it and the point here
- 00:12:14is that like the thing that you want to
- 00:12:16pay attention to is that both these
- 00:12:18companies are following a similar
- 00:12:19trajectory their latest models are not
- 00:12:21only better but also smaller and cheaper
- 00:12:23than the previous generation we know how
- 00:12:25anthropic pulled it off by distilling
- 00:12:27Opus 3.5 into Sonet 3.6 but this is
- 00:12:30where they get into something
- 00:12:31interesting so what did open AI do
- 00:12:33because of anthropic we know that they
- 00:12:35trained op 3.5 and put the knowledge of
- 00:12:37that model distilled it down into Sonic
- 00:12:393.6 what on Earth did opening I do and
- 00:12:42this is where the GPT 5 Rumor comes into
- 00:12:44account so we can take a look at this
- 00:12:45diagram and basically shows that you
- 00:12:46know we have the teacher models and we
- 00:12:49have the distillation models and it's
- 00:12:50basically saying that maybe opening eye
- 00:12:52has a very secret internal model it
- 00:12:54might be gbt 5 it might be a much
- 00:12:56smarter model considering you know all
- 00:12:58the things we've heard recently but we
- 00:12:59can see here that these models are going
- 00:13:01to be models that are very smart but
- 00:13:02they're only used for distillation to
- 00:13:05public models that are better smaller
- 00:13:06and cheaper to actually run and this is
- 00:13:09pretty crazy when you think about it so
- 00:13:11internally at anthropic it's quite
- 00:13:12likely they have Opus 3.5 which would be
- 00:13:14incredible to use but they managed to
- 00:13:16distill it down into Sonic 3.6 and Sonic
- 00:13:183.6 as you already know is pretty
- 00:13:20credible and this is where they're
- 00:13:21saying look with open eye it's
- 00:13:23potentially true that we have a GPD 5
- 00:13:26type model that's distilling all of this
- 00:13:28information that the these you know
- 00:13:29models that we're getting now these much
- 00:13:31smaller ones that we're currently using
- 00:13:32and I do know that there was some kind
- 00:13:33of distillation going on with the
- 00:13:35strawberry model so opening ey are
- 00:13:36definitely familiar with that
- 00:13:37distillation process it's quite like how
- 00:13:39we had the 01 model and we had the 01
- 00:13:41previews models and how effective those
- 00:13:43ones were now when we get onto chapter 3
- 00:13:45it says that one might assume the
- 00:13:46anthropic distillation approach was
- 00:13:47driven by unique circumstances an
- 00:13:49underwhelming training run for Opus 3.5
- 00:13:52but that is something that all of these
- 00:13:54companies have experienced and the key
- 00:13:55thing about this is that like the news
- 00:13:57kind of flipped on everyone's head
- 00:13:58because everyone took the wrong thing
- 00:14:00away from the news they said they had
- 00:14:02subpar training ones but subpar it
- 00:14:04doesn't mean that it was worse because
- 00:14:06think about like this okay like you can
- 00:14:08have something that subpar but it
- 00:14:09doesn't mean that it was worse than you
- 00:14:10did before it just means that it didn't
- 00:14:11live up to your expectations like
- 00:14:13imagine you know you're expecting to win
- 00:14:14$10,000 in a competition and you only
- 00:14:17won $11,000 more than you usually win
- 00:14:19like that's probably subpar but it
- 00:14:21doesn't mean that it wasn't an
- 00:14:22improvement and you have to remember
- 00:14:23that like with intelligence every inch
- 00:14:25that you gain unlocks a whole new host
- 00:14:27of capability so they say say that the
- 00:14:29causes for this don't matter to us
- 00:14:31diminishing returns for lack of data
- 00:14:32whatever the case is you know it doesn't
- 00:14:34really matter because all of these
- 00:14:35companies are going through at the same
- 00:14:36time and basically this is why they talk
- 00:14:38about you know one of the key things for
- 00:14:39these companies that most people don't
- 00:14:40understand is that you know if 300
- 00:14:42million people are using your product
- 00:14:43weekly the operational expenditures can
- 00:14:45you know suddenly kill your company
- 00:14:47which is really important like costs
- 00:14:49matter costs really do matter for these
- 00:14:51companies because there are so many
- 00:14:52people using them there's so many
- 00:14:53servers and you know this is something
- 00:14:55that continually is expanding week on
- 00:14:56week so it's pretty hard to keep up with
- 00:14:59said demand and so this is why they talk
- 00:15:00about how distillation was really good
- 00:15:02because whatever drove anthropic to
- 00:15:03distill Sonet 3.6 to Opus 3.5 is
- 00:15:06affecting openingi several times over
- 00:15:09and talks about you know it basically
- 00:15:10just talks about here how distillation
- 00:15:11works because it Bridges these two
- 00:15:13Universal challenges into an advantage
- 00:15:15you solve the inference cost Problem by
- 00:15:16serving people a smaller model and avoid
- 00:15:18the public backlash for underwelming
- 00:15:19performance by not releasing a larger
- 00:15:21one now of course as most of you guys
- 00:15:23might know one of the things that you
- 00:15:25can't do anymore is of course
- 00:15:26overtraining so these AI Labs have
- 00:15:28actually exhausted all the high quality
- 00:15:29data sources for pre-training and this
- 00:15:31is actually something that Elon Musk and
- 00:15:32Elia satk have admitted in recent weeks
- 00:15:35he says we're back at distillation I
- 00:15:37think that both GPT 40 and Claw 3.5
- 00:15:39Sonic have been distilled down from
- 00:15:41larger models so this is something that
- 00:15:42you know they both actually talk about
- 00:15:44as a realistic things to the point where
- 00:15:46like they're saying that the way how we
- 00:15:48now get to a next level of model is that
- 00:15:50we can't just put more and more data in
- 00:15:51new Innovations are needed and the basic
- 00:15:53thing that look distillation is probably
- 00:15:55the only way that these models are going
- 00:15:57to be getting better so you can see here
- 00:15:58it says every piece of the puzzle so far
- 00:16:00suggests that open ey is doing what
- 00:16:01anthropic did with Opus 3.5 train the
- 00:16:04model and hide the model in the same way
- 00:16:06through distillation and for the same
- 00:16:08reasons because of course there are poor
- 00:16:09results in terms of the cost and that's
- 00:16:12quite the discovery because Opus 3.5 is
- 00:16:14still hidden but we have to think about
- 00:16:16it where is opening eyes and nagalas
- 00:16:18model is it hiding in the company's
- 00:16:19basement care to venture a name and this
- 00:16:22is why open ey are currently distilling
- 00:16:23that model now what's really interesting
- 00:16:25as well is that they talk about how he
- 00:16:26who blazes the trail must clear the pass
- 00:16:28so he he said I started this by
- 00:16:30analyzing anthropics Opus 3.5 story
- 00:16:32because it's the one where we have more
- 00:16:33information then I traced a bridge to
- 00:16:35open ey with the concept of distillation
- 00:16:37and explained why the underlying forces
- 00:16:39pushing anthropic are also pushing open
- 00:16:41ey but there's a new obstacle in theory
- 00:16:44because openi is the Pioneer they might
- 00:16:46be facing obstacles that like anthropic
- 00:16:48simply haven't found yet because if
- 00:16:49you're innovating you're going to you
- 00:16:51know face problems that nobody else has
- 00:16:52seen yet so you need to clear that path
- 00:16:54and it says here that one obstacle is
- 00:16:55the hardware requirements to train GPT 5
- 00:16:58so at 3 2.6 is comparable to GPT 40 but
- 00:17:01it was released with a 5mon lag we
- 00:17:03should assume GPT 5 is on another level
- 00:17:05more powerful and bigger also more
- 00:17:07expensive not only to inference but also
- 00:17:09to train we could be talking about a
- 00:17:11half billion dollar training run would
- 00:17:13it even be possible to do such a thing
- 00:17:15with current hardware and yes that would
- 00:17:17be possible but the crazy thing about
- 00:17:19this is that you know you wouldn't be
- 00:17:20able to have inflence over that model so
- 00:17:22basically what they stating here is that
- 00:17:23these companies are probably still
- 00:17:25scaling the models internally and doing
- 00:17:27insane training runs but the only way
- 00:17:29that they can actually you know provide
- 00:17:31the inference you know to get these
- 00:17:32models out into the public is to distill
- 00:17:34those capabilities down into a smaller
- 00:17:36model and it says you know in principle
- 00:17:38our Cent Hardware is good enough to
- 00:17:39serve models much bigger than GPT 4 for
- 00:17:42example a 50 times scaled up version of
- 00:17:44GPT 4 having around 100 trillion
- 00:17:46parameters could probably be served at
- 00:17:49$3,000 per million token and 10 to 20
- 00:17:51tokens per second of output speed
- 00:17:53however for this to be viable those big
- 00:17:55models would have to unlock a lot of
- 00:17:56economic value for the customers using
- 00:17:58them so so of course this is the kind of
- 00:18:00reason why they don't release it and
- 00:18:01this is super interesting because we
- 00:18:02know that these companies are always
- 00:18:04currently struggling for inference CU
- 00:18:05they're trying to do a lot of research
- 00:18:06and all these kind of things but it will
- 00:18:08be interesting to see what is actually
- 00:18:09going on behind the scenes here because
- 00:18:11if they are training up a model that is
- 00:18:12that big then they could do a lot of
- 00:18:15things with that but the only thing I
- 00:18:16would say that this article doesn't
- 00:18:17cover at the moment is the fact that gbt
- 00:18:1940 I remember reading a few posts about
- 00:18:21this and I do remember reading that gbt
- 00:18:234 is a model that is an omn model that
- 00:18:26built from the ground up to be
- 00:18:27multimodel in and multimodo out so I'm
- 00:18:30not sure like it was just an llm but
- 00:18:32maybe there are more details on that but
- 00:18:34I do remember reading about GPT 40 being
- 00:18:36like this Omni model so it was trained
- 00:18:38on audio in audio out and I do remember
- 00:18:39like reading the you know research paper
- 00:18:41the entire full thing where I did you
- 00:18:43know 30 40 minute you know details going
- 00:18:45into it where you can actually see that
- 00:18:46the model is capable of a lot of
- 00:18:48different things so I do think that if
- 00:18:50they do have a gbt 5 type model I don't
- 00:18:52think they've just D it down into gbt 40
- 00:18:54yet I think that model is definitely
- 00:18:55going to be coming in the future and
- 00:18:56they talk about you know spending that
- 00:18:58kind of inference money is not even just
- 00:18:59aable for Microsoft Google or Amazon
- 00:19:01because of course they need to unlock a
- 00:19:03lot of economic value if they plan to
- 00:19:04serve this several trillion parameter
- 00:19:06model to the public so they don't so
- 00:19:08they train it they realize it performs
- 00:19:09better than their current offerings but
- 00:19:11they have to accept it as it hasn't
- 00:19:12Advanced enough to justify the enormous
- 00:19:15cost of keeping it running and that's
- 00:19:16essentially what the Wall Street you
- 00:19:17know Journal said on gbt 5 a month ago
- 00:19:20and what Bloomberg said about Opus 3.5
- 00:19:22now other the thing that they said if
- 00:19:23opening ey were hypothetically
- 00:19:24withholding GPT 5 under the preex status
- 00:19:26not already they would achieve one more
- 00:19:28thing besides cost control and
- 00:19:29preventing public backlash so if openi
- 00:19:31were hypothetically withholding GPT 5
- 00:19:33under the pretext that it's not ready
- 00:19:35they would achieve one more thing
- 00:19:36besides the cost control and preventing
- 00:19:38the public backlash that actually
- 00:19:40sidestep the need to declare whether or
- 00:19:42not it meets the threshold for being
- 00:19:43categorized as AI as you know they have
- 00:19:45a contract with Microsoft that says you
- 00:19:47know AGI is a system that can generate
- 00:19:49at least $100 in profits maybe Microsoft
- 00:19:52would say that if people are able to
- 00:19:53build rappers out of that and that's
- 00:19:55able to get them to 100 billion maybe
- 00:19:57they wouldn't mind triggering the a
- 00:19:58clause and parting ways with Microsoft
- 00:20:01so you know potentially if they were
- 00:20:02looking at 100 billion in annual
- 00:20:04recurring revenue from gbt 5 maybe they
- 00:20:07wouldn't care now this is where the
- 00:20:09theory starts to evolve into something
- 00:20:10really crazy and it starts to finalize
- 00:20:12so this is where the theory starts to
- 00:20:14basically say that look they might not
- 00:20:15need us so even if that were true no
- 00:20:17skeptic has stopped to think that openi
- 00:20:19may have a better internal use case than
- 00:20:21whatever they'd get from it externally
- 00:20:23there's vast differences between
- 00:20:24creating an excellent model and creating
- 00:20:26an excellent model that can be served
- 00:20:27cheaply to 300 million people if you
- 00:20:29can't you don't but also if you don't
- 00:20:32need to you don't now it says here and
- 00:20:33this is crazy okay that they were giving
- 00:20:35us access to their best model because
- 00:20:37they needed our data but not so much
- 00:20:39anymore they're not chasing our money
- 00:20:40either that's Microsoft but not them
- 00:20:43they want a AGI and then they want ASI
- 00:20:45and they want a legacy they're basically
- 00:20:47stating that look before they needed to
- 00:20:48train on user data and they needed to
- 00:20:50figure out where to scale the model but
- 00:20:52now that they have gbt 5 and whatever
- 00:20:53internal models they're using to distill
- 00:20:55down the knowledge into simple products
- 00:20:57that we can use and get a decent amount
- 00:20:59of value from on a day-to-day basis they
- 00:21:01don't really need to provide us with the
- 00:21:02Cutting Edge pieces of Technology
- 00:21:04anymore simply because they don't need
- 00:21:06to serve them anymore they just need to
- 00:21:08use them themselves in order to develop
- 00:21:09better products and better technology
- 00:21:11and that's kind of like saying that you
- 00:21:13know when open AI develops their own
- 00:21:15internal AGI systems then they're just
- 00:21:16going to use that themselves in order to
- 00:21:18actually make money so this is where you
- 00:21:20can see here that this is what they're
- 00:21:21talking about internally their private
- 00:21:23models are Opus 3.5 and privately they
- 00:21:25have gbt 5 and they distill those
- 00:21:27capabilities into these models I'm not
- 00:21:29sure about the distilling into gbt 40
- 00:21:32but maybe it's definitely being used to
- 00:21:34help with some of the reinforcement
- 00:21:35learning for future models and some of
- 00:21:37the synthetic data generation and I have
- 00:21:39heard that that is the case with 01/03
- 00:21:42and you can see right here it says we're
- 00:21:43nearing the end I believe i' laid out
- 00:21:45enough arguments to make a solid case
- 00:21:47open ey likely has gbt 5 working
- 00:21:49internally just as anthropic does with
- 00:21:51Opus 3.5 but it's quite plausible that
- 00:21:53openi never releases GPT 5 at all the
- 00:21:56public now measures performance again 01
- 00:21:58sl3 Not Just gbt 4 or Claude sonit 3.5
- 00:22:02now with the test Time new scaling laws
- 00:22:04the bar for gbt 5 to clear keeps Rising
- 00:22:07how could they ever release a GPT 5 that
- 00:22:09truly outshines 01 and 03 and the coming
- 00:22:12O Series models at the pace they're
- 00:22:14producing them besides that they no
- 00:22:16longer need our money or our data
- 00:22:18anymore basically saying that look when
- 00:22:19we take a look at how the fact that
- 00:22:21these 01 series models are just so
- 00:22:22incredible when it comes to the raw
- 00:22:24capabilities of reasoning why on Earth
- 00:22:26would they release a GPT 5 level model
- 00:22:27at all when those smaller models that
- 00:22:29we're currently getting with gbt 4 type
- 00:22:31systems are just a lot more expensive
- 00:22:33for incremental gains so you can see
- 00:22:34right here it says training new base
- 00:22:36models like gbt 5 gbt 6 and Beyond will
- 00:22:38always make sense for op ey internally
- 00:22:40but not necessarily as products that
- 00:22:42part might be over the only goal that
- 00:22:45matters to them from now on is to keep
- 00:22:47generating better data for the next
- 00:22:49generation of models from here on the
- 00:22:52base models May operate in the
- 00:22:53background empowering other models to
- 00:22:55achieve Feats that they couldn't on
- 00:22:56their own like an old hermit pass down
- 00:22:58wisdom from a secret Mountain cave
- 00:23:00except the that cave is a massive Data
- 00:23:02Center and whether or not we meet him or
- 00:23:04not is you know something we're going to
- 00:23:05have to see so it's quite likely that
- 00:23:07maybe we're going to have these internal
- 00:23:09models gbt 6 gb7 gb5 producing the
- 00:23:12synthetic data for these future models
- 00:23:14to be trained on it could definitely be
- 00:23:15the case considering the fact that it's
- 00:23:17quite likely that we might not get these
- 00:23:19models and so what about if you know gbt
- 00:23:215 suddenly gets released they basically
- 00:23:22say that even if gbt 5 is eventually
- 00:23:24released opening eye and the Tropic have
- 00:23:26already initiated the operation of curse
- 00:23:28of self-improvement with humans in the
- 00:23:30loop and it doesn't really matter what
- 00:23:32they give us publicly they're going to
- 00:23:33be pulling further and further ahead
- 00:23:35like the universe expanding so fast that
- 00:23:37distant galaxies can no longer reach us
- 00:23:39and that's probably how they jumped from
- 00:23:4101 to 03 in barely 3 months and that's
- 00:23:44how they're going to jump to 04 to 05 so
- 00:23:47it's probably why they've been so
- 00:23:48excited on social media because they've
- 00:23:50implemented a new way to scale
- 00:23:53incredibly so you can see right here
- 00:23:54they actually talk about something that
- 00:23:55I actually spoke about quite a long time
- 00:23:57before like ages ago in I first launched
- 00:23:58my ever AI Community I spoke up about
- 00:24:00the fact that you know even if AGI does
- 00:24:02arrive we probably won't get access to
- 00:24:04it because the economic value for the
- 00:24:05average person just simply doesn't make
- 00:24:07sense and you can see right here that
- 00:24:08they state did you really think
- 00:24:09approaching AGI would mean gaining
- 00:24:12access to increasingly powerful AI at
- 00:24:13your fingertips that they'd release
- 00:24:15every advancement for us to use surely
- 00:24:17you don't believe that they meant it
- 00:24:18when they said their models would push
- 00:24:20them too far ahead for anyone else to
- 00:24:22catch up and each new generation model
- 00:24:24is an engine of escape velocity from the
- 00:24:26stratosphere they're already w goodbye
- 00:24:28and they're basically saying that look
- 00:24:30with every time that they make a new
- 00:24:31model it's going to become harder and
- 00:24:33harder to catch up to open a ey because
- 00:24:34they have something else that can
- 00:24:36generate synthetic data and that can
- 00:24:38also help them further the entire cycle
- 00:24:40of increasing intelligence within those
- 00:24:42models now recently there was also this
- 00:24:43very cryptic tweet that has been going
- 00:24:45viral on Twitter and it says just got
- 00:24:47reading to some info what's happening
- 00:24:48globally happening internally at openi
- 00:24:51and holy mother of God I don't even know
- 00:24:53how to express my feelings without
- 00:24:54sounding like hype but I don't know what
- 00:24:57to say but I will share this the
- 00:24:59innovators are coming the problem is we
- 00:25:01don't know how they got there now I will
- 00:25:02say this isn't like a Jimmy apples kind
- 00:25:04of post I haven't really seen any people
- 00:25:05coating this statement but not just this
- 00:25:08person has been stating this thing I've
- 00:25:09been seeing time and time again from
- 00:25:10people at open AI stating crazy things
- 00:25:12about super intelligence open AI on
- 00:25:14their blog have said that you know
- 00:25:15they're now chasing artificial super
- 00:25:17intelligence instead of AGI and they
- 00:25:18know exactly how to get to AGI so all of
- 00:25:20these statements coming around the same
- 00:25:22time at this new paradigm isn't honestly
- 00:25:24surprising and as I was making this
- 00:25:25video there was actually a tweet about
- 00:25:27GPT 5 you can see someone said can you
- 00:25:29comment something about GPT 5 we know
- 00:25:31you won't be able to say anything just
- 00:25:33anything at all and he respond saying
- 00:25:35what would you like to know and chubby
- 00:25:37comments back saying when any time in
- 00:25:39terms of the estimate of time arrival
- 00:25:41and of course the performance how much
- 00:25:42better is this going to be than gbt 40
- 00:25:44and will this GPT series merge with the
- 00:25:46O Series and he says he's still figuring
- 00:25:48out when the estimated type of arrival
- 00:25:49is and of course the performance and in
- 00:25:512025 they actually talk about merging
- 00:25:53the 01 series and the GPT Series so
- 00:25:55overall still a very vague response not
- 00:25:57much to for there but it definitely does
- 00:26:00make sense considering the fact that
- 00:26:02they did create these models they were
- 00:26:03going to make them anyways and of course
- 00:26:05we know that they haven't released them
- 00:26:07so it's quite likely that they are using
- 00:26:08them internally to generate data and do
- 00:26:10many other things just imagine a version
- 00:26:12of Claude 3.6 that's even better than it
- 00:26:14is now imagine what they could be using
- 00:26:16that for and we've seen literally how
- 00:26:18much Claude 3.6 Sonic has changed entire
- 00:26:21Industries in terms of like cursor and
- 00:26:23coding and what people are able to do
- 00:26:24with that so it's super super
- 00:26:25interesting to see where things go from
- 00:26:27here with that being said hopefully you
- 00:26:29guys enjoyed this video and I'll see you
- 00:26:30in the next one
- GPT-5
- OpenAI
- AI development
- model distillation
- Anthropic
- Claude Opus 3.5
- AI industry
- internal models
- technology trends
- cutting-edge AI