AI Has a Fatal Flaw—And Nobody Can Fix It

00:18:01
https://www.youtube.com/watch?v=_IOh0S_L3C4

概要

TLDRThe video examines the potential and limitations of artificial intelligence, focusing on the mathematical boundaries that hinder AI models from achieving greater intelligence. It emphasizes that while AI can outperform humans in many tasks, it still lacks true understanding and reasoning abilities. This is illustrated through discussions on AI training methods, particularly the extensive number of parameters it employs, and how these models are beginning to hit a wall of diminishing returns as they scale. Additionally, the video highlights the challenges in real-world applications, such as common sense reasoning and advanced math, and suggests that achieving true artificial general intelligence presents significant hurdles that must be overcome.

収穫

  • 💰 This is a $100 million line—the cost of advanced AI models.
  • 🤖 Current AI lacks true intelligence despite outperforming humans in specific tasks.
  • 📉 AI faces diminishing returns when scaled beyond a certain point.
  • 🔐 Training data limitations restrict the abilities of AI models.
  • 🤔 Common sense reasoning remains a challenge for AI.
  • 🔗 Chain of Thought illustrates AI's step-by-step thinking process.
  • 🎓 GPT models excel at word prediction but struggle with mathematical reasoning.
  • 🚫 AI lacks the capability for creativity and real-world decision-making.
  • 📈 More efficient AI models may emerge based on industry innovations.
  • 🤷‍♂️ The question of whether AI can think depends on how we define thinking.

タイムライン

  • 00:00:00 - 00:05:00

    This segment introduces the concept of a significant limitation in artificial intelligence, represented by a crucial mathematical equation that suggests inherent boundaries in AI's intelligence. Although current computers excel in specific tasks like math and data recall, they're not genuinely 'intelligent.' The segment alludes to the public's fascination with rapid advancements in AI technology, particularly emphasizing that the assumption of AI models becoming exponentially smarter is misguided; the reality is, current models face challenges that hinder this growth.

  • 00:05:00 - 00:10:00

    Here, the video explains how AI models function, particularly focusing on the mechanism of predicting words and how they leverage parameters during training. The segment also highlights GPT-3 and its reliance on a massive number of parameters to predict language efficiently, while touching on the embedding process, where words are mapped into a vastly complex multi-dimensional space. This lays a foundation for understanding why these AI models, despite their capabilities, struggle with more advanced tasks like mathematics due to their inherent design limits.

  • 00:10:00 - 00:18:01

    The final part discusses the implications of AI advancements on future technologies and everyday tasks, touching on the diminishing returns of scaling AI models further. It explores how current models struggle with common sense and real-world reasoning, acknowledging that while AI shows potential, we are still far from achieving true human-like cognitive abilities. The segment concludes by posing philosophical questions about AI's future and our perceived limitations, suggesting a continuous evolution in the relationship between humans and computers.

マインドマップ

ビデオQ&A

  • What is the primary limitation of current AI models?

    Current AI models face a limit on how intelligent they can become, due to mathematical and data constraints.

  • What distinguishes current AI from artificial general intelligence (AGI)?

    Current AI excels in specific tasks but lacks true understanding and reasoning capabilities that characterize AGI.

  • Why do current AI models struggle with mathematics?

    AI models like GPT predict outcomes based on patterns in data rather than understanding mathematical principles.

  • What is the estimated cost to train advanced AI models?

    Training advanced AI models can cost around $100 million or more.

  • What is the impact of parameter counts in AI models?

    Higher parameter counts can improve performance to a point, but diminishing returns are observed after a certain scale.

  • What does the term 'Chain of Thought' refer to in AI?

    'Chain of Thought' is a method where AI breaks down problems into smaller tasks to improve reasoning.

  • Can AI models understand context?

    AI models can recognize context by analyzing surrounding words, but they do not truly comprehend meanings.

  • What challenges do AI systems face in real-world scenarios?

    AI struggles with common sense reasoning, creativity, and quick advanced decision making in real-world settings.

  • Is there potential for more efficient AI models?

    Yes, there are emerging models that aim to be more efficient with fewer parameters and lower costs.

  • Why do people fear the rise of AI?

    Concerns stem from potential job displacement and the ethics of AI functioning in real-world applications.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:00
    take a look at this line This is a $100
  • 00:00:01
    million line probably the most expensive
  • 00:00:03
    math equation in history it's a limit an
  • 00:00:05
    imaginary wall of physics and
  • 00:00:07
    Mathematics of how intelligent
  • 00:00:09
    artificial intelligence can ever be so
  • 00:00:11
    take a good look cuz I promise I'm not
  • 00:00:13
    going to show it again in this video
  • 00:00:14
    like I'm not pretending this is a
  • 00:00:16
    science channel so we're we're not going
  • 00:00:17
    to go that deep but I will explain in
  • 00:00:19
    the simplest of words why there's a
  • 00:00:21
    limit to how smart these models can get
  • 00:00:24
    a limit that even the top scientists
  • 00:00:26
    have not been able to overcome and the
  • 00:00:28
    clues about this have already started
  • 00:00:29
    started to show up and this is the
  • 00:00:32
    equation that was the equation that
  • 00:00:34
    might put an end to all this uh bubbly
  • 00:00:37
    Behavior around
  • 00:00:40
    it so for years we've been imagining an
  • 00:00:43
    artificial intelligence that outex
  • 00:00:45
    smarts us but let's get something out of
  • 00:00:46
    the way computers are already way better
  • 00:00:48
    than us at plenty of things computers
  • 00:00:51
    absolutely kick our ass at solving math
  • 00:00:53
    and they're definitely smarter than us
  • 00:00:55
    at storing data and reciting it back
  • 00:00:57
    that's not really intelligence current
  • 00:00:58
    AI companies have even had to
  • 00:01:00
    differentiate our current AI from AGI
  • 00:01:02
    artificial general intelligence because
  • 00:01:04
    they know deep down that what we have
  • 00:01:06
    now is not really intelligence but
  • 00:01:09
    assuming that we gave hands to a GPT
  • 00:01:11
    could it actually cook me an egg for
  • 00:01:13
    breakfast hold that thought the fact
  • 00:01:15
    that AI is the buzz word of the Year and
  • 00:01:17
    that Nvidia is worth some trillion
  • 00:01:18
    dollar number is based on three premises
  • 00:01:21
    one that the smarter models are going to
  • 00:01:24
    need all of those gpus two that more
  • 00:01:26
    people are going to adopt AI into their
  • 00:01:29
    daily lives and three that our AI models
  • 00:01:32
    are going to get exponentially smarter
  • 00:01:34
    everyone is in panic mode because some
  • 00:01:36
    Chinese startup allegedly built and
  • 00:01:38
    trained at chat GPT level model using a
  • 00:01:41
    fraction of the compute cost and a
  • 00:01:42
    fraction of the cost one Chinese startup
  • 00:01:45
    just launched a new AI model to rival
  • 00:01:47
    open AI deep seek has become the most
  • 00:01:50
    downloaded free app passing chaty PT
  • 00:01:52
    which is pretty shocking we're going to
  • 00:01:54
    get to this it remains to be seen how
  • 00:01:55
    much the average Joe or the average
  • 00:01:56
    company adopts AI into their lives but
  • 00:02:00
    none of those betting on big Tech AI
  • 00:02:02
    transformation even questioning the
  • 00:02:05
    possibility of that third bullet point
  • 00:02:07
    and therein lies the problem making our
  • 00:02:10
    current AI models smarter is almost
  • 00:02:13
    impossible in order to understand why
  • 00:02:15
    that math equation is so dangerous to
  • 00:02:17
    these companies we need to understand at
  • 00:02:19
    least the basics of how our current
  • 00:02:21
    models are trained and how they think so
  • 00:02:23
    let's just go to our explainer time
  • 00:02:25
    explainer
  • 00:02:26
    time what GPT excels at to the point
  • 00:02:29
    where it acts like an intelligent
  • 00:02:31
    sentient bot enough to fool many of us
  • 00:02:33
    to write entire essays what it does is
  • 00:02:36
    is predicting the next word in a
  • 00:02:38
    sentence but being really good at
  • 00:02:40
    predicting words or the next word
  • 00:02:42
    already allows a model like GPT to beat
  • 00:02:44
    us at most standardized tests which is
  • 00:02:47
    kind of the way we measure our own human
  • 00:02:49
    intelligence isn't it and yet GPT
  • 00:02:52
    doesn't do that well at math it only
  • 00:02:54
    beat about 50% of the students in these
  • 00:02:56
    tests why is that well one number that
  • 00:02:58
    you're going to hear about all the time
  • 00:02:59
    when people talk about these models is
  • 00:03:02
    the number of parameters that a model
  • 00:03:04
    was trained on how many parameters is a
  • 00:03:05
    model using gp3 for example which is
  • 00:03:08
    almost useless dumb compared to the
  • 00:03:11
    models that we use today used 175
  • 00:03:14
    billion parameters what the hell does
  • 00:03:16
    that mean so let me show you now a model
  • 00:03:18
    like GPT uses pre-trained Transformers
  • 00:03:21
    to generate text hence the name now this
  • 00:03:23
    sounds like nonsense to you right now
  • 00:03:24
    but I promise it'll make sense in
  • 00:03:26
    exactly 4 minutes imagine that we feed
  • 00:03:28
    the model a sentence so that it can try
  • 00:03:30
    and predict what the next word is so the
  • 00:03:32
    first thing that the model needs to do
  • 00:03:34
    is try to understand what this group of
  • 00:03:36
    words means like we we're seeing words
  • 00:03:38
    here but the computer is really just
  • 00:03:40
    seeing bits out of all this thing and
  • 00:03:42
    the first thing that the model will do
  • 00:03:43
    is try to break this into tokens maybe
  • 00:03:45
    you've heard the term before then what
  • 00:03:47
    we'll try to do is classify those tokens
  • 00:03:49
    based on their
  • 00:03:51
    meaning llms classify words by grouping
  • 00:03:54
    them together with words that have
  • 00:03:56
    similar meaning technically this is done
  • 00:03:57
    not by grouping entire words but tokens
  • 00:04:00
    which are fractions of a word but I'm
  • 00:04:02
    going to stick with the concept of words
  • 00:04:03
    just for Simplicity sake for example
  • 00:04:05
    ring may be classified with other words
  • 00:04:08
    like ear like Jewel maybe around the
  • 00:04:11
    world Circle so here's a 2d simple
  • 00:04:14
    two-dimension axis we have a horizontal
  • 00:04:16
    axis X and a vertical axis y this is
  • 00:04:18
    great for numbers right because we can
  • 00:04:19
    just go up or down depending on the
  • 00:04:21
    number but we're dealing with words here
  • 00:04:23
    and there are thousands of words out
  • 00:04:25
    there with thousands of different
  • 00:04:27
    meanings it would be kind of impossible
  • 00:04:29
    to group words by meaning into a 2d
  • 00:04:32
    space even in a 3D space we'd run out of
  • 00:04:34
    directions to go to very quickly so gpt3
  • 00:04:37
    classifies words or tokens really into
  • 00:04:41
    12,288 different dimensions that means a
  • 00:04:43
    grid with
  • 00:04:45
    12,288 axis we can't see it of course we
  • 00:04:48
    can't even imagine it it's like that
  • 00:04:50
    Interstellar Tesseract but with 11,284
  • 00:04:53
    more Dimensions to go but don't worry
  • 00:04:57
    what you need to understand is that in
  • 00:04:59
    this unimaginable Cloud this black hole
  • 00:05:02
    of directions words with similar
  • 00:05:04
    meanings are going to be grouped close
  • 00:05:07
    to each other so from the gpt3 paper we
  • 00:05:10
    know that open AI used about 50,000
  • 00:05:13
    tokens basically a dictionary of tokens
  • 00:05:15
    and in map them into this 12,000
  • 00:05:17
    Dimension space which already puts the
  • 00:05:19
    count of parameters that the model is
  • 00:05:21
    going to need at around 600 million
  • 00:05:24
    still that's far from the 175 billion
  • 00:05:26
    parameters that gpt3 had so let's keep
  • 00:05:28
    digging but before I do that I want to
  • 00:05:30
    take a moment to thank nordpass for
  • 00:05:31
    helping us fun today's explainer
  • 00:05:32
    nordpass is a secure password manager
  • 00:05:35
    created by the experts behind nordvpn to
  • 00:05:37
    help you and your team store and share
  • 00:05:39
    passwords and credit card details
  • 00:05:40
    securely one in four people can still
  • 00:05:42
    log into accounts from their previous
  • 00:05:44
    jobs granting them access to stuff that
  • 00:05:46
    they shouldn't have but passwords shared
  • 00:05:48
    through Nord pass can be revoked in
  • 00:05:50
    seconds you have full visibility into
  • 00:05:52
    who has access to which shared Company
  • 00:05:54
    accounts and the vulnerability of these
  • 00:05:56
    making life a lot simpler for your it
  • 00:05:58
    departments we migrated our old password
  • 00:06:01
    manager into Nord pass with a simple
  • 00:06:03
    export import function and everybody hit
  • 00:06:05
    the ground running in minutes it's easy
  • 00:06:07
    to use you can sync it across devices
  • 00:06:09
    and it has this userfriendly interface
  • 00:06:11
    so I really can't recommend it enough
  • 00:06:13
    npress also has this really cool feature
  • 00:06:14
    called data breach scanner which gives
  • 00:06:16
    you live alerts if any of your corporate
  • 00:06:18
    data appears on the dark net so it gives
  • 00:06:20
    you a warning advance to change your
  • 00:06:22
    passwords before any of your accounts
  • 00:06:23
    are breached which can of course cause
  • 00:06:26
    financial and reputation damage we
  • 00:06:28
    partnered with norpass to bring you
  • 00:06:29
    3-month free trial on npass for business
  • 00:06:32
    and 20% of their business plans no
  • 00:06:34
    credit card is required you can just go
  • 00:06:36
    to nordp pass.com slidebean use the code
  • 00:06:38
    slidebean at signup or you can just scan
  • 00:06:41
    this QR code you'll level up your
  • 00:06:43
    business security you'll save a lot of
  • 00:06:44
    money and you'll help our channel in the
  • 00:06:46
    process okay so now let's dig into what
  • 00:06:48
    happens after the embedding so the
  • 00:06:50
    mapping of words into this
  • 00:06:52
    incomprehensible tacct black hole is
  • 00:06:54
    called embedding this is the embedding
  • 00:06:56
    step and it's how a model turns words
  • 00:06:58
    into something that computers can
  • 00:07:00
    understand understand and process but
  • 00:07:01
    just understanding that the word ring
  • 00:07:03
    lives in a neighborhood of other words
  • 00:07:05
    we still don't know what it means in
  • 00:07:06
    this context ring might be a sound might
  • 00:07:09
    be an earring might be the one ring so
  • 00:07:11
    how does the model know and so that's
  • 00:07:13
    where transforming comes in what the
  • 00:07:14
    model's going to do is well transform
  • 00:07:17
    the word that essentially move this word
  • 00:07:19
    in this 12,000 dimensional space this
  • 00:07:22
    specific sentence it'll move it closer
  • 00:07:24
    to the meaning that's based on the
  • 00:07:26
    context around this specific word in
  • 00:07:28
    this sentence so that that context could
  • 00:07:30
    be the word before the word after could
  • 00:07:32
    be the words mentioned earlier in the
  • 00:07:34
    conversation for example the model might
  • 00:07:36
    notice that this R is capitalized even
  • 00:07:39
    though it's not at the beginning of the
  • 00:07:40
    sentence must mean something it might
  • 00:07:42
    also look at adjectives and how they
  • 00:07:44
    affect nouns so this transformation
  • 00:07:46
    layer makes tiny adjustments in the
  • 00:07:48
    region of space where this particular
  • 00:07:51
    word
  • 00:07:52
    lives now all of these Transformers are
  • 00:07:55
    going to run at the same time and that's
  • 00:07:57
    in part why gpus are so good at doing
  • 00:08:00
    this thing because they were built to
  • 00:08:01
    calculate all the pixels in your screen
  • 00:08:03
    at the same time now each Transformer in
  • 00:08:05
    gpt3 has about 1.8 billion parameters
  • 00:08:09
    around 600 million of those parameters
  • 00:08:11
    are in this first attention layer which
  • 00:08:13
    helps Focus the word in space and about
  • 00:08:16
    1.2 billion of those parameters are in
  • 00:08:18
    the feed forward Network layer which is
  • 00:08:20
    kind of like a like a zoom in on the
  • 00:08:23
    meaning of the word but that's as far as
  • 00:08:24
    we're going to zoom in today now gpt3
  • 00:08:26
    uses 96 of the Transformers for a total
  • 00:08:29
    of almost 174 billion parameters we're
  • 00:08:32
    almost done now the last few parameters
  • 00:08:34
    are on the output layer which
  • 00:08:35
    essentially does the unembedded the
  • 00:08:38
    inverse of the input layer it brings
  • 00:08:40
    this word these 12,000 Dimensions into
  • 00:08:44
    our old 2D bit world and it gives us the
  • 00:08:47
    result of this massive operation of the
  • 00:08:49
    model as words not as numbers the result
  • 00:08:52
    of this massive massive mathematical
  • 00:08:55
    journey is a list of words along with
  • 00:08:57
    the probability of which word comes next
  • 00:09:00
    now the whole idea of machine learning
  • 00:09:02
    is that we don't have to go to train
  • 00:09:04
    each one of those 175 billion parameters
  • 00:09:06
    to tell it what it needs to do it learns
  • 00:09:08
    itself AKA machine learning now the
  • 00:09:11
    first time this runs this thing is going
  • 00:09:12
    to spit out just gibberish but during
  • 00:09:14
    training the model adjust these
  • 00:09:16
    parameters using algorithms to reduce
  • 00:09:18
    these errors think of them like small
  • 00:09:20
    knobs that slightly move to generate
  • 00:09:24
    slightly different mathematical outcomes
  • 00:09:25
    in the end it's like trial and error on
  • 00:09:27
    a trillion scale
  • 00:09:30
    if a particular set of values helps the
  • 00:09:31
    model make a correct prediction those
  • 00:09:33
    values are reinforced if not they're
  • 00:09:35
    adjusted each of these connections
  • 00:09:36
    between one value and the other is a
  • 00:09:39
    neuron which makes a neural network and
  • 00:09:41
    it works not so differently from Human
  • 00:09:43
    neurons like it may sound impossible but
  • 00:09:45
    after billions of operations and
  • 00:09:46
    training data this thing can actually
  • 00:09:50
    and pretty accurately predict the next
  • 00:09:52
    word in a sentence again this thing has
  • 00:09:53
    consumed billions and billions of texts
  • 00:09:55
    written by humans and has become so good
  • 00:09:57
    at predicting words that it can pass our
  • 00:09:59
    tests and predicting words is the llm
  • 00:10:01
    example but you can apply this logic of
  • 00:10:03
    predicting the next thing at how a pixel
  • 00:10:06
    should look to generate an image or
  • 00:10:08
    understanding if this dress is blue or
  • 00:10:10
    gold same basic principle now you know
  • 00:10:13
    the reason why it failed the high school
  • 00:10:14
    math exam a pure GPT model doesn't do
  • 00:10:17
    math at least not directly in the
  • 00:10:19
    simplest of terms if you ask it what's
  • 00:10:21
    1+ 1 it knows the answer is two because
  • 00:10:24
    it read a million times that the answer
  • 00:10:26
    is two and it's incredibly efficient at
  • 00:10:28
    identifying patterns but not because it
  • 00:10:30
    pulled up a calculator and added 1+ one
  • 00:10:32
    that's not bad per se but it's going to
  • 00:10:34
    be a problem later the thing is once you
  • 00:10:36
    have a computer that can understand
  • 00:10:39
    these relationships within words you can
  • 00:10:41
    give it instructions in plain English
  • 00:10:43
    and it'll base its responses on that
  • 00:10:45
    like this Transformer model with an
  • 00:10:47
    instruction on top of it is the same
  • 00:10:49
    concept that grock and Lama are using
  • 00:10:52
    and they're all limited by the same
  • 00:10:53
    equation now that 175 billion parameter
  • 00:10:56
    gp3 model had problems like you could
  • 00:10:59
    tell it was AI because it didn't write
  • 00:11:01
    quite like a human it couldn't count the
  • 00:11:03
    RS in Strawberry it also had a rather
  • 00:11:05
    small limit of context how many tokens
  • 00:11:08
    before the current word are processed
  • 00:11:10
    and considered for the prediction of the
  • 00:11:12
    next word so let's just Trin it with
  • 00:11:14
    more right open AI theorized that by
  • 00:11:16
    scaling the amount of data and the
  • 00:11:17
    amount of parameters the model would get
  • 00:11:19
    a lot smarter and it did a way to
  • 00:11:21
    measure the effectiveness of the model
  • 00:11:23
    is with the error rate so the word
  • 00:11:24
    predictions that are incorrect In in
  • 00:11:26
    very simple terms it's it's more
  • 00:11:27
    complicated than that but anyway they
  • 00:11:29
    they projected the error rate decreasing
  • 00:11:31
    the bigger the model was and the more
  • 00:11:33
    data was used for its training and so
  • 00:11:35
    they went and did it they spent over a
  • 00:11:37
    $100 million in training this thing
  • 00:11:39
    leaked data from open AI says that gp4
  • 00:11:41
    uses 1.8 trillion parameters it has more
  • 00:11:45
    Transformer steps potentially with more
  • 00:11:46
    dimensions for the tokens and it took
  • 00:11:48
    about 25,000 gpus running for over 3
  • 00:11:51
    months to train GPT 4 but it worked the
  • 00:11:55
    results were way better than gpt3 so
  • 00:11:57
    let's just keep doing that right more
  • 00:11:59
    gpus more data more parameters well
  • 00:12:03
    that's when they hit a
  • 00:12:05
    wall now that wall is this formula I
  • 00:12:08
    said that I wouldn't show you again cuz
  • 00:12:09
    you would think that a bigger model in
  • 00:12:11
    this case you know the bigger the size
  • 00:12:13
    of the model the better the performance
  • 00:12:15
    at a at some fantastic astronomical
  • 00:12:17
    level but open AI has kind of reached
  • 00:12:19
    this wall of diminishing returns it's
  • 00:12:21
    kind of like here right there's not a
  • 00:12:23
    lot that we can do like regardless of
  • 00:12:25
    the size we just can't get that
  • 00:12:27
    performance up a lot even if we throw a
  • 00:12:28
    lot more dat and create neural networks
  • 00:12:30
    with quadrillions of parameters the
  • 00:12:32
    improvements are going to be marginal
  • 00:12:34
    all the way through 2024 we had lived on
  • 00:12:37
    this part of the chart right but GPT 5
  • 00:12:41
    failures seem to reveal that we've kind
  • 00:12:44
    of arrived at this Plateau right
  • 00:12:47
    here and that's not even the worst of it
  • 00:12:49
    so a recent paper concluded that there
  • 00:12:51
    is simply not enough data to train them
  • 00:12:55
    there is a point in this curve where the
  • 00:12:57
    amount of data needed for training is
  • 00:12:59
    bigger than the amount of data that
  • 00:13:01
    exists we just haven't produced enough
  • 00:13:03
    data that can be used for training text
  • 00:13:06
    knowledge images speech to satisfy the
  • 00:13:08
    needs that the models would have to
  • 00:13:11
    reach Perfection or or a very very small
  • 00:13:14
    error rate so in other words we have
  • 00:13:15
    found the limit of the current machine
  • 00:13:18
    learning algorithms the models are
  • 00:13:21
    flawed and Humanity doesn't have the
  • 00:13:23
    resources to train them let's be let's
  • 00:13:25
    be real for a second like this series of
  • 00:13:27
    tubes a series of tubes this Transformer
  • 00:13:30
    model is arguably one of the most
  • 00:13:31
    important scientific breakthroughs of
  • 00:13:33
    the century and I'm focusing on language
  • 00:13:36
    models here but we have now built models
  • 00:13:38
    to predict the shape of proteins which
  • 00:13:40
    seemed an impossible task for a human if
  • 00:13:42
    you wanted to produce an image of
  • 00:13:44
    something that didn't exist you need
  • 00:13:46
    creative people illustrators Photoshop
  • 00:13:47
    artists 3D rendering and now an AI can
  • 00:13:50
    just deduce how something looks from
  • 00:13:53
    previous training like I don't think
  • 00:13:55
    enough people talk about what this means
  • 00:13:57
    for 3D artists when we spent trying to
  • 00:13:59
    build a world a new world from scratch
  • 00:14:01
    and now a computer can reverse engineer
  • 00:14:03
    that from training data and just give us
  • 00:14:05
    the same result in a fraction of the
  • 00:14:07
    time but it's not over though for years
  • 00:14:09
    we thought there was no other way to
  • 00:14:10
    reach this level of performance unless
  • 00:14:12
    we had like two trillion parameters
  • 00:14:14
    billions of dollars and servers and
  • 00:14:16
    piles of training data but it looks like
  • 00:14:18
    there is a better way a way around it
  • 00:14:20
    based on deep seeks efficiency with
  • 00:14:22
    apparently a fraction of the parameters
  • 00:14:24
    and the cost we're yet to see if that's
  • 00:14:26
    true still I think it's only a matter of
  • 00:14:28
    time before we find a more efficient way
  • 00:14:30
    to do all this process easier and
  • 00:14:32
    cheaper but even more importantly the
  • 00:14:34
    answer may not be GPT 5 or 6 or seven
  • 00:14:37
    nlms have proven that we can make a
  • 00:14:39
    computer understand natural language and
  • 00:14:41
    so companies figured The Next Step was
  • 00:14:43
    connecting other systems to that brain
  • 00:14:45
    and this is why GPT can now see images
  • 00:14:47
    or recognize speech giving eyes and ears
  • 00:14:50
    to the system hello there cutie that
  • 00:14:52
    eventually will turn into hands but how
  • 00:14:55
    far is it from cooking an egg or doing
  • 00:14:57
    my dishes there's some reasoning needed
  • 00:14:59
    behind
  • 00:15:01
    that so this is the whole idea of what
  • 00:15:03
    models like 01 and more recently 03 try
  • 00:15:06
    to do this model that we built
  • 00:15:07
    originally just tries to spit out the
  • 00:15:09
    next word as quickly as possible but
  • 00:15:11
    scientists came up with this concept of
  • 00:15:13
    reasoning Now using the same Transformer
  • 00:15:16
    the same model at its core it tries to
  • 00:15:18
    interpret the question and tries to
  • 00:15:20
    break it down into smaller subtasks or
  • 00:15:23
    prompts and then it tries to solve each
  • 00:15:25
    one of those prompts in order kind of
  • 00:15:27
    giving it like part answers to your
  • 00:15:29
    original question now once that possible
  • 00:15:32
    response is done it analyzes it again to
  • 00:15:34
    see if it makes sense against the
  • 00:15:36
    original question and the original
  • 00:15:38
    context of the conversation so it does a
  • 00:15:39
    bit like your own brain's thought
  • 00:15:42
    process you know writing that email
  • 00:15:43
    response starting over readjusting
  • 00:15:45
    rereading before you hit send nailed
  • 00:15:50
    it like this step-by-step process is
  • 00:15:52
    called Chain of Thought again not too
  • 00:15:54
    different from your train of thought but
  • 00:15:56
    let's go back to that table that I
  • 00:15:57
    showed you earlier this iterative
  • 00:15:59
    thinking has actually allowed current
  • 00:16:00
    models to beat us at General human
  • 00:16:03
    intelligence tests IQ structure logic
  • 00:16:06
    decision-making scenarios it's good
  • 00:16:08
    enough to write basic and even some
  • 00:16:09
    intermediate code it's got a fair share
  • 00:16:12
    of Engineers struggling to find jobs
  • 00:16:14
    which would have thought that they were
  • 00:16:15
    the first to be replaced by this but
  • 00:16:17
    anyway in startups at least there's
  • 00:16:18
    there's an unspoken truth of the number
  • 00:16:22
    of jobs that AI has already replaced but
  • 00:16:24
    it's very bad press and nobody really
  • 00:16:26
    wants to talk about it but when you get
  • 00:16:27
    out of control environments and into the
  • 00:16:30
    real world that's where AI struggles
  • 00:16:32
    like Common Sense reasoning creativity
  • 00:16:35
    decision- making in real world scenarios
  • 00:16:37
    and even advanced mathematics where
  • 00:16:39
    problem solving and some creativity is
  • 00:16:41
    required like these models take minutes
  • 00:16:44
    to process through all of this and it
  • 00:16:46
    still takes a seconds to make these
  • 00:16:47
    Advanced decisions that's a processing
  • 00:16:50
    and capacity problem not an architecture
  • 00:16:52
    problem also what happens when these
  • 00:16:54
    models are allowed to escape containment
  • 00:16:58
    when they can start doing things in the
  • 00:17:00
    real world operator is a resarch preview
  • 00:17:03
    of an agent that uses browser to uh help
  • 00:17:06
    user to do things I'm not doing anything
  • 00:17:08
    right now the operator is doing
  • 00:17:09
    everything by itself it's okay Mom
  • 00:17:10
    should help I think we we just keep
  • 00:17:13
    pushing the bar of what intelligence is
  • 00:17:17
    feelings right creativity true invention
  • 00:17:20
    we still have some of those and
  • 00:17:23
    computers don't but the set of human
  • 00:17:25
    only skills is shrinking it's running
  • 00:17:28
    out and I think we have to deal with the
  • 00:17:29
    reality that it's no longer an if but a
  • 00:17:32
    when question do computers really think
  • 00:17:36
    let's just say it all depends on what
  • 00:17:39
    you mean by thinking now if you enjoyed
  • 00:17:43
    today's explainer you should watch our
  • 00:17:45
    video from last week on how money gets
  • 00:17:46
    created and why 93% of today's money
  • 00:17:49
    doesn't really exist catch you on the
  • 00:17:51
    next one
  • 00:17:53
    [Music]
タグ
  • AI
  • Artificial Intelligence
  • AGI
  • Machine Learning
  • GPT
  • Parameters
  • Mathematics
  • Diminishing Returns
  • Common Sense
  • Chain of Thought