Live Chat with Matt Shumer about Reflection 70b!

00:35:30
https://www.youtube.com/watch?v=5_m-kN64Exc

Summary

TLDRThe video stream discusses the launch of a new open-source AI model named Reflection 70b, featuring guests Matt Schumer, co-founder and CEO of hyperr AI, and Sahil Chow, founder of glaive. Reflection 70b introduces a new concept called reflection tuning, which aims to enhance AI performance by enabling the model to self-correct its responses by recognizing and reflecting on mistakes during the inference process. This innovative model has shown competitive results, outperforming larger models like the llama 3.1 405b. The development of Reflection 70b was completed in just three weeks, showcasing a collaborative effort using synthetic data to refine training strategies. Sahil Chow's glaive played a crucial role in delivering the synthetic data that facilitated reflection tuning. The technique allows the model to improve accuracy by essentially mirroring human problem-solving strategies, including identifying and fixing errors. While reflection tuning can be partially mimicked through advanced prompt engineering, its full potential is realized when baked into the model, offering significant performance improvements. The model is particularly exciting for the open-source community as it showcases that high-quality AI can be achieved without the extensive resources typically used by large AI labs. Future releases, including a 405b model, and possibly more enhancements are anticipated, keeping the AI community engaged.

Takeaways

  • 🎤 Guests are Matt Schumer and Sahil Chow.
  • 🆕 Introduction of Reflection 70b model.
  • 🧠 Utilizes reflection tuning technique.
  • 🛠 Developed in three weeks.
  • 📊 Outperforms larger models like llama 405b.
  • 🤖 Leverages synthetic data for training.
  • 💡 Focuses on model self-correction.
  • ✅ Significant improvements through reflection tuning.
  • 🔓 It's an open-source project.
  • 🚀 Anticipation for more future releases.

Timeline

  • 00:00:00 - 00:05:00

    The host introduces the live stream with guests Matt Schumer, co-founder and CEO of Hyperr AI, and Sahil Chow from Glaive, discussing their new model, Reflection 70b. They express excitement about discussing the recently released open-source model using a novel technique called Reflection Tuning.

  • 00:05:00 - 00:10:00

    Matt Schumer talks about his background and company Hyperr AI, which he founded to create AI solutions, such as an AI that writes emails. He shares that Reflection 70b was developed fairly quickly, in three weeks, through collaboration with Sahil Chow.

  • 00:10:00 - 00:15:00

    Matt describes the inspiration and process of developing Reflection 70b, highlighting the use of a specially curated dataset. Sahil introduces Glaive and explains their role in creating synthetic datasets for custom model training, emphasizing the efficiency and speed of their synthetic data generation process.

  • 00:15:00 - 00:20:00

    Matt explains how Reflection Tuning leverages large language models (LLMs) to improve thought processes and reasoning by enabling models to reflect on their errors. Sahil elaborates on the use of synthetic datasets to train models to recognize mistakes and improve accuracy, showing significant performance gains in certain benchmarks.

  • 00:20:00 - 00:25:00

    The discussion focuses on the training process of the model and the development of Reflections – a technique where models are trained to reflect like humans when errors are detected. Sahil explains the dataset structuring and selective use of Reflections to avoid overthinking and ensure higher precision in decision-making.

  • 00:25:00 - 00:30:00

    Glaive's role in providing the synthetic data is discussed, detailing how they designed the reflection examples to aid fine-tuning. Matt discusses the benefits of integrating reflection into models over simply prompt engineering, showing how their approach outperformed traditional methods significantly.

  • 00:30:00 - 00:35:30

    The closing of the stream included audience questions about the application of this technique to other models, and the potential for future open access to the dataset for broader application. The hosts emphasize their openness to innovation, hinting at additional unexplored strategies beyond Reflection Tuning.

Show more

Mind Map

Video Q&A

  • Who are the guests on the stream?

    Matt Schumer, co-founder and CEO of hyperr AI, and Sahil Chow from glaive, the founder of the company.

  • What is Reflection 70b?

    It is an open-source AI model using a new technique called reflection tuning to improve its performance.

  • What is reflection tuning?

    Reflection tuning is a technique that allows the model to recognize and correct its own mistakes by reflecting on its outputs during inference.

  • What differentiates Reflection 70b from other models?

    Reflection 70b uses reflection tuning to improve performance by learning to correct mistakes, making it competitive with larger models like llama 3.1 405b.

  • How was Reflection 70b developed so quickly?

    The model was developed in three weeks by leveraging synthetic data and a novel training process focusing on reflection tuning.

  • What role did Sahil Chow play in developing the model?

    Sahil Chow's company, glaive, helped in creating the synthetic data and training strategies for Reflection 70b.

  • Is Reflection 70b available for public usage?

    Yes, the model is open-source, and there are plans to release the dataset and fine-tuning details.

  • Why is reflection important in AI models?

    Reflection allows AI models to self-correct and improve accuracy by learning from their own mistakes.

  • Will they release any other versions of this model?

    Yes, a 405b version is planned for release soon.

  • How do synthetic data and reflection tuning work together?

    Synthetic data helps create training samples that teach the model to recognize and reflect on mistakes, improving learning efficiency.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:05
    [Music]
  • 00:00:33
    [Music]
  • 00:00:55
    [Music]
  • 00:01:07
    [Music]
  • 00:01:19
    [Music]
  • 00:01:35
    [Music]
  • 00:01:45
    [Music]
  • 00:02:14
    [Music]
  • 00:02:20
    [Music]
  • 00:02:32
    [Music]
  • 00:02:45
    [Music]
  • 00:02:56
    [Music]
  • 00:02:59
    e
  • 00:03:34
    ah there we go yes we have sound Massa
  • 00:03:37
    Hill can you hear me
  • 00:03:40
    okay all right I think you might still
  • 00:03:42
    be muted I'm gonna just do a quick intro
  • 00:03:45
    I'm muted yeah good now all right here
  • 00:03:48
    we go no sound the sound should be
  • 00:03:51
    coming in any moment
  • 00:03:54
    now no rip audio no audio yep it'll be
  • 00:03:57
    fixed I still don't know how to operate
  • 00:03:59
    a mute
  • 00:04:02
    button all right we got uh we got audio
  • 00:04:05
    good okay so today super excited to
  • 00:04:09
    share we have uh two special guests
  • 00:04:11
    joining us today uh Matt Schumer
  • 00:04:14
    co-founder and CEO of hyperr AI and the
  • 00:04:20
    author creator of the most uh recently
  • 00:04:23
    dropped amazing open source open weights
  • 00:04:25
    model reflection 70b we also have sahil
  • 00:04:30
    Chow from uh glaive he is the founder of
  • 00:04:34
    glaive who was uh pivotal in helping get
  • 00:04:37
    this model to everybody so very excited
  • 00:04:40
    to talk to you both thank you so much
  • 00:04:42
    for joining me today uh cannot wait to
  • 00:04:45
    hear more about what went into this
  • 00:04:47
    model but uh welcome to the stream
  • 00:04:51
    guys thanks for having
  • 00:04:54
    us all right um so first let's just talk
  • 00:04:59
    about what happened so yesterday
  • 00:05:02
    actually I'm going to share your your
  • 00:05:03
    Twitter
  • 00:05:04
    post let's see you probably posted a ton
  • 00:05:06
    of stuff since
  • 00:05:08
    yesterday too
  • 00:05:11
    much cool here's the announcement so
  • 00:05:14
    match humer yesterday uh 11:51 a.m. I'm
  • 00:05:17
    excited to announce reflection 70b the
  • 00:05:19
    world's top open source model trained
  • 00:05:23
    using a new technique reflection tuning
  • 00:05:25
    we'll talk about what that is today and
  • 00:05:28
    it is doing incredibly well uh if you
  • 00:05:32
    look at the benchmarks just across the
  • 00:05:34
    board competitive with the other
  • 00:05:37
    Frontier both closed source and open
  • 00:05:39
    source models beating llama 3.1 405b the
  • 00:05:42
    larger one and according to Matt we have
  • 00:05:45
    uh the 405b model coming out next week
  • 00:05:47
    so first Matt maybe just tell us a
  • 00:05:51
    little bit about yourself uh what do you
  • 00:05:53
    what do you do and what made you want to
  • 00:05:56
    create this
  • 00:05:58
    model yeah yeah so um high level on
  • 00:06:02
    myself and the company right so uh I've
  • 00:06:04
    been personally starting company since I
  • 00:06:05
    was 12 uh we can skip a lot of time a
  • 00:06:07
    few years ago uh when I was in college I
  • 00:06:09
    started uh other side AI which is the
  • 00:06:11
    company that sort of became hyperight
  • 00:06:14
    right and the idea about hyperight
  • 00:06:15
    initially was can we create an AI that
  • 00:06:18
    can write your emails for you we were
  • 00:06:19
    actually the first um that were aware of
  • 00:06:21
    uh VC backed company in this sort of
  • 00:06:23
    like quote unquote generative AI space
  • 00:06:24
    that used the open AI models at the
  • 00:06:26
    beginning we've grown quite a bit since
  • 00:06:28
    then now we have well a few million
  • 00:06:31
    users uh millions in Revenue were
  • 00:06:33
    profitable um for a much sort of like
  • 00:06:36
    expanded version of that product it's
  • 00:06:38
    sort of writing in general we're the
  • 00:06:39
    best AI for writing continues to get
  • 00:06:42
    better we have a lot of sort of specific
  • 00:06:43
    things we do but that's that's the high
  • 00:06:45
    level um in terms of this model this is
  • 00:06:48
    sort of just a fun thing uh very long
  • 00:06:50
    story short I was actually on vacation
  • 00:06:52
    and my mind was just everywhere and I
  • 00:06:54
    was like okay I'm I'm kind of bored of
  • 00:06:56
    this I need to actually do something
  • 00:06:57
    productive and I've been noling on this
  • 00:07:00
    idea for a very long time and it was
  • 00:07:01
    sort of like time to just do it so
  • 00:07:03
    towards the end of the trip I actually
  • 00:07:05
    reached out to sahill and I was like can
  • 00:07:06
    we collaborate and that's how this came
  • 00:07:08
    to be how long ago was
  • 00:07:12
    that three weeks I think so to be clear
  • 00:07:18
    uh you had the idea reached out to
  • 00:07:21
    sahill put together the data set
  • 00:07:25
    fine-tuned the model and published it
  • 00:07:27
    all within three weeks
  • 00:07:31
    yes and I know that sounds crazy I get
  • 00:07:35
    that but I think a lot of people
  • 00:07:37
    underestimate what can be done today
  • 00:07:38
    with a very small amount of resources
  • 00:07:40
    right too many people look to the big
  • 00:07:42
    the big AI labs and they're like hey
  • 00:07:43
    look you know they spent billions of
  • 00:07:44
    dollars doing this there's no hope for
  • 00:07:46
    anybody to compete with a small budget
  • 00:07:47
    and small team small amount of time
  • 00:07:50
    right but thill and I did this on the
  • 00:07:52
    side right this wasn't even sort of our
  • 00:07:53
    major Focus not even close to it and
  • 00:07:55
    it's just thinking about the problem in
  • 00:07:57
    a different way right the data set was
  • 00:08:00
    everything for this and glaive made that
  • 00:08:03
    possible we just had to kind of know
  • 00:08:04
    what we were going after and once we
  • 00:08:06
    sort of had the idea of what this would
  • 00:08:08
    look like it was very easy to actually
  • 00:08:09
    put into into practice awesome and and
  • 00:08:13
    on that note sahill please tell us about
  • 00:08:16
    yourself tell us about what glaive
  • 00:08:20
    does yeah yeah uh kind of similar to M
  • 00:08:23
    I've been
  • 00:08:24
    doing startups specifically AI startups
  • 00:08:27
    for a few years now uh just before
  • 00:08:29
    starting live I worked at a company
  • 00:08:32
    called banana where we were building
  • 00:08:34
    sess gpus for machine
  • 00:08:36
    learning uh realized that people cannot
  • 00:08:38
    host custom models because to have
  • 00:08:40
    really high performing custom models
  • 00:08:42
    that use case specific you need high
  • 00:08:43
    quality data and companies like that
  • 00:08:46
    most of the times and that is when I
  • 00:08:48
    decided to just uh be a founder and
  • 00:08:52
    start claive and glaive is essentially a
  • 00:08:54
    platform where companies can build use
  • 00:08:57
    case specific data sets uh using using
  • 00:08:59
    our synthetic data generation Pipeline
  • 00:09:02
    and train mods on that and basically
  • 00:09:04
    keep iterating on them uh because it's
  • 00:09:07
    uh as I think match said it's it's
  • 00:09:09
    pretty quick to iterate on data sets
  • 00:09:11
    with the help of synthetic data uh the
  • 00:09:13
    generation process is pretty quick yeah
  • 00:09:17
    uh that's uh that's summary cool uh so I
  • 00:09:22
    mean Matt you said it the data set is
  • 00:09:24
    everything but I'm I want to touch on
  • 00:09:26
    that a little bit because we can talk
  • 00:09:28
    about you know publicly available data
  • 00:09:30
    and and how that is essentially all used
  • 00:09:34
    in current Frontier models and there's
  • 00:09:37
    really like two solutions to that you
  • 00:09:38
    either create synthetic data or you do
  • 00:09:41
    more with the data that you have and so
  • 00:09:44
    which approach were did you take with
  • 00:09:47
    reflection and what what makes
  • 00:09:49
    reflection special what is the the
  • 00:09:51
    secret sauce
  • 00:09:53
    there yeah so I think s could probably
  • 00:09:55
    talk more to the specifics of the data
  • 00:09:57
    side and how the data set was
  • 00:09:58
    constructed I could talk more the sort
  • 00:09:59
    of idea behind reflection and why it
  • 00:10:02
    works um so at a high level right the
  • 00:10:05
    general idea is llms are sort of getting
  • 00:10:08
    to the point where they can they can
  • 00:10:09
    think quote unquote think um and it's
  • 00:10:12
    sort of mirrored after how a human
  • 00:10:13
    thinks right you kind of talk through it
  • 00:10:15
    in your head you know some people do it
  • 00:10:16
    differently but you know some people
  • 00:10:17
    kind of talk through the problem in
  • 00:10:18
    their head and arrive at an answer and
  • 00:10:20
    llms do that today with Chain of Thought
  • 00:10:22
    that's how they have you know it's one
  • 00:10:24
    of the reasons they've improved in
  • 00:10:25
    performance so much over the last few
  • 00:10:27
    years uh it's been a corner so know
  • 00:10:29
    pretty much every major llm whether it's
  • 00:10:32
    coding whether it's just general
  • 00:10:33
    reasoning or writing even um so we do a
  • 00:10:36
    lot of work with that and one of the
  • 00:10:37
    things that I realized is like look as a
  • 00:10:40
    person I can think through a problem but
  • 00:10:41
    I'm not always going to get every little
  • 00:10:43
    bit right and what happens if I get
  • 00:10:44
    something wrong well personally I
  • 00:10:46
    reflect I'm like wait I made a mistake I
  • 00:10:49
    backtrack and I fix myself but an llm
  • 00:10:51
    doesn't do that one of the things we
  • 00:10:53
    noticed is that llms once they make that
  • 00:10:55
    first mistake they kind of accept it as
  • 00:10:57
    fact right if I'm asking it 2 plus two
  • 00:11:01
    and then multiply that by seven right
  • 00:11:03
    and it says okay I'll do that let's
  • 00:11:05
    start with two plus two that equals five
  • 00:11:07
    and then five time 7even right it's
  • 00:11:08
    already made that mistake and it's just
  • 00:11:10
    assuming the thing that it said is right
  • 00:11:12
    and if we could
  • 00:11:14
    essentially teach the model to think
  • 00:11:17
    more like we do and actually reflect on
  • 00:11:20
    behavior that it makes a mistake in the
  • 00:11:23
    model gets smarter it gets more reliable
  • 00:11:26
    gets more accurate that's generally the
  • 00:11:28
    idea here we did a lot of even beyond
  • 00:11:31
    the data it's how we trained the model
  • 00:11:32
    right we created what we call like our
  • 00:11:34
    own splitting strategy that allows the
  • 00:11:37
    model to not learn to make mistakes
  • 00:11:38
    because if you do this wrong it's very
  • 00:11:39
    easy to teach the model to make more
  • 00:11:41
    mistakes and then fix them the idea is
  • 00:11:43
    to keep the model's performance as is or
  • 00:11:44
    make it even better and then on the
  • 00:11:47
    things where it actually ends up making
  • 00:11:48
    a mistake and it would no matter what we
  • 00:11:50
    can fix that and it's still not perfect
  • 00:11:52
    it's still very early it's still a first
  • 00:11:54
    version of this but I think we've come
  • 00:11:56
    quite a bit um away in a few weeks so
  • 00:11:58
    that's that's sort of like how it works
  • 00:12:00
    sah maybe you want to talk a little bit
  • 00:12:01
    more to the the data generation process
  • 00:12:03
    itself because that was all
  • 00:12:05
    you yeah yeah I can do that uh
  • 00:12:09
    surprisingly the data set we used for
  • 00:12:11
    this isn't that big uh what people would
  • 00:12:14
    expect it's roughly we generated uh and
  • 00:12:17
    we generated it in steps uh we started
  • 00:12:19
    out with I think just 10,000 samples uh
  • 00:12:22
    to see if this is actually possible and
  • 00:12:24
    we scaled up to 100,000
  • 00:12:26
    samples uh we decided that we'll do uh
  • 00:12:29
    some code data some math data General
  • 00:12:31
    reasoning function calling and multiturn
  • 00:12:34
    so uh the goal really wasn't to get a
  • 00:12:37
    lot of data and teach the model how to
  • 00:12:39
    reason but it was essentially to teach
  • 00:12:41
    the model to recognize its own mistake
  • 00:12:44
    and I think uh to that point see a lot
  • 00:12:47
    of people uh mentioned that you can kind
  • 00:12:49
    of get similar gains out of just
  • 00:12:51
    prompting uh regular instruct models to
  • 00:12:54
    use reflection and I think that's true
  • 00:12:57
    to some extent you can get uh percentage
  • 00:12:59
    of the gains by just prompting but if
  • 00:13:02
    you try to prompt even son 3.5 for
  • 00:13:05
    example you'll see that it's there's a
  • 00:13:07
    lot of bias in the model outputs where
  • 00:13:10
    the model almost always believes that
  • 00:13:12
    what it's saying is correct and we face
  • 00:13:15
    that problem in data generation as well
  • 00:13:17
    right like we use language models we use
  • 00:13:19
    f tune language models to generate
  • 00:13:21
    synthetic data and when we first try to
  • 00:13:24
    actually generate reflection data we
  • 00:13:25
    realize that if we ask the model to
  • 00:13:28
    actually make the mistake and then
  • 00:13:30
    reflect on it Ms are really bad at doing
  • 00:13:32
    that uh maybe it's just R if you do that
  • 00:13:35
    like the MS will make mistakes easily
  • 00:13:37
    but if you ask a model to deliberately
  • 00:13:39
    make a mistake it just won't be able to
  • 00:13:43
    and that was the main goal of just find
  • 00:13:45
    you to teach the model that it can
  • 00:13:48
    actually make mistakes and and correct
  • 00:13:51
    that yeah uh I I have so many questions
  • 00:13:55
    about this so uh for those of you
  • 00:13:58
    watching uh maybe this is a good way to
  • 00:14:00
    break it down you you effectively took
  • 00:14:02
    really kind of sophisticated prompt
  • 00:14:04
    engineering strategies you know s even
  • 00:14:08
    something as simple as explain your
  • 00:14:09
    reasoning step by step which is
  • 00:14:11
    something that I put into all of my
  • 00:14:13
    prompts when I'm giving it more complex
  • 00:14:15
    reasoning and logic questions um but you
  • 00:14:18
    can get obviously even more
  • 00:14:20
    sophisticated than that Chain of Thought
  • 00:14:22
    um and and so on and so you've taken
  • 00:14:25
    that and you've built it into the model
  • 00:14:27
    itself and you've actually given it
  • 00:14:29
    examples of making mistakes along the
  • 00:14:32
    way and that's actually what I'm showing
  • 00:14:33
    I believe on the screen right here
  • 00:14:35
    actually there's a lot going on
  • 00:14:36
    hopefully yall can see it uh but but
  • 00:14:39
    effectively you're teaching the model in
  • 00:14:43
    the fine-tuning data itself to think
  • 00:14:45
    through everything you know more step by
  • 00:14:48
    step but also to and hence the name
  • 00:14:52
    reflect on the output as it's doing the
  • 00:14:56
    inference and then potentially correct
  • 00:14:58
    itself during the inference process so
  • 00:15:01
    what first of all H like what was the
  • 00:15:06
    trigger for thinking that this would be
  • 00:15:08
    really a a a good strategy and then why
  • 00:15:11
    is it better than simply uh put like
  • 00:15:15
    having a system message with some of
  • 00:15:17
    this prompting techniques or or putting
  • 00:15:19
    it in the prompt
  • 00:15:22
    itself yeah I mean why don't I start
  • 00:15:24
    with what you you just asked the last
  • 00:15:25
    question s covered a little bit of this
  • 00:15:27
    um some of my person findings with this
  • 00:15:29
    because I've been prompt engineering for
  • 00:15:32
    too many years now um like basically
  • 00:15:34
    since 2019 with gpd2 um what I found was
  • 00:15:39
    essentially if you asked it to do this
  • 00:15:41
    and didn't train on it a couple things
  • 00:15:43
    would happen sometimes it was sort of
  • 00:15:44
    like what side was talking about where
  • 00:15:46
    you know the model was overconfident but
  • 00:15:48
    there was another issue I found which is
  • 00:15:50
    sometimes this is kind of getting a
  • 00:15:51
    little weird the model would actually
  • 00:15:53
    deliberately make mistakes that it
  • 00:15:54
    wouldn't have made otherwise because
  • 00:15:56
    when you're prompting it really wants to
  • 00:15:58
    follow those instructions
  • 00:15:59
    really wants to follow that system
  • 00:16:00
    prompt and if you say when you make a
  • 00:16:03
    mistake fix it with a reflection tag
  • 00:16:06
    you're going to notice that it's going
  • 00:16:07
    to make that mistake what you notice
  • 00:16:08
    with our model is that it doesn't always
  • 00:16:10
    use reflection it only uses it when it
  • 00:16:12
    really thinks it needs to sometimes
  • 00:16:14
    little a little bit too much but we're
  • 00:16:15
    you know we're we're much more towards
  • 00:16:17
    the end of like use it when you need to
  • 00:16:18
    don't when you don't because if you're
  • 00:16:20
    teaching it to do it all the time right
  • 00:16:21
    it's just going to make mistakes
  • 00:16:23
    deliberately um so yeah it doesn't
  • 00:16:26
    really work for prompting and I do have
  • 00:16:28
    to apologize like I said said I not
  • 00:16:29
    slept in two days so what was the first
  • 00:16:30
    part of your question no I
  • 00:16:33
    I I'm trying to understand like what the
  • 00:16:38
    reasoning is like what if it's better to
  • 00:16:41
    put the prompt in the actual fine-tune
  • 00:16:45
    data or if it's better to do it
  • 00:16:47
    afterwards and obviously like your model
  • 00:16:49
    is is crushing The Benchmark so you've
  • 00:16:52
    done something really well but like what
  • 00:16:55
    what was that idea that made you think
  • 00:16:58
    that it it's actually like let's let's
  • 00:17:00
    push this to the fine tune
  • 00:17:03
    itself yeah I mean I've had this idea
  • 00:17:05
    like I mentioned for for many months now
  • 00:17:07
    and different forms of it um so it was
  • 00:17:10
    sort of like how do you enable that for
  • 00:17:12
    everyone without them needing to figure
  • 00:17:13
    it out themselves but even if they
  • 00:17:15
    wanted to figure out themselves without
  • 00:17:16
    a fine tune it doesn't work super well
  • 00:17:17
    you know prompting like like sah said
  • 00:17:19
    can get you a marginal gain like you
  • 00:17:21
    definitely will see better performance
  • 00:17:22
    but it's not going to be the performance
  • 00:17:23
    jump we saw I mean we saw 10 plus
  • 00:17:25
    percent uh jumps in many benchmarks I
  • 00:17:27
    mean it was it was taking L 70b and
  • 00:17:30
    bring it past 405b and much further
  • 00:17:32
    beyond that I mean we're expecting to
  • 00:17:34
    see with 45b is going to be hopefully
  • 00:17:36
    insane um so yeah it's really that this
  • 00:17:39
    can't be done with prompting alone if
  • 00:17:41
    you really want to get the full force of
  • 00:17:42
    it um this is more close this is closer
  • 00:17:45
    to how people think I think there are
  • 00:17:47
    many more Gams we can make there and it
  • 00:17:49
    was just one of those stupid obvious
  • 00:17:51
    ideas that no one really thought to do
  • 00:17:54
    right I think a lot of the major labs
  • 00:17:55
    they they're way too in the weeds in
  • 00:17:56
    many ways they're thinking to
  • 00:17:58
    technically and too intelligently about
  • 00:18:00
    this where sometimes the simple thing is
  • 00:18:02
    what actually is going to work right
  • 00:18:04
    just like so many people back in you
  • 00:18:06
    know 2018 whatever they were they were
  • 00:18:08
    working on all these new architectures
  • 00:18:09
    but scaling them when we had was the
  • 00:18:11
    right answer yeah and sahill I think you
  • 00:18:14
    mentioned like
  • 00:18:16
    putting almost like telling the model to
  • 00:18:19
    make the mistake and then showing it how
  • 00:18:20
    to correct itself I is is one of the
  • 00:18:23
    secrets to reflection but you also said
  • 00:18:26
    if you do that too much it will just
  • 00:18:28
    start making mistakes what is that
  • 00:18:30
    balance how did you discover where that
  • 00:18:32
    kind of Cliff is before it started just
  • 00:18:35
    taking those mistakes and and and
  • 00:18:37
    accepting them as
  • 00:18:39
    truth yeah I think there's a bunch of
  • 00:18:41
    small uh details there and I think next
  • 00:18:44
    week when we put out like a technical
  • 00:18:46
    report it's going to help people
  • 00:18:48
    understand that as well but
  • 00:18:49
    essentially uh if you if you look at uh
  • 00:18:54
    the responses the model generate you'll
  • 00:18:56
    see in somewhere or other it's not
  • 00:18:58
    always structure but it's going to
  • 00:18:59
    classify whether the problem is a hard
  • 00:19:01
    problem moderate or an easy problem and
  • 00:19:04
    you're going to see that it only uses
  • 00:19:05
    reflection andine of thought for
  • 00:19:07
    problems it thinks uh is a hard problem
  • 00:19:10
    for itself and that was part of the
  • 00:19:12
    training so we uh only added Reflections
  • 00:19:16
    we classified problems as easy moderate
  • 00:19:18
    and hard and we only added Reflections
  • 00:19:20
    to the hard problems because you don't
  • 00:19:23
    always need Reflections you don't want
  • 00:19:24
    to teach them all to
  • 00:19:26
    overthink uh that was one part of it and
  • 00:19:28
    I think I think uh the second balance is
  • 00:19:30
    we also added uh data samples which are
  • 00:19:34
    essentially just Reflections but the all
  • 00:19:37
    reflects on something it did which was
  • 00:19:39
    correct and just realizes that it does
  • 00:19:41
    not need to actually course correct and
  • 00:19:43
    it was already correct and move on with
  • 00:19:45
    the existing uh Chain of Thought So we
  • 00:19:48
    we try to balance it out with uh these
  • 00:19:51
    type of samples trying to cover as many
  • 00:19:52
    H cases as possible I think there's
  • 00:19:54
    still room for
  • 00:19:56
    improvement but uh it was essentially I
  • 00:19:58
    think someone on twit found out that
  • 00:20:00
    this was V5 of the model we trained we
  • 00:20:02
    had like V1 2 3 and four and that's
  • 00:20:05
    essentially what it was uh training
  • 00:20:07
    model and figuring out uh how to balance
  • 00:20:10
    the data set even more so if we're if
  • 00:20:13
    we're looking at this example I I guess
  • 00:20:15
    I'm trying to figure out uh you you give
  • 00:20:17
    it a prompt and it's running inference
  • 00:20:20
    and outputting is it truly able to
  • 00:20:25
    reflect on the output as it's giving the
  • 00:20:28
    output like what is what does that flow
  • 00:20:30
    look
  • 00:20:32
    like yeah so that was that was actually
  • 00:20:34
    an intentional a very intentional design
  • 00:20:36
    decision
  • 00:20:37
    where I think a lot of people get
  • 00:20:39
    frustrated when they look at this is
  • 00:20:41
    actually something we learned from hyper
  • 00:20:42
    right people get frustrated when they
  • 00:20:43
    see Chain of Thought you know some of
  • 00:20:44
    the more power Usery people they like it
  • 00:20:47
    but your average user doesn't they just
  • 00:20:48
    want to know the answer to the question
  • 00:20:50
    and if they want to dive deeper great
  • 00:20:51
    but they don't always want to dive
  • 00:20:53
    deeper and they don't want to dig
  • 00:20:54
    through amount of text Chain of Thought
  • 00:20:55
    text to find that answer so this gets
  • 00:20:59
    even worse with reflection right because
  • 00:21:00
    when it's reflecting in its output it's
  • 00:21:02
    like oh my God there's so much going on
  • 00:21:04
    here how do I parse through this it's
  • 00:21:05
    just the cognitive load is crazy so how
  • 00:21:08
    can we fix that and one of the ideas we
  • 00:21:10
    had um was just separate the reasoning
  • 00:21:13
    out right especially as inference is
  • 00:21:15
    getting faster look at what's going on
  • 00:21:16
    with for example Gro and cerebrus right
  • 00:21:18
    inference is going to get way way faster
  • 00:21:21
    and if we can do some of the thinking up
  • 00:21:22
    front and just sort of hide that from
  • 00:21:24
    the user and show it optionally if they
  • 00:21:26
    want to see it and then spit out a
  • 00:21:28
    perfect answer at the end that's what is
  • 00:21:30
    ideal um for most users in most used
  • 00:21:32
    cases the idea here is right the model
  • 00:21:35
    will start reasoning inside sort of what
  • 00:21:37
    we call output tags or sorry thinking
  • 00:21:39
    tags so it'll that's what we're seeing
  • 00:21:42
    right
  • 00:21:43
    here yeah it's going to start thinking
  • 00:21:46
    it does its Reflections in there it does
  • 00:21:47
    it Chain of Thought in there it does all
  • 00:21:48
    its planning all that and then it's like
  • 00:21:51
    okay once it's got the right answer or
  • 00:21:52
    it thinks it's got the right answer it's
  • 00:21:54
    like I'm done then it says okay time for
  • 00:21:56
    an output here's what I'm to show the
  • 00:21:57
    user and it writes the final message to
  • 00:22:00
    the user and I think different companies
  • 00:22:02
    and different use cases and interfaces
  • 00:22:04
    will use this differently um some people
  • 00:22:07
    want to see it all some people want to
  • 00:22:08
    make it optional some people want to
  • 00:22:10
    hide it entirely um but I think that
  • 00:22:12
    especially as models get faster and the
  • 00:22:15
    actual sort of thinking process takes
  • 00:22:17
    like less than a second for example I
  • 00:22:19
    think this is the right
  • 00:22:20
    approach and so that that was the
  • 00:22:23
    purpose of the tags just to clarify so
  • 00:22:25
    you have the thinking tags if you don't
  • 00:22:26
    want to see the thinking you just want
  • 00:22:28
    the output put it's kind of like
  • 00:22:29
    standard mode and then you have advanced
  • 00:22:31
    mode uh and then you can actually see
  • 00:22:32
    what's going on personally I love seeing
  • 00:22:35
    it like I want to know what it's
  • 00:22:36
    thinking why it's thinking that and that
  • 00:22:39
    can actually help you kind of recraft
  • 00:22:42
    the original prompt if you need to uh
  • 00:22:44
    because you could see if there's a
  • 00:22:45
    misunderstanding uh or or not so like
  • 00:22:49
    what what are your thoughts on on this
  • 00:22:51
    this method of self-correction during
  • 00:22:54
    inference time versus let's say having
  • 00:22:57
    multiple large language models uh
  • 00:23:00
    working in collaboration correcting each
  • 00:23:02
    other how does this
  • 00:23:04
    differ why not use them both right
  • 00:23:07
    everything can build on everything hell
  • 00:23:10
    yeah and if you can have multiple of
  • 00:23:12
    these models right thinking about
  • 00:23:13
    different angles on each problem and you
  • 00:23:15
    have reflection with within each model
  • 00:23:17
    it's just like having a better model
  • 00:23:19
    going after um a problem but having them
  • 00:23:21
    do it together okay amazing um and then
  • 00:23:25
    one last question for me and then I I
  • 00:23:27
    know we have a bunch of questions in the
  • 00:23:29
    chat so hopefully I know you have
  • 00:23:31
    probably about 12 more minutes so uh
  • 00:23:33
    hopefully we'll get to them both um what
  • 00:23:35
    have you seen in terms of token usage on
  • 00:23:38
    output as compared to a
  • 00:23:40
    non-reflection
  • 00:23:43
    model definitely more sah you might have
  • 00:23:45
    a much better response than definitely
  • 00:23:49
    more yeah uh haven't test it out like I
  • 00:23:53
    don't have exact numbers to share right
  • 00:23:55
    now but uh from just the test I've run
  • 00:23:58
    it seems to be uh 1.5 two times the
  • 00:24:02
    number of tokens uh for harder problems
  • 00:24:05
    so it's it's definitely generating a lot
  • 00:24:08
    more tokens and yeah I think if you're
  • 00:24:12
    using a hosted API that causes the the
  • 00:24:14
    cost of the latency problem but again I
  • 00:24:16
    think uh as Matt said INF is going to
  • 00:24:19
    get much faster and cheaper so uh we
  • 00:24:22
    kind of betting on that as
  • 00:24:24
    well all right um thank you so we got a
  • 00:24:29
    bunch of questions already uh we're
  • 00:24:31
    using a new streaming software I'm not
  • 00:24:33
    sure how to put it on the screen so I'm
  • 00:24:34
    just going to read them uh Tresor asks
  • 00:24:37
    will it be possible to fine-tune other
  • 00:24:39
    models like GPT 40 mini to implement the
  • 00:24:42
    reflection approach so basically does
  • 00:24:44
    this data set does this approach work
  • 00:24:46
    across the
  • 00:24:48
    board it should there will be
  • 00:24:51
    constraints with closed provider fine
  • 00:24:54
    tunings because uh they often have their
  • 00:24:57
    own limits on how you can do the fine
  • 00:24:58
    tuning so we'll go into this in more
  • 00:25:00
    detail when we release a report and we
  • 00:25:02
    we I think are going to be releasing the
  • 00:25:03
    data set um we're still making the final
  • 00:25:05
    decision there but it's likely G to be a
  • 00:25:07
    yes um it has to be trained in a very
  • 00:25:10
    specific way it's a very simple training
  • 00:25:12
    process it's straight up just like
  • 00:25:14
    standard fine-tuning right um there's no
  • 00:25:16
    rhf going on none of that but the way
  • 00:25:19
    the data is sort of split up it's to
  • 00:25:21
    sort of like enable it to not learn to
  • 00:25:24
    make extra mistakes but learn to correct
  • 00:25:26
    them um so for any open source model it
  • 00:25:29
    should be doable and relatively easy
  • 00:25:31
    compared to some of the other approaches
  • 00:25:32
    people have uh put forward for things
  • 00:25:35
    like fine tuning open AI maybe it's
  • 00:25:38
    possible especially if you're creative
  • 00:25:39
    about it I don't I haven't fine- tune on
  • 00:25:41
    the open API in a while but from what I
  • 00:25:43
    remember it would be a little bit
  • 00:25:44
    limiting for something like this yeah
  • 00:25:46
    and I I like what you said earlier um
  • 00:25:49
    this was just an overlooked strategy
  • 00:25:51
    like it's it's maybe it was it was too
  • 00:25:54
    simple uh not that's not to be
  • 00:25:56
    pejorative at all it's like but it is
  • 00:25:58
    you know sometimes it's just like hey
  • 00:26:00
    it's right in front of you here here's
  • 00:26:01
    here's a great strategy you can use um
  • 00:26:03
    so Yash asks what about an 8B version
  • 00:26:07
    and we know you mentioned the 405b is
  • 00:26:10
    coming out uh what about an 8B version
  • 00:26:12
    that more people can run maybe
  • 00:26:15
    locally yeah I mean we found and to be
  • 00:26:18
    fair we trained the AP first um but we
  • 00:26:21
    found that it made some big improvements
  • 00:26:23
    on certain benchmarks but others it
  • 00:26:25
    really didn't and eight
  • 00:26:29
    we weren't really sure where some of the
  • 00:26:30
    gains were coming from it it felt like
  • 00:26:33
    it was a little too dumb to pick up on
  • 00:26:35
    this perfectly um like the 70b
  • 00:26:38
    definitely sort of like crossed a
  • 00:26:39
    threshold where it was like able to
  • 00:26:40
    grock uh grock this to some extent um
  • 00:26:43
    but given the sort of crazy interest
  • 00:26:45
    we've seen like we we expected this to
  • 00:26:46
    do well we didn't expect it to do this
  • 00:26:49
    well um like it's been the craziest
  • 00:26:52
    launcher of My Life
  • 00:26:54
    um given that I think there probably is
  • 00:26:57
    enough interest to figure out something
  • 00:26:59
    there but you know you said before right
  • 00:27:01
    there there are many things that are
  • 00:27:02
    sort of like these low hanging fruit
  • 00:27:03
    things that just don't pay attention to
  • 00:27:05
    and we have a few other ideas in mind
  • 00:27:06
    that I think could do far better than
  • 00:27:08
    reflection so I think it's a question of
  • 00:27:09
    do we want to focus more on reflection
  • 00:27:11
    and really optimizing an 8B that you
  • 00:27:13
    know may be obsolete soon or really
  • 00:27:15
    going crazy and focusing on something
  • 00:27:17
    that could actually Top This by a good
  • 00:27:19
    bit and I think I'm more excited about
  • 00:27:21
    the lad but I was just about to ask you
  • 00:27:25
    that Matt sorry sorry to cut you off I
  • 00:27:27
    was just about to ask if you have any
  • 00:27:29
    kind of new ideas going on do you uh
  • 00:27:31
    want to hint at anything or not quite
  • 00:27:35
    yet I think if you look at my past
  • 00:27:38
    work and you think about this from the
  • 00:27:40
    perspective of like look some of the
  • 00:27:43
    learnings we've gotten from prompting
  • 00:27:44
    have not been properly thought about in
  • 00:27:46
    terms of training them into the model I
  • 00:27:48
    think you can kind of see where I'm
  • 00:27:49
    going with
  • 00:27:50
    this
  • 00:27:52
    um there are a few different ideas that
  • 00:27:55
    I'm kind of toying with and we're
  • 00:27:56
    talking about right now and I think the
  • 00:27:58
    interesting thing is with glaive like
  • 00:27:59
    they're not going to be so hard to
  • 00:28:01
    implement um they actually might be even
  • 00:28:04
    easier than reflection um they can be
  • 00:28:06
    comined with reflection reflection but
  • 00:28:09
    yeah I think it's the idea of like we're
  • 00:28:11
    going to start exploring what it looks
  • 00:28:12
    like to take these things that make it
  • 00:28:15
    better from a prompting perspective and
  • 00:28:16
    can you make it much better by training
  • 00:28:19
    on
  • 00:28:20
    that how do you give everyone the power
  • 00:28:22
    of like the world- class prompt
  • 00:28:23
    engineering um sort of stuff maked into
  • 00:28:25
    the model and even go beyond that that's
  • 00:28:27
    kind of how about it that's that's great
  • 00:28:30
    and uh you know Victor asked kind of as
  • 00:28:32
    an extension of that uh are you going to
  • 00:28:35
    provide the data sets publicly H so
  • 00:28:38
    other people can maybe you know uh
  • 00:28:42
    manicure them as they see fit and and
  • 00:28:44
    roll their
  • 00:28:47
    own I think it's highly likely s if you
  • 00:28:50
    have different feelings that we can
  • 00:28:52
    figure that
  • 00:28:53
    out no I think in the past uh like most
  • 00:28:57
    of the open Source work we have done uh
  • 00:29:00
    as always included data sets we are
  • 00:29:02
    essentially a data set company so it
  • 00:29:03
    makes sense plus it just uh makes it
  • 00:29:06
    really easy for people to reproduce uh
  • 00:29:08
    benchmarks uh the technique so overall I
  • 00:29:12
    think it would be it would make sense to
  • 00:29:13
    open source once we have uh the four
  • 00:29:16
    five be trained as
  • 00:29:18
    well awesome uh Mark Cox asks uh how
  • 00:29:22
    exactly does it recognize a quote
  • 00:29:25
    unquote mistake that requires a
  • 00:29:28
    reflection and actually I have even kind
  • 00:29:30
    of a followup to that
  • 00:29:33
    do does it always output a reflection is
  • 00:29:36
    it always
  • 00:29:40
    necessary uh no so uh in the data set uh
  • 00:29:45
    we only have Reflections as part of uh
  • 00:29:48
    samples where the model is likely to
  • 00:29:51
    make mistakes and we decided that by
  • 00:29:54
    just classifying the problem as hard so
  • 00:29:56
    uh if if you just ask simple question it
  • 00:29:59
    won't add a reflection tag uh if the
  • 00:30:01
    question is really simple it won't even
  • 00:30:03
    do Chain of Thought and just uh answer
  • 00:30:05
    pretty straightforwardly so model can
  • 00:30:08
    almost infer uh the difficulty of a
  • 00:30:12
    problem and decide when to do
  • 00:30:14
    Reflections uh but we've tried to train
  • 00:30:16
    them mod to do Reflections whenever it
  • 00:30:18
    makes uh it it solves a hard problem and
  • 00:30:21
    it encounters the step where it's making
  • 00:30:23
    a hard Chain of Thought step so even if
  • 00:30:27
    you have a really complicated problem
  • 00:30:29
    there would be like few steps within
  • 00:30:31
    that problem that the model is more
  • 00:30:32
    likely to make mistakes and we have
  • 00:30:34
    added Reflections in the r set right at
  • 00:30:36
    that step so whenever it makes uh for
  • 00:30:39
    example it does arithmetic it's more
  • 00:30:41
    likely to just use a reflection tag
  • 00:30:43
    right after that and see if what it did
  • 00:30:45
    was correct or
  • 00:30:48
    not okay great um Alex fov asked uh any
  • 00:30:53
    gut feeling about using Moe mixture of
  • 00:30:56
    experts and this technique
  • 00:30:58
    uh maybe if some experts are trained
  • 00:31:00
    like this and others
  • 00:31:05
    aren't first of all hi Alex um I don't
  • 00:31:09
    know I don't see why it would be much
  • 00:31:12
    different I've had poor experiences
  • 00:31:16
    training on mixure of experts models
  • 00:31:17
    just
  • 00:31:18
    generally so I don't know if you found
  • 00:31:20
    the
  • 00:31:23
    same yeah uh I mean one benefit could be
  • 00:31:26
    that uh
  • 00:31:28
    if there's uh an inference speed trade
  • 00:31:31
    off because uh Moes are usually faster
  • 00:31:34
    to inference that could be a
  • 00:31:36
    benefit uh intuitively I don't think we
  • 00:31:39
    would see any performance game uh
  • 00:31:42
    relative to just these tense models but
  • 00:31:44
    again I have never tried this so un
  • 00:31:47
    sure okay people will be able to try it
  • 00:31:50
    next week especially if we have the data
  • 00:31:51
    out so hopefully everyone does yeah I
  • 00:31:54
    mean once you do that I'm sure everybody
  • 00:31:56
    wants to get their get their hands on it
  • 00:31:58
    try experiment with with their own
  • 00:32:01
    techniques built on top of what you've
  • 00:32:03
    done with reflection so um and then
  • 00:32:06
    actually another one from Alex uh
  • 00:32:08
    effects on quantization of this and and
  • 00:32:11
    um Matt we were talking about this
  • 00:32:12
    earlier I'm planning on running it
  • 00:32:14
    through my full L llm test suite and I
  • 00:32:17
    want to download it I have two rtxa
  • 00:32:20
    6000s maybe that was enough for I think
  • 00:32:22
    you said fp8 uh but uh how how how do
  • 00:32:26
    you see quantization affect the quality
  • 00:32:28
    of the output using this
  • 00:32:34
    technique I don't know for sure I do
  • 00:32:37
    suspect especially from my limited
  • 00:32:39
    testing of like there's already an API
  • 00:32:40
    that's up that is
  • 00:32:42
    fp8 um I do suspect there's definitely a
  • 00:32:45
    performance loss I just don't know how
  • 00:32:47
    much it is right maybe it's 1% it really
  • 00:32:50
    doesn't matter um maybe it is more than
  • 00:32:53
    that I do think from what I saw at a
  • 00:32:54
    minimum it is much less consistent um I
  • 00:32:58
    did notice that I just haven't spent
  • 00:32:59
    enough time with it
  • 00:33:03
    personally okay uh Daniel Ranger is
  • 00:33:06
    asking about context length is there a
  • 00:33:08
    reason why the context length is 4K
  • 00:33:10
    instead of 128k and and I guess how have
  • 00:33:12
    you seen maybe that's not the case based
  • 00:33:14
    on your reaction uh how have you seen
  • 00:33:17
    context length affect the performance of
  • 00:33:21
    the model and maybe you could just
  • 00:33:22
    clarify what the context length
  • 00:33:26
    is it should be the full Lama I believe
  • 00:33:29
    one go ahead go ahead yeah so uh it's
  • 00:33:33
    the full Lama context line I think
  • 00:33:36
    128k uh around that uh I just that that
  • 00:33:40
    we haven't included data that's very
  • 00:33:41
    long context in in the training so we
  • 00:33:45
    don't know how the performance will be
  • 00:33:47
    on very long contexts and like that's
  • 00:33:50
    something to improve on in the
  • 00:33:52
    future but you you can essentially run
  • 00:33:54
    it through the entire context length of
  • 00:33:57
    L fre one
  • 00:33:58
    awesome uh okay I know you guys got to
  • 00:34:01
    go in like two minutes so I'll just wrap
  • 00:34:03
    it up quickly uh your post got 2.2
  • 00:34:06
    million views uh Matt we have 1,700 plus
  • 00:34:10
    viewers in here and growing it it keeps
  • 00:34:12
    going up so I think there's a lot of
  • 00:34:15
    appreciation for what you've done and
  • 00:34:17
    you as well sahill I just want to say
  • 00:34:19
    thank you especially like from the open
  • 00:34:21
    source Community I love seeing open
  • 00:34:24
    source compete with closed Source
  • 00:34:25
    Frontier models it just it it makes me
  • 00:34:27
    me so happy that I can download this
  • 00:34:29
    stuff and play with it myself um so
  • 00:34:31
    thank you so much if you want to check
  • 00:34:33
    out uh Matt here's his Twitter right
  • 00:34:36
    here Matt Schumer
  • 00:34:38
    uncore uh and he is the CEO and founder
  • 00:34:43
    of hyperr so check out hyperight Ai and
  • 00:34:46
    all of this was built with sah Hill's
  • 00:34:48
    company uh founder of glaive g a i v so
  • 00:34:54
    glaive doai I believe it is is that
  • 00:34:56
    right uh Sill yeah that's correct yeah
  • 00:34:59
    so if you have a novel approach just
  • 00:35:02
    like Matt did contact sahill at glaive
  • 00:35:05
    and you can make it happen and I just
  • 00:35:07
    want to say thank you both S I mean this
  • 00:35:09
    was like a last minute thing you just
  • 00:35:10
    pinged me Matt and and I was so happy
  • 00:35:13
    you agreed to join you too sahill so
  • 00:35:15
    thank you so much thanks to everybody
  • 00:35:17
    who joined and uh I'm glad we got some
  • 00:35:19
    uh got some
  • 00:35:22
    details yeah thank you for having us and
  • 00:35:24
    thank you for the
  • 00:35:25
    support yeah all right you know where to
  • 00:35:27
    find them we'll drop it all in the
  • 00:35:29
    description
Tags
  • Reflection 70b
  • AI model
  • open-source
  • reflection tuning
  • hyperr AI
  • glaive
  • synthetic data
  • machine learning
  • model training
  • innovation