Who are the guests on the stream?

Matt Schumer, co-founder and CEO of hyperr AI, and Sahil Chow from glaive, the founder of the company.

What is Reflection 70b?

It is an open-source AI model using a new technique called reflection tuning to improve its performance.

What is reflection tuning?

Reflection tuning is a technique that allows the model to recognize and correct its own mistakes by reflecting on its outputs during inference.

What differentiates Reflection 70b from other models?

Reflection 70b uses reflection tuning to improve performance by learning to correct mistakes, making it competitive with larger models like llama 3.1 405b.

How was Reflection 70b developed so quickly?

The model was developed in three weeks by leveraging synthetic data and a novel training process focusing on reflection tuning.

What role did Sahil Chow play in developing the model?

Sahil Chow's company, glaive, helped in creating the synthetic data and training strategies for Reflection 70b.

Is Reflection 70b available for public usage?

Yes, the model is open-source, and there are plans to release the dataset and fine-tuning details.

Why is reflection important in AI models?

Reflection allows AI models to self-correct and improve accuracy by learning from their own mistakes.

Will they release any other versions of this model?

Yes, a 405b version is planned for release soon.

How do synthetic data and reflection tuning work together?

Synthetic data helps create training samples that teach the model to recognize and reflect on mistakes, improving learning efficiency.

Live Chat with Matt Shumer about Reflection 70b!

00:35:30

https://www.youtube.com/watch?v=5_m-kN64Exc

الملخص

TLDRThe video stream discusses the launch of a new open-source AI model named Reflection 70b, featuring guests Matt Schumer, co-founder and CEO of hyperr AI, and Sahil Chow, founder of glaive. Reflection 70b introduces a new concept called reflection tuning, which aims to enhance AI performance by enabling the model to self-correct its responses by recognizing and reflecting on mistakes during the inference process. This innovative model has shown competitive results, outperforming larger models like the llama 3.1 405b. The development of Reflection 70b was completed in just three weeks, showcasing a collaborative effort using synthetic data to refine training strategies. Sahil Chow's glaive played a crucial role in delivering the synthetic data that facilitated reflection tuning. The technique allows the model to improve accuracy by essentially mirroring human problem-solving strategies, including identifying and fixing errors. While reflection tuning can be partially mimicked through advanced prompt engineering, its full potential is realized when baked into the model, offering significant performance improvements. The model is particularly exciting for the open-source community as it showcases that high-quality AI can be achieved without the extensive resources typically used by large AI labs. Future releases, including a 405b model, and possibly more enhancements are anticipated, keeping the AI community engaged.

الوجبات الجاهزة

🎤 Guests are Matt Schumer and Sahil Chow.
🆕 Introduction of Reflection 70b model.
🧠 Utilizes reflection tuning technique.
🛠 Developed in three weeks.
📊 Outperforms larger models like llama 405b.
🤖 Leverages synthetic data for training.
💡 Focuses on model self-correction.
✅ Significant improvements through reflection tuning.
🔓 It's an open-source project.
🚀 Anticipation for more future releases.

الجدول الزمني

00:00:00 - 00:05:00
The host introduces the live stream with guests Matt Schumer, co-founder and CEO of Hyperr AI, and Sahil Chow from Glaive, discussing their new model, Reflection 70b. They express excitement about discussing the recently released open-source model using a novel technique called Reflection Tuning.
00:05:00 - 00:10:00
Matt Schumer talks about his background and company Hyperr AI, which he founded to create AI solutions, such as an AI that writes emails. He shares that Reflection 70b was developed fairly quickly, in three weeks, through collaboration with Sahil Chow.
00:10:00 - 00:15:00
Matt describes the inspiration and process of developing Reflection 70b, highlighting the use of a specially curated dataset. Sahil introduces Glaive and explains their role in creating synthetic datasets for custom model training, emphasizing the efficiency and speed of their synthetic data generation process.
00:15:00 - 00:20:00
Matt explains how Reflection Tuning leverages large language models (LLMs) to improve thought processes and reasoning by enabling models to reflect on their errors. Sahil elaborates on the use of synthetic datasets to train models to recognize mistakes and improve accuracy, showing significant performance gains in certain benchmarks.
00:20:00 - 00:25:00
The discussion focuses on the training process of the model and the development of Reflections – a technique where models are trained to reflect like humans when errors are detected. Sahil explains the dataset structuring and selective use of Reflections to avoid overthinking and ensure higher precision in decision-making.
00:25:00 - 00:30:00
Glaive's role in providing the synthetic data is discussed, detailing how they designed the reflection examples to aid fine-tuning. Matt discusses the benefits of integrating reflection into models over simply prompt engineering, showing how their approach outperformed traditional methods significantly.
00:30:00 - 00:35:30
The closing of the stream included audience questions about the application of this technique to other models, and the potential for future open access to the dataset for broader application. The hosts emphasize their openness to innovation, hinting at additional unexplored strategies beyond Reflection Tuning.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

Who are the guests on the stream?
Matt Schumer, co-founder and CEO of hyperr AI, and Sahil Chow from glaive, the founder of the company.
What is Reflection 70b?
It is an open-source AI model using a new technique called reflection tuning to improve its performance.
What is reflection tuning?
Reflection tuning is a technique that allows the model to recognize and correct its own mistakes by reflecting on its outputs during inference.
What differentiates Reflection 70b from other models?
Reflection 70b uses reflection tuning to improve performance by learning to correct mistakes, making it competitive with larger models like llama 3.1 405b.
How was Reflection 70b developed so quickly?
The model was developed in three weeks by leveraging synthetic data and a novel training process focusing on reflection tuning.
What role did Sahil Chow play in developing the model?
Sahil Chow's company, glaive, helped in creating the synthetic data and training strategies for Reflection 70b.
Is Reflection 70b available for public usage?
Yes, the model is open-source, and there are plans to release the dataset and fine-tuning details.
Why is reflection important in AI models?
Reflection allows AI models to self-correct and improve accuracy by learning from their own mistakes.
Will they release any other versions of this model?
Yes, a 405b version is planned for release soon.
How do synthetic data and reflection tuning work together?
Synthetic data helps create training samples that teach the model to recognize and reflect on mistakes, improving learning efficiency.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:05
[Music]
00:00:33
[Music]
00:00:55
[Music]
00:01:07
[Music]
00:01:19
[Music]
00:01:35
[Music]
00:01:45
[Music]
00:02:14
[Music]
00:02:20
[Music]
00:02:32
[Music]
00:02:45
[Music]
00:02:56
[Music]
00:02:59
e
00:03:34
ah there we go yes we have sound Massa
00:03:37
Hill can you hear me
00:03:40
okay all right I think you might still
00:03:42
be muted I'm gonna just do a quick intro
00:03:45
I'm muted yeah good now all right here
00:03:48
we go no sound the sound should be
00:03:51
coming in any moment
00:03:54
now no rip audio no audio yep it'll be
00:03:57
fixed I still don't know how to operate
00:03:59
a mute
00:04:02
button all right we got uh we got audio
00:04:05
good okay so today super excited to
00:04:09
share we have uh two special guests
00:04:11
joining us today uh Matt Schumer
00:04:14
co-founder and CEO of hyperr AI and the
00:04:20
author creator of the most uh recently
00:04:23
dropped amazing open source open weights
00:04:25
model reflection 70b we also have sahil
00:04:30
Chow from uh glaive he is the founder of
00:04:34
glaive who was uh pivotal in helping get
00:04:37
this model to everybody so very excited
00:04:40
to talk to you both thank you so much
00:04:42
for joining me today uh cannot wait to
00:04:45
hear more about what went into this
00:04:47
model but uh welcome to the stream
00:04:51
guys thanks for having
00:04:54
us all right um so first let's just talk
00:04:59
about what happened so yesterday
00:05:02
actually I'm going to share your your
00:05:03
Twitter
00:05:04
post let's see you probably posted a ton
00:05:06
of stuff since
00:05:08
yesterday too
00:05:11
much cool here's the announcement so
00:05:14
match humer yesterday uh 11:51 a.m. I'm
00:05:17
excited to announce reflection 70b the
00:05:19
world's top open source model trained
00:05:23
using a new technique reflection tuning
00:05:25
we'll talk about what that is today and
00:05:28
it is doing incredibly well uh if you
00:05:32
look at the benchmarks just across the
00:05:34
board competitive with the other
00:05:37
Frontier both closed source and open
00:05:39
source models beating llama 3.1 405b the
00:05:42
larger one and according to Matt we have
00:05:45
uh the 405b model coming out next week
00:05:47
so first Matt maybe just tell us a
00:05:51
little bit about yourself uh what do you
00:05:53
what do you do and what made you want to
00:05:56
create this
00:05:58
model yeah yeah so um high level on
00:06:02
myself and the company right so uh I've
00:06:04
been personally starting company since I
00:06:05
was 12 uh we can skip a lot of time a
00:06:07
few years ago uh when I was in college I
00:06:09
started uh other side AI which is the
00:06:11
company that sort of became hyperight
00:06:14
right and the idea about hyperight
00:06:15
initially was can we create an AI that
00:06:18
can write your emails for you we were
00:06:19
actually the first um that were aware of
00:06:21
uh VC backed company in this sort of
00:06:23
like quote unquote generative AI space
00:06:24
that used the open AI models at the
00:06:26
beginning we've grown quite a bit since
00:06:28
then now we have well a few million
00:06:31
users uh millions in Revenue were
00:06:33
profitable um for a much sort of like
00:06:36
expanded version of that product it's
00:06:38
sort of writing in general we're the
00:06:39
best AI for writing continues to get
00:06:42
better we have a lot of sort of specific
00:06:43
things we do but that's that's the high
00:06:45
level um in terms of this model this is
00:06:48
sort of just a fun thing uh very long
00:06:50
story short I was actually on vacation
00:06:52
and my mind was just everywhere and I
00:06:54
was like okay I'm I'm kind of bored of
00:06:56
this I need to actually do something
00:06:57
productive and I've been noling on this
00:07:00
idea for a very long time and it was
00:07:01
sort of like time to just do it so
00:07:03
towards the end of the trip I actually
00:07:05
reached out to sahill and I was like can
00:07:06
we collaborate and that's how this came
00:07:08
to be how long ago was
00:07:12
that three weeks I think so to be clear
00:07:18
uh you had the idea reached out to
00:07:21
sahill put together the data set
00:07:25
fine-tuned the model and published it
00:07:27
all within three weeks
00:07:31
yes and I know that sounds crazy I get
00:07:35
that but I think a lot of people
00:07:37
underestimate what can be done today
00:07:38
with a very small amount of resources
00:07:40
right too many people look to the big
00:07:42
the big AI labs and they're like hey
00:07:43
look you know they spent billions of
00:07:44
dollars doing this there's no hope for
00:07:46
anybody to compete with a small budget
00:07:47
and small team small amount of time
00:07:50
right but thill and I did this on the
00:07:52
side right this wasn't even sort of our
00:07:53
major Focus not even close to it and
00:07:55
it's just thinking about the problem in
00:07:57
a different way right the data set was
00:08:00
everything for this and glaive made that
00:08:03
possible we just had to kind of know
00:08:04
what we were going after and once we
00:08:06
sort of had the idea of what this would
00:08:08
look like it was very easy to actually
00:08:09
put into into practice awesome and and
00:08:13
on that note sahill please tell us about
00:08:16
yourself tell us about what glaive
00:08:20
does yeah yeah uh kind of similar to M
00:08:23
I've been
00:08:24
doing startups specifically AI startups
00:08:27
for a few years now uh just before
00:08:29
starting live I worked at a company
00:08:32
called banana where we were building
00:08:34
sess gpus for machine
00:08:36
learning uh realized that people cannot
00:08:38
host custom models because to have
00:08:40
really high performing custom models
00:08:42
that use case specific you need high
00:08:43
quality data and companies like that
00:08:46
most of the times and that is when I
00:08:48
decided to just uh be a founder and
00:08:52
start claive and glaive is essentially a
00:08:54
platform where companies can build use
00:08:57
case specific data sets uh using using
00:08:59
our synthetic data generation Pipeline
00:09:02
and train mods on that and basically
00:09:04
keep iterating on them uh because it's
00:09:07
uh as I think match said it's it's
00:09:09
pretty quick to iterate on data sets
00:09:11
with the help of synthetic data uh the
00:09:13
generation process is pretty quick yeah
00:09:17
uh that's uh that's summary cool uh so I
00:09:22
mean Matt you said it the data set is
00:09:24
everything but I'm I want to touch on
00:09:26
that a little bit because we can talk
00:09:28
about you know publicly available data
00:09:30
and and how that is essentially all used
00:09:34
in current Frontier models and there's
00:09:37
really like two solutions to that you
00:09:38
either create synthetic data or you do
00:09:41
more with the data that you have and so
00:09:44
which approach were did you take with
00:09:47
reflection and what what makes
00:09:49
reflection special what is the the
00:09:51
secret sauce
00:09:53
there yeah so I think s could probably
00:09:55
talk more to the specifics of the data
00:09:57
side and how the data set was
00:09:58
constructed I could talk more the sort
00:09:59
of idea behind reflection and why it
00:10:02
works um so at a high level right the
00:10:05
general idea is llms are sort of getting
00:10:08
to the point where they can they can
00:10:09
think quote unquote think um and it's
00:10:12
sort of mirrored after how a human
00:10:13
thinks right you kind of talk through it
00:10:15
in your head you know some people do it
00:10:16
differently but you know some people
00:10:17
kind of talk through the problem in
00:10:18
their head and arrive at an answer and
00:10:20
llms do that today with Chain of Thought
00:10:22
that's how they have you know it's one
00:10:24
of the reasons they've improved in
00:10:25
performance so much over the last few
00:10:27
years uh it's been a corner so know
00:10:29
pretty much every major llm whether it's
00:10:32
coding whether it's just general
00:10:33
reasoning or writing even um so we do a
00:10:36
lot of work with that and one of the
00:10:37
things that I realized is like look as a
00:10:40
person I can think through a problem but
00:10:41
I'm not always going to get every little
00:10:43
bit right and what happens if I get
00:10:44
something wrong well personally I
00:10:46
reflect I'm like wait I made a mistake I
00:10:49
backtrack and I fix myself but an llm
00:10:51
doesn't do that one of the things we
00:10:53
noticed is that llms once they make that
00:10:55
first mistake they kind of accept it as
00:10:57
fact right if I'm asking it 2 plus two
00:11:01
and then multiply that by seven right
00:11:03
and it says okay I'll do that let's
00:11:05
start with two plus two that equals five
00:11:07
and then five time 7even right it's
00:11:08
already made that mistake and it's just
00:11:10
assuming the thing that it said is right
00:11:12
and if we could
00:11:14
essentially teach the model to think
00:11:17
more like we do and actually reflect on
00:11:20
behavior that it makes a mistake in the
00:11:23
model gets smarter it gets more reliable
00:11:26
gets more accurate that's generally the
00:11:28
idea here we did a lot of even beyond
00:11:31
the data it's how we trained the model
00:11:32
right we created what we call like our
00:11:34
own splitting strategy that allows the
00:11:37
model to not learn to make mistakes
00:11:38
because if you do this wrong it's very
00:11:39
easy to teach the model to make more
00:11:41
mistakes and then fix them the idea is
00:11:43
to keep the model's performance as is or
00:11:44
make it even better and then on the
00:11:47
things where it actually ends up making
00:11:48
a mistake and it would no matter what we
00:11:50
can fix that and it's still not perfect
00:11:52
it's still very early it's still a first
00:11:54
version of this but I think we've come
00:11:56
quite a bit um away in a few weeks so
00:11:58
that's that's sort of like how it works
00:12:00
sah maybe you want to talk a little bit
00:12:01
more to the the data generation process
00:12:03
itself because that was all
00:12:05
you yeah yeah I can do that uh
00:12:09
surprisingly the data set we used for
00:12:11
this isn't that big uh what people would
00:12:14
expect it's roughly we generated uh and
00:12:17
we generated it in steps uh we started
00:12:19
out with I think just 10,000 samples uh
00:12:22
to see if this is actually possible and
00:12:24
we scaled up to 100,000
00:12:26
samples uh we decided that we'll do uh
00:12:29
some code data some math data General
00:12:31
reasoning function calling and multiturn
00:12:34
so uh the goal really wasn't to get a
00:12:37
lot of data and teach the model how to
00:12:39
reason but it was essentially to teach
00:12:41
the model to recognize its own mistake
00:12:44
and I think uh to that point see a lot
00:12:47
of people uh mentioned that you can kind
00:12:49
of get similar gains out of just
00:12:51
prompting uh regular instruct models to
00:12:54
use reflection and I think that's true
00:12:57
to some extent you can get uh percentage
00:12:59
of the gains by just prompting but if
00:13:02
you try to prompt even son 3.5 for
00:13:05
example you'll see that it's there's a
00:13:07
lot of bias in the model outputs where
00:13:10
the model almost always believes that
00:13:12
what it's saying is correct and we face
00:13:15
that problem in data generation as well
00:13:17
right like we use language models we use
00:13:19
f tune language models to generate
00:13:21
synthetic data and when we first try to
00:13:24
actually generate reflection data we
00:13:25
realize that if we ask the model to
00:13:28
actually make the mistake and then
00:13:30
reflect on it Ms are really bad at doing
00:13:32
that uh maybe it's just R if you do that
00:13:35
like the MS will make mistakes easily
00:13:37
but if you ask a model to deliberately
00:13:39
make a mistake it just won't be able to
00:13:43
and that was the main goal of just find
00:13:45
you to teach the model that it can
00:13:48
actually make mistakes and and correct
00:13:51
that yeah uh I I have so many questions
00:13:55
about this so uh for those of you
00:13:58
watching uh maybe this is a good way to
00:14:00
break it down you you effectively took
00:14:02
really kind of sophisticated prompt
00:14:04
engineering strategies you know s even
00:14:08
something as simple as explain your
00:14:09
reasoning step by step which is
00:14:11
something that I put into all of my
00:14:13
prompts when I'm giving it more complex
00:14:15
reasoning and logic questions um but you
00:14:18
can get obviously even more
00:14:20
sophisticated than that Chain of Thought
00:14:22
um and and so on and so you've taken
00:14:25
that and you've built it into the model
00:14:27
itself and you've actually given it
00:14:29
examples of making mistakes along the
00:14:32
way and that's actually what I'm showing
00:14:33
I believe on the screen right here
00:14:35
actually there's a lot going on
00:14:36
hopefully yall can see it uh but but
00:14:39
effectively you're teaching the model in
00:14:43
the fine-tuning data itself to think
00:14:45
through everything you know more step by
00:14:48
step but also to and hence the name
00:14:52
reflect on the output as it's doing the
00:14:56
inference and then potentially correct
00:14:58
itself during the inference process so
00:15:01
what first of all H like what was the
00:15:06
trigger for thinking that this would be
00:15:08
really a a a good strategy and then why
00:15:11
is it better than simply uh put like
00:15:15
having a system message with some of
00:15:17
this prompting techniques or or putting
00:15:19
it in the prompt
00:15:22
itself yeah I mean why don't I start
00:15:24
with what you you just asked the last
00:15:25
question s covered a little bit of this
00:15:27
um some of my person findings with this
00:15:29
because I've been prompt engineering for
00:15:32
too many years now um like basically
00:15:34
since 2019 with gpd2 um what I found was
00:15:39
essentially if you asked it to do this
00:15:41
and didn't train on it a couple things
00:15:43
would happen sometimes it was sort of
00:15:44
like what side was talking about where
00:15:46
you know the model was overconfident but
00:15:48
there was another issue I found which is
00:15:50
sometimes this is kind of getting a
00:15:51
little weird the model would actually
00:15:53
deliberately make mistakes that it
00:15:54
wouldn't have made otherwise because
00:15:56
when you're prompting it really wants to
00:15:58
follow those instructions
00:15:59
really wants to follow that system
00:16:00
prompt and if you say when you make a
00:16:03
mistake fix it with a reflection tag
00:16:06
you're going to notice that it's going
00:16:07
to make that mistake what you notice
00:16:08
with our model is that it doesn't always
00:16:10
use reflection it only uses it when it
00:16:12
really thinks it needs to sometimes
00:16:14
little a little bit too much but we're
00:16:15
you know we're we're much more towards
00:16:17
the end of like use it when you need to
00:16:18
don't when you don't because if you're
00:16:20
teaching it to do it all the time right
00:16:21
it's just going to make mistakes
00:16:23
deliberately um so yeah it doesn't
00:16:26
really work for prompting and I do have
00:16:28
to apologize like I said said I not
00:16:29
slept in two days so what was the first
00:16:30
part of your question no I
00:16:33
I I'm trying to understand like what the
00:16:38
reasoning is like what if it's better to
00:16:41
put the prompt in the actual fine-tune
00:16:45
data or if it's better to do it
00:16:47
afterwards and obviously like your model
00:16:49
is is crushing The Benchmark so you've
00:16:52
done something really well but like what
00:16:55
what was that idea that made you think
00:16:58
that it it's actually like let's let's
00:17:00
push this to the fine tune
00:17:03
itself yeah I mean I've had this idea
00:17:05
like I mentioned for for many months now
00:17:07
and different forms of it um so it was
00:17:10
sort of like how do you enable that for
00:17:12
everyone without them needing to figure
00:17:13
it out themselves but even if they
00:17:15
wanted to figure out themselves without
00:17:16
a fine tune it doesn't work super well
00:17:17
you know prompting like like sah said
00:17:19
can get you a marginal gain like you
00:17:21
definitely will see better performance
00:17:22
but it's not going to be the performance
00:17:23
jump we saw I mean we saw 10 plus
00:17:25
percent uh jumps in many benchmarks I
00:17:27
mean it was it was taking L 70b and
00:17:30
bring it past 405b and much further
00:17:32
beyond that I mean we're expecting to
00:17:34
see with 45b is going to be hopefully
00:17:36
insane um so yeah it's really that this
00:17:39
can't be done with prompting alone if
00:17:41
you really want to get the full force of
00:17:42
it um this is more close this is closer
00:17:45
to how people think I think there are
00:17:47
many more Gams we can make there and it
00:17:49
was just one of those stupid obvious
00:17:51
ideas that no one really thought to do
00:17:54
right I think a lot of the major labs
00:17:55
they they're way too in the weeds in
00:17:56
many ways they're thinking to
00:17:58
technically and too intelligently about
00:18:00
this where sometimes the simple thing is
00:18:02
what actually is going to work right
00:18:04
just like so many people back in you
00:18:06
know 2018 whatever they were they were
00:18:08
working on all these new architectures
00:18:09
but scaling them when we had was the
00:18:11
right answer yeah and sahill I think you
00:18:14
mentioned like
00:18:16
putting almost like telling the model to
00:18:19
make the mistake and then showing it how
00:18:20
to correct itself I is is one of the
00:18:23
secrets to reflection but you also said
00:18:26
if you do that too much it will just
00:18:28
start making mistakes what is that
00:18:30
balance how did you discover where that
00:18:32
kind of Cliff is before it started just
00:18:35
taking those mistakes and and and
00:18:37
accepting them as
00:18:39
truth yeah I think there's a bunch of
00:18:41
small uh details there and I think next
00:18:44
week when we put out like a technical
00:18:46
report it's going to help people
00:18:48
understand that as well but
00:18:49
essentially uh if you if you look at uh
00:18:54
the responses the model generate you'll
00:18:56
see in somewhere or other it's not
00:18:58
always structure but it's going to
00:18:59
classify whether the problem is a hard
00:19:01
problem moderate or an easy problem and
00:19:04
you're going to see that it only uses
00:19:05
reflection andine of thought for
00:19:07
problems it thinks uh is a hard problem
00:19:10
for itself and that was part of the
00:19:12
training so we uh only added Reflections
00:19:16
we classified problems as easy moderate
00:19:18
and hard and we only added Reflections
00:19:20
to the hard problems because you don't
00:19:23
always need Reflections you don't want
00:19:24
to teach them all to
00:19:26
overthink uh that was one part of it and
00:19:28
I think I think uh the second balance is
00:19:30
we also added uh data samples which are
00:19:34
essentially just Reflections but the all
00:19:37
reflects on something it did which was
00:19:39
correct and just realizes that it does
00:19:41
not need to actually course correct and
00:19:43
it was already correct and move on with
00:19:45
the existing uh Chain of Thought So we
00:19:48
we try to balance it out with uh these
00:19:51
type of samples trying to cover as many
00:19:52
H cases as possible I think there's
00:19:54
still room for
00:19:56
improvement but uh it was essentially I
00:19:58
think someone on twit found out that
00:20:00
this was V5 of the model we trained we
00:20:02
had like V1 2 3 and four and that's
00:20:05
essentially what it was uh training
00:20:07
model and figuring out uh how to balance
00:20:10
the data set even more so if we're if
00:20:13
we're looking at this example I I guess
00:20:15
I'm trying to figure out uh you you give
00:20:17
it a prompt and it's running inference
00:20:20
and outputting is it truly able to
00:20:25
reflect on the output as it's giving the
00:20:28
output like what is what does that flow
00:20:30
look
00:20:32
like yeah so that was that was actually
00:20:34
an intentional a very intentional design
00:20:36
decision
00:20:37
where I think a lot of people get
00:20:39
frustrated when they look at this is
00:20:41
actually something we learned from hyper
00:20:42
right people get frustrated when they
00:20:43
see Chain of Thought you know some of
00:20:44
the more power Usery people they like it
00:20:47
but your average user doesn't they just
00:20:48
want to know the answer to the question
00:20:50
and if they want to dive deeper great
00:20:51
but they don't always want to dive
00:20:53
deeper and they don't want to dig
00:20:54
through amount of text Chain of Thought
00:20:55
text to find that answer so this gets
00:20:59
even worse with reflection right because
00:21:00
when it's reflecting in its output it's
00:21:02
like oh my God there's so much going on
00:21:04
here how do I parse through this it's
00:21:05
just the cognitive load is crazy so how
00:21:08
can we fix that and one of the ideas we
00:21:10
had um was just separate the reasoning
00:21:13
out right especially as inference is
00:21:15
getting faster look at what's going on
00:21:16
with for example Gro and cerebrus right
00:21:18
inference is going to get way way faster
00:21:21
and if we can do some of the thinking up
00:21:22
front and just sort of hide that from
00:21:24
the user and show it optionally if they
00:21:26
want to see it and then spit out a
00:21:28
perfect answer at the end that's what is
00:21:30
ideal um for most users in most used
00:21:32
cases the idea here is right the model
00:21:35
will start reasoning inside sort of what
00:21:37
we call output tags or sorry thinking
00:21:39
tags so it'll that's what we're seeing
00:21:42
right
00:21:43
here yeah it's going to start thinking
00:21:46
it does its Reflections in there it does
00:21:47
it Chain of Thought in there it does all
00:21:48
its planning all that and then it's like
00:21:51
okay once it's got the right answer or
00:21:52
it thinks it's got the right answer it's
00:21:54
like I'm done then it says okay time for
00:21:56
an output here's what I'm to show the
00:21:57
user and it writes the final message to
00:22:00
the user and I think different companies
00:22:02
and different use cases and interfaces
00:22:04
will use this differently um some people
00:22:07
want to see it all some people want to
00:22:08
make it optional some people want to
00:22:10
hide it entirely um but I think that
00:22:12
especially as models get faster and the
00:22:15
actual sort of thinking process takes
00:22:17
like less than a second for example I
00:22:19
think this is the right
00:22:20
approach and so that that was the
00:22:23
purpose of the tags just to clarify so
00:22:25
you have the thinking tags if you don't
00:22:26
want to see the thinking you just want
00:22:28
the output put it's kind of like
00:22:29
standard mode and then you have advanced
00:22:31
mode uh and then you can actually see
00:22:32
what's going on personally I love seeing
00:22:35
it like I want to know what it's
00:22:36
thinking why it's thinking that and that
00:22:39
can actually help you kind of recraft
00:22:42
the original prompt if you need to uh
00:22:44
because you could see if there's a
00:22:45
misunderstanding uh or or not so like
00:22:49
what what are your thoughts on on this
00:22:51
this method of self-correction during
00:22:54
inference time versus let's say having
00:22:57
multiple large language models uh
00:23:00
working in collaboration correcting each
00:23:02
other how does this
00:23:04
differ why not use them both right
00:23:07
everything can build on everything hell
00:23:10
yeah and if you can have multiple of
00:23:12
these models right thinking about
00:23:13
different angles on each problem and you
00:23:15
have reflection with within each model
00:23:17
it's just like having a better model
00:23:19
going after um a problem but having them
00:23:21
do it together okay amazing um and then
00:23:25
one last question for me and then I I
00:23:27
know we have a bunch of questions in the
00:23:29
chat so hopefully I know you have
00:23:31
probably about 12 more minutes so uh
00:23:33
hopefully we'll get to them both um what
00:23:35
have you seen in terms of token usage on
00:23:38
output as compared to a
00:23:40
non-reflection
00:23:43
model definitely more sah you might have
00:23:45
a much better response than definitely
00:23:49
more yeah uh haven't test it out like I
00:23:53
don't have exact numbers to share right
00:23:55
now but uh from just the test I've run
00:23:58
it seems to be uh 1.5 two times the
00:24:02
number of tokens uh for harder problems
00:24:05
so it's it's definitely generating a lot
00:24:08
more tokens and yeah I think if you're
00:24:12
using a hosted API that causes the the
00:24:14
cost of the latency problem but again I
00:24:16
think uh as Matt said INF is going to
00:24:19
get much faster and cheaper so uh we
00:24:22
kind of betting on that as
00:24:24
well all right um thank you so we got a
00:24:29
bunch of questions already uh we're
00:24:31
using a new streaming software I'm not
00:24:33
sure how to put it on the screen so I'm
00:24:34
just going to read them uh Tresor asks
00:24:37
will it be possible to fine-tune other
00:24:39
models like GPT 40 mini to implement the
00:24:42
reflection approach so basically does
00:24:44
this data set does this approach work
00:24:46
across the
00:24:48
board it should there will be
00:24:51
constraints with closed provider fine
00:24:54
tunings because uh they often have their
00:24:57
own limits on how you can do the fine
00:24:58
tuning so we'll go into this in more
00:25:00
detail when we release a report and we
00:25:02
we I think are going to be releasing the
00:25:03
data set um we're still making the final
00:25:05
decision there but it's likely G to be a
00:25:07
yes um it has to be trained in a very
00:25:10
specific way it's a very simple training
00:25:12
process it's straight up just like
00:25:14
standard fine-tuning right um there's no
00:25:16
rhf going on none of that but the way
00:25:19
the data is sort of split up it's to
00:25:21
sort of like enable it to not learn to
00:25:24
make extra mistakes but learn to correct
00:25:26
them um so for any open source model it
00:25:29
should be doable and relatively easy
00:25:31
compared to some of the other approaches
00:25:32
people have uh put forward for things
00:25:35
like fine tuning open AI maybe it's
00:25:38
possible especially if you're creative
00:25:39
about it I don't I haven't fine- tune on
00:25:41
the open API in a while but from what I
00:25:43
remember it would be a little bit
00:25:44
limiting for something like this yeah
00:25:46
and I I like what you said earlier um
00:25:49
this was just an overlooked strategy
00:25:51
like it's it's maybe it was it was too
00:25:54
simple uh not that's not to be
00:25:56
pejorative at all it's like but it is
00:25:58
you know sometimes it's just like hey
00:26:00
it's right in front of you here here's
00:26:01
here's a great strategy you can use um
00:26:03
so Yash asks what about an 8B version
00:26:07
and we know you mentioned the 405b is
00:26:10
coming out uh what about an 8B version
00:26:12
that more people can run maybe
00:26:15
locally yeah I mean we found and to be
00:26:18
fair we trained the AP first um but we
00:26:21
found that it made some big improvements
00:26:23
on certain benchmarks but others it
00:26:25
really didn't and eight
00:26:29
we weren't really sure where some of the
00:26:30
gains were coming from it it felt like
00:26:33
it was a little too dumb to pick up on
00:26:35
this perfectly um like the 70b
00:26:38
definitely sort of like crossed a
00:26:39
threshold where it was like able to
00:26:40
grock uh grock this to some extent um
00:26:43
but given the sort of crazy interest
00:26:45
we've seen like we we expected this to
00:26:46
do well we didn't expect it to do this
00:26:49
well um like it's been the craziest
00:26:52
launcher of My Life
00:26:54
um given that I think there probably is
00:26:57
enough interest to figure out something
00:26:59
there but you know you said before right
00:27:01
there there are many things that are
00:27:02
sort of like these low hanging fruit
00:27:03
things that just don't pay attention to
00:27:05
and we have a few other ideas in mind
00:27:06
that I think could do far better than
00:27:08
reflection so I think it's a question of
00:27:09
do we want to focus more on reflection
00:27:11
and really optimizing an 8B that you
00:27:13
know may be obsolete soon or really
00:27:15
going crazy and focusing on something
00:27:17
that could actually Top This by a good
00:27:19
bit and I think I'm more excited about
00:27:21
the lad but I was just about to ask you
00:27:25
that Matt sorry sorry to cut you off I
00:27:27
was just about to ask if you have any
00:27:29
kind of new ideas going on do you uh
00:27:31
want to hint at anything or not quite
00:27:35
yet I think if you look at my past
00:27:38
work and you think about this from the
00:27:40
perspective of like look some of the
00:27:43
learnings we've gotten from prompting
00:27:44
have not been properly thought about in
00:27:46
terms of training them into the model I
00:27:48
think you can kind of see where I'm
00:27:49
going with
00:27:50
this
00:27:52
um there are a few different ideas that
00:27:55
I'm kind of toying with and we're
00:27:56
talking about right now and I think the
00:27:58
interesting thing is with glaive like
00:27:59
they're not going to be so hard to
00:28:01
implement um they actually might be even
00:28:04
easier than reflection um they can be
00:28:06
comined with reflection reflection but
00:28:09
yeah I think it's the idea of like we're
00:28:11
going to start exploring what it looks
00:28:12
like to take these things that make it
00:28:15
better from a prompting perspective and
00:28:16
can you make it much better by training
00:28:19
on
00:28:20
that how do you give everyone the power
00:28:22
of like the world- class prompt
00:28:23
engineering um sort of stuff maked into
00:28:25
the model and even go beyond that that's
00:28:27
kind of how about it that's that's great
00:28:30
and uh you know Victor asked kind of as
00:28:32
an extension of that uh are you going to
00:28:35
provide the data sets publicly H so
00:28:38
other people can maybe you know uh
00:28:42
manicure them as they see fit and and
00:28:44
roll their
00:28:47
own I think it's highly likely s if you
00:28:50
have different feelings that we can
00:28:52
figure that
00:28:53
out no I think in the past uh like most
00:28:57
of the open Source work we have done uh
00:29:00
as always included data sets we are
00:29:02
essentially a data set company so it
00:29:03
makes sense plus it just uh makes it
00:29:06
really easy for people to reproduce uh
00:29:08
benchmarks uh the technique so overall I
00:29:12
think it would be it would make sense to
00:29:13
open source once we have uh the four
00:29:16
five be trained as
00:29:18
well awesome uh Mark Cox asks uh how
00:29:22
exactly does it recognize a quote
00:29:25
unquote mistake that requires a
00:29:28
reflection and actually I have even kind
00:29:30
of a followup to that
00:29:33
do does it always output a reflection is
00:29:36
it always
00:29:40
necessary uh no so uh in the data set uh
00:29:45
we only have Reflections as part of uh
00:29:48
samples where the model is likely to
00:29:51
make mistakes and we decided that by
00:29:54
just classifying the problem as hard so
00:29:56
uh if if you just ask simple question it
00:29:59
won't add a reflection tag uh if the
00:30:01
question is really simple it won't even
00:30:03
do Chain of Thought and just uh answer
00:30:05
pretty straightforwardly so model can
00:30:08
almost infer uh the difficulty of a
00:30:12
problem and decide when to do
00:30:14
Reflections uh but we've tried to train
00:30:16
them mod to do Reflections whenever it
00:30:18
makes uh it it solves a hard problem and
00:30:21
it encounters the step where it's making
00:30:23
a hard Chain of Thought step so even if
00:30:27
you have a really complicated problem
00:30:29
there would be like few steps within
00:30:31
that problem that the model is more
00:30:32
likely to make mistakes and we have
00:30:34
added Reflections in the r set right at
00:30:36
that step so whenever it makes uh for
00:30:39
example it does arithmetic it's more
00:30:41
likely to just use a reflection tag
00:30:43
right after that and see if what it did
00:30:45
was correct or
00:30:48
not okay great um Alex fov asked uh any
00:30:53
gut feeling about using Moe mixture of
00:30:56
experts and this technique
00:30:58
uh maybe if some experts are trained
00:31:00
like this and others
00:31:05
aren't first of all hi Alex um I don't
00:31:09
know I don't see why it would be much
00:31:12
different I've had poor experiences
00:31:16
training on mixure of experts models
00:31:17
just
00:31:18
generally so I don't know if you found
00:31:20
the
00:31:23
same yeah uh I mean one benefit could be
00:31:26
that uh
00:31:28
if there's uh an inference speed trade
00:31:31
off because uh Moes are usually faster
00:31:34
to inference that could be a
00:31:36
benefit uh intuitively I don't think we
00:31:39
would see any performance game uh
00:31:42
relative to just these tense models but
00:31:44
again I have never tried this so un
00:31:47
sure okay people will be able to try it
00:31:50
next week especially if we have the data
00:31:51
out so hopefully everyone does yeah I
00:31:54
mean once you do that I'm sure everybody
00:31:56
wants to get their get their hands on it
00:31:58
try experiment with with their own
00:32:01
techniques built on top of what you've
00:32:03
done with reflection so um and then
00:32:06
actually another one from Alex uh
00:32:08
effects on quantization of this and and
00:32:11
um Matt we were talking about this
00:32:12
earlier I'm planning on running it
00:32:14
through my full L llm test suite and I
00:32:17
want to download it I have two rtxa
00:32:20
6000s maybe that was enough for I think
00:32:22
you said fp8 uh but uh how how how do
00:32:26
you see quantization affect the quality
00:32:28
of the output using this
00:32:34
technique I don't know for sure I do
00:32:37
suspect especially from my limited
00:32:39
testing of like there's already an API
00:32:40
that's up that is
00:32:42
fp8 um I do suspect there's definitely a
00:32:45
performance loss I just don't know how
00:32:47
much it is right maybe it's 1% it really
00:32:50
doesn't matter um maybe it is more than
00:32:53
that I do think from what I saw at a
00:32:54
minimum it is much less consistent um I
00:32:58
did notice that I just haven't spent
00:32:59
enough time with it
00:33:03
personally okay uh Daniel Ranger is
00:33:06
asking about context length is there a
00:33:08
reason why the context length is 4K
00:33:10
instead of 128k and and I guess how have
00:33:12
you seen maybe that's not the case based
00:33:14
on your reaction uh how have you seen
00:33:17
context length affect the performance of
00:33:21
the model and maybe you could just
00:33:22
clarify what the context length
00:33:26
is it should be the full Lama I believe
00:33:29
one go ahead go ahead yeah so uh it's
00:33:33
the full Lama context line I think
00:33:36
128k uh around that uh I just that that
00:33:40
we haven't included data that's very
00:33:41
long context in in the training so we
00:33:45
don't know how the performance will be
00:33:47
on very long contexts and like that's
00:33:50
something to improve on in the
00:33:52
future but you you can essentially run
00:33:54
it through the entire context length of
00:33:57
L fre one
00:33:58
awesome uh okay I know you guys got to
00:34:01
go in like two minutes so I'll just wrap
00:34:03
it up quickly uh your post got 2.2
00:34:06
million views uh Matt we have 1,700 plus
00:34:10
viewers in here and growing it it keeps
00:34:12
going up so I think there's a lot of
00:34:15
appreciation for what you've done and
00:34:17
you as well sahill I just want to say
00:34:19
thank you especially like from the open
00:34:21
source Community I love seeing open
00:34:24
source compete with closed Source
00:34:25
Frontier models it just it it makes me
00:34:27
me so happy that I can download this
00:34:29
stuff and play with it myself um so
00:34:31
thank you so much if you want to check
00:34:33
out uh Matt here's his Twitter right
00:34:36
here Matt Schumer
00:34:38
uncore uh and he is the CEO and founder
00:34:43
of hyperr so check out hyperight Ai and
00:34:46
all of this was built with sah Hill's
00:34:48
company uh founder of glaive g a i v so
00:34:54
glaive doai I believe it is is that
00:34:56
right uh Sill yeah that's correct yeah
00:34:59
so if you have a novel approach just
00:35:02
like Matt did contact sahill at glaive
00:35:05
and you can make it happen and I just
00:35:07
want to say thank you both S I mean this
00:35:09
was like a last minute thing you just
00:35:10
pinged me Matt and and I was so happy
00:35:13
you agreed to join you too sahill so
00:35:15
thank you so much thanks to everybody
00:35:17
who joined and uh I'm glad we got some
00:35:19
uh got some
00:35:22
details yeah thank you for having us and
00:35:24
thank you for the
00:35:25
support yeah all right you know where to
00:35:27
find them we'll drop it all in the
00:35:29
description

الوسوم

Reflection 70b
AI model
open-source
reflection tuning
hyperr AI
glaive
synthetic data
machine learning
model training
innovation