Cal é o nome do modelo de IA probado neste estudo?

O modelo de IA probado chama-se 01 preview.

Con que se comparou 01 preview no estudo?

Comparouse con médicos humanos e co modelo GPT-4.

Que tipo de casos médicos utilizáronse para probar a IA?

Utilizáronse casos médicos complexos do New England Journal of Medicine.

En que destacou 01 preview en comparación con GPT-4?

Destacou na identificación de condicións raras e complexas con alta precisión.

Que indica unha puntuación de enlace de cinco?

Indica que a diagnosis é completamente correcta.

Como se desempeñou 01 preview en comparación coas IA anteriores e os médicos humanos?

Actuou significativamente mellor, especialmente en razoamento de xestión médica.

Que porcentaxe de casos críticos detectou correctamente 01 preview?

Conseguiu unha consistencia lixeiramente maior dentro do 50% ao 100% de acertos en diagnosis críticas.

Que limitacións se advertiron sobre o uso de IA na medicina?

A advertencia de confiar demasiado nas IA debido aos riscos de alucinacións nas súas respostas.

Que proxeccións se fixeron sobre o uso de IA no futuro da medicina?

Proxéctase un maior uso de IA para revisar decisións médicas e mellorar diagnósticos.

Que destacou un caso cualitativo do uso de 01 preview para un diagnóstico de enfermidade inmunolóxica?

Un experto no campo atopou os resultados da análise do 01 preview impresionantes.

OpenAIs New Model Stuns Even DOCTORS!

00:12:52

https://www.youtube.com/watch?v=E9TB7CvkmzE

الملخص

TLDREste estudo analiza como a intelixencia artificial (IA) avanza na medicina comparando o novo sistema de IA chamado 01 preview con médicos humanos e modelos anteriores como o GPT-4. A investigación enfocouse en casos médicos complexos, extraídos do prestixioso New England Journal of Medicine, que desafían incluso a médicos experimentados. Os resultados mostran que 01 preview non só foi capaz de diagnosticar con precisión condicións raras que GPT-4 non puido resolver, senón que tamén superou aos médicos humanos en probas de razoamento diagnóstico e manexo médico. O estudo subliña a evolución das capacidades de razoamento e toma de decisións da IA en escenarios da vida real. A pesar destes prometedores resultados, destaca a necesidade de precaucións para evitar unha excesiva dependencia na IA debido ao risco de alucinacións nos diagnósticos. Avánzase que a IA podería desempeñar un papel crucial no futuro da medicina, complementando os médicos humanos na identificación de condicións infrequentes e na revisión de decisións médicas, o que podería salvar vidas ao reducir erros humanos.

الوجبات الجاهزة

🤖 01 preview destaca por riba de GPT-4 na diagnose de casos complexos.
🔍 Comparación feita con datos reais do New England Journal of Medicine.
🧑‍⚕️ A capacidade de 01 preview supera tamén a médicos humanos en razoamento.
📊 Resultados mostran que IA pode mellorar o diagnóstico médico.
⚠️ Advirten dos riscos de confiar demasiado nas IA en medicina.
📈 A evolución da IA xera promesas no futuro diagnóstico.
🔬 Estudo detalla casos onde a IA mellora a detección de enfermidades raras.
🧠 Razónase sobre como futuras IAs poderían influír na xestión médica.
📅 Proxección do uso de IA en medicina nos próximos anos.
💡 IA combinada co xuízo humano podería reducir erros e salvar vidas.

الجدول الزمني

00:00:00 - 00:05:00
Hoxe imos analizar un estudo fascinante que podería cambiar a forma en que pensamos sobre a IA na medicina. Os investigadores probaron un dos sistemas de IA máis recentes, chamado 01 Preview, comparándoo con médicos humanos e modelos anteriores como GPT-4, para ver ata que punto a IA mellorou no diagnóstico e toma de decisións médicas. Este estudo non foi unha simple proba, xa que puxeron a IA a través de catro ou cinco retos intensos empregando casos reais do New England Journal of Medicine, buscando ver se a IA podía pensar e razoar como un doutor en escenarios do mundo real. A IA modelo, chamada Open AI1, foi comparada con GPT-4 en casos complexos, mostrando que 01 Preview foi capaz de diagnosticar con precisión en múltiples instancias onde GPT-4 fallou. Estes casos amosan que cos avances actuais, as novas series de modelos de IA destacan cando se trata de escenarios complexos.
00:05:00 - 00:12:52
Neste segmento móstrase a comparación de rendemento entre diferentes sistemas de diagnóstico, incluíndo a 01 Preview, GPT-4 e médicos humanos, empregando casos do New England Journal of Medicine dende 2012 ata 2020. Os sistemas modernos de IA destacan polos seus diagnósticos correctos en porcentaxes significativamente maiores que sistemas máis antigos e que os propios clínicos humanos. Ademais, a análises explícase sobre como a IA superou aos humanos non só en diagnóstico, senón tamén no razoamento de xestión médica, aínda que cabe notar que 01 Preview é aínda unha versión previa e modelos máis recentes poderían demostrar aínda máis precisións. O segmento remata especulando sobre o futuro da IA na medicina, imaxinando avances que permitan aos médicos superar posibles erros fatais grazas á axuda destas ferramentas avanzadas de IA.

الخريطة الذهنية

فيديو أسئلة وأجوبة

Cal é o nome do modelo de IA probado neste estudo?
O modelo de IA probado chama-se 01 preview.
Con que se comparou 01 preview no estudo?
Comparouse con médicos humanos e co modelo GPT-4.
Que tipo de casos médicos utilizáronse para probar a IA?
Utilizáronse casos médicos complexos do New England Journal of Medicine.
En que destacou 01 preview en comparación con GPT-4?
Destacou na identificación de condicións raras e complexas con alta precisión.
Que indica unha puntuación de enlace de cinco?
Indica que a diagnosis é completamente correcta.
Como se desempeñou 01 preview en comparación coas IA anteriores e os médicos humanos?
Actuou significativamente mellor, especialmente en razoamento de xestión médica.
Que porcentaxe de casos críticos detectou correctamente 01 preview?
Conseguiu unha consistencia lixeiramente maior dentro do 50% ao 100% de acertos en diagnosis críticas.
Que limitacións se advertiron sobre o uso de IA na medicina?
A advertencia de confiar demasiado nas IA debido aos riscos de alucinacións nas súas respostas.
Que proxeccións se fixeron sobre o uso de IA no futuro da medicina?
Proxéctase un maior uso de IA para revisar decisións médicas e mellorar diagnósticos.
Que destacou un caso cualitativo do uso de 01 preview para un diagnóstico de enfermidade inmunolóxica?
Un experto no campo atopou os resultados da análise do 01 preview impresionantes.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
So today we're diving into a fascinating
00:00:02
study that could change about how we
00:00:03
think about AI in medicine researchers
00:00:05
tested one of the newest AI systems
00:00:07
called 01 preview against both human
00:00:10
doctors and previous models like GPT 4
00:00:13
just to see how good AI has become at
00:00:16
medical diagnosis and decision-making
00:00:18
now this wasn't a simple test the
00:00:20
researchers put the AI through four or
00:00:23
five different intense challenges
00:00:24
ranging from diagnosing complex medical
00:00:27
cases that have stumped doctors to
00:00:28
suggesting treatment plans to
00:00:30
identifying critical conditions that
00:00:32
absolutely can't be missed they used
00:00:34
real medical cases from the prestigious
00:00:36
New England Journal of Medicine these
00:00:39
are the kind of complex cases that even
00:00:41
experienced doctors find challenging and
00:00:43
what makes this study particularly
00:00:45
interesting is that they didn't just use
00:00:47
multiple choice questions instead they
00:00:49
tested the abilities of the AI to think
00:00:52
and reason like a doctor would in real
00:00:54
world scenarios because they wanted to
00:00:56
see if the AI could handle complex
00:00:59
multi-step thinking that doctors use
00:01:01
every day when treating patients and in
00:01:03
this video I'll break down exactly what
00:01:05
they tested what they found and what
00:01:07
this could mean for the future of
00:01:08
healthcare now one of the things that
00:01:10
they found in this research was the fact
00:01:12
that this AI model the open ai1 model
00:01:16
was actually really really impressive in
00:01:19
comparison to GPT 4 they actually
00:01:21
showcase around three different cases
00:01:24
where GPT 4 cannot solve a complex case
00:01:28
it can't diagnose it and and it manages
00:01:31
to get it completely wrong whereas 01
00:01:34
gets the diagnosis completely right so
00:01:37
case one they had some really complex
00:01:39
disease GPT 4 got it completely wrong
00:01:42
with a bond score of zero then 01
00:01:44
preview managed to get it completely
00:01:46
right and identified the exact condition
00:01:49
case number two there was another
00:01:50
complex task which GPT 4 completely
00:01:53
missed and listed common conditions
00:01:55
instead then 01 preview completely
00:01:57
nailed it and got the rare condition
00:02:00
completely right then we had case three
00:02:02
there was an actual condition and GPT 4
00:02:04
was close which you know managed to get
00:02:06
a bond score of three and listed some
00:02:09
correct information but incorrect
00:02:11
conditions whereas 01 preview got it
00:02:14
exactly right again and in this what's
00:02:17
particularly interesting is that the
00:02:18
bond score shows how close each AI got
00:02:21
zero is completely wrong five is exactly
00:02:24
right and these were actually really
00:02:25
tough cases so these were like medical
00:02:27
Mysteries and GPT 4 tended to guess more
00:02:30
common conditions but 01 preview was
00:02:32
able to identify rare and complex
00:02:35
conditions pretty accurately and this
00:02:36
just basically shows us that with each
00:02:38
Improvement of AI and of course with
00:02:40
this new series of models whilst yes you
00:02:42
might use this AI on a day-to-day basis
00:02:45
when we are tackling complex scenarios
00:02:47
like this this is where these thinking
00:02:50
models really do shine now there was
00:02:53
also this image right here and this
00:02:55
image shows a comparison of how well
00:02:57
different diagnostic systems both Ai and
00:03:00
human perform at correctly diagnosing
00:03:02
medical conditions using cases from the
00:03:04
New England Journal of Medicine and this
00:03:06
is from 2012 to 20 so now the types of
00:03:09
systems showns in the blue colors are of
00:03:12
course the modern AI systems and the
00:03:14
light blue is where you have the older
00:03:16
diagnostic systems that required doctors
00:03:18
to manually input symptoms and of course
00:03:20
in the brown bar at the bottom that is
00:03:23
where you can see the human clinicians
00:03:25
performance now overall what we can see
00:03:27
here is that there is of course a Stark
00:03:29
impr Improvement when we look at the 01
00:03:32
preview compared to GPT 4 then when we
00:03:34
look at these older AI systems we can
00:03:36
see that they're not as good and of
00:03:38
course we can see compared to the
00:03:39
clinician there is a large increase in
00:03:42
terms of the percentage correct
00:03:43
diagnosis from here you can see it's
00:03:46
around 30% whereas with these llms it's
00:03:49
around 60 to above 75% which is rather
00:03:52
surprising and this really goes to show
00:03:54
us just how powerful these AI systems
00:03:57
are I know a lot of people give these
00:03:58
generative AI system system Flack
00:04:00
because oh they're just regurgitating
00:04:02
stuff but when you apply them to medical
00:04:05
use cases you can see that these tools
00:04:07
are remarkably powerful for diagnosing
00:04:09
different diseases or diagnosing
00:04:11
different things in a variety of
00:04:12
different scenarios processing complex
00:04:14
bits of medical information and arriving
00:04:16
at correct diagnosis is the kind of
00:04:18
thing that AI is exactly designed for or
00:04:21
should I say uniquely designed for now
00:04:23
we can see here figure five comparison
00:04:25
of GPT 4 01 preview and Physicians for
00:04:29
management and diagnostic reasoning and
00:04:31
we can see here that this image shows
00:04:33
how well different groups performed when
00:04:35
managing medical cases called gry
00:04:37
matters management cases comparing
00:04:39
scores between 01 preview by itself
00:04:41
which scores are remarkable 85 to 90%
00:04:44
GPT 4 AI scoring around 40 to 50% and
00:04:47
human Physicians using a GPT 4 as a tool
00:04:51
scoring around 40 to 50% and then of
00:04:53
course human Physicians using standard
00:04:56
traditional medical resources scoring
00:04:59
are whopping 30 to 40% so this is rather
00:05:03
fascinating once again the scores
00:05:04
ranging from 0 to 100 show us that 01
00:05:07
preview clearly outperformed all other
00:05:10
options by a large margin and this is
00:05:12
fascinating because this performed
00:05:14
significantly better than both GPT 4 and
00:05:18
the human Physicians interestingly there
00:05:19
wasn't much difference alone between GPT
00:05:21
4 and the Physicians using GPT 4 but
00:05:24
this visualization powerfully
00:05:25
demonstrates how much more capable 01
00:05:27
preview is at Medical Management
00:05:29
reasoning compared to both earlier AI
00:05:31
systems and human Physicians even when
00:05:34
those Physicians have access to AI or
00:05:36
traditional resource now in addition to
00:05:38
this I do want to caveat this by saying
00:05:40
this is 01 preview this isn't even the
00:05:42
full 01 nor is it even 03 which was
00:05:46
recently released by opening ey/ demode
00:05:49
and we know that that model is even
00:05:51
smarter so imagine what kinds of results
00:05:53
that would get if this preview model is
00:05:55
getting around 80 to 90% we can also see
00:05:58
this in terms of the landar Mark
00:05:59
diagnostic cases and these cases are
00:06:01
basically the greatest medical Mysteries
00:06:04
that have been solved they're like
00:06:05
famous cases that have become teaching
00:06:07
Classics in medicine kind of like the
00:06:09
greatest hits of medical diagnosis now
00:06:12
these are real patient cases from the
00:06:14
past that were particularly challenging
00:06:15
or groundbreaking they helped doctors
00:06:17
learn something new about a disease or
00:06:19
condition and they often changed how
00:06:21
doctors approach diagnosing similar
00:06:23
problems now what makes these landmark
00:06:25
cases is that they're usually complex
00:06:27
cases that weren't obvious to solve they
00:06:29
often involved unusual combinations of
00:06:31
symptoms and the final diagnosis was
00:06:34
essentially surprising or taught doctors
00:06:36
something new and they become standard
00:06:38
teaching tools in medical schools now
00:06:40
when they managed to test these AI
00:06:42
systems on this we can see once again
00:06:44
that 01 preview manages to get a
00:06:47
extremely high score on the leftand side
00:06:49
and we can see that gp4 only also
00:06:51
manages interestingly to outperform
00:06:53
Physicians with gp4 and Physicians with
00:06:56
gp4 does perform better than Physicians
00:06:58
and resources now interestingly here we
00:07:00
can see that the AI didn't manage to
00:07:02
supersede humans that much because there
00:07:04
were several cases where humans managed
00:07:06
to get this stuff but we can see here
00:07:07
that the AI is definitely really
00:07:10
effective when it does come to these
00:07:11
Landmark diagnostic cases I mean whether
00:07:13
or not you could say that this is a
00:07:14
training data thing I still think that
00:07:16
this is remarkably impressive
00:07:18
considering the Physicians are seeming
00:07:19
better off with these AR tools rather
00:07:21
than without them now this graph right
00:07:23
here shows how often different groups
00:07:25
caught the most critical diagnosis and
00:07:27
this is what they call cannot miss
00:07:29
diagnoses these are the diagnosis
00:07:31
conditions that if they are missed they
00:07:34
could be life-threatening for patients
00:07:36
so we have four different categories so
00:07:38
we got the residents in pink which are
00:07:40
junior doctors in training we've got the
00:07:42
attending physicians in green which are
00:07:44
experienced fully qualified doctors then
00:07:46
we've got gp4 in blue the previous AI
00:07:49
model and 01 preview in purple the
00:07:52
newest AI model now what the graph shows
00:07:54
is a scale that goes from 0 to 1 or 0%
00:07:57
to 100% And the boxes show where the
00:07:59
majority of the scores were and the
00:08:01
black lines show the full range of
00:08:03
different scores and of course the dots
00:08:06
show the individual results now all
00:08:08
groups perform similarly around a 50% to
00:08:11
100% rate but we can see once again that
00:08:13
01 preview was more slightly consistent
00:08:16
and residents showed more variation in
00:08:18
performance experienced doctors
00:08:20
performed about as well as these AI
00:08:22
systems and this was rather fascinating
00:08:25
because once again we see that AI
00:08:26
manages to perform really well in these
00:08:28
scenarios now let me break down this
00:08:30
table which shows how 01 preview planned
00:08:32
medical tests compared to what actually
00:08:34
happened in the case if we take a look
00:08:36
at this first case you can see you know
00:08:38
there was a certain plan which the
00:08:40
doctors actually planned and then
00:08:42
interestingly the 01 preview managed to
00:08:44
suggest another plan which was actually
00:08:47
very similar to exactly what these
00:08:49
doctors suggested so you can see here in
00:08:51
this case it managed to get a two score
00:08:54
which is a completely correct score when
00:08:55
it comes to planning certain things in
00:08:58
terms of the range of tests that you
00:09:00
would conduct when you're trying to
00:09:02
figure out what kind of diagnosis that
00:09:03
you would have now there were some
00:09:06
things here that were rather interesting
00:09:08
it was impressive that the AI didn't
00:09:10
just suggest random tests it laid out a
00:09:12
comprehensive stepbystep plan that
00:09:14
included backup plans and Alternatives
00:09:17
it explained why each test was needed
00:09:19
and it matched what expert doctors
00:09:21
actually did in real life and this was
00:09:23
rather fascinating because there are
00:09:24
complex steps that go into doing this
00:09:27
and it's important to understand that
00:09:28
all of those reasoning steps have to be
00:09:30
completed successfully for the AI to get
00:09:33
the right answer now there were certain
00:09:35
areas where the AI was wrong there were
00:09:37
two other scenarios where the AI got
00:09:38
half the answer right and then the other
00:09:40
one got completely incorrect but I think
00:09:43
the most fascinating thing about this is
00:09:45
that this is an AI system which isn't
00:09:47
just purely medically based like it
00:09:49
isn't fine-tuned on medical issues but
00:09:51
remarkably we can see that when we're
00:09:53
looking at these diagnosis we're seeing
00:09:55
these suggested plans we're seeing that
00:09:57
it's able to sometimes get the right
00:10:00
suggested plan and the right steps to
00:10:02
take which is rather impressive and we
00:10:04
can only imagine what's going to happen
00:10:05
in the next 5 years the kinds of models
00:10:08
that we're going to be get and just how
00:10:09
accurate they are in terms of diagnosing
00:10:12
conditions and of course suggesting
00:10:13
plans of course I would say though that
00:10:15
I hope humans don't become too reliant
00:10:17
on this because of course with
00:10:18
hallucinations you wouldn't want to have
00:10:20
you know a tired dentist that is
00:10:21
overworked or a tired doctor that is
00:10:23
overworked or atire clinician or
00:10:25
physician that just uses what the AI
00:10:27
says and then next thing you know a UC
00:10:29
ination manages to mess up a person so
00:10:31
of course I do think that humans will
00:10:33
always have a role to play when it comes
00:10:35
to diagnosing individuals we could also
00:10:37
see here that this individual said that
00:10:39
I had A1 analyze a very specific immune
00:10:42
disease for my friend who happens to be
00:10:44
one of the top scientists in the field
00:10:46
and After High said the results his
00:10:48
response was oh my God I just read it
00:10:50
this is breathtaking this is insanely
00:10:51
good so we can see also that the
00:10:53
qualitative results from individuals
00:10:55
using this at the top of their field
00:10:57
does seem to be one that proves that
00:10:59
these models are also rather fascinating
00:11:01
so with that being said what do you guys
00:11:03
think is the future of AI and humans
00:11:05
when it comes to the medical industry I
00:11:08
think it's really fascinating that we're
00:11:09
now starting to explore this in further
00:11:11
detail I do think that with rules and
00:11:12
regulations it's going to be pretty hard
00:11:14
to actually get these models out into a
00:11:17
real sort of practice but I do think
00:11:19
we're going to start to see more and
00:11:20
more cases where doctors may have missed
00:11:22
certain things but users taking it into
00:11:23
their own hands to consult with a model
00:11:25
like 01 or even 03 and get remarkable
00:11:28
results that doctors simply would have
00:11:30
missed this is something that I've
00:11:31
discussed before that literally millions
00:11:33
of Americans die each year because
00:11:35
doctors manag to make mistakes we will
00:11:37
make mistakes we're humans but the only
00:11:38
problem is is that in the medical
00:11:40
industry sometimes there are situations
00:11:42
that are simply life or death and those
00:11:44
mistakes do cost lies so maybe having an
00:11:46
AI System review every single decision
00:11:49
made maybe we could catch those rare
00:11:51
conditions or diseases that we otherwise
00:11:53
would have missed and then of course
00:11:55
having humans check over and run the
00:11:56
necessary test to ensure that what the
00:11:58
AI suggests Ed is potentially factual
00:12:00
with that being said would you be open
00:12:02
to having an AI doctor I personally
00:12:04
think that with the next 15 to 20 years
00:12:06
we're certainly going to have maybe some
00:12:07
pods or something where you prick your
00:12:09
finger you get an instant blood test you
00:12:11
get an AI doctor that tells you
00:12:12
everything wrong in your body you get
00:12:14
instant diagnosis you get an AI that
00:12:15
reasons over all of your personal data
00:12:17
Maybe it knows everything you've done
00:12:19
everything you've seen it knows
00:12:20
everything you've eaten and it's able to
00:12:22
condu probably the most effective plan
00:12:24
for you because it understands your
00:12:26
emotional state your physical state your
00:12:27
water levels how much you've been
00:12:28
drinking and it can probably suggest the
00:12:31
most accurate thing context is of course
00:12:33
key and I find that the more context you
00:12:34
give these models and of course your
00:12:36
doctors the better they become and if we
00:12:38
look at how AI is going to be integrated
00:12:40
into our lives I wouldn't be surprised
00:12:41
if we're going to be sharing that AI
00:12:43
data with our doctors very soon a very
00:12:45
interesting world for those of you who
00:12:47
are trying to live for other with that
00:12:49
being said if you enjoyed this video I
00:12:50
would like to see you in the next one

الوسوم

IA na medicina
diagnósticos médicos
comparación IA e médicos
GPT-4
casos complexos
razoamento médico
sistemas de IA
diagnóstico preciso