00:00:02
hello everybody I hope you're doing well
00:00:06
uh this is Dr vahid
00:00:08
Arad I would like to demonstrate uh
00:00:12
doing regression analysis in this video
00:00:15
using chat GPT and I would also like to
00:00:18
compare the results of chat GPT analysis
00:00:22
with uh conventional statistical
00:00:24
software the data that I'm using is uh
00:00:29
as small part of a large data set which
00:00:33
uh is right in this window as you can
00:00:36
see I have got two independent variables
00:00:39
which I have called iv1 and iv2 and I
00:00:42
have a dependent variable uh DV for
00:00:45
short and I'd like to regress this DV
00:00:48
variable on these iv1 and iv2 variables
00:00:51
to figure out whether they can predict
00:00:53
the amount of variance or um the amount
00:00:57
of uh DV or the amount of variance that
00:00:59
you Ober in DV let's do that the first
00:01:02
thing that I have already done and I i'
00:01:05
like to share with you is to write a
00:01:07
good prompt I've already done that uh
00:01:09
that prompt and I've just copied it and
00:01:11
I'm going to paste it right here in the
00:01:13
window of chat GPT
00:01:16
40 all right so let's just paste it here
00:01:20
before I run this prompt I wanted to re
00:01:24
remind you of uh the data again this is
00:01:27
iv1 this is iv2 and the third column
00:01:31
represents the dependent variable all
00:01:33
the way down so what I did was to just
00:01:36
really copy and paste IV uh one two and
00:01:40
DV and paste it into the window
00:01:43
following that I wrote this prompt uh
00:01:46
I'd like to elaborate on the different
00:01:47
components of the prompt so if you want
00:01:49
to write a
00:01:50
prompt uh the components here might be
00:01:53
useful um as a kind of um template or
00:01:57
structure that you could apply
00:02:00
uh I have started by saying that there
00:02:02
are three columns of data labeled iv1
00:02:07
iv2 and DV so this is just an
00:02:09
introduction to the data then my request
00:02:13
is uh perform a linear regression
00:02:15
analysis using DV as the dependent
00:02:19
variable and iv1 and iv2 as the
00:02:22
independent variables so this is very
00:02:23
clear I think this is just a standard
00:02:25
language that we use in statistical
00:02:28
analysis then I have uh also included
00:02:31
estimate the beta coefficients the T
00:02:34
values and P values for both
00:02:37
independent variables and this is
00:02:40
important
00:02:42
because uh it's through an uh examining
00:02:46
the T values and P values uh that we
00:02:49
learn whether the independent variables
00:02:52
are significant predictors of variance
00:02:55
in our dependent
00:02:56
variable so this is important to be
00:02:58
included and additionally calculate the
00:03:01
R squ value at the end uh and then I
00:03:05
have requested to use the inter method
00:03:07
there are several different methods I
00:03:08
have discussed them in a previous video
00:03:11
I mean quite several previous videos uh
00:03:14
please watch uh those videos on my
00:03:15
YouTube channel if you haven't watched
00:03:17
them so the inter method for variable
00:03:20
entry and round all estimates to three
00:03:23
decimal places cuz uh previously I ran
00:03:28
this analysis the same code with chat
00:03:30
GPT I just wanted to make sure that it
00:03:33
understands my prompt and I realize that
00:03:36
it can give you lots and lots of decimal
00:03:38
values if you do not include um this uh
00:03:42
component in the prompt and finally
00:03:44
present the results in a table format I
00:03:46
mean if you like to include this uh you
00:03:50
can ask for table format otherwise you
00:03:53
can can just remove it if you do not
00:03:55
prefer to uh see the result in a table
00:03:57
format now I can run the but before that
00:04:00
I wanted to show you that under the chat
00:04:03
GPT button uh uh on this drop- down menu
00:04:07
you can see uh GPT
00:04:10
40 and then gpt1 preview um1 mini and U
00:04:16
there are quite a few others right here
00:04:18
uh o One Mini and four what I would like
00:04:21
to do is to compare chat GPT 40 with 01
00:04:25
preview to see which one of them
00:04:28
performs better and at the end I will
00:04:30
look at the results of the same analysis
00:04:33
in the conventional software in this
00:04:35
case I'm using jasp for the analysis
00:04:37
okay so let's run the analysis first of
00:04:39
all it's going to take a few minutes uh
00:04:42
maybe not a few minutes maybe a few
00:04:44
seconds for chat to figure out the
00:04:48
parameters all right so analyzing starts
00:04:51
if you click on this drop- down menu it
00:04:54
gives you the python code that is
00:04:56
running in the
00:04:58
background uh
00:05:01
um so the python code is being written
00:05:04
automatically and if everything goes
00:05:07
well uh you should be able to see the
00:05:09
results in a second or so yeah there we
00:05:12
go so linear regression
00:05:16
results uh are demonstrated both in this
00:05:19
table at the bottom and also in this
00:05:21
table uh just under the python if you
00:05:24
are familiar with python and are
00:05:27
interested in coding using python you
00:05:29
can just copy the code from this window
00:05:32
right here from this option in the
00:05:34
window and paste it into Python and run
00:05:37
the analysis you should be able to get
00:05:38
the same
00:05:40
results all right so let's go through
00:05:42
the results the first thing that we
00:05:44
observe here is is the beta coefficient
00:05:47
for The Intercept right here and also
00:05:49
right here they're the same so uh let me
00:05:54
just read it from here because I think
00:05:55
it's it's more um visible the intercept
00:05:59
cept has gotten a coefficient of uh 24
00:06:06
uh701 with a large T value which is most
00:06:09
likely statistically significant and how
00:06:12
do we know that uh this is the P value
00:06:16
the P value is
00:06:18
0.004 and that's for The Intercept right
00:06:22
that's that's not too bad uh if you
00:06:26
compare it particularly if you compare
00:06:28
it with the result of of uh your
00:06:31
conventional software in this case jasp
00:06:34
let me move this around a little bit
00:06:35
here
00:06:37
okay okay just please ignore that clock
00:06:40
um if you um compare it you see that the
00:06:45
on standardized intercept at the bottom
00:06:48
of this um output in the linear
00:06:50
regression tab is exactly the same as
00:06:54
what Chachi BT has identified for us so
00:06:57
that's really good I mean I can move
00:07:00
this to the right the left side so you
00:07:01
can see it better The Intercept is
00:07:06
24.71 uh and chat CPT gave us exactly
00:07:09
the same thing which is wonderful the T
00:07:12
value should be the same as well uh the
00:07:15
T value is
00:07:17
um yes
00:07:21
3451 which is um 3.4 51 and the P value
00:07:26
is significant now as to the other two
00:07:29
of variables in the analysis or two
00:07:31
parameters in the analysis which are iv1
00:07:33
and
00:07:34
iv2 uh the beta coefficients are these
00:07:39
two both of them are negative the first
00:07:41
one has a significant P value associated
00:07:44
with this TV value whereas the second
00:07:46
one doesn't have any significant P value
00:07:49
associated with it so let's look at the
00:07:51
results of
00:07:53
our jasp the as as you saw that the
00:07:56
first T value is almost exactly the same
00:08:01
I want to check again is - 2.
00:08:05
624 - 2. 624 the P value is exactly the
00:08:10
same and comparing the T the two t
00:08:12
values minus
00:08:15
0.49 uh you will see that they're also
00:08:18
the same and the P value is also the
00:08:20
same excellent it did a wonderful job of
00:08:24
analyzing the data and I'm very happy in
00:08:27
addition the uh r squ value which has
00:08:30
been estimated under M1 on this on top
00:08:34
of
00:08:35
this output is uh oops it's just jumping
00:08:40
around can you see that r s value is
00:08:44
0.412 which means that around 40% of the
00:08:47
variance is explained by our two
00:08:49
independent variables although one of
00:08:51
them is not statistically significant
00:08:53
and we can confirm that
00:08:56
0.412 is the r² value that's estimated
00:08:59
by chat
00:09:00
gbt uh 40 so great job chat GPT 40 I'm
00:09:07
impressed uh the other thing is that we
00:09:10
can go ahead and run the same analysis
00:09:14
under um chat gpt1 preview because I
00:09:19
have heard a lot about its capabilities
00:09:22
so chat GPT or1 preview is chosen I'm
00:09:26
going to paste the same prompt exactly
00:09:29
the same prompt into this window to see
00:09:31
how it's doing in this scenario so just
00:09:35
send the prompt and wait for a little
00:09:38
bit maybe slightly longer
00:09:41
than the wait time for chat PT 40
00:09:45
analysis uh for some reasons it takes
00:09:48
more time and this is how uh the process
00:09:52
of thinking is um demonstrated in chat
00:09:56
gp1 so it's going to take some time
00:09:59
Let's uh just wait and be patient to see
00:10:02
what kind of analysis we will get uh so
00:10:05
let me go back to my jasp window just
00:10:07
remind you that as I have discussed in
00:10:09
previous videos uh under jasp you can
00:10:14
basically run a regression analysis let
00:10:17
me move this downward a little bit so
00:10:20
you can see the window you can run a
00:10:22
regression analysis under the regression
00:10:25
tab uh under linear regression if you
00:10:27
click on linear regression tab tab you
00:10:29
will see uh the window let me move this
00:10:33
back up again uh of the linear
00:10:36
regression so you you got to move the
00:10:38
dependent variable to the dependent box
00:10:41
and the two IVs which in this case are
00:10:44
continuous variables to the covariates
00:10:47
the reason why we move it to the
00:10:49
covariates is that they're not
00:10:51
categorical if they were categorical you
00:10:53
would have moved it to factors and I
00:10:55
think this just gives us a decent uh
00:10:58
first look at the results of the
00:11:01
analysis because we get the r squ value
00:11:04
the adjusted r squ value rmsc and so on
00:11:08
in fact you can also ask chat TPT to
00:11:10
generate these statistics for you so
00:11:13
let's go back to the results of our
00:11:16
analysis all right so as you can see the
00:11:19
results are out and
00:11:24
um well they're not exactly the same as
00:11:27
what I got before uh it's quite
00:11:31
different actually let me close this
00:11:33
little window there to see if we've
00:11:35
gotten everything well so first things
00:11:39
first it says the R squ value is way
00:11:42
above the R square value that both chpt
00:11:45
40 and my J uh software estimated so
00:11:50
here I don't um I don't think it's
00:11:53
passing the test I'm afraid uh for the
00:11:56
intercept it has done a relatively good
00:11:58
job actually a good job I should say
00:12:00
because the estimation is similar to the
00:12:03
estimation
00:12:04
of oh it's not actually oh oops okay I
00:12:08
have to revise myself here the onst
00:12:10
stand do is
00:12:12
24.7 whereas it's
00:12:15
34.1 so it's not acceptable even though
00:12:19
the P value indicates that uh The
00:12:22
Intercept is statistically significantly
00:12:25
different from
00:12:26
zero uh in both scenarios the amount or
00:12:29
the coefficient of The Intercept is not
00:12:32
acceptable uh it's actually estimated
00:12:35
wrongly in the same way for iv1 and iv2
00:12:39
the uh T values and the coefficients
00:12:42
have been estimated
00:12:45
wrongly and as a result the P values are
00:12:47
not uh reliable even though the the
00:12:51
first P value indicates that it's um
00:12:54
basically statistically
00:12:56
significant interestingly the P value
00:12:59
for the second IV has been is is now
00:13:03
much smaller than what we saw even
00:13:05
though it's not statistically
00:13:06
significant yet uh I'm not sure if you
00:13:09
run the same analysis uh it would
00:13:12
produce the same output or not just in
00:13:14
the uh I'm just C curious to see if the
00:13:18
same results will be replicated or it
00:13:20
will just um randomly output some
00:13:23
statistics out there so let's run this
00:13:25
again and get back to the results to to
00:13:29
figure out whether the results are the
00:13:31
same or
00:13:33
different okay the results are out
00:13:35
they're exactly the same as the previous
00:13:38
result but as you can see they're wrong
00:13:41
because the r s value is way
00:13:44
overestimated and the coefficients are
00:13:46
also very different from the
00:13:47
coefficients that we got in the
00:13:49
conventional software jasp as well as
00:13:52
chat gbt
00:13:53
4 uh this is just a very brief
00:13:56
demonstration really I'm not uh at this
00:13:59
point confident that you should uh only
00:14:03
rely on chat GPT 40 to run your
00:14:06
statistical analysis but it clearly
00:14:09
demonstrates that chat bt40 at this time
00:14:12
has an advantage over uh 01 maybe over
00:14:17
time o1 will also be tweaked and fine
00:14:19
tune and it can do a similarly good
00:14:23
job um in conclusion chat GPD 40 uh
00:14:28
seems to be more capable of doing
00:14:32
regression analysis linear regression
00:14:33
analysis with two independent variables
00:14:36
whereas chat p21 totally failed to give
00:14:39
us any good results um in the future I
00:14:43
will see if um chat pt40 particularly
00:14:48
can do more sophisticated uh statistical
00:14:51
analysis and I'll be happy to share the
00:14:53
results of my finding with you on the
00:14:54
same video channel thank you very much
00:14:56
for your attention and have a great day