00:00:00
hello and welcome to lesson
00:00:02
23 the goodness of fit test the idea of
00:00:06
the goodness of fit test is somebody
00:00:08
makes a claim like for example I claim
00:00:10
that the grades in the class are evenly
00:00:13
distributed so basically that's saying
00:00:15
you'll have the same number of A's the
00:00:17
same number of B's the same number of
00:00:19
C's as you can see my claim is not even
00:00:24
close they're not evenly
00:00:26
distributed but I'm still going to make
00:00:28
the claim that they're evenly dist
00:00:31
distributed so one thing we need to do
00:00:33
is count the number of categories so
00:00:35
there's a b c d f so that's 1 2 3 four
00:00:39
five and for some reason they use the
00:00:43
letter k for categories so categories
00:00:46
there's five of
00:00:47
them the next thing I need to do is find
00:00:50
out what the sample size is so add up
00:00:52
all of these students and see how many
00:00:54
that
00:00:57
is calculator is looking a little bit
00:00:59
bright right let me turn that
00:01:02
down okay that's better so a 12 + a 16
00:01:09
and a 22 and a three and a five means
00:01:13
there's 55
00:01:16
students so
00:01:19
n equals
00:01:22
55 and when the claim is that they're
00:01:25
evenly
00:01:26
distributed basically you just take the
00:01:28
n and divide by the K so basically 55
00:01:32
students total divide them evenly into
00:01:35
five categories and
00:01:39
55 / 5 that should be an 11 55 ID 5
00:01:47
that's an 11 and if this turned out to
00:01:49
be a decimal just go ahead and use the
00:01:52
decimal so this turned out to be a nice
00:01:55
even 11 so that's what's expected so if
00:01:59
the claim is true and they are evenly
00:02:02
distributed then there should have been
00:02:04
11 in each
00:02:06
category so this is 11 11 and in theory
00:02:11
that's what it be would be if they're
00:02:13
evenly
00:02:15
distributed you can also say that the
00:02:18
claim is that the proportion of people
00:02:21
that get A's equals the proportion of
00:02:24
people to get B's etc for
00:02:27
C's and D's and
00:02:31
Fs so there's the claim and with this
00:02:34
there is no h sub o and H1 we just go
00:02:37
with the claim just by
00:02:41
itself okay now this is going to be a
00:02:44
Kai squar test and the K squ
00:02:46
distribution looks like
00:02:49
this so it starts at zero and then it's
00:02:54
skewed looking like
00:02:57
this and in order to find the critical
00:03:00
value well for one thing this is a right
00:03:02
tail test only so there's only a right
00:03:11
tail that makes it nice because then you
00:03:13
don't have to decide left tail right
00:03:16
tail this is always right tail and then
00:03:20
if we use 95% level of confidence then
00:03:23
this little tail over here would be 5%
00:03:27
or 1us .95 is the
00:03:30
05 and then we need to look up the
00:03:32
critical
00:03:34
value so for many of these videos I've
00:03:36
been using the calculator to find the
00:03:38
critical values using the distribution
00:03:41
right here but they don't happen to have
00:03:43
this one in the calculator so we
00:03:45
actually have to rely on a
00:03:48
table
00:03:50
and wherever you clicked on the the
00:03:52
video I did put a link to this table
00:03:55
right
00:03:56
here so this is the kai Square distri
00:04:00
distribution and the ones over here are
00:04:02
if you have a left tail and the ones
00:04:04
over here are for a right tail so we're
00:04:06
actually only going to be using the
00:04:09
right
00:04:12
side and what I'm looking for is on the
00:04:15
right side there's
00:04:16
05 so I use
00:04:20
05 and also we need degrees of freedom
00:04:22
so I know how far down to go in this
00:04:25
list well the degrees of freedom just
00:04:28
comes from the k
00:04:31
-1 for the T Test it's n minus1 but for
00:04:34
this one it's K
00:04:36
minus1 so the degrees of
00:04:39
freedom so in general degrees of freedom
00:04:43
is K minus one so in this case it's
00:04:47
going to be 5 - 1 it's
00:04:50
four okay now back to my piece of
00:04:55
paper so over here they have the degrees
00:04:58
of freedom and degrees of Freedom Four
00:05:00
is right
00:05:02
here and then I like to make a line
00:05:06
right
00:05:08
there so this is degrees of freedom of
00:05:10
four this is the column that said
00:05:13
05 so then I go right here and that says
00:05:17
a
00:05:18
9.48 so that's the critical value
00:05:25
9.48 so the critical value is
00:05:28
9.4
00:05:30
88 all right now we're almost done now
00:05:34
in order to get the test statistic we're
00:05:36
going to use the calculator and I'm
00:05:38
going to put these observed values in
00:05:40
list one and these expected values in
00:05:43
list
00:05:47
two so go to stat and then edit the list
00:05:53
and I happen to have some stuff left
00:05:54
over so I need to go to the top of the
00:05:57
list and then use clear to clear out the
00:06:01
list and L2 also clear it
00:06:05
out and now L1 is going to be the
00:06:08
observed values so that's a
00:06:11
12 16
00:06:14
22 a three and a
00:06:18
two and then the expected values all of
00:06:22
those 11s go in list two so 11 just keep
00:06:25
typing 11 five
00:06:28
times there we
00:06:31
go and then for the last part you go
00:06:34
to you go to
00:06:37
stat move over to
00:06:39
test and this is called the goodness of
00:06:42
fit test so you go down to where it says
00:06:46
gof for goodness of fit so that's letter
00:06:53
D so D the kai squared good
00:07:00
of fit
00:07:02
test is what I'm using on the
00:07:10
calculator and when you hit enter it's
00:07:13
then going to ask you where are the
00:07:15
lists so the observed should be in list
00:07:18
one so if it doesn't say it put second
00:07:21
one the expected values should be in
00:07:23
list two and put second two for
00:07:27
L2 and then there they want to know
00:07:30
degrees of freedom which is
00:07:33
four and then just go down to
00:07:37
calculate and from this all we need is
00:07:39
the kai squared which is the
00:07:46
26545 so the test
00:07:52
statistic Kai squar equals a
00:07:58
26.5 4
00:08:02
five so this is the cut off for things
00:08:04
being unusual anything that goes past
00:08:06
9.4 is unusual this goes way past
00:08:10
that so that means that we can reject
00:08:13
the
00:08:15
claim so the
00:08:17
claim is
00:08:20
false in other words grades are not
00:08:23
evenly distributed if they were they
00:08:25
would have all been 11s They are not
00:08:28
close to 11
00:08:30
so my claim was
00:08:33
false okay then we need to do one more
00:08:36
example so basically with the goodness
00:08:38
of fit test there's two types one is it
00:08:40
says evenly distributed so you just take
00:08:44
the total number of people divide by the
00:08:46
categories and then that is what you use
00:08:48
for all of the expected
00:08:50
values then with the second type of
00:08:54
example it could give you
00:08:57
specific percentages to to use for each
00:09:01
category so we need to start off the
00:09:03
same the number of categories count how
00:09:05
many that is and that's still five
00:09:09
categories and then add up the number of
00:09:12
people and the these are the same
00:09:14
numbers as the last example all I'm
00:09:17
doing is changing the claim to show you
00:09:19
the idea of the two possibilities for
00:09:22
the claim so when you add these up that
00:09:24
still equals n equals 55
00:09:28
people
00:09:30
and then this one is saying that for the
00:09:32
A's there's going to be
00:09:35
20% so change the 20% to a decimal which
00:09:40
is20 and then multiply with the N
00:09:43
multiply with
00:09:47
55 and then go on to the B's and I said
00:09:50
for the B's it would be 25% so as a
00:09:54
decimal this is going to be 0 25 and
00:09:57
then it's always going to be the total
00:09:59
times the total number of people so next
00:10:01
is a 25 * 55
00:10:06
people and for the C's I said it was
00:10:08
going to be 40% that's going to
00:10:12
be40 *
00:10:16
55 for the D's I said it was going to be
00:10:23
10% and then finally for the fs I said
00:10:26
it would be 5% and 5% is is
00:10:34
05 it looks like I should have made the
00:10:36
Box a little bit bigger because now I
00:10:37
need to see what these numbers
00:10:39
equal I'll just write it right below the
00:10:42
box that's
00:10:44
okay so let me clear out that old
00:10:46
problem and then we've got2 *
00:10:50
55 so that's an
00:10:54
11 and then next is 25 * 55
00:11:00
that's a
00:11:06
13.75 and then next is going to
00:11:10
be40 *
00:11:13
55 so that's
00:11:16
22 and right here you might say it's not
00:11:19
possible to have 75 of a person that's
00:11:22
true but right now we're not talking
00:11:24
about actual people we're just saying in
00:11:27
general in theory how many people would
00:11:29
that be so you go ahead and use the
00:11:32
13.75 and then onto the next one 10% of
00:11:38
55 that is
00:11:42
5.5 and then
00:11:46
last 05 * 55 people and that's
00:11:53
2.75 so this last one is
00:11:58
2.75 okay now for the
00:12:01
claim so basically I said that 20% of
00:12:04
people will get an A so that's saying
00:12:06
the proportion of A's will be in decimal
00:12:10
form that's
00:12:12
20 the proportion for
00:12:15
B's is 25 so that's in decimal form
00:12:21
25 the proportion of people that will
00:12:23
get C's is 40% so that's 40
00:12:30
and then for the D's I said that it was
00:12:32
going to be 10% so that's 10 and then
00:12:35
last for the fs I said it would be 5% so
00:12:38
that's
00:12:41
05 all right now we just need the
00:12:45
picture so the distribution looks like
00:12:52
this so it starts at zero and then it's
00:12:56
it's
00:12:57
skewed and degrees of freedom is going
00:12:59
to be the same as last time so that's
00:13:01
going to be K -1 is
00:13:06
4 and we're going to use 95% level of
00:13:10
confidence which means that this right
00:13:12
tail because remember we only use the
00:13:14
right tail for this test is
00:13:18
05 and that's going to be the same
00:13:21
critical value as last
00:13:23
time so it's still
00:13:25
05 and degrees of freedom is four so if
00:13:29
you go down that is the
00:13:33
9.48 so the critical value equals
00:13:40
9.48 and then let's see if we can reject
00:13:43
my claim so these are going to go in
00:13:46
list one and these are going these are
00:13:48
going to go in list
00:13:54
two so just go to stat and edit and
00:13:59
these numbers are actually the same so I
00:14:02
can just leave those in there and then
00:14:04
these are supposed to be an 11 and then
00:14:06
a
00:14:09
13.75 and then a
00:14:11
22 a
00:14:14
5.5 and a
00:14:17
2.75 and just go to stat tests and then
00:14:22
scroll down to the goodness of fit test
00:14:26
which is letter
00:14:28
d
00:14:31
and because I'm still using list one and
00:14:33
list two I can just leave that there
00:14:36
degrees of freedom is actually the same
00:14:38
so I can just leave that
00:14:40
there color doesn't matter because I'm
00:14:43
not graphing
00:14:45
it and then it says that k^2 equal
00:14:51
181 that seems very very large did I
00:14:55
mistype
00:14:57
something oh look it right
00:15:01
there when I went to type the 22 I
00:15:04
accidentally typed
00:15:06
222 so the reason I thought it was
00:15:09
strange is because on the first
00:15:12
example these numbers okay that's close
00:15:15
but these aren't close these aren't
00:15:16
close that's not close that's not close
00:15:19
and so you should get a big test
00:15:21
statistic in order to say that it's
00:15:23
false well when I just did it right now
00:15:26
I got like 181 which I thought was weird
00:15:29
and that's because I mistype this one
00:15:31
which of course I did that on a on
00:15:33
purpose just to show you that nobody's
00:15:36
perfect sometimes you have to go back
00:15:38
and check your work so let me double
00:15:41
check 11
00:15:44
13.75 a regular 22 a 5.5 and a
00:15:48
2.75 okay good now go back to stat tests
00:15:54
and the goodness of fit
00:15:57
test
00:15:59
and this is all the same so I can just
00:16:01
skip down to
00:16:02
calculate that's more reasonable k^2
00:16:05
equal a
00:16:08
1.8 so for the last part the
00:16:11
test
00:16:14
statistic ki^ squ equals a
00:16:19
1.8 and that was actually a 1.80 so you
00:16:23
could leave it at 1.8 or to emphasize
00:16:25
you could say
00:16:26
1.80 so if it were to go past to 9.4
00:16:30
that would be considered unusual this is
00:16:33
not even close it's actually Landing
00:16:36
more like right here so that means we do
00:16:39
not have enough evidence to reject the
00:16:41
claim cuz if you look at these numbers a
00:16:44
2.7 compared to two that's not off by
00:16:47
very much this one is perfect this one's
00:16:50
only off by one this one's off by two
00:16:53
and A4 this one's off by one and a half
00:16:56
so they're not off by that much so we
00:17:02
cannot reject the
00:17:10
claim