00:00:02
let's take a look at the normal
00:00:03
approximation to the binomial
00:00:05
distribution The Continuous normal
00:00:07
distribution can sometimes be used to
00:00:09
approximate the discrete binomial
00:00:13
distribution why would we want to do
00:00:15
this in the olden days it was very
00:00:16
useful for probability calculations the
00:00:18
binomial formula can be a bit of a pain
00:00:20
if it has to be used over and over and
00:00:22
over again and so this normal
00:00:24
approximation came in to make lives much
00:00:26
easier these days the computers can do
00:00:28
the calculations for us so it's not as
00:00:29
much of an issue there but we still use
00:00:33
this normal approximation in statistical
00:00:35
inference when we do things a little bit
00:00:36
later on like statistical inference for
00:00:38
proportions we often use this normal
00:00:40
approximation so it's good to know a
00:00:42
little bit about
00:00:43
it now you may recall that the binomial
00:00:46
distribution is perfectly symmetric if p
00:00:50
is exactly equal to 0.5 and will have
00:00:52
some skewness when p is not equal to 0.5
00:00:56
now the normal distribution is a
00:00:58
symmetric distribution and so the normal
00:01:00
approximation is going to work best when
00:01:02
p is close to 0.5 and it's going to work
00:01:04
better and better as we get a larger and
00:01:06
larger sample size as
00:01:08
well for illustrative purposes here is
00:01:11
the binomial distribution with n is 40
00:01:13
and P is .5 and as you may be able to
00:01:16
tell this is a perfectly symmetric
00:01:18
distribution in this case and let's say
00:01:20
we were to superimpose a normal curve
00:01:23
over this
00:01:24
distribution it looks awfully normal
00:01:26
that superimposed normal curve fits
00:01:28
pretty well and we'd see that if we
00:01:30
jacked up the sample size even higher
00:01:32
and higher and higher that normal curve
00:01:35
would fit better and
00:01:37
better now what if we're a little bit
00:01:39
closer to the boundary here here p is 03
00:01:43
which is close to the boundary of zero
00:01:45
and we've got the same value of n but we
00:01:47
see a little bit of skewness and if we
00:01:49
were to superimpose a normal curve it
00:01:51
does not fit very well if we amped up
00:01:55
the sample size larger and larger we'd
00:01:57
see that the normal approximation is
00:01:59
going to be better and better as that
00:02:01
sample size gets larger and larger but
00:02:03
the general idea here when we get near
00:02:05
the boundaries of zero or one we are
00:02:08
going to need a larger and larger value
00:02:10
of n for that normal approximation to be
00:02:14
reasonable and this idea summarized in
00:02:16
this rough guideline that the normal
00:02:18
approximation is reasonable if both n *
00:02:21
p is bigger than or equal to 10 and n *
00:02:24
1 minus p is bigger than or equal to 10
00:02:26
and if you play around with that a
00:02:27
little bit you'll see simply that if p
00:02:29
is close to 0 or 1 we need a larger
00:02:32
value of n in order for the normal
00:02:33
approximation to be reasonable now this
00:02:36
is just a rough guideline sometimes
00:02:38
people replace 10 with five here and
00:02:41
sometimes use different rules Al
00:02:43
together so you should consult with your
00:02:44
professor or your textbook to see what
00:02:46
rough guideline they are
00:02:48
using recall that if x is a binomial
00:02:51
random variable then X has a mean of n *
00:02:54
p and a variance of n * P * 1us p and
00:02:59
what we were just disc discussing above
00:03:00
is that X can be considered
00:03:02
approximately normal in certain settings
00:03:04
and so then we can standardize this in
00:03:07
the usual way we can say
00:03:09
xus mu over Sigma using mu from up here
00:03:15
and sigma being the square root of Sigma
00:03:17
squar from here then this quantity we're
00:03:21
going to call a zed and we're going to
00:03:23
call that Zed because that quantity has
00:03:26
approximately the standard normal
00:03:28
distribution so my Zed is going to be
00:03:32
approximately standard normal normal
00:03:34
with a mean of zero and a variance of
00:03:36
one and now we're going to use this in
00:03:38
probability
00:03:39
calculations so let X be a binomial
00:03:42
random variable with n of 75 and P equal
00:03:45
to6 what is Mu well we know mu is equal
00:03:49
to n * p and that's going to be 75 * 0.6
00:03:53
and that works out to 45 what is Sigma
00:03:56
squar n * P * 1 - p and that is going to
00:04:02
be equal to 75 * 0.6 * 1 -
00:04:08
0.6 and that works out to 18 and so our
00:04:11
standard deviation is simply going to be
00:04:13
the square root of
00:04:15
18 now suppose we wanted to use the
00:04:17
normal approximation to estimate this
00:04:20
probability we could calculate the exact
00:04:22
probability from the binomial
00:04:23
distribution using a computer or even by
00:04:25
hand if we had to but we're going to use
00:04:27
the normal approximation here and we're
00:04:30
going to say that the probability that X
00:04:31
is bigger than or equal to 52 this is
00:04:33
going to be approximately the
00:04:35
probability that Zed is bigger than or
00:04:38
equal to we could say 52 minus the mean
00:04:42
45 we're simply standardizing like
00:04:44
normal divided by the standard deviation
00:04:47
Square < TK of 18 and this is equal to
00:04:49
the probability that Zed takes on a
00:04:53
value that's at least as big as 1.645
00:04:56
rounded to three decimal places then we
00:04:59
would go to a computer or our standard
00:05:01
normal table here's
00:05:03
1.645 and we're interested in this area
00:05:06
so I'm going to leave that up to you to
00:05:08
verify for yourselves that that's going
00:05:10
to be approximately
00:05:13
0.495 if you use a standard normal table
00:05:16
instead of a computer you might get a
00:05:17
rounded version of that but you should
00:05:19
get pretty close to that value now
00:05:21
compare that with the exact value that
00:05:24
we get if we use the binomial formula
00:05:26
which I'm not showing here but I used a
00:05:28
computer to get the exact value based on
00:05:30
the binomial distribution and we get
00:05:33
0611 and our value based on the normal
00:05:36
approximation well it's in the ball park
00:05:38
but it's still a little bit
00:05:40
off but we can improve that
00:05:43
approximation with something we call a
00:05:46
continuity correction we are moving from
00:05:48
this discrete binomial distribution to
00:05:51
this continuous normal distribution and
00:05:54
when we do that we can improve the
00:05:56
approximation with this continuity
00:05:59
correction
00:06:01
to illustrate let's look what's going on
00:06:02
here this is a plot of the binomial
00:06:04
distribution with n is 75 and P is 6 and
00:06:08
the Shaded green part is the probability
00:06:11
that we're interested in the probability
00:06:13
that we get a value of 52 or
00:06:17
greater what if I superimpose the normal
00:06:20
curve here's the superimposed normal
00:06:23
curve and this red shaded area is the
00:06:27
probability calculation that we carried
00:06:29
out on the last page and here is the
00:06:31
value 52 now in the binomial setting 52
00:06:36
means something 52 successes out of 75
00:06:40
but on the continuous front for a
00:06:43
continuous random variable 52 means
00:06:46
52.00
00:06:48
0000000000 infinite Zer there and it is
00:06:52
distinctly different from
00:06:55
52.1
00:06:56
say now I've blown this part up on the
00:06:59
next slide just to make it a little
00:07:00
easier to
00:07:01
see so to come closer to regaining our
00:07:04
original meaning of 52 we're going to
00:07:06
say okay 52 in the discret sense had
00:07:08
some meaning 52 in the continuous sense
00:07:11
means exactly 52
00:07:13
52.00 so to regain that original meaning
00:07:16
we should let 52 Take on all values
00:07:19
between 52 and 1/2 and 51 and 1/2 that
00:07:23
way it comes closer to representing what
00:07:25
is intended here in the binomial
00:07:27
distribution and so as we can see here
00:07:29
here right at 52 when we started at
00:07:32
52.00 and we did our probability
00:07:35
calculation we were really missing out
00:07:37
an important part we were missing out
00:07:40
this half of 52 in a sense and so what
00:07:42
we really should do is start here at
00:07:47
51.5 that's where we should start so
00:07:50
that's what it looks like on the next
00:07:51
page if I start at 51.5 my approximation
00:07:56
my red shaded area here is going to be a
00:07:58
lot closer to the total those green
00:08:00
probabilities and a lot closer to
00:08:02
reality so if I want my probability that
00:08:05
X is bigger than or equal to 52 before
00:08:08
doing the normal approximation I am
00:08:11
going to use the continuity correction
00:08:13
and say this is the probability that Zed
00:08:16
is bigger than or equal to
00:08:19
51.5 minus the mean over the standard
00:08:25
deviation this works out to the
00:08:27
probability that Zed is bigger than than
00:08:30
or equal to 1.53
00:08:33
2 and we go to our standard normal table
00:08:36
or a computer or what have you and if we
00:08:39
did this without any roundoff error and
00:08:42
just rounded our final answer to four
00:08:44
decimal places we'd see that this is
00:08:47
0.628 and as I'll illustrate in a little
00:08:50
bit that's closer to the true
00:08:52
probability based on the binomial
00:08:54
distribution than when we didn't use the
00:08:56
continuity
00:08:58
correction in this new question we're
00:09:00
interested in the probability that X is
00:09:02
strictly greater than 52 that is what
00:09:05
the Shaded green bits represent and to
00:09:08
be strictly greater than 52 I want to
00:09:11
make sure I don't include any of 52 so I
00:09:14
really should start right here I should
00:09:16
start at
00:09:18
52.5 let's see what that shaded area
00:09:21
under the normal curve looks
00:09:23
like that looks a little bit better and
00:09:25
so I want my probability that X is
00:09:27
strictly greater than 52 this is going
00:09:30
to be approximately the probability that
00:09:32
Zed is greater than
00:09:37
52.5 minus my mu which is 45 divided by
00:09:41
the standard deviation which is the
00:09:43
square < TK of 18 this works out to the
00:09:45
probability that Zed is greater than
00:09:49
1768 and if we found that value under
00:09:52
our standard normal curve 1. 768 and we
00:09:55
did it without any roundoff error and
00:09:57
rounded our final answer to for places
00:10:00
we would see that's the answer now you
00:10:02
should be able to come close to that but
00:10:04
not exactly that from a standard normal
00:10:08
table now what about here let's say I
00:10:10
wanted the probability that X is less
00:10:13
than or equal to 52 that's what that
00:10:15
shaded green bit
00:10:16
represents and I can't start exactly at
00:10:19
52 that wouldn't be quite right to
00:10:20
include all of 52 and go left I need to
00:10:23
start here I need to start at 52.5 so
00:10:27
let's see what that looks like when
00:10:28
shaded under the normal
00:10:30
curve that looks like it might provide
00:10:32
us a pretty reasonable approximation so
00:10:33
when I want the probability that X is
00:10:35
less than or equal to
00:10:37
52 that's going to be approximately with
00:10:40
my continuity correction I want to
00:10:42
include all of 52 and so I should start
00:10:45
at
00:10:47
52.5 subtract the mean which is 45 and
00:10:50
divide by the standard
00:10:52
deviation this is going to be equal to
00:10:54
the probability that Zed is less than or
00:10:57
equal to 1.7
00:11:00
68 and if we plot that
00:11:03
out
00:11:06
1768 that's this entire area
00:11:10
here and that works out to
00:11:13
0.961
00:11:15
5 what about if we're interested in the
00:11:18
probability that X is strictly less than
00:11:21
52 which is represented by the green
00:11:23
shaded bit if I'm going strictly less
00:11:27
and I don't want to include 52 then I
00:11:29
should start here at
00:11:31
51.5 let's see what that looks like when
00:11:33
shaded in that looks like that might
00:11:36
give us a pretty darn reasonable
00:11:37
approximation so if I want my
00:11:39
probability that X is less than 52 then
00:11:42
I'm going to say that's approximately
00:11:43
the probability that Zed is less than
00:11:47
51.5 minus mu over
00:11:52
Sigma and this is equal to the
00:11:54
probability that Zed is less than 1.53
00:11:59
two and if we put that into our
00:12:02
computer and we did this without any
00:12:04
roundoff error and then just rounded to
00:12:07
four decimal places at The Bitter End we
00:12:09
would say that this is 0.
00:12:16
9372 so let's look at a summary of what
00:12:19
we just did if I want the probability
00:12:21
that X is bigger than or equal to 52
00:12:24
then I want to include all of 52 and go
00:12:27
right which means I should start at
00:12:31
51.5 if I want the probability that X is
00:12:34
strictly greater than 52 well I'm going
00:12:36
right but I don't want to include any of
00:12:39
52 so I should start at
00:12:42
52.5 if I am going left less than or
00:12:46
equal to 52 I want to include all of 52
00:12:50
while going left which means that I
00:12:52
should start at
00:12:54
52.5 and if I'm going strictly less than
00:12:57
52 I want to make sure I don't include
00:12:59
any of 52 while going left that which
00:13:02
means I should start at
00:13:04
51.5 now I strongly recommend that you
00:13:06
don't simply memorize this but you try
00:13:08
to follow the underlying logic if you
00:13:10
truly understand why we're doing what
00:13:12
we're doing here then it'll be easier to
00:13:14
do properly when you have to now let's
00:13:17
take a quick look here at how that
00:13:19
continuity correction improved the
00:13:21
approximation for a couple of cases that
00:13:23
we just looked at if we're interested in
00:13:25
the probability that X is bigger than or
00:13:27
equal to 52 without the continuity
00:13:29
correction we got
00:13:31
0495 but with the continuity correction
00:13:34
we got
00:13:35
0628 which is much closer to the exact
00:13:39
value based on the binomial distribution
00:13:41
that's what this exact value is coming
00:13:43
from similarly down here when we wanted
00:13:46
greater than 52 we had
00:13:48
0495 we use the same calculation in both
00:13:51
cases we didn't use our continuity
00:13:52
correction and with the continuity
00:13:55
correction we get
00:13:57
0385 which is much closer to the exact
00:14:00
value based on the binomial distribution
00:14:02
so the continuity correction has greatly
00:14:05
improved our approximation