The Normal Approximation to the Binomial Distribution

00:14:10
https://www.youtube.com/watch?v=CCqWkJ_pqNU

الملخص

TLDRThe video discusses the normal approximation to the binomial distribution, highlighting its utility in simplifying calculations historically and its relevance today in statistical inference. It explains that the approximation is most accurate when the success probability is around 0.5 and the sample size is large. The video demonstrates the application of continuity correction, where adjustments to the standardization process significantly enhance the accuracy of probability estimates. Examples illustrate the difference in results with and without this correction, underscoring its importance in practical applications.

الوجبات الجاهزة

  • 📈 The normal approximation simplifies binomial calculations.
  • 🔍 Approximation works best with p close to 0.5 and larger n.
  • 🔧 Continuity correction improves accuracy when transitioning from discrete to continuous distributions.
  • 📊 The binomial distribution is symmetric when p = 0.5.
  • 📝 Standardization involves converting binomial variables to Z-scores.
  • 🔑 Large sample sizes enhance the normal approximation's fit.
  • 📊 n*p and n*(1-p) should both be >= 10 for a reasonable approximation.
  • 🔄 Exact binomial probabilities can still be computed for accuracy.
  • ⚠️ Understanding the logic behind calculations aids in applying them correctly.
  • 🏷️ The shape of the binomial distribution varies with p values.

الجدول الزمني

  • 00:00:00 - 00:05:00

    The video discusses the normal approximation of the binomial distribution, explaining the advantages of this approximation in simplifying probability calculations. The discussion notes that the binomial distribution is symmetric at p = 0.5 and becomes skewed for values of p away from this point. Using illustrations of distributions, the speaker clarifies that as sample size increases, the normal approximation becomes more accurate, particularly when n*p and n*(1-p) are both greater than or equal to 10. The importance of understanding this guideline for normal approximation in statistical analysis is emphasized.

  • 00:05:00 - 00:14:10

    The latter part of the video focuses on the use of continuity correction to improve normal approximation. It explains how adjustments are made when transitioning from a discrete binomial distribution to a continuous normal distribution, highlighting the necessity of starting calculations at values like 51.5 or 52.5 to accurately include or exclude specific outcomes. Through various examples and probability calculations using continuity corrections, the speaker demonstrates how these adjustments lead to significantly closer approximations to exact binomial outcomes. This approach enhances the overall accuracy of statistical estimates.

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • What is the normal approximation to the binomial distribution?

    The normal approximation is a technique to estimate probabilities in a binomial distribution using the continuous normal distribution, particularly useful when dealing with larger sample sizes.

  • When is the normal approximation considered reasonable?

    The normal approximation is considered reasonable if both n*p and n*(1-p) are greater than or equal to 10.

  • What is continuity correction and why is it used?

    Continuity correction is used to adjust for the differences between discrete and continuous distributions, improving the accuracy of normal approximations to binomial probabilities.

  • How does one perform standardization in the context of this approximation?

    Standardization is done using the formula Z = (X - μ) / σ, where X is the binomial random variable, μ is the mean, and σ is the standard deviation.

  • What does a continuity correction imply when calculating probabilities?

    A continuity correction implies adjusting the value at which you calculate probabilities to better represent discrete values in a continuous setting, such as using 51.5 or 52.5 instead of 52.

  • How does sample size affect the normal approximation's accuracy?

    As the sample size increases, the approximation to the normal distribution improves, making it a better fit for the binomial distribution.

  • What are the conditions under which the binomial distribution is perfectly symmetric?

    The binomial distribution is perfectly symmetric when the probability of success (p) is exactly 0.5.

  • How can one find the exact probability for binomial distributions?

    Exact probabilities for binomial distributions can be calculated using the binomial formula or computational tools.

  • Can the success probability p affect the shape of the binomial distribution?

    Yes, the binomial distribution becomes skewed when p is not equal to 0.5, especially as it approaches 0 or 1.

  • What is an example of applying the normal approximation?

    An example includes calculating the probability that X is greater than or equal to 52 using the normal approximation with continuity correction.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:02
    let's take a look at the normal
  • 00:00:03
    approximation to the binomial
  • 00:00:05
    distribution The Continuous normal
  • 00:00:07
    distribution can sometimes be used to
  • 00:00:09
    approximate the discrete binomial
  • 00:00:13
    distribution why would we want to do
  • 00:00:15
    this in the olden days it was very
  • 00:00:16
    useful for probability calculations the
  • 00:00:18
    binomial formula can be a bit of a pain
  • 00:00:20
    if it has to be used over and over and
  • 00:00:22
    over again and so this normal
  • 00:00:24
    approximation came in to make lives much
  • 00:00:26
    easier these days the computers can do
  • 00:00:28
    the calculations for us so it's not as
  • 00:00:29
    much of an issue there but we still use
  • 00:00:33
    this normal approximation in statistical
  • 00:00:35
    inference when we do things a little bit
  • 00:00:36
    later on like statistical inference for
  • 00:00:38
    proportions we often use this normal
  • 00:00:40
    approximation so it's good to know a
  • 00:00:42
    little bit about
  • 00:00:43
    it now you may recall that the binomial
  • 00:00:46
    distribution is perfectly symmetric if p
  • 00:00:50
    is exactly equal to 0.5 and will have
  • 00:00:52
    some skewness when p is not equal to 0.5
  • 00:00:56
    now the normal distribution is a
  • 00:00:58
    symmetric distribution and so the normal
  • 00:01:00
    approximation is going to work best when
  • 00:01:02
    p is close to 0.5 and it's going to work
  • 00:01:04
    better and better as we get a larger and
  • 00:01:06
    larger sample size as
  • 00:01:08
    well for illustrative purposes here is
  • 00:01:11
    the binomial distribution with n is 40
  • 00:01:13
    and P is .5 and as you may be able to
  • 00:01:16
    tell this is a perfectly symmetric
  • 00:01:18
    distribution in this case and let's say
  • 00:01:20
    we were to superimpose a normal curve
  • 00:01:23
    over this
  • 00:01:24
    distribution it looks awfully normal
  • 00:01:26
    that superimposed normal curve fits
  • 00:01:28
    pretty well and we'd see that if we
  • 00:01:30
    jacked up the sample size even higher
  • 00:01:32
    and higher and higher that normal curve
  • 00:01:35
    would fit better and
  • 00:01:37
    better now what if we're a little bit
  • 00:01:39
    closer to the boundary here here p is 03
  • 00:01:43
    which is close to the boundary of zero
  • 00:01:45
    and we've got the same value of n but we
  • 00:01:47
    see a little bit of skewness and if we
  • 00:01:49
    were to superimpose a normal curve it
  • 00:01:51
    does not fit very well if we amped up
  • 00:01:55
    the sample size larger and larger we'd
  • 00:01:57
    see that the normal approximation is
  • 00:01:59
    going to be better and better as that
  • 00:02:01
    sample size gets larger and larger but
  • 00:02:03
    the general idea here when we get near
  • 00:02:05
    the boundaries of zero or one we are
  • 00:02:08
    going to need a larger and larger value
  • 00:02:10
    of n for that normal approximation to be
  • 00:02:14
    reasonable and this idea summarized in
  • 00:02:16
    this rough guideline that the normal
  • 00:02:18
    approximation is reasonable if both n *
  • 00:02:21
    p is bigger than or equal to 10 and n *
  • 00:02:24
    1 minus p is bigger than or equal to 10
  • 00:02:26
    and if you play around with that a
  • 00:02:27
    little bit you'll see simply that if p
  • 00:02:29
    is close to 0 or 1 we need a larger
  • 00:02:32
    value of n in order for the normal
  • 00:02:33
    approximation to be reasonable now this
  • 00:02:36
    is just a rough guideline sometimes
  • 00:02:38
    people replace 10 with five here and
  • 00:02:41
    sometimes use different rules Al
  • 00:02:43
    together so you should consult with your
  • 00:02:44
    professor or your textbook to see what
  • 00:02:46
    rough guideline they are
  • 00:02:48
    using recall that if x is a binomial
  • 00:02:51
    random variable then X has a mean of n *
  • 00:02:54
    p and a variance of n * P * 1us p and
  • 00:02:59
    what we were just disc discussing above
  • 00:03:00
    is that X can be considered
  • 00:03:02
    approximately normal in certain settings
  • 00:03:04
    and so then we can standardize this in
  • 00:03:07
    the usual way we can say
  • 00:03:09
    xus mu over Sigma using mu from up here
  • 00:03:15
    and sigma being the square root of Sigma
  • 00:03:17
    squar from here then this quantity we're
  • 00:03:21
    going to call a zed and we're going to
  • 00:03:23
    call that Zed because that quantity has
  • 00:03:26
    approximately the standard normal
  • 00:03:28
    distribution so my Zed is going to be
  • 00:03:32
    approximately standard normal normal
  • 00:03:34
    with a mean of zero and a variance of
  • 00:03:36
    one and now we're going to use this in
  • 00:03:38
    probability
  • 00:03:39
    calculations so let X be a binomial
  • 00:03:42
    random variable with n of 75 and P equal
  • 00:03:45
    to6 what is Mu well we know mu is equal
  • 00:03:49
    to n * p and that's going to be 75 * 0.6
  • 00:03:53
    and that works out to 45 what is Sigma
  • 00:03:56
    squar n * P * 1 - p and that is going to
  • 00:04:02
    be equal to 75 * 0.6 * 1 -
  • 00:04:08
    0.6 and that works out to 18 and so our
  • 00:04:11
    standard deviation is simply going to be
  • 00:04:13
    the square root of
  • 00:04:15
    18 now suppose we wanted to use the
  • 00:04:17
    normal approximation to estimate this
  • 00:04:20
    probability we could calculate the exact
  • 00:04:22
    probability from the binomial
  • 00:04:23
    distribution using a computer or even by
  • 00:04:25
    hand if we had to but we're going to use
  • 00:04:27
    the normal approximation here and we're
  • 00:04:30
    going to say that the probability that X
  • 00:04:31
    is bigger than or equal to 52 this is
  • 00:04:33
    going to be approximately the
  • 00:04:35
    probability that Zed is bigger than or
  • 00:04:38
    equal to we could say 52 minus the mean
  • 00:04:42
    45 we're simply standardizing like
  • 00:04:44
    normal divided by the standard deviation
  • 00:04:47
    Square < TK of 18 and this is equal to
  • 00:04:49
    the probability that Zed takes on a
  • 00:04:53
    value that's at least as big as 1.645
  • 00:04:56
    rounded to three decimal places then we
  • 00:04:59
    would go to a computer or our standard
  • 00:05:01
    normal table here's
  • 00:05:03
    1.645 and we're interested in this area
  • 00:05:06
    so I'm going to leave that up to you to
  • 00:05:08
    verify for yourselves that that's going
  • 00:05:10
    to be approximately
  • 00:05:13
    0.495 if you use a standard normal table
  • 00:05:16
    instead of a computer you might get a
  • 00:05:17
    rounded version of that but you should
  • 00:05:19
    get pretty close to that value now
  • 00:05:21
    compare that with the exact value that
  • 00:05:24
    we get if we use the binomial formula
  • 00:05:26
    which I'm not showing here but I used a
  • 00:05:28
    computer to get the exact value based on
  • 00:05:30
    the binomial distribution and we get
  • 00:05:33
    0611 and our value based on the normal
  • 00:05:36
    approximation well it's in the ball park
  • 00:05:38
    but it's still a little bit
  • 00:05:40
    off but we can improve that
  • 00:05:43
    approximation with something we call a
  • 00:05:46
    continuity correction we are moving from
  • 00:05:48
    this discrete binomial distribution to
  • 00:05:51
    this continuous normal distribution and
  • 00:05:54
    when we do that we can improve the
  • 00:05:56
    approximation with this continuity
  • 00:05:59
    correction
  • 00:06:01
    to illustrate let's look what's going on
  • 00:06:02
    here this is a plot of the binomial
  • 00:06:04
    distribution with n is 75 and P is 6 and
  • 00:06:08
    the Shaded green part is the probability
  • 00:06:11
    that we're interested in the probability
  • 00:06:13
    that we get a value of 52 or
  • 00:06:17
    greater what if I superimpose the normal
  • 00:06:20
    curve here's the superimposed normal
  • 00:06:23
    curve and this red shaded area is the
  • 00:06:27
    probability calculation that we carried
  • 00:06:29
    out on the last page and here is the
  • 00:06:31
    value 52 now in the binomial setting 52
  • 00:06:36
    means something 52 successes out of 75
  • 00:06:40
    but on the continuous front for a
  • 00:06:43
    continuous random variable 52 means
  • 00:06:46
    52.00
  • 00:06:48
    0000000000 infinite Zer there and it is
  • 00:06:52
    distinctly different from
  • 00:06:55
    52.1
  • 00:06:56
    say now I've blown this part up on the
  • 00:06:59
    next slide just to make it a little
  • 00:07:00
    easier to
  • 00:07:01
    see so to come closer to regaining our
  • 00:07:04
    original meaning of 52 we're going to
  • 00:07:06
    say okay 52 in the discret sense had
  • 00:07:08
    some meaning 52 in the continuous sense
  • 00:07:11
    means exactly 52
  • 00:07:13
    52.00 so to regain that original meaning
  • 00:07:16
    we should let 52 Take on all values
  • 00:07:19
    between 52 and 1/2 and 51 and 1/2 that
  • 00:07:23
    way it comes closer to representing what
  • 00:07:25
    is intended here in the binomial
  • 00:07:27
    distribution and so as we can see here
  • 00:07:29
    here right at 52 when we started at
  • 00:07:32
    52.00 and we did our probability
  • 00:07:35
    calculation we were really missing out
  • 00:07:37
    an important part we were missing out
  • 00:07:40
    this half of 52 in a sense and so what
  • 00:07:42
    we really should do is start here at
  • 00:07:47
    51.5 that's where we should start so
  • 00:07:50
    that's what it looks like on the next
  • 00:07:51
    page if I start at 51.5 my approximation
  • 00:07:56
    my red shaded area here is going to be a
  • 00:07:58
    lot closer to the total those green
  • 00:08:00
    probabilities and a lot closer to
  • 00:08:02
    reality so if I want my probability that
  • 00:08:05
    X is bigger than or equal to 52 before
  • 00:08:08
    doing the normal approximation I am
  • 00:08:11
    going to use the continuity correction
  • 00:08:13
    and say this is the probability that Zed
  • 00:08:16
    is bigger than or equal to
  • 00:08:19
    51.5 minus the mean over the standard
  • 00:08:25
    deviation this works out to the
  • 00:08:27
    probability that Zed is bigger than than
  • 00:08:30
    or equal to 1.53
  • 00:08:33
    2 and we go to our standard normal table
  • 00:08:36
    or a computer or what have you and if we
  • 00:08:39
    did this without any roundoff error and
  • 00:08:42
    just rounded our final answer to four
  • 00:08:44
    decimal places we'd see that this is
  • 00:08:47
    0.628 and as I'll illustrate in a little
  • 00:08:50
    bit that's closer to the true
  • 00:08:52
    probability based on the binomial
  • 00:08:54
    distribution than when we didn't use the
  • 00:08:56
    continuity
  • 00:08:58
    correction in this new question we're
  • 00:09:00
    interested in the probability that X is
  • 00:09:02
    strictly greater than 52 that is what
  • 00:09:05
    the Shaded green bits represent and to
  • 00:09:08
    be strictly greater than 52 I want to
  • 00:09:11
    make sure I don't include any of 52 so I
  • 00:09:14
    really should start right here I should
  • 00:09:16
    start at
  • 00:09:18
    52.5 let's see what that shaded area
  • 00:09:21
    under the normal curve looks
  • 00:09:23
    like that looks a little bit better and
  • 00:09:25
    so I want my probability that X is
  • 00:09:27
    strictly greater than 52 this is going
  • 00:09:30
    to be approximately the probability that
  • 00:09:32
    Zed is greater than
  • 00:09:37
    52.5 minus my mu which is 45 divided by
  • 00:09:41
    the standard deviation which is the
  • 00:09:43
    square < TK of 18 this works out to the
  • 00:09:45
    probability that Zed is greater than
  • 00:09:49
    1768 and if we found that value under
  • 00:09:52
    our standard normal curve 1. 768 and we
  • 00:09:55
    did it without any roundoff error and
  • 00:09:57
    rounded our final answer to for places
  • 00:10:00
    we would see that's the answer now you
  • 00:10:02
    should be able to come close to that but
  • 00:10:04
    not exactly that from a standard normal
  • 00:10:08
    table now what about here let's say I
  • 00:10:10
    wanted the probability that X is less
  • 00:10:13
    than or equal to 52 that's what that
  • 00:10:15
    shaded green bit
  • 00:10:16
    represents and I can't start exactly at
  • 00:10:19
    52 that wouldn't be quite right to
  • 00:10:20
    include all of 52 and go left I need to
  • 00:10:23
    start here I need to start at 52.5 so
  • 00:10:27
    let's see what that looks like when
  • 00:10:28
    shaded under the normal
  • 00:10:30
    curve that looks like it might provide
  • 00:10:32
    us a pretty reasonable approximation so
  • 00:10:33
    when I want the probability that X is
  • 00:10:35
    less than or equal to
  • 00:10:37
    52 that's going to be approximately with
  • 00:10:40
    my continuity correction I want to
  • 00:10:42
    include all of 52 and so I should start
  • 00:10:45
    at
  • 00:10:47
    52.5 subtract the mean which is 45 and
  • 00:10:50
    divide by the standard
  • 00:10:52
    deviation this is going to be equal to
  • 00:10:54
    the probability that Zed is less than or
  • 00:10:57
    equal to 1.7
  • 00:11:00
    68 and if we plot that
  • 00:11:03
    out
  • 00:11:06
    1768 that's this entire area
  • 00:11:10
    here and that works out to
  • 00:11:13
    0.961
  • 00:11:15
    5 what about if we're interested in the
  • 00:11:18
    probability that X is strictly less than
  • 00:11:21
    52 which is represented by the green
  • 00:11:23
    shaded bit if I'm going strictly less
  • 00:11:27
    and I don't want to include 52 then I
  • 00:11:29
    should start here at
  • 00:11:31
    51.5 let's see what that looks like when
  • 00:11:33
    shaded in that looks like that might
  • 00:11:36
    give us a pretty darn reasonable
  • 00:11:37
    approximation so if I want my
  • 00:11:39
    probability that X is less than 52 then
  • 00:11:42
    I'm going to say that's approximately
  • 00:11:43
    the probability that Zed is less than
  • 00:11:47
    51.5 minus mu over
  • 00:11:52
    Sigma and this is equal to the
  • 00:11:54
    probability that Zed is less than 1.53
  • 00:11:59
    two and if we put that into our
  • 00:12:02
    computer and we did this without any
  • 00:12:04
    roundoff error and then just rounded to
  • 00:12:07
    four decimal places at The Bitter End we
  • 00:12:09
    would say that this is 0.
  • 00:12:16
    9372 so let's look at a summary of what
  • 00:12:19
    we just did if I want the probability
  • 00:12:21
    that X is bigger than or equal to 52
  • 00:12:24
    then I want to include all of 52 and go
  • 00:12:27
    right which means I should start at
  • 00:12:31
    51.5 if I want the probability that X is
  • 00:12:34
    strictly greater than 52 well I'm going
  • 00:12:36
    right but I don't want to include any of
  • 00:12:39
    52 so I should start at
  • 00:12:42
    52.5 if I am going left less than or
  • 00:12:46
    equal to 52 I want to include all of 52
  • 00:12:50
    while going left which means that I
  • 00:12:52
    should start at
  • 00:12:54
    52.5 and if I'm going strictly less than
  • 00:12:57
    52 I want to make sure I don't include
  • 00:12:59
    any of 52 while going left that which
  • 00:13:02
    means I should start at
  • 00:13:04
    51.5 now I strongly recommend that you
  • 00:13:06
    don't simply memorize this but you try
  • 00:13:08
    to follow the underlying logic if you
  • 00:13:10
    truly understand why we're doing what
  • 00:13:12
    we're doing here then it'll be easier to
  • 00:13:14
    do properly when you have to now let's
  • 00:13:17
    take a quick look here at how that
  • 00:13:19
    continuity correction improved the
  • 00:13:21
    approximation for a couple of cases that
  • 00:13:23
    we just looked at if we're interested in
  • 00:13:25
    the probability that X is bigger than or
  • 00:13:27
    equal to 52 without the continuity
  • 00:13:29
    correction we got
  • 00:13:31
    0495 but with the continuity correction
  • 00:13:34
    we got
  • 00:13:35
    0628 which is much closer to the exact
  • 00:13:39
    value based on the binomial distribution
  • 00:13:41
    that's what this exact value is coming
  • 00:13:43
    from similarly down here when we wanted
  • 00:13:46
    greater than 52 we had
  • 00:13:48
    0495 we use the same calculation in both
  • 00:13:51
    cases we didn't use our continuity
  • 00:13:52
    correction and with the continuity
  • 00:13:55
    correction we get
  • 00:13:57
    0385 which is much closer to the exact
  • 00:14:00
    value based on the binomial distribution
  • 00:14:02
    so the continuity correction has greatly
  • 00:14:05
    improved our approximation
الوسوم
  • normal approximation
  • binomial distribution
  • statistical inference
  • continuity correction
  • probability calculations
  • sample size
  • mean and variance
  • standard normal distribution
  • Z-scores
  • discrete vs continuous