Basic probability: Joint, marginal and conditional probability | Independence

00:14:27
https://www.youtube.com/watch?v=SrEmzdOT65s

Resumen

TLDRThe video focuses on basic probability concepts related to categorical variables through an example of an HBO survey on favorite shows. The first part introduces joint probability distribution, illustrating how to calculate probabilities based on gender and show preference. Definitions of joint and marginal probabilities are provided, and their calculations explained. The video then moves to conditional probabilities, explaining how to compute the probability of preferences given specific conditions. Lastly, it discusses testing for independence between variables, demonstrating with the survey data how preferences can differ by gender.

Para llevar

  • 📊 Understanding **joint probability** through gender and show preferences.
  • ✏️ Marginal probability sums to one for total preferences.
  • 🔍 **Conditional probability** focuses on specific categories.
  • ✖️ Testing for **independence** among variables.
  • 📈 The concept of **marginal probability distribution** highlighted.
  • 🔗 **Union** and **intersection** symbols used in calculus of probabilities.

Cronología

  • 00:00:00 - 00:05:00

    The video introduces basic probability focusing on categorical variables, using a survey of 500 HBO subscribers regarding their favorite shows. It presents joint events based on gender (male, female) and shows (Game of Thrones, Westworld, and others), calculates properties like joint and marginal probabilities, and illustrates how they apply to the survey results. The speaker emphasizes the concepts with a probability distribution, explaining how these distributions must add up to one as they encompass all possibilities.

  • 00:05:00 - 00:14:27

    The discussion advances to union and conditional probabilities, highlighting calculations for subscribers being male or preferring Westworld. The video deepens into independence of variables, showing how conditional probabilities differ from joint probabilities and leading to the conclusion that gender influences HBO show preference. The speaker wraps up by summarizing key probability types relevant to categorical data, encouraging viewers to explore more on the topic.

Mapa mental

Vídeo de preguntas y respuestas

  • What are joint probabilities?

    Joint probabilities are the probabilities of two events happening at the same time, represented by the intersection of those events.

  • What is marginal probability?

    Marginal probability refers to the probability of a single event, regardless of the value of other variables.

  • How do you calculate conditional probability?

    Conditional probability is calculated by dividing the joint probability of two events by the probability of the condition.

  • What does it mean for two variables to be independent?

    Two variables are independent if the probability of one does not affect the probability of the other.

  • What is a conditional probability distribution?

    A conditional probability distribution reflects the probability of an event given a condition, such as gender.

Ver más resúmenes de vídeos

Obtén acceso instantáneo a resúmenes gratuitos de vídeos de YouTube gracias a la IA.
Subtítulos
en
Desplazamiento automático:
  • 00:00:00
    in this video we're looking at basic
  • 00:00:02
    probability now when I say basic I mean
  • 00:00:05
    probability as it relates to categorical
  • 00:00:07
    variables like yes no type variables
  • 00:00:10
    numerical variables have their own thing
  • 00:00:12
    going on and I'll deal with that in
  • 00:00:13
    later videos but we're going to be
  • 00:00:15
    learning through the use of an example
  • 00:00:17
    here and here it is the HBO cable
  • 00:00:20
    network took a survey of 500 subscribers
  • 00:00:23
    to determine people's favorite show now
  • 00:00:25
    in this case let's say there were two
  • 00:00:28
    categorical variables one which is
  • 00:00:30
    gender male female and the other
  • 00:00:33
    people's favorite show so I've Got Game
  • 00:00:34
    of Thrones in there Westworld and I've
  • 00:00:36
    just combined all the others into this
  • 00:00:38
    other
  • 00:00:39
    category I'm thinking that those two are
  • 00:00:42
    probably the biggest shows on the HBO
  • 00:00:44
    roster so let's see how this
  • 00:00:46
    distribution pans out now out of 500
  • 00:00:49
    subscribers 80 of them are male that
  • 00:00:52
    like Game of Thrones 120 of them are
  • 00:00:55
    female that like Game of Thrones etc etc
  • 00:00:58
    and each of these squares we can call
  • 00:01:00
    joint events they called Joint events
  • 00:01:02
    because they depend on classes from two
  • 00:01:05
    different variables now the other thing
  • 00:01:07
    to note is that of course they're all
  • 00:01:08
    going to add up to this total figure in
  • 00:01:09
    the bottom right but we can also
  • 00:01:11
    calculate this total column and total
  • 00:01:14
    row where we just sum up the total
  • 00:01:16
    number of people that thought Game of
  • 00:01:18
    Thrones was their favorite show and
  • 00:01:20
    that's 200 the total number of people
  • 00:01:22
    that thought Westworld was their
  • 00:01:23
    favorite show and that's 125 and we can
  • 00:01:25
    also total up the males and females it's
  • 00:01:27
    230 males and 270 females
  • 00:01:30
    but this is not quite a probability
  • 00:01:32
    distribution just yet so let's see how
  • 00:01:34
    we get there here's that distribution
  • 00:01:36
    again and if we divide everything by 500
  • 00:01:40
    which is the total number of
  • 00:01:41
    observations we get something which is
  • 00:01:43
    called a probability distribution so
  • 00:01:46
    where before we had 120 females who
  • 00:01:49
    preferred Game of Thrones we now have
  • 00:01:52
    .24 in that cell because that tells us
  • 00:01:54
    that 0.24 or 24% of the distribution is
  • 00:01:58
    defined by that joint event so that's
  • 00:02:01
    why we call that a joint probability so
  • 00:02:03
    that 024 is called a joint probability
  • 00:02:06
    and the way we can write that is to say
  • 00:02:07
    the probability of female the event
  • 00:02:10
    where the person is female and the event
  • 00:02:12
    where the person likes Game of Thrones
  • 00:02:14
    is
  • 00:02:15
    0.24 now you might see it written with
  • 00:02:17
    the word and in between or else you
  • 00:02:20
    might see it with this sort of upside
  • 00:02:22
    down U which we're going to call
  • 00:02:23
    intersection so that's just a fancy
  • 00:02:25
    statistical term to say the intersection
  • 00:02:27
    of female and Game of Thrones which
  • 00:02:29
    which is of course that joint
  • 00:02:31
    probability now collectively those six
  • 00:02:34
    cells here form what's called The Joint
  • 00:02:36
    probability distribution so all of these
  • 00:02:38
    six cells are going to add up to one
  • 00:02:40
    because if you think about it everyone
  • 00:02:42
    in this distribution has to be in one of
  • 00:02:44
    these cells you have to either be male
  • 00:02:46
    or female and you have to have selected
  • 00:02:49
    one of these options so it's no surprise
  • 00:02:51
    that this should sum up to one now we
  • 00:02:54
    can also calculate What's called the
  • 00:02:55
    marginal probability called as such
  • 00:02:58
    because it's in the marginal
  • 00:03:00
    another term for this is the simple
  • 00:03:02
    probability so just be aware of that if
  • 00:03:04
    you see the phrase simple probability
  • 00:03:06
    but here this 0.4 refers to the raw
  • 00:03:09
    probability of someone liking Game of
  • 00:03:12
    Thrones and I'm sure you can tell it's
  • 00:03:14
    just the sum of both the male and female
  • 00:03:16
    joint probabilities now of course this
  • 00:03:19
    actual column here is called the
  • 00:03:21
    marginal probability distribution so
  • 00:03:23
    just like we had a joint probability
  • 00:03:25
    distribution this one I've highlighted
  • 00:03:27
    here also sums to one now the thing to
  • 00:03:29
    appreciate is that this marginal
  • 00:03:31
    probability distribution completely
  • 00:03:33
    ignores gender it's as if that variable
  • 00:03:35
    doesn't exist here but be aware also
  • 00:03:37
    there's another marginal probability
  • 00:03:39
    distribution down here and in this case
  • 00:03:42
    it would ignore the show of preference
  • 00:03:44
    but again this is going to sum to one
  • 00:03:46
    because there are 46% males in our
  • 00:03:49
    sample and 54% females in our sample and
  • 00:03:52
    we're going to have to sum them up to
  • 00:03:53
    get 100% so our joint probability
  • 00:03:56
    distribution adds up to one but so does
  • 00:03:58
    both of these marginal probability
  • 00:04:00
    distributions okay so here's a couple of
  • 00:04:03
    questions for you just using what we've
  • 00:04:04
    gleaned from the last minute or so you
  • 00:04:07
    should be able to answer some of these
  • 00:04:08
    questions what's the probability of an
  • 00:04:10
    HBO subscriber being male now I think I
  • 00:04:12
    might have even said this before but the
  • 00:04:14
    probability of a subscriber being male
  • 00:04:16
    is purely that marginal probability of
  • 00:04:19
    the male column so 46 now just be aware
  • 00:04:22
    of terminology when we say probability
  • 00:04:24
    we're after a number that's between 0
  • 00:04:26
    and one sometimes they'll ask about
  • 00:04:28
    percentages and stuff so you can turn
  • 00:04:30
    that into a percent but if it just says
  • 00:04:32
    probability you can just leave it
  • 00:04:34
    as46 or the next question what's the
  • 00:04:36
    probability of an HBO subscriber
  • 00:04:38
    preferring Westworld pretty simple it's
  • 00:04:41
    just going to be the 0.25 so we can keep
  • 00:04:44
    going appreciating that those first two
  • 00:04:46
    questions were asking for marginal or
  • 00:04:48
    simple probabilities this question asks
  • 00:04:51
    what's the probability of an HBO
  • 00:04:53
    subscriber being male and preferring
  • 00:04:55
    Westworld so that's 0.2 and again you've
  • 00:04:58
    got that intersection symbol
  • 00:05:01
    here and here's a slightly different one
  • 00:05:03
    it says what's the probability of an HBO
  • 00:05:05
    subscriber being male or preferring
  • 00:05:08
    Westworld so this is not an intersection
  • 00:05:11
    it's not a joint probability it actually
  • 00:05:13
    requires us to think just a little bit
  • 00:05:16
    and appreciate that to find this we're
  • 00:05:18
    going to have to sum up all of the joint
  • 00:05:21
    probabilities where this condition's met
  • 00:05:23
    so if the subscriber is male we're in
  • 00:05:26
    this column so we can highlight these
  • 00:05:28
    three cells green and we can also
  • 00:05:30
    highlight the Westworld row green as
  • 00:05:32
    well so in summing up those four cells
  • 00:05:35
    we're going to be able to get this
  • 00:05:36
    probability and you can see here I've
  • 00:05:38
    just gone all of those numbers added
  • 00:05:39
    together and I've got
  • 00:05:41
    0.51 now you might have noticed I used
  • 00:05:43
    this upwards U here and this is the
  • 00:05:47
    symbol that we call Union so this is the
  • 00:05:49
    union of two events of someone being
  • 00:05:51
    male and the event someone liking
  • 00:05:54
    Westworld now if you've watched a few of
  • 00:05:56
    my other videos you might know that I
  • 00:05:57
    tend not to like formulas too much much
  • 00:05:59
    because formulas tend to stop us
  • 00:06:01
    thinking and many of them have an
  • 00:06:03
    intuitive basis to start with and this
  • 00:06:05
    is true of this formula here which
  • 00:06:07
    provides us a way of calculating the
  • 00:06:10
    union between two events so you might
  • 00:06:12
    have seen this somewhere where we say
  • 00:06:14
    the union between events A and B is the
  • 00:06:17
    probability of a plus the probability of
  • 00:06:19
    B minus the intersection and again you
  • 00:06:21
    can calculate this one that way if you'd
  • 00:06:23
    like which is to say that we add up the
  • 00:06:26
    two marginal probabilities the 0.25 and
  • 00:06:29
    the
  • 00:06:30
    46 and then we subtract the joint
  • 00:06:33
    probability at 0.2 now why do we
  • 00:06:35
    subtract the 0.2 well if you think about
  • 00:06:38
    each of these joint probabilities this
  • 00:06:40
    one here sums up the entire row for
  • 00:06:42
    Westworld this one here sums up the
  • 00:06:45
    entire column for male thus we've
  • 00:06:47
    actually added this cell twice the male
  • 00:06:50
    Westworld joint probability we've added
  • 00:06:53
    that twice if we've just simply added
  • 00:06:54
    these two blue cells together so we have
  • 00:06:57
    to subtract one of those away again to
  • 00:06:59
    be left with 0.51 which is no surprise
  • 00:07:02
    the same answer we got using the other
  • 00:07:04
    method this brings us to the next and
  • 00:07:07
    more tricky type of probability which is
  • 00:07:09
    called a conditional probability so here
  • 00:07:11
    I've given you a question again as an
  • 00:07:13
    example no just got an HBO subscription
  • 00:07:17
    what's the chance that her favorite show
  • 00:07:18
    will be Game of Thrones now again you
  • 00:07:21
    might have seen a formula that looks a
  • 00:07:23
    little bit like this where this says the
  • 00:07:25
    probability of event a given event b
  • 00:07:29
    equals the intersection of the two
  • 00:07:30
    events divided by the probability of the
  • 00:07:34
    condition so in this case what we do is
  • 00:07:37
    we basically focus on the part of this
  • 00:07:39
    distribution which is of interest and
  • 00:07:42
    because non is female we can ignore the
  • 00:07:44
    rest of the table completely we're only
  • 00:07:47
    really considering this column here we
  • 00:07:50
    take the joint probability which is .24
  • 00:07:53
    and we divide by the probability of the
  • 00:07:56
    condition which is 0 54 now the
  • 00:07:58
    condition was that she was email that
  • 00:08:00
    was given in the question so we can
  • 00:08:02
    calculate this as24
  • 00:08:05
    /54 which is 44444 which is about 49ths
  • 00:08:09
    but that's okay we can leave it as a
  • 00:08:11
    four decimal place probability here so
  • 00:08:14
    what we can do now is create for
  • 00:08:16
    ourselves a new column which is the
  • 00:08:18
    probability of preferring each of these
  • 00:08:20
    shows given someone is female so we just
  • 00:08:24
    got 4444 for Game of Thrones here now we
  • 00:08:27
    can do the same for Westworld and other
  • 00:08:31
    we find here what's called the
  • 00:08:32
    conditional probability distribution so
  • 00:08:35
    this is the probability distribution of
  • 00:08:37
    preferring various shows given someone
  • 00:08:40
    as female and again because it's a
  • 00:08:43
    complete probability distribution it's
  • 00:08:45
    going to add up to one and indeed this
  • 00:08:48
    does so what what we can do now is we
  • 00:08:51
    can compare this conditional probability
  • 00:08:53
    distribution to the marginal probability
  • 00:08:57
    distribution that's this one here the
  • 00:08:58
    original Total
  • 00:09:00
    distribution and we can see that if we
  • 00:09:03
    don't take sex into account 40% of
  • 00:09:06
    people like Game of Thrones 25% like
  • 00:09:08
    Westworld 35% like other shows but when
  • 00:09:11
    we take gender into account these change
  • 00:09:14
    a little bit so for females you can see
  • 00:09:16
    that they're more likely to like Game of
  • 00:09:18
    Thrones than the general population
  • 00:09:20
    they're less likely to like Westworld
  • 00:09:22
    than the general population and they're
  • 00:09:24
    a little bit more likely to like other
  • 00:09:26
    shows than the general population as
  • 00:09:28
    well and we can actually use this to
  • 00:09:30
    assess whether the two variables gender
  • 00:09:33
    and show Choice are independent but
  • 00:09:36
    before we do just appreciate that we can
  • 00:09:37
    also go the other way with our
  • 00:09:39
    conditions so for example if I asked you
  • 00:09:42
    given that a subscriber's favorite show
  • 00:09:44
    is Westworld what's the probability that
  • 00:09:46
    they are male in this case the condition
  • 00:09:49
    is that they like Westworld so our
  • 00:09:52
    condition is actually an entire row so
  • 00:09:55
    much like last time we can block out the
  • 00:09:57
    rest of the table that doesn't coincide
  • 00:09:59
    with Westworld and we can just focus on
  • 00:10:01
    this row and the probability of being
  • 00:10:03
    male is that 0.2 which is the joint
  • 00:10:06
    probability divided by the probability
  • 00:10:09
    of the condition
  • 00:10:10
    0.25 so if you do that you get 0.8 and
  • 00:10:13
    that's the probability of being male if
  • 00:10:16
    you prefer Westworld in other words 80%
  • 00:10:18
    of people that watch Westworld are
  • 00:10:20
    males so this leads us to the topic of
  • 00:10:23
    Independence between the variables
  • 00:10:29
    now we've already touched on it just
  • 00:10:31
    briefly to show that females have a
  • 00:10:33
    little bit of a different preference of
  • 00:10:35
    their HBO shows to the general
  • 00:10:38
    population but the strict definition of
  • 00:10:40
    independence actually has two different
  • 00:10:43
    approaches the first of which says that
  • 00:10:46
    if the two variables are independent
  • 00:10:48
    then the probability of a given B is
  • 00:10:50
    just equal to the probability of a
  • 00:10:53
    another way to think of this is that
  • 00:10:55
    imposing the condition B doesn't
  • 00:10:58
    actually affect
  • 00:10:59
    the probability of a at all so in our
  • 00:11:02
    case we found the probability of liking
  • 00:11:05
    Westworld if you're female is 0.093 we
  • 00:11:09
    found that from our conditional
  • 00:11:11
    probability distribution earlier but the
  • 00:11:14
    probability of just preferring Westworld
  • 00:11:16
    straight up is 0.25 it's in that final
  • 00:11:19
    value here it's in that marginal value
  • 00:11:22
    here so if these two variables gender
  • 00:11:25
    and choice of HBO shows were independent
  • 00:11:29
    these two values should be equal
  • 00:11:32
    therefore the variables are not
  • 00:11:33
    independent as 0.093 does not equal 0.25
  • 00:11:38
    so clearly gender does influence the HBO
  • 00:11:41
    show that they prefer now it's probably
  • 00:11:44
    worth mentioning that because this is a
  • 00:11:46
    sample you would never expect for these
  • 00:11:49
    values to be exactly equal because we
  • 00:11:52
    know in a sample there's going to be
  • 00:11:53
    some random variation and even if the
  • 00:11:56
    variables were completely independent
  • 00:11:59
    they these two probabilities wouldn't be
  • 00:12:01
    perfectly equal but in this case I think
  • 00:12:05
    it's quite clear that they're very
  • 00:12:06
    different probabilities one's 25% and
  • 00:12:09
    one's 9% so I think we're quite safe to
  • 00:12:12
    say these variables are not
  • 00:12:14
    independent now as I said there were two
  • 00:12:16
    approaches for assessing Independence
  • 00:12:18
    and this one says the probability of a
  • 00:12:21
    union B is equal to the two marginal
  • 00:12:24
    probabilities multiplied together now
  • 00:12:26
    I've got a feeling you might understand
  • 00:12:28
    this equation in l so if I can take you
  • 00:12:30
    away from this example just for a second
  • 00:12:33
    say I flipped a coin in one hand and I
  • 00:12:35
    rolled a dice in the other hand and I
  • 00:12:38
    said I'm going to give you a hundred
  • 00:12:39
    bucks if I roll a six and I flip aead if
  • 00:12:44
    both of those events occur I'll give you
  • 00:12:46
    a 100 bucks and I asked you what is the
  • 00:12:47
    probability of me giving you a 100 bucks
  • 00:12:51
    you probably say well that's one in 12
  • 00:12:54
    right because you've got half a chance
  • 00:12:55
    of getting ahead and a sixth of a chance
  • 00:12:58
    of rolling a six
  • 00:12:59
    so you multiply them together you're
  • 00:13:01
    going to do the probability of a getting
  • 00:13:03
    ahead times the probability of B and
  • 00:13:05
    you're going to get that joint
  • 00:13:07
    probability but the only reason you can
  • 00:13:09
    do that is because rolling a dice and
  • 00:13:12
    flipping a coin are completely
  • 00:13:13
    independent variables it's not as if
  • 00:13:16
    rolling a six influences the chance of
  • 00:13:18
    getting ahead right so as I said you can
  • 00:13:21
    kind of intuitively understand this
  • 00:13:23
    formula but what that implies for our
  • 00:13:25
    distribution here is that if we were to
  • 00:13:27
    look at the intersection between
  • 00:13:30
    Westworld and female we get
  • 00:13:32
    0.05 and if you multiply the two
  • 00:13:34
    marginal values together the
  • 00:13:37
    0.54 and the
  • 00:13:38
    0.25 you get
  • 00:13:41
    0.14 so again these two values are not
  • 00:13:44
    the same so we can say they're not
  • 00:13:46
    independent
  • 00:13:48
    variables which is no surprise because
  • 00:13:50
    we just showed this using the other
  • 00:13:51
    approach so either of these approaches
  • 00:13:53
    would be fine to show that people's
  • 00:13:57
    preference of HBO shows does depend on
  • 00:14:01
    their gender all right so that's a wrap
  • 00:14:04
    we've dealt with joint probabilities
  • 00:14:06
    marginal probabilities conditional
  • 00:14:08
    probabilities and we've also looked at
  • 00:14:10
    how to test for Independence between two
  • 00:14:13
    categorical variables now if you like
  • 00:14:16
    the video I got plenty more you can
  • 00:14:17
    check them out on the YouTube channel or
  • 00:14:20
    heading to my website Zed statistics.com
  • 00:14:22
    and if you've got any ideas feel free to
  • 00:14:24
    get in touch adios
Etiquetas
  • probability
  • joint probability
  • marginal probability
  • conditional probability
  • independence
  • categorical variables
  • survey
  • HBO
  • Game of Thrones
  • Westworld