Statistical Learning: 4.5 Discriminant Analysis

00:07:13
https://www.youtube.com/watch?v=oJc2r246VoQ

Summary

TLDRThe video explains discriminant analysis, a classification method that models the distribution of features in different classes and applies Bayes' theorem to determine class probabilities. It focuses on linear and quadratic discriminant analysis using Gaussian distributions. The video illustrates how prior probabilities and density functions affect decision boundaries in classification. It also discusses the advantages of discriminant analysis over logistic regression, particularly in scenarios with well-separated classes, small sample sizes, and when the normality assumption is valid.

Takeaways

  • 📊 Discriminant analysis models feature distributions in classes.
  • 🔍 Bayes' theorem helps calculate class probabilities from features.
  • 📈 Linear and quadratic discriminant analysis are common forms.
  • 📉 Prior probabilities influence classification decisions.
  • ⚖️ Discriminant analysis is more stable than logistic regression in certain cases.

Timeline

  • 00:00:00 - 00:07:13

    The discussion shifts from multinomial regression to discriminant analysis, a different classification method. Discriminant analysis models the distribution of features (X) for each class separately and applies Bayes' theorem to determine the probability of a class (Y) given a feature (X). The focus is on linear discriminant analysis using Gaussian distributions, leading to linear or quadratic forms. Bayes' theorem is introduced, explaining how to calculate the probability of a class given a feature by flipping the joint distribution. The presentation emphasizes the use of Gaussian density functions for classification and illustrates decision boundaries based on class probabilities and densities. The importance of prior probabilities in determining decision boundaries is highlighted, showing how they influence classification outcomes. Finally, the advantages of discriminant analysis over logistic regression are discussed, particularly in scenarios with well-separated classes, small sample sizes, and multiple classes, asserting that Bayes' rule provides optimal classification under the right conditions.

Mind Map

Video Q&A

  • What is discriminant analysis?

    Discriminant analysis is a classification method that models the distribution of features in different classes and uses Bayes' theorem to determine class probabilities.

  • How does Bayes' theorem apply to classification?

    Bayes' theorem allows us to calculate the probability of a class given a feature by relating it to the joint distribution of the features and classes.

  • What are the types of discriminant analysis?

    The two popular forms of discriminant analysis are linear discriminant analysis and quadratic discriminant analysis.

  • Why is discriminant analysis preferred over logistic regression in some cases?

    Discriminant analysis is more stable than logistic regression when classes are well-separated, with small sample sizes, and when the predictors are approximately normally distributed.

  • What role do prior probabilities play in discriminant analysis?

    Prior probabilities influence the decision boundary in classification, affecting how features are classified into different classes.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:00
    we're not going to go into more detail
  • 00:00:03
    on multinomial regression now what we're
  • 00:00:05
    going to do is tell you about a
  • 00:00:06
    different classification method which is
  • 00:00:08
    called discriminant analysis which is
  • 00:00:10
    also very useful and it approaches a
  • 00:00:13
    problem from a really quite different
  • 00:00:15
    point of view
  • 00:00:17
    in discriminate analysis the the idea is
  • 00:00:20
    to model the distribution of X in each
  • 00:00:22
    of the classes separately
  • 00:00:24
    and then use what's known as Bayes
  • 00:00:26
    theorem to flip things around to get the
  • 00:00:28
    probability of Y given X
  • 00:00:32
    in this case for the linear discriminant
  • 00:00:36
    analysis we're going to use gaussian
  • 00:00:37
    distributions for each class
  • 00:00:39
    and that's going to lead to linear or
  • 00:00:41
    quadratic discriminant analysis so those
  • 00:00:44
    are the two popular forms
  • 00:00:46
    but as you'll see this approaches is
  • 00:00:48
    quite General and other distributions
  • 00:00:50
    can be used as well
  • 00:00:52
    but we'll focus on normal distributions
  • 00:00:56
    so what is Bayes theorem for
  • 00:00:58
    classification so it sounds pretty scary
  • 00:01:01
    but not too bad so of course Thomas
  • 00:01:03
    Bayes was a famous mathematician
  • 00:01:05
    and his name now today represents a
  • 00:01:08
    burgeoning subfield of statistical and
  • 00:01:11
    and probably ballistic modeling but here
  • 00:01:13
    we're going to focus on a very simple
  • 00:01:15
    result which is known as Bayes theorem
  • 00:01:18
    and it says that the probability of y
  • 00:01:21
    equals K given x equals x so you've got
  • 00:01:25
    the ideas you've got two two variables
  • 00:01:28
    in this case we've got Y and X
  • 00:01:31
    and we be looking at aspects of the
  • 00:01:33
    joint distribution
  • 00:01:35
    so this is what we after the probability
  • 00:01:37
    of y equals K given X
  • 00:01:39
    and base theorem says you can flip
  • 00:01:42
    things around you can write that as the
  • 00:01:45
    probability that X is x given y equals k
  • 00:01:48
    that's the first piece on the on the top
  • 00:01:50
    there
  • 00:01:52
    multiplied by the marginal probability
  • 00:01:54
    or prior probability that Y is K
  • 00:01:58
    and then divided by
  • 00:02:00
    the marginal probability that x equals X
  • 00:02:04
    so this is just a formula from
  • 00:02:06
    probability Theory but it turns it out
  • 00:02:09
    it's really useful and is the basis for
  • 00:02:11
    discriminant analysis
  • 00:02:14
    and so
  • 00:02:15
    we write things slightly differently in
  • 00:02:18
    the case of discriminant analysis
  • 00:02:20
    so this probability y equals k
  • 00:02:25
    is written as Pi k
  • 00:02:28
    so if there's
  • 00:02:30
    if these three classes are going to be
  • 00:02:32
    three values for pi just the probability
  • 00:02:33
    for each of the classes but here we've
  • 00:02:36
    got class little K so that's Pi k
  • 00:02:39
    probability that
  • 00:02:41
    X is x given y equals K well if x is a
  • 00:02:45
    is a quantitative variable
  • 00:02:48
    um we re what we write for that is the
  • 00:02:50
    density so that's a probability density
  • 00:02:53
    function for X in class K
  • 00:02:57
    and then the marginal probability of of
  • 00:03:00
    X is just this expression over here
  • 00:03:04
    so this is summing over all the classes
  • 00:03:08
    okay
  • 00:03:10
    and and so that's how we use Bayes
  • 00:03:12
    theorem to get to the probabilities of
  • 00:03:15
    Interest which is y equals K given X
  • 00:03:18
    now at this point it's still quite
  • 00:03:19
    General we can plug in any probability
  • 00:03:22
    densities but now what we're going to do
  • 00:03:24
    is go ahead and plug in the gaussian
  • 00:03:26
    density
  • 00:03:27
    for f sub K of x
  • 00:03:32
    um before we do that let me just show
  • 00:03:35
    you a little picture to to make things
  • 00:03:37
    clear
  • 00:03:38
    in the left-hand plot what have we got
  • 00:03:40
    here we've got a plot against X single
  • 00:03:42
    variable X
  • 00:03:43
    and in the vertical axis what we've got
  • 00:03:45
    is actually Pi sub k
  • 00:03:49
    and multiplied by F sub K of x
  • 00:03:54
    for both classes k equals 1 and k equals
  • 00:03:56
    2.
  • 00:03:57
    now in this case
  • 00:03:59
    so remember in the previous Slide the
  • 00:04:02
    probability was essentially proportional
  • 00:04:04
    to Pi sub K times F sub K of x
  • 00:04:09
    and in this case the the pies are the
  • 00:04:11
    same for both so it's really to do with
  • 00:04:13
    which density is the highest
  • 00:04:15
    and you can see that
  • 00:04:18
    the decision boundary or the vertical
  • 00:04:20
    dashed line is at zero and that's the
  • 00:04:23
    point at which the green density is
  • 00:04:25
    higher than the purple density
  • 00:04:27
    and so anything to the left or zero we
  • 00:04:29
    classify as as green
  • 00:04:32
    and anything to the right we'd classify
  • 00:04:35
    as as purple
  • 00:04:36
    okay and it sort of makes sense that
  • 00:04:39
    that's what we do there
  • 00:04:41
    the right hand plot has has different
  • 00:04:43
    priors
  • 00:04:45
    so yeah
  • 00:04:46
    the probability of 2 is 0.7 and and of
  • 00:04:50
    of 1 is 0.3
  • 00:04:52
    and now again
  • 00:04:54
    we've multiple we we plot in pi sub k
  • 00:04:58
    times F sub K of x
  • 00:05:01
    against X and that bigger prize has
  • 00:05:05
    bumped up the purple and what it's done
  • 00:05:08
    is moved the decision boundary slightly
  • 00:05:09
    to the left
  • 00:05:11
    and that makes sense too again it's
  • 00:05:12
    where they intersect
  • 00:05:14
    that makes sense as well because we've
  • 00:05:17
    got more we've got more purples here
  • 00:05:19
    actually I say are they pinks it looks
  • 00:05:21
    purple to me
  • 00:05:23
    um there's more of them so everything
  • 00:05:25
    else being equal we're going to make
  • 00:05:27
    less mistakes if we if we classify it to
  • 00:05:30
    to purples and to to Greens
  • 00:05:33
    okay so that's a row that's how these
  • 00:05:35
    these priors and and the densities play
  • 00:05:37
    a role in classification
  • 00:05:39
    okay so so why does criminant analysis
  • 00:05:41
    it seemed like logistic regression was a
  • 00:05:43
    pretty good tool well it is but it turns
  • 00:05:46
    out there's a room for discriminant
  • 00:05:47
    analysis as well
  • 00:05:49
    and so I there's three points we make
  • 00:05:52
    here when the classes are well separated
  • 00:05:54
    it turns out that the parameter
  • 00:05:56
    estimates for logistic regression are
  • 00:05:59
    surprisingly unstable in fact if you've
  • 00:06:02
    got a feature that separates the classes
  • 00:06:04
    perfectly the coefficients go off to
  • 00:06:07
    Infinity
  • 00:06:08
    so it really doesn't do well there
  • 00:06:10
    the district regression was developed in
  • 00:06:12
    in the largely in the biological and
  • 00:06:15
    medical Fields where you never found
  • 00:06:17
    such strong predictors
  • 00:06:20
    now you can do things to to to make
  • 00:06:24
    logistic regression better behave but it
  • 00:06:26
    turns out linear discriminant analysis
  • 00:06:28
    doesn't suffer from this problem and is
  • 00:06:30
    better behaved in those situations
  • 00:06:33
    also if any small the sample size is
  • 00:06:36
    small and the distribution of the
  • 00:06:38
    predictors x is approximately normal in
  • 00:06:40
    in each of the classes it turns out the
  • 00:06:43
    linear regression discriminant model is
  • 00:06:45
    again more stable than logistic
  • 00:06:46
    regression
  • 00:06:48
    and finally if we got more than two
  • 00:06:50
    classes we'll see logistic regression
  • 00:06:53
    gives us nice low dimensional views of
  • 00:06:55
    the data and the other point remember in
  • 00:06:57
    the very first section we showed that
  • 00:06:58
    the Bay's rule if you have the right
  • 00:06:59
    population model the Bayes rule is the
  • 00:07:02
    best you can possibly do right so if our
  • 00:07:04
    normal assumption is right here right
  • 00:07:05
    then then the this Bayes rule from the
  • 00:07:07
    scrim analysis from the Bayes rule is is
  • 00:07:09
    the best you can possibly do good point
  • 00:07:11
    Rob
Tags
  • discriminant analysis
  • Bayes theorem
  • classification
  • Gaussian distributions
  • linear discriminant analysis
  • quadratic discriminant analysis
  • logistic regression
  • decision boundary
  • prior probabilities
  • probability density function