00:00:01
welcome back in this video I'll be
00:00:02
explaining to you the naive based
00:00:05
classifier how it works and we'll take a
00:00:08
simple example now the naive base
00:00:10
classifier as we mentioned before is
00:00:12
based on the frequency
00:00:15
table knif Bas classifier is based on
00:00:18
base theorem with Independence
00:00:20
assumptions between predictors I hope
00:00:22
you're familiar with base theorem it's
00:00:24
very nice and easy and quite a nice way
00:00:26
of of explaining how things work
00:00:30
um and we assume that our predictors are
00:00:35
independent what that means
00:00:37
is knowing the value of one attribute
00:00:39
does not tell us anything about the
00:00:41
value of another attribute or another
00:00:43
predictor a naive base model is usually
00:00:46
easy to build with no complicated
00:00:47
iterative parameter estimation and that
00:00:50
makes it particularly useful for very
00:00:51
large data sets so naive base is quite
00:00:55
uh well known and well- liked amongst
00:00:56
the research Community it's quite simple
00:00:59
but it actually performs really well
00:01:02
many uh uh quite
00:01:05
often even sometimes it outperforms
00:01:08
sophisticated methods the good thing
00:01:10
about uh uh uh naive base classifier is
00:01:13
that it's easy to understand and easy to
00:01:15
explain and easy to
00:01:18
debug now the way it works as we
00:01:20
mentioned before it's based on base
00:01:23
theorem Bas theorem provides a way of
00:01:25
calculating the posterior probability
00:01:27
probability of C given x c is our class
00:01:30
X our data or our predictors or our
00:01:32
attributes from the probability of see
00:01:34
probability of the class that's before
00:01:36
seeing any data probability of the data
00:01:39
and probability of the data given the
00:01:40
class naive based classifier assumes
00:01:43
that the effect of the value of a
00:01:46
predictor x on a given Class C is
00:01:49
independent of the values of other
00:01:51
predictors so other so predictors are
00:01:53
independent from each other this
00:01:55
assumption is called the class
00:01:57
conditional Independence now let's have
00:01:59
a look at this equation here probability
00:02:01
this is R probability of C given X
00:02:04
probability of the class given x x here
00:02:06
is our data or our predictors we can
00:02:08
have one or more predictors yes
00:02:11
probability of the class given the data
00:02:13
equals probability of the data given the
00:02:15
class times probability of the class
00:02:18
this is probability of the class before
00:02:20
seeing any data divide by probability of
00:02:24
X IE probability of the data itself this
00:02:26
is called the posterior probability
00:02:28
probability of C given X probability of
00:02:30
x given C is called the likelihood
00:02:33
probability of uh uh of the class is
00:02:36
called the class prior probability again
00:02:38
this is probability of the class before
00:02:40
seeing any data and here probability of
00:02:43
the data is called the predictive prior
00:02:46
probability so probability of C given X
00:02:48
is po posterior probability of class or
00:02:51
the target given predictor or attribute
00:02:52
C given x x is our attributes
00:02:55
probability of C is the prior
00:02:56
probability of the class prior means
00:02:59
before c see any data probability of the
00:03:01
data given the class is the likelihood
00:03:03
which is the probability of the
00:03:05
predictor given the class and
00:03:06
probability of X is the prior
00:03:08
probability of the predictor probability
00:03:10
of the DAT itself sometimes it's not
00:03:11
always possible to know probability of X
00:03:14
this one here but there's a way around
00:03:16
that so the probability of C given X
00:03:19
which is X is our attribute is the
00:03:21
probability of the first attribute given
00:03:23
the class times probability of the
00:03:25
second attribute given the class times
00:03:28
and we go over all the nend attributes
00:03:30
that we have given the class times
00:03:32
probability of the class itself the
00:03:34
reason we multiply probabilities here is
00:03:36
because as we mentioned before the uh um
00:03:40
predictors are independent so this
00:03:42
Independence assumption assumption helps
00:03:44
us to solve this by multiplying
00:03:47
probabilities let's take an example the
00:03:50
posterior probability can be calculated
00:03:52
first by constructing a frequency table
00:03:56
for each attribute against the target so
00:03:58
we do uh frequency tables remember if
00:04:01
our data is numerical then we can
00:04:03
transform it into categorical or I'll
00:04:05
show you another technique in the next
00:04:07
video how to deal with uh numerical uh
00:04:10
variables to build a basian uh class
00:04:13
Nave basian
00:04:14
classifier now after we build the
00:04:17
frequency tables we transform them into
00:04:19
likelihood tables or into probability
00:04:20
tables and finally we use the naive base
00:04:23
equation to calculate the posterior
00:04:25
probability for each class now the class
00:04:28
with the highest post the probability is
00:04:30
the outcome of prediction let's have a
00:04:32
look at an example if you remember the
00:04:34
weather data now for example we have
00:04:36
four categorical
00:04:38
attributes and we have our class what we
00:04:41
do here is for example we Pro calculate
00:04:43
the probability of the class now this is
00:04:45
the prior probability probability of yes
00:04:48
is 9 over 14 probability of no is 5 over
00:04:52
14 14 yes and now we buil a frequency
00:04:55
table you've seen this before in the 1 R
00:04:57
uh classifier
00:04:59
and from the frequency table now from
00:05:01
the from these Counts from the
00:05:02
frequencies we extract these
00:05:05
probabilities probability of yes given
00:05:08
it's a
00:05:09
sunny I'm sorry probability of Sunny
00:05:11
given it's a yes probability of Sunny
00:05:14
given it's a no probability of overcast
00:05:16
given as and so on and so forth and
00:05:18
these are just the frequency over the
00:05:22
sum of the column so 3 over 9 4 over 9 2
00:05:26
over 9 this column here sums to 9
00:05:28
because we have nine yeses and here 2
00:05:30
over 5 0 over 5 and 3 over 5 because
00:05:33
this column here sums to five we have
00:05:35
five nodes yes likewise for the two
00:05:38
different values for humidity high and
00:05:39
normal we compute their corresponding
00:05:41
probabilities likewise for windy and for
00:05:46
temperature uh if you see here for
00:05:48
example for the uh
00:05:51
Outlook we have the frequency table and
00:05:53
we have the probabilities now now if we
00:05:55
see here the probability of x given C so
00:05:59
probability of the variable given the
00:06:00
class is uh read as follows or extracted
00:06:05
as follows probability for example
00:06:07
probability of Sunny given yes of Sunny
00:06:09
given the class is yes is 3 over9 or 33
00:06:13
the probability of Sunny given it's a no
00:06:15
is 2 over five yes and now the
00:06:17
probability of just Sunny regardless of
00:06:19
the class this is probability of X is 5
00:06:22
over 14 probability of overcast is 404
00:06:25
and continue likewise for
00:06:27
rainy and now at the bottom here we have
00:06:30
the probability of yes and the
00:06:31
probability of no as we mentioned this
00:06:33
should sum up to nine and this should
00:06:35
sum up to five the number of yeses and
00:06:37
the number of NOS I hope this makes
00:06:39
sense now let's say for example we want
00:06:40
to compute the probability of yes given
00:06:43
the day is sunny so what we do is we uh
00:06:47
uh
00:06:48
multiply probability of and this is by
00:06:51
the way probability of C given X so we
00:06:52
have probability of x given C we
00:06:56
multiply probability of Sunny given yes
00:06:58
which is 3 9 times probability of uh C
00:07:03
which is yes now our class is C which is
00:07:06
64 and we divide by the probability of X
00:07:10
the probability of X of Sunny now is 5
00:07:13
over 14 which is 36 and that results in
00:07:18
0.6 yes just a direct application of
00:07:22
this equation here and we mentioned that
00:07:24
probability is multiply because we have
00:07:28
Independence now for these you can you
00:07:30
can uh we can use a simple example for
00:07:32
example from uh this table let's say we
00:07:34
have a random day now some input now we
00:07:36
want to decide either to play or not yes
00:07:39
or no let's say we have a day with
00:07:42
Outlook rainy temperature mild humidity
00:07:45
normal and windy is true uh and we want
00:07:48
to decide either to play or no either to
00:07:51
play or not using naive based classifier
00:07:53
what we do is we compute the likelihood
00:07:55
of yes and the likelihood of the no so
00:07:57
the probability of the likelihood of the
00:07:59
yes is the probability of the Outlook
00:08:01
equals rainy giving it a yes times
00:08:04
probability of a temperature mild giving
00:08:06
it a yes times the probability of
00:08:09
humidity is normal given it a yes times
00:08:11
the probability of the windy is true giv
00:08:14
it a yes times the probability of yes
00:08:17
and these values we can extract them
00:08:19
easily from our frequency and
00:08:22
probability
00:08:23
tables as you can see here so for
00:08:25
example probability of Outlook equals
00:08:28
when given it a
00:08:30
yes
00:08:33
is rainy equ say yes it's actually two
00:08:37
over 9 yes likewise probability of
00:08:40
temperature is mild given it a yes is
00:08:44
probability of temperature is mild given
00:08:46
say yes is 4 over9 and we multiply these
00:08:49
things together and multiply by the
00:08:52
probability of the yes which is 9 over
00:08:54
14 and we get this number this number
00:08:56
now is not a probability is just the
00:08:58
likelihood of the yes because
00:09:00
uh we in the same way we can compute the
00:09:02
likelihood of the no probability of
00:09:04
Outlook is rainy given it a no
00:09:06
probability of temperature is mild
00:09:08
giving it a no likewise humidity normal
00:09:10
windy normal giv it a no times the
00:09:12
probability of the no and we can extract
00:09:15
this from the table for example the
00:09:16
probability of windy is true given it's
00:09:19
a it's a no probability I'm sorry
00:09:22
probability of windy is true given its
00:09:24
no is 3 over
00:09:26
5 and that's as you can see there
00:09:30
Time 5 over 14 which is probability of
00:09:32
the no and we get that now we can
00:09:33
normalize to get the probability for
00:09:36
probability of yes is likelihood of yes
00:09:39
over likelihood of yes plus likelihood
00:09:41
of no we get probability of yes
00:09:42
probability of no is likelihood of no
00:09:45
over likelihood of yes plus likelihood
00:09:48
of no and gives us that probability and
00:09:50
notice now that probability of the yes
00:09:52
is larger than probability of the now so
00:09:54
we probably decide to play on that
00:09:57
day uh this is how the na base
00:09:59
classifier works you may have noticed
00:10:01
something because we multiply
00:10:04
probabilities if we have zero
00:10:07
frequencies then we have a problem if
00:10:09
you remember this idea of Independence
00:10:11
and we multiply probabilities if we have
00:10:14
zero counts I'm sorry where is it we
00:10:16
have if we have zero counts then we have
00:10:18
a problem because we multiply by zero we
00:10:20
end up with a zero for for for that
00:10:23
value well there's a way around this and
00:10:25
this is this is called the zero
00:10:27
frequency problem it happens s when an
00:10:30
attribute value for example Outlook is
00:10:34
overcast doesn't occur with every class
00:10:36
value for example with play golf equals
00:10:40
no so here if you see the Outlook is
00:10:42
overcast and play no is zero it doesn't
00:10:44
occur and that causes a problem and the
00:10:46
way around that is to add one to all the
00:10:49
counts um so we just add one to all the
00:10:52
counts so these counts instead of 3 4 2
00:10:55
2 0 3 they become 4 f 5 3 3 1 4 and we
00:11:02
do the same thing for everything else
00:11:04
and the this probability will slightly
00:11:06
change but that's just a way around this
00:11:09
zero uh frequency problem thanks for
00:11:12
watching in the next video I'll show you
00:11:13
how to deal with uh numerical data when
00:11:17
trying to build a basian a naive basian
00:11:19
classifier