Wat is 'n Naïve Bayes-klasifiseerder?

Dit is 'n statistiese klassifikator wat gebaseer is op die Bayes-teorema en aannames van onafhanklikheid tussen voorspellende veranderlikes.

Hoe bereken jy die posterior probability in 'n Naïve Bayes-klasifiseerder?

Deur die kans van die data gegewe die klas te vermenigvuldig met die prior probability van die klas en dit te deel deur die kans van die data.

Wat is die 'zero frequency problem'?

Dit gebeur wanneer 'n bepaalde attribuutwaarde nie voorkom nie, wat lei tot 'n kans van nul. Om dit te verhoed, kan jy een by al die tellings voeg.

Hoe werk die aannames van onafhanklikheid?

Dit beteken dat die kennis van die waarde van een attribuut nie enige inligting oor die waarde van 'n ander attribuut gee nie.

Waarom is Naïve Bayes maklik om te verstaan en uit te leg?

Die metode is eenvoudig en min ingewikkeld, wat dit maklik maak om te implementeer en te debug.

How Naive Bayes Classifier Works 1/2.. Understanding Naive Bayes and Example

00:11:21

https://www.youtube.com/watch?v=XcwH9JGfZOU

Resumen

TLDRDie video verduidelik hoe die Naïve Bayes-klasifiseerder werk, gebaseer op die Bayes-teorema. Dit bespreek die aannames van onafhanklikheid tussen voorspellende veranderlikes en hoe om posterior probabilities te bereken. 'n Voorbeeld met weerdata word gebruik om die proses van die opstel van frekwensietable en die berekening van waarskynlikhede te demonstreer. Die video sluit ook die uitdaging van die 'zero frequency problem' in en 'n metode om dit op te los.

Para llevar

📊 Naïve Bayes is gebaseer op die Bayes-teorema.
📈 Aannames van onafhanklikheid tussen voorspellers is belangrik.
📉 Die posterior probability word bereken deur waarskynlikhede te vermenigvuldig.
🗃️ Frekwensietable is noodsaaklik vir die berekening van waarskynlikhede.
🔄 Die zero frequency problem kan opgelos word deur een by tellings te voeg.

Cronología

00:00:00 - 00:05:00
In hierdie video word die naiewe basisklassifiseerder bespreek, wat op die basisstelling gebou is, met die aannames van onafhanklikheid tussen voorspellers. Hierdie model is maklik om te bou en werk goed met groot datastelle. Die kern van die naiewe basisklassifiseerder is die berekening van die posterior probaliteit van 'n klas gegee 'n voorspellingsdata, wat die klas voor die data (prior probaliteit) en die kans van die data gegee die klas (likelihood) insluit. Dit maak dit 'n gewilde keuse in die navorsingsgemeenskap weens sy eenvoud en doeltreffendheid, selfs in vergelyking met meer ingewikkelde metodes.
00:05:00 - 00:11:21
Die video verduidelik ook hoe om die posterior probaliteit te bereken deur 'n frekwensietabel te bou en dit in 'n kans tabel te omskep. 'n Voorbeeld van weerdata word gegee om die berekening van 'n posterior probaliteit te demonstreer. Daar word ook 'n metode bespreek om die 'zero frequency problem' aan te spreek deur 1 by alle telling toe te voeg wanneer 'n attribuutwaarde nie in 'n klas voorkom nie. Hierdie stappe illustreer die praktiese toepassing van die naiewe basisklassifiseerder en die berekening van die kans om 'n besluit te neem, soos of om te speel of nie.

Mapa mental

Vídeo de preguntas y respuestas

Wat is 'n Naïve Bayes-klasifiseerder?
Dit is 'n statistiese klassifikator wat gebaseer is op die Bayes-teorema en aannames van onafhanklikheid tussen voorspellende veranderlikes.
Hoe bereken jy die posterior probability in 'n Naïve Bayes-klasifiseerder?
Deur die kans van die data gegewe die klas te vermenigvuldig met die prior probability van die klas en dit te deel deur die kans van die data.
Wat is die 'zero frequency problem'?
Dit gebeur wanneer 'n bepaalde attribuutwaarde nie voorkom nie, wat lei tot 'n kans van nul. Om dit te verhoed, kan jy een by al die tellings voeg.
Hoe werk die aannames van onafhanklikheid?
Dit beteken dat die kennis van die waarde van een attribuut nie enige inligting oor die waarde van 'n ander attribuut gee nie.
Waarom is Naïve Bayes maklik om te verstaan en uit te leg?
Die metode is eenvoudig en min ingewikkeld, wat dit maklik maak om te implementeer en te debug.

Ver más resúmenes de vídeos

Obtén acceso instantáneo a resúmenes gratuitos de vídeos de YouTube gracias a la IA.

Subtítulos

Desplazamiento automático:

00:00:01
welcome back in this video I'll be
00:00:02
explaining to you the naive based
00:00:05
classifier how it works and we'll take a
00:00:08
simple example now the naive base
00:00:10
classifier as we mentioned before is
00:00:12
based on the frequency
00:00:15
table knif Bas classifier is based on
00:00:18
base theorem with Independence
00:00:20
assumptions between predictors I hope
00:00:22
you're familiar with base theorem it's
00:00:24
very nice and easy and quite a nice way
00:00:26
of of explaining how things work
00:00:30
um and we assume that our predictors are
00:00:35
independent what that means
00:00:37
is knowing the value of one attribute
00:00:39
does not tell us anything about the
00:00:41
value of another attribute or another
00:00:43
predictor a naive base model is usually
00:00:46
easy to build with no complicated
00:00:47
iterative parameter estimation and that
00:00:50
makes it particularly useful for very
00:00:51
large data sets so naive base is quite
00:00:55
uh well known and well- liked amongst
00:00:56
the research Community it's quite simple
00:00:59
but it actually performs really well
00:01:02
many uh uh quite
00:01:05
often even sometimes it outperforms
00:01:08
sophisticated methods the good thing
00:01:10
about uh uh uh naive base classifier is
00:01:13
that it's easy to understand and easy to
00:01:15
explain and easy to
00:01:18
debug now the way it works as we
00:01:20
mentioned before it's based on base
00:01:23
theorem Bas theorem provides a way of
00:01:25
calculating the posterior probability
00:01:27
probability of C given x c is our class
00:01:30
X our data or our predictors or our
00:01:32
attributes from the probability of see
00:01:34
probability of the class that's before
00:01:36
seeing any data probability of the data
00:01:39
and probability of the data given the
00:01:40
class naive based classifier assumes
00:01:43
that the effect of the value of a
00:01:46
predictor x on a given Class C is
00:01:49
independent of the values of other
00:01:51
predictors so other so predictors are
00:01:53
independent from each other this
00:01:55
assumption is called the class
00:01:57
conditional Independence now let's have
00:01:59
a look at this equation here probability
00:02:01
this is R probability of C given X
00:02:04
probability of the class given x x here
00:02:06
is our data or our predictors we can
00:02:08
have one or more predictors yes
00:02:11
probability of the class given the data
00:02:13
equals probability of the data given the
00:02:15
class times probability of the class
00:02:18
this is probability of the class before
00:02:20
seeing any data divide by probability of
00:02:24
X IE probability of the data itself this
00:02:26
is called the posterior probability
00:02:28
probability of C given X probability of
00:02:30
x given C is called the likelihood
00:02:33
probability of uh uh of the class is
00:02:36
called the class prior probability again
00:02:38
this is probability of the class before
00:02:40
seeing any data and here probability of
00:02:43
the data is called the predictive prior
00:02:46
probability so probability of C given X
00:02:48
is po posterior probability of class or
00:02:51
the target given predictor or attribute
00:02:52
C given x x is our attributes
00:02:55
probability of C is the prior
00:02:56
probability of the class prior means
00:02:59
before c see any data probability of the
00:03:01
data given the class is the likelihood
00:03:03
which is the probability of the
00:03:05
predictor given the class and
00:03:06
probability of X is the prior
00:03:08
probability of the predictor probability
00:03:10
of the DAT itself sometimes it's not
00:03:11
always possible to know probability of X
00:03:14
this one here but there's a way around
00:03:16
that so the probability of C given X
00:03:19
which is X is our attribute is the
00:03:21
probability of the first attribute given
00:03:23
the class times probability of the
00:03:25
second attribute given the class times
00:03:28
and we go over all the nend attributes
00:03:30
that we have given the class times
00:03:32
probability of the class itself the
00:03:34
reason we multiply probabilities here is
00:03:36
because as we mentioned before the uh um
00:03:40
predictors are independent so this
00:03:42
Independence assumption assumption helps
00:03:44
us to solve this by multiplying
00:03:47
probabilities let's take an example the
00:03:50
posterior probability can be calculated
00:03:52
first by constructing a frequency table
00:03:56
for each attribute against the target so
00:03:58
we do uh frequency tables remember if
00:04:01
our data is numerical then we can
00:04:03
transform it into categorical or I'll
00:04:05
show you another technique in the next
00:04:07
video how to deal with uh numerical uh
00:04:10
variables to build a basian uh class
00:04:13
Nave basian
00:04:14
classifier now after we build the
00:04:17
frequency tables we transform them into
00:04:19
likelihood tables or into probability
00:04:20
tables and finally we use the naive base
00:04:23
equation to calculate the posterior
00:04:25
probability for each class now the class
00:04:28
with the highest post the probability is
00:04:30
the outcome of prediction let's have a
00:04:32
look at an example if you remember the
00:04:34
weather data now for example we have
00:04:36
four categorical
00:04:38
attributes and we have our class what we
00:04:41
do here is for example we Pro calculate
00:04:43
the probability of the class now this is
00:04:45
the prior probability probability of yes
00:04:48
is 9 over 14 probability of no is 5 over
00:04:52
14 14 yes and now we buil a frequency
00:04:55
table you've seen this before in the 1 R
00:04:57
uh classifier
00:04:59
and from the frequency table now from
00:05:01
the from these Counts from the
00:05:02
frequencies we extract these
00:05:05
probabilities probability of yes given
00:05:08
it's a
00:05:09
sunny I'm sorry probability of Sunny
00:05:11
given it's a yes probability of Sunny
00:05:14
given it's a no probability of overcast
00:05:16
given as and so on and so forth and
00:05:18
these are just the frequency over the
00:05:22
sum of the column so 3 over 9 4 over 9 2
00:05:26
over 9 this column here sums to 9
00:05:28
because we have nine yeses and here 2
00:05:30
over 5 0 over 5 and 3 over 5 because
00:05:33
this column here sums to five we have
00:05:35
five nodes yes likewise for the two
00:05:38
different values for humidity high and
00:05:39
normal we compute their corresponding
00:05:41
probabilities likewise for windy and for
00:05:46
temperature uh if you see here for
00:05:48
example for the uh
00:05:51
Outlook we have the frequency table and
00:05:53
we have the probabilities now now if we
00:05:55
see here the probability of x given C so
00:05:59
probability of the variable given the
00:06:00
class is uh read as follows or extracted
00:06:05
as follows probability for example
00:06:07
probability of Sunny given yes of Sunny
00:06:09
given the class is yes is 3 over9 or 33
00:06:13
the probability of Sunny given it's a no
00:06:15
is 2 over five yes and now the
00:06:17
probability of just Sunny regardless of
00:06:19
the class this is probability of X is 5
00:06:22
over 14 probability of overcast is 404
00:06:25
and continue likewise for
00:06:27
rainy and now at the bottom here we have
00:06:30
the probability of yes and the
00:06:31
probability of no as we mentioned this
00:06:33
should sum up to nine and this should
00:06:35
sum up to five the number of yeses and
00:06:37
the number of NOS I hope this makes
00:06:39
sense now let's say for example we want
00:06:40
to compute the probability of yes given
00:06:43
the day is sunny so what we do is we uh
00:06:47
uh
00:06:48
multiply probability of and this is by
00:06:51
the way probability of C given X so we
00:06:52
have probability of x given C we
00:06:56
multiply probability of Sunny given yes
00:06:58
which is 3 9 times probability of uh C
00:07:03
which is yes now our class is C which is
00:07:06
64 and we divide by the probability of X
00:07:10
the probability of X of Sunny now is 5
00:07:13
over 14 which is 36 and that results in
00:07:18
0.6 yes just a direct application of
00:07:22
this equation here and we mentioned that
00:07:24
probability is multiply because we have
00:07:28
Independence now for these you can you
00:07:30
can uh we can use a simple example for
00:07:32
example from uh this table let's say we
00:07:34
have a random day now some input now we
00:07:36
want to decide either to play or not yes
00:07:39
or no let's say we have a day with
00:07:42
Outlook rainy temperature mild humidity
00:07:45
normal and windy is true uh and we want
00:07:48
to decide either to play or no either to
00:07:51
play or not using naive based classifier
00:07:53
what we do is we compute the likelihood
00:07:55
of yes and the likelihood of the no so
00:07:57
the probability of the likelihood of the
00:07:59
yes is the probability of the Outlook
00:08:01
equals rainy giving it a yes times
00:08:04
probability of a temperature mild giving
00:08:06
it a yes times the probability of
00:08:09
humidity is normal given it a yes times
00:08:11
the probability of the windy is true giv
00:08:14
it a yes times the probability of yes
00:08:17
and these values we can extract them
00:08:19
easily from our frequency and
00:08:22
probability
00:08:23
tables as you can see here so for
00:08:25
example probability of Outlook equals
00:08:28
when given it a
00:08:30
yes
00:08:33
is rainy equ say yes it's actually two
00:08:37
over 9 yes likewise probability of
00:08:40
temperature is mild given it a yes is
00:08:44
probability of temperature is mild given
00:08:46
say yes is 4 over9 and we multiply these
00:08:49
things together and multiply by the
00:08:52
probability of the yes which is 9 over
00:08:54
14 and we get this number this number
00:08:56
now is not a probability is just the
00:08:58
likelihood of the yes because
00:09:00
uh we in the same way we can compute the
00:09:02
likelihood of the no probability of
00:09:04
Outlook is rainy given it a no
00:09:06
probability of temperature is mild
00:09:08
giving it a no likewise humidity normal
00:09:10
windy normal giv it a no times the
00:09:12
probability of the no and we can extract
00:09:15
this from the table for example the
00:09:16
probability of windy is true given it's
00:09:19
a it's a no probability I'm sorry
00:09:22
probability of windy is true given its
00:09:24
no is 3 over
00:09:26
5 and that's as you can see there
00:09:30
Time 5 over 14 which is probability of
00:09:32
the no and we get that now we can
00:09:33
normalize to get the probability for
00:09:36
probability of yes is likelihood of yes
00:09:39
over likelihood of yes plus likelihood
00:09:41
of no we get probability of yes
00:09:42
probability of no is likelihood of no
00:09:45
over likelihood of yes plus likelihood
00:09:48
of no and gives us that probability and
00:09:50
notice now that probability of the yes
00:09:52
is larger than probability of the now so
00:09:54
we probably decide to play on that
00:09:57
day uh this is how the na base
00:09:59
classifier works you may have noticed
00:10:01
something because we multiply
00:10:04
probabilities if we have zero
00:10:07
frequencies then we have a problem if
00:10:09
you remember this idea of Independence
00:10:11
and we multiply probabilities if we have
00:10:14
zero counts I'm sorry where is it we
00:10:16
have if we have zero counts then we have
00:10:18
a problem because we multiply by zero we
00:10:20
end up with a zero for for for that
00:10:23
value well there's a way around this and
00:10:25
this is this is called the zero
00:10:27
frequency problem it happens s when an
00:10:30
attribute value for example Outlook is
00:10:34
overcast doesn't occur with every class
00:10:36
value for example with play golf equals
00:10:40
no so here if you see the Outlook is
00:10:42
overcast and play no is zero it doesn't
00:10:44
occur and that causes a problem and the
00:10:46
way around that is to add one to all the
00:10:49
counts um so we just add one to all the
00:10:52
counts so these counts instead of 3 4 2
00:10:55
2 0 3 they become 4 f 5 3 3 1 4 and we
00:11:02
do the same thing for everything else
00:11:04
and the this probability will slightly
00:11:06
change but that's just a way around this
00:11:09
zero uh frequency problem thanks for
00:11:12
watching in the next video I'll show you
00:11:13
how to deal with uh numerical data when
00:11:17
trying to build a basian a naive basian
00:11:19
classifier

Etiquetas

Naïve Bayes
Bayes-teorema
klassifikasie
frekwensietabelles
waarskynlikhede
kansberekening
sagteware ontwikkeling
data wetenskap
masjienleer