Support Vector Machines (SVMs): A friendly introduction
概要
TLDRThis video provides a detailed introduction to Support Vector Machines (SVM), a critical classification algorithm in machine learning. The instructor, Luis Serrano, builds upon the previous videos on linear regression and logistic regression to explain how SVMs function by separating data points of different classes with parallel lines. The key focus is on finding two parallel lines that maximize the distance between them while still separating the classes correctly. The algorithm iteratively adjusts the position of these lines based on feedback from data points, and concepts such as the expanding factor and the 'C' hyperparameter are introduced to control the model's performance concerning classification versus margin errors. Throughout, the process and practical implementations are discussed, along with the theoretical underpinnings of SVM errors and gradient descent techniques.
収穫
- 🔍 SVMs classify by finding the optimal separating line between classes.
- 📏 The goal is to maximize the distance between two parallel lines around the decision boundary.
- 🔄 Iterative adjustments are made based on point classifications.
- 📈 'C' parameter adjusts the trade-off between classification error and margin error.
- 🏗️ The expanding factor slightly separates the lines during training.
- 🧮 SVM error comprises classification error and margin error.
- 🔬 Understanding the trade-offs is crucial for effective model training.
- 🌐 Hyperparameters can be tuned to enhance model performance.
タイムライン
- 00:00:00 - 00:05:00
Luis Serrano introduces Support Vector Machines (SVM) while recapping linear regression and logistic regression from previous videos. He credits his students for their contributions during his teaching experience and emphasizes the significance of SVM as a classification algorithm that seeks to separate points of two classes with the best possible line.
- 00:05:00 - 00:10:00
The SVM algorithm, compared to the perceptron algorithm, focuses on not just finding any separating line but the optimal line that maximizes the distance between parallel lines drawn around it. This involves finding two lines that are as far apart as possible from each other while still effectively separating the data points.
- 00:10:00 - 00:15:00
Serrano illustrates the margin concept with two parallel lines, explaining that the best line is the one with maximized margin from the data points. The process involves iteratively adjusting the separating lines based on the positions of the data points and how they are classified by the current line.
- 00:15:00 - 00:20:00
He discusses the mathematical representation of lines and how adjustments to the equations can shift the position of the lines. By introducing an expanding rate, the SVM algorithm allows the lines to be spread apart incrementally throughout the training process, ensuring an optimal separation between classes.
- 00:20:00 - 00:25:00
Luis outlines the SVM training steps including defining initial random lines, the number of iterations, learning rates, and the iterative loop for moving the lines based on misclassifications of data points while maintaining the expanding step. This step is crucial for ensuring that lines spread further apart over time as they learn.
- 00:25:00 - 00:30:57
Finally, Luis explains the importance of error functions in SVM, defining classification errors based on misclassifications and margin errors based on the distance between separating lines. He combines these to create a comprehensive error measure for the SVM, and discusses hyperparameters like the C parameter that controls the trade-off between the classification error and the margin error.
マインドマップ
ビデオQ&A
What are Support Vector Machines (SVM)?
SVMs are a classification algorithm in machine learning that find the best line to separate data points from different classes.
How do SVMs differ from the perceptron algorithm?
SVMs aim to find not just one separating line but two parallel lines that are as far apart as possible while separating the classes.
What is the significance of the 'C' parameter in SVM?
The 'C' parameter balances the importance of classification error versus the margin error during SVM training.
What does the margin error in SVM determine?
The margin error indicates how far apart the two parallel lines are; smaller margins indicate a higher margin error.
What role does gradient descent play in SVM training?
Gradient descent is used to minimize the classification and margin error to optimize the separating lines.
Can you explain the expanding factor in SVM?
The expanding factor is a number close to 1 that is multiplied to parameters during iterations to slightly separate the two parallel lines.
How is the error calculated in an SVM?
The SVM error combines the classification error (distance of misclassified points to the boundary) and the margin error (distance between the two separating lines).
ビデオをもっと見る
#3 Notasi FLOWCHART pada Algoritma | DASAR DASAR PEMROGRAMAN
Business | English learning podcast Conversation | Episode 2
Concussion & Brain Injury Prevention | Podcast
DIY Gumball Machine Money Operated from Cardboard at Home
DISPEPSIA (asam lambung, maag, gastritis)
'Ang Huling Pag-ibig ni Rizal', dokumentaryo ni Howie Severino (Full episode) | I-Witness
- 00:00:00hello my name is luis serrano and this
- 00:00:02is a friendly introduction to support
- 00:00:04vector machines or SVM for short this is
- 00:00:07the third of a series of three videos on
- 00:00:09linear models if you haven't take a look
- 00:00:12at the first one it's called linear
- 00:00:14regression and the second one called
- 00:00:15logistic regression this one builds up a
- 00:00:17lot on the second one in particular and
- 00:00:20I'll start with the credits this year I
- 00:00:22thought a machine learning class at
- 00:00:23Qwest University in British Columbia
- 00:00:25Canada and had a wonderful group of
- 00:00:27students who had an awesome time here's
- 00:00:30a picture of us with my friend Richard
- 00:00:32Hoshino on the right he's also a
- 00:00:33professor and actually my students were
- 00:00:36the ones who helped me figure out the
- 00:00:38key idea for this video
- 00:00:40so as VMS are a very important
- 00:00:43classification algorithm and basically
- 00:00:45what it does is the usually tries to
- 00:00:47separate points of two classes using a
- 00:00:50line however it tries really hard to
- 00:00:52find the best line and the best line
- 00:00:54will be the one that is sort of the
- 00:00:56farthest from the points as possible to
- 00:00:58want to separate them best normally as
- 00:01:01VMS are explained in terms of either
- 00:01:04some kind of linear optimization or some
- 00:01:07kind of gradient descent what I want to
- 00:01:09show you today is something that I
- 00:01:10actually haven't seen in the literature
- 00:01:11it may exist but I haven't seen it and
- 00:01:13it's a method that is small greatest a
- 00:01:18step like method which is sort of an
- 00:01:21iteration and in this iteration what you
- 00:01:23do is you first of all try to find a
- 00:01:25better line that classifies the points
- 00:01:27and then at every step you just take two
- 00:01:30lines parallel and just kind of stretch
- 00:01:32them apart let me be more explicit so
- 00:01:36let me start with a very quick recap on
- 00:01:38the previous video on logistic
- 00:01:40regression of perceptron algorithm
- 00:01:41basically what we want to do is we have
- 00:01:44data split into two classes red points
- 00:01:47and blue points and we want to find the
- 00:01:49perfect line this is the perceptron
- 00:01:50algorithm so what I want to do is not
- 00:01:53just find the perfect line but a line
- 00:01:54with a red side and a blue side that
- 00:01:56splits the point in the best possible
- 00:01:58way and the way we did this was we start
- 00:02:02with a random line and then we start
- 00:02:03asking the points what can they tell us
- 00:02:06to make our line better so for example
- 00:02:09this point over here says I'm good so
- 00:02:11don't worry don't do anything
- 00:02:13blue one says well he I'm on the wrong
- 00:02:15side so you better move closer to me in
- 00:02:17order to classify me better so we move a
- 00:02:19little closer my mother machine-learning
- 00:02:21want to do tiny steps we don't want to
- 00:02:23make any big drastic steps so we asked
- 00:02:25another point this one is red in the red
- 00:02:27area so it says I'm good
- 00:02:29don't do anything then we ask this one
- 00:02:32over here and it says get over here so
- 00:02:34we get over there then we ask this blue
- 00:02:37one in the blue area so it says I'm good
- 00:02:40by the way we're adding random points
- 00:02:42here there's no particular order then we
- 00:02:45ask this one it's a red point in the red
- 00:02:47area so it's just some good we ask this
- 00:02:49point which is a blue point in the red
- 00:02:51area so it says get over here so we move
- 00:02:54closer then we ask this red on the red
- 00:02:57area so it says I'm good then this red
- 00:03:00which is now misclassifies on the blue
- 00:03:02area so it says move over here and we
- 00:03:05listen to it and now it seems like all
- 00:03:08the points are good so that is in a
- 00:03:10nutshell the perceptron algorithm I'd
- 00:03:12like to remind you that the way we did
- 00:03:15it is we started with a random line with
- 00:03:18red and blue sides then we picked a
- 00:03:20large number the number of repetitions
- 00:03:21or epochs which in this case is going to
- 00:03:24be a thousand that's the number of times
- 00:03:26we're going to repeat our iterative step
- 00:03:28then step three says repeat a thousand
- 00:03:31times we pick a random point we ask the
- 00:03:33point if it's correctly classified or
- 00:03:35not if it's correctly classified we do
- 00:03:37nothing if it's not correctly classified
- 00:03:39then we move the line a little bit
- 00:03:40towards a point and we do this
- 00:03:45repeatedly so we get the line that
- 00:03:48separates the data pretty well so anyway
- 00:03:52that's a small recap of the perceptron
- 00:03:53algorithm and this algorithm is going to
- 00:03:56be very similar but it's gonna have a
- 00:03:57little bit of an extra step so let me
- 00:04:00show you that extra step first let's
- 00:04:02start by defining what is it that the
- 00:04:04SVM does best so I'm gonna give you an
- 00:04:07example of some data here and I'm gonna
- 00:04:09copy it twice and this is a line that
- 00:04:13separates as data and this is another
- 00:04:15line that separates that data so
- 00:04:17question for you which line is better so
- 00:04:21feel free to think about it for a minute
- 00:04:22I think the best one is
- 00:04:25this one on the left and this one on the
- 00:04:27right is not so good even though they
- 00:04:29both separate the data if you notice the
- 00:04:32one on the left separates the points
- 00:04:35really well like it's really far away
- 00:04:37from the points whereas the one on the
- 00:04:39right is really really close to two of
- 00:04:40the points so if you were to wiggle the
- 00:04:42line on the right around you may miss
- 00:04:45one of the points and you may miss
- 00:04:46classify them whereas so lying on the
- 00:04:48left you can wiggle it freely and you
- 00:04:50still get a good classifier so now the
- 00:04:54question is how do we train the computer
- 00:04:56to pick the line in the left instead of
- 00:04:59the line in the right because if you
- 00:05:00remember perceptron algorithm just finds
- 00:05:03a good line that separates the data but
- 00:05:06it doesn't necessarily find the best one
- 00:05:09so let's rephrase the question what what
- 00:05:12don't do is not just find one line but
- 00:05:15find two lines that are spaced apart as
- 00:05:17possible from each other so here for
- 00:05:19example centered on the main line we
- 00:05:22have these two parallel equidistant
- 00:05:25lines and notice that for this case on
- 00:05:29the Left we can actually have them
- 00:05:30pretty far away from each other on the
- 00:05:33other hand if we do this with the line
- 00:05:35on the right the farthest we can get is
- 00:05:37two lines that are pretty close so we're
- 00:05:39gonna compare these Green distance over
- 00:05:41here with this distance over here
- 00:05:44and the one on the left is pretty wide
- 00:05:46whereas the one on the right is pretty
- 00:05:48narrow so we're gonna go for white so
- 00:05:51we're gonna tell the computer when you
- 00:05:52find a white one
- 00:05:53you're good but if you find a narrow one
- 00:05:55then you're not good and now the
- 00:05:58question is how do we train an algorithm
- 00:06:00to find two lines as far apart from each
- 00:06:04other that are parallel that still split
- 00:06:07our data so this is what we're gonna do
- 00:06:08very similar to what we did before we're
- 00:06:11gonna start by dropping a random line
- 00:06:14that doesn't do a very good job
- 00:06:15necessarily then we draw two parallel
- 00:06:18lines around it at some small random
- 00:06:21distance and then what we're gonna do is
- 00:06:23we're gonna do something very similar to
- 00:06:25the perceptron algorithm we're gonna
- 00:06:27start listening to the points and asking
- 00:06:29them what we need to do so let's say one
- 00:06:32point tells us to move in this direction
- 00:06:34so we move in this direction and then
- 00:06:36what we're gonna do is at every step
- 00:06:38we are going to separate the lines just
- 00:06:41a little bit and then we listen to
- 00:06:43another point that maybe tells us to
- 00:06:45move in this direction and then again
- 00:06:46we're gonna separate the lines a little
- 00:06:48bit and then again another point tells
- 00:06:51us to move in this direction and then
- 00:06:53we're gonna separate the lines a little
- 00:06:55bit and that's pretty much it that's
- 00:06:58what the SVM algorithm does of course we
- 00:07:02need to go through some technicalities
- 00:07:03one technicality is how to separate
- 00:07:06lines so let me show you how to separate
- 00:07:09lines using equations so let's say we
- 00:07:11have a line with equation for example 2x
- 00:07:14plus 3y plus minus 6 equals 0 and then
- 00:07:18again recall that really in the
- 00:07:19Cartesian plane where the horizontal
- 00:07:21axis is the x axis and the vertical axis
- 00:07:24is the y axis so notice that this line
- 00:07:28is the set of points that satisfy that
- 00:07:32two times the x-coordinate plus 3 times
- 00:07:35the y-coordinate minus 6 is equal to 0
- 00:07:38what happens if I multiply this 2 3 and
- 00:07:41-6 by some constant for example by 2 I
- 00:07:44get for example 4x plus 6y plus minus 12
- 00:07:49equals 0
- 00:07:49well what line do you think this is it's
- 00:07:53actually the exact same line because any
- 00:07:55point that satisfies 2x + 3 y - 6 0 0
- 00:07:58also satisfies that 2 times that thing
- 00:08:01equals 0 because 2 times 0 is equal to 0
- 00:08:03so in particular we get the same line
- 00:08:05and if I multiply this equation by any
- 00:08:07factor for example by 10 I get 20 x plus
- 00:08:103y plus minus 60 is equal to 0 I get the
- 00:08:14exact same line so this is actually this
- 00:08:16line actually represents a family of
- 00:08:18equations I can also multiply it by
- 00:08:20numbers are smaller than 1 for example
- 00:08:220.2 X plus point 3y plus minus point 6
- 00:08:26that's dividing the original equation by
- 00:08:2810 that also satisfies the same line and
- 00:08:31I can even multiply in my negative
- 00:08:32numbers and it still works but now let's
- 00:08:36see what changes so here again we have
- 00:08:392x plus 3y plus minus 6 equals 0 and the
- 00:08:42exact same line which is 4x plus 6y plus
- 00:08:45minus 12 equals 0 now let's actually
- 00:08:48draw the line 2x plus 3y plus minus 6
- 00:08:521 & 2 X plus 3y plus minus 6 equals
- 00:08:56minus 1 because what we're gonna do is
- 00:08:58our two parallel lines to the original
- 00:09:01one are the ones with the same equation
- 00:09:04except the ones that don't give 0 but
- 00:09:07they give one and minus 1 now what do
- 00:09:10you think happens if I do the same thing
- 00:09:12on the graph in the right and this is
- 00:09:15important so actually feel free to pause
- 00:09:16this video and think about it for a
- 00:09:18minute
- 00:09:19I'll tell you what happens what happens
- 00:09:20is that we get two lines that are
- 00:09:23parallel but much closer so the
- 00:09:25equations for X plus six Y plus a minus
- 00:09:2812 equals one and minus one are actually
- 00:09:31a lot closer to the original one than
- 00:09:34the ones with equation 2x plus 3y plus
- 00:09:37minus 6 equals 1 and actually if I
- 00:09:39multiply this equation by a smaller
- 00:09:42factor for example by dividing by 10 so
- 00:09:47I get zero point two X plus zero point
- 00:09:49three y plus minus zero point six equals
- 00:09:51one and minus one then I get lines that
- 00:09:54are much more far away from the original
- 00:09:57one and if I were to multiply it by a
- 00:10:00huge number by ten for example I get 20
- 00:10:03X plus 3y plus minus 60 equals 1 and
- 00:10:06minus 1 then the lines get much much
- 00:10:08closer so the original line stays the
- 00:10:10same if I multiply by a constant but
- 00:10:12these two parallel lines move farther
- 00:10:15away or closer depending on if I'm
- 00:10:17multiplying by a number that is close to
- 00:10:20zero a small number or by a large number
- 00:10:23this is not gonna appear in this video
- 00:10:26but if you multiply by a negative number
- 00:10:28that the two lines actually switch but
- 00:10:31this is not so important for this
- 00:10:32algorithm but basically what we're gonna
- 00:10:35do is we're gonna be able to separate
- 00:10:39lines by multiplying them by a small
- 00:10:41number that's really what we're gonna do
- 00:10:43in this algorithm but first we need some
- 00:10:44justification why is it that this
- 00:10:46phenomenon happens so let's look at this
- 00:10:48line for example 2x plus 3y plus minus 6
- 00:10:52equals 0 and let's just look at one side
- 00:10:54of it so 2x plus 3y plus minus 6 equals
- 00:10:571 so why is it that this line over here
- 00:11:01in between is the equation 4x plus 6y
- 00:11:05plus minus
- 00:11:06twelve is equal to one it's actually
- 00:11:08exactly in the middle well let's take a
- 00:11:10look at this equation 4x plus 6y plus
- 00:11:13minus 12 equals 1 is the same line as if
- 00:11:17I just divide the entire thing by 2
- 00:11:18including the 1 so if I were to divide
- 00:11:212x plus 3y plus it's minus 6 equals 0.5
- 00:11:25I get the exact same line and the reason
- 00:11:28is that any X&Y that satisfied 4x plus
- 00:11:326y plus minus 12 equals 1
- 00:11:34they also satisfied 2x plus 3y plus
- 00:11:37minus 6 equals 0.5 the exact same
- 00:11:39equation so when I bring back this
- 00:11:42equation well now you can see that it's
- 00:11:44a value of 0.5 actually lies right in
- 00:11:48between the value of 0 and the value of
- 00:11:501 so that's why this equation is in
- 00:11:53between and you can see that this works
- 00:11:55for pretty much any constant that I
- 00:11:57multiply the line by so what we're gonna
- 00:12:00do is we're gonna introduce something
- 00:12:02called the expanding rate and expanding
- 00:12:04rate is very simple we have again our
- 00:12:06equation 2x plus 3y plus minus 6 equals
- 00:12:080 which gives us this line and then we
- 00:12:11have our two neighbor equations the one
- 00:12:15that gives us one which is over here and
- 00:12:17the one that gives us minus one which is
- 00:12:19over here and our expanding rate is just
- 00:12:22gonna be some number and remember that
- 00:12:25in machine learning we always want to
- 00:12:27make tiny steps we don't want to make
- 00:12:30any big steps so we want to separate
- 00:12:33this line but by a very very little
- 00:12:35amount so we're gonna take a number that
- 00:12:37is very close to 1 for example 0.99
- 00:12:40let's say that's my favorite number that
- 00:12:42is close to 1 and we're gonna call that
- 00:12:44the expanding rate and what we're gonna
- 00:12:47do is we're just gonna multiply all
- 00:12:49these numbers here by 0.99 so what do we
- 00:12:53get
- 00:12:53well we get these equations the
- 00:12:56equations are 1.9 8x plus 2.9 7y plus
- 00:13:00minus 5.94 is equal to 0 to 1 and 10
- 00:13:06minus 1 and this equations gives us
- 00:13:10three lines that one of the middle is
- 00:13:11still the same one but the two on the
- 00:13:14sides are actually just a little spread
- 00:13:16apart so we're just gonna add
- 00:13:19that step to the perceptron algorithm
- 00:13:21and that's gonna spread our lines apart
- 00:13:23a little bit every time we aerate so now
- 00:13:27we're ready to formulate the SVM
- 00:13:29algorithm and it's gonna be very similar
- 00:13:31to the perceptron algorithm step one is
- 00:13:33we're gonna start with a random line and
- 00:13:35to equi distant parallel lines to it and
- 00:13:37I'm gonna color them red and blue just
- 00:13:39to emphasize which side of the line is
- 00:13:41red and which side of the line is blue
- 00:13:42in order to see which points we have
- 00:13:44correctly or incorrectly classified now
- 00:13:47step two is gonna be pick a large number
- 00:13:49on the number of repetitions or epochs
- 00:13:51the number of times we're gonna iterate
- 00:13:52this algorithm step three is gonna be
- 00:13:55pick a number close to 1 so the
- 00:13:57expanding factor and we saw it's gonna
- 00:13:59be 0.99 I can pick anything but that's
- 00:14:02the one I'm gonna pick close to one step
- 00:14:04four is now the loop so repeat a
- 00:14:06thousand times pick a random point and
- 00:14:08the point is correctly classified for
- 00:14:11example this one says I'm good then we
- 00:14:13do nothing if it's incorrectly
- 00:14:15classified then for example like this
- 00:14:18one which is a blue point in the red
- 00:14:20area says get over here so we move the
- 00:14:22line towards a point so we learned in
- 00:14:25the previous video how to move a line
- 00:14:26towards a point like this and then we're
- 00:14:31gonna do the extra step which is
- 00:14:33separate the lines using the expanding
- 00:14:35factor so we're gonna do separate the
- 00:14:37lines a little bit and we're just gonna
- 00:14:40repeat these many many many times
- 00:14:41thousand times until we get a pretty
- 00:14:44good result and then we enjoy the lines
- 00:14:46that separate the data best so notice
- 00:14:48that the two steps that we've added is
- 00:14:50this step three pick a number the imp
- 00:14:52expanding factor close to one and the
- 00:14:55one where we separate the lines using
- 00:14:57the expanding factor the rest is pretty
- 00:14:59much the same thing as the perceptron
- 00:15:01algorithm so now just for full
- 00:15:06disclosure if you want to code this like
- 00:15:07this is actually the perceptron
- 00:15:09algorithm that we saw in the previous
- 00:15:10video where we added step four is the
- 00:15:14the mathematical step where we check if
- 00:15:17something is in the blue and red area by
- 00:15:19checking if the equation applied on the
- 00:15:22point become Q is been a 0 or less than
- 00:15:250 so we update the values of a B and C
- 00:15:28accordingly by adding the learning rate
- 00:15:32times
- 00:15:33the coordinates of the point so the SVM
- 00:15:35algorithm is actually very similar what
- 00:15:37we do is we start the random line of
- 00:15:39equation ax will be y plus C equals zero
- 00:15:42and we draw the parallel lines with
- 00:15:44equations x will be y plus c equals one
- 00:15:47and minus one then we pick a large
- 00:15:49number the number of epochs which is
- 00:15:51gonna be a thousand then we pick a
- 00:15:52learning rate which is gonna be zero
- 00:15:54point zero one we saw it in the logistic
- 00:15:56regression video then we pick an
- 00:15:58expanding rate which is gonna be 0.99
- 00:16:01it's a number close to one and then the
- 00:16:04loop step is repeated thousand times
- 00:16:05pick a random point and the point is
- 00:16:08correctly classified and do nothing if
- 00:16:10the point is blue in the red area then
- 00:16:12we update the values of a B and C
- 00:16:15accordingly if the point is red in the
- 00:16:17blue area we update the values in a
- 00:16:19different way and then it's a final step
- 00:16:21we multiply the values a B and C by 0.99
- 00:16:27which is the expanding step and again
- 00:16:30the two new steps are step three and the
- 00:16:33expanding step so that's it that's the
- 00:16:35SVM training algorithm I encourage you
- 00:16:37to code it and see how it does try
- 00:16:40different values for number of epochs
- 00:16:42learning rate expanding rate etc and let
- 00:16:46me know how you went in the comments so
- 00:16:48that's the SVM algorithm as I said I
- 00:16:50encourage you to code it take a look at
- 00:16:52in some datasets and see how it goes
- 00:16:54however this comes out of somewhere this
- 00:16:57comes out of an error function
- 00:17:00development with gradient descent so now
- 00:17:03I'm gonna show you what the error
- 00:17:04function is and it's very similar to the
- 00:17:06perceptron algorithm where we had a
- 00:17:07classification error based on how far
- 00:17:09the points are from the boundary however
- 00:17:12now we're gonna have a another thing
- 00:17:15that adds to the error which is based on
- 00:17:17how far away these two lines are so let
- 00:17:20me show you so to start there functions
- 00:17:23let me first ask you a question here we
- 00:17:24have the same data set twice and I'm
- 00:17:27gonna show you two support vector
- 00:17:29machines that classified the first one
- 00:17:31is this one and the second one is this
- 00:17:34one now the question is which one do you
- 00:17:37think is better I feel free to pause the
- 00:17:40video and think about it so notice that
- 00:17:42the model on the left has one problem
- 00:17:44which is that it misclassifies
- 00:17:46point however it's good because it's got
- 00:17:49the lines pretty wide apart the mall on
- 00:17:53the right is great there's a
- 00:17:55classification because it classifies
- 00:17:57every point correctly however the lines
- 00:17:59are very close together so the question
- 00:18:03is which one is better and the answer is
- 00:18:05we don't really know it depends on our
- 00:18:08data it depends on our model it depends
- 00:18:10on the scenario but with error functions
- 00:18:13we can actually have an approach to
- 00:18:15maybe analyze what exactly do we want so
- 00:18:18let's recall what happened with the
- 00:18:20perceptron error so we here we have some
- 00:18:22points and a model a perception that
- 00:18:25separates them now this will make some
- 00:18:27mistakes right it makes these two
- 00:18:30because these two are blue points in the
- 00:18:33red area and makes these two because
- 00:18:35these are red points in the blue area so
- 00:18:38the question is how do we measure the
- 00:18:40error or how bad this model is and the
- 00:18:44rationale is if a point is on the
- 00:18:46correct side then this error is zero if
- 00:18:49a point is on the wrong side then the
- 00:18:51error can change if a point is close to
- 00:18:55the boundary then the error is small and
- 00:18:56if it's far from the battery then the
- 00:18:57error is huge because if you're for
- 00:19:00example a blue point and you're close to
- 00:19:02the blue area but still in the red area
- 00:19:04you have a small error but if you well
- 00:19:05into the red area then you generate a
- 00:19:07lot of error because that model is very
- 00:19:09wrong on that point so what you want is
- 00:19:12the distance or not exactly the distance
- 00:19:14but something proportional to this
- 00:19:15distance and the same here so we're
- 00:19:18gonna add a number of proportional to
- 00:19:21these distances and that's gonna be the
- 00:19:22perceptron error so for SVM is gonna be
- 00:19:25similar we're gonna have our lines and
- 00:19:27now we're just gonna have to
- 00:19:30classification errors coming from
- 00:19:31different places so what we're gonna
- 00:19:33have is a red one so our red area now
- 00:19:36doesn't start from the middle but it
- 00:19:38starts from the bottom line and every
- 00:19:43point above this line that is blue it's
- 00:19:47automatically misclassify so these three
- 00:19:49are misclassified and the error is
- 00:19:51precisely the distance from the bottom
- 00:19:53line and that simple so notice that this
- 00:19:57blue point
- 00:19:59that is close to the bottom line is
- 00:20:00actually misclassified even though it
- 00:20:02was correctly classifying the perceptron
- 00:20:04algorithm it is okay it's a harsh error
- 00:20:07function now the blue error comes from
- 00:20:12the line in the top so it comes from
- 00:20:14here now every red point underneath this
- 00:20:16top line is gonna be misclassified and
- 00:20:19its error is gonna be similar to the
- 00:20:22perceptron error it's gonna be
- 00:20:23proportional to this distance over here
- 00:20:25so we're adding all those distances and
- 00:20:28that's our error so those two errors
- 00:20:30form the classification error now we
- 00:20:32have something called the margin error
- 00:20:33and the margin error is simply something
- 00:20:37that tells us if these two lines are
- 00:20:40close by or far apart I'm gonna be a
- 00:20:43little more specific later but it's
- 00:20:45basically a number that is gonna be big
- 00:20:47if the lines are close together and
- 00:20:49small if the lines are far apart
- 00:20:51because it's an error so the better
- 00:20:54model the smaller the error and the
- 00:20:56better our model the wider our lines are
- 00:20:59so let's actually look a little bit more
- 00:21:02on the marry margin error here so we
- 00:21:04have our data set and our data set again
- 00:21:07and two models so this one has the lines
- 00:21:12pretty far apart therefore it has a
- 00:21:14large margin so it's gonna have a small
- 00:21:16margin error and this one over here the
- 00:21:19lines are pretty close so it's got a
- 00:21:21small margin therefore it has a large
- 00:21:23margin error and just show the contrast
- 00:21:25notice that this way model on the right
- 00:21:28has a small classification error and
- 00:21:30this model on the left has a large
- 00:21:32classification error because the mandar
- 00:21:33right classifies all the points
- 00:21:35correctly and the one of the left
- 00:21:36classify points one point incorrectly
- 00:21:39but let's get back to our margin error
- 00:21:42so we have our three lines and let's
- 00:21:44recall the questions of the line are
- 00:21:46something along lines of ax plus B y
- 00:21:49plus C equals 1 and X will do i plus c
- 00:21:52equals minus 1 so now we're gonna do is
- 00:21:55calculate the distance so that I'm gonna
- 00:21:58leave as a challenge for you to do some
- 00:22:01math and show that this is actually 2
- 00:22:03divided by the square root of a squared
- 00:22:05plus B squared so I challenge you to to
- 00:22:09prove this what you have to do is play
- 00:22:11with linear equations
- 00:22:12and Pythagorean theorem and so now the
- 00:22:15question is what can our error be so
- 00:22:19let's think about it we need a number
- 00:22:21that is big if the distance is small and
- 00:22:25a number that is small if the distance
- 00:22:27is big so what can our error be feel
- 00:22:30free to think about it the hint is look
- 00:22:33at the denominator right the bigger a
- 00:22:36square plus B Square is the smaller this
- 00:22:39number is and vice versa so what about
- 00:22:42just taking the margin error to be this
- 00:22:43a squared plus B squared notice that in
- 00:22:47this number is large that means the
- 00:22:49denominator is small and vice versa
- 00:22:52so if we let our margin error just be
- 00:22:55that sum of squares that works that
- 00:22:58actually measures how far apart the
- 00:23:02lines are in the opposite way so if the
- 00:23:05lines are close the error is big even
- 00:23:07lines are far the error is small
- 00:23:09that looks familiar shouldn't be a
- 00:23:12surprise it's actually the
- 00:23:13regularization term if you've seen l2
- 00:23:16regularization so now we can summarize
- 00:23:18where as vm error is here it is we have
- 00:23:21our data set on our model and the error
- 00:23:24basically splits in three first is the
- 00:23:27blue classification error which is
- 00:23:29basically measures all the red points
- 00:23:32that are in the blue side then we have
- 00:23:34the red classification error which
- 00:23:36measures all the blue points that are in
- 00:23:37the red side and then we have the margin
- 00:23:40error which measures how far apart the
- 00:23:43lines are so the red and the blue get
- 00:23:46together to form the total
- 00:23:47classification error which tells us how
- 00:23:50many points are misclassified and how
- 00:23:52badly they are misclassified and then
- 00:23:54the margin error that tells us if the
- 00:23:56lines are far apart or close by and
- 00:23:59these two get together to form the total
- 00:24:02SVM error so that is the error in a
- 00:24:06support vector machine the one that
- 00:24:08we're supports minimize and the gradient
- 00:24:10descent steps very similar what it does
- 00:24:13is actually the same thing as the SVM
- 00:24:14strick what it does is here we have our
- 00:24:17data and here we have a model and this
- 00:24:19model is pretty bad notice that the
- 00:24:20lines are pretty narrow and
- 00:24:23misclassifies a bunch of the points so
- 00:24:26this is a bad as vm it's got a large
- 00:24:29error both in the classification sense
- 00:24:31and in the margin sense and we wanted to
- 00:24:33is using calculus or using gradient
- 00:24:36descent we minimize this error in order
- 00:24:39to get to a good place a good as vm that
- 00:24:42has a good boundary the lines are far
- 00:24:45apart and it actually classifies most of
- 00:24:47the points correctly so in the same way
- 00:24:50that we did with the perceptron
- 00:24:52algorithm this gradient descent process
- 00:24:55takes us from a large error to a small
- 00:24:57error and this actually is exact same
- 00:25:00thing as the SVM trick that I show you
- 00:25:03recently of moving the line closer to
- 00:25:06the points plus separating the lines a
- 00:25:10tiny little bit so now I have a
- 00:25:12challenge for you and the challenge is
- 00:25:15simply to convince yourself that the
- 00:25:19expanding step actually comes out of
- 00:25:22gradient descent so take a look at this
- 00:25:24we have our lines for the question x
- 00:25:26plus b y plus Z equals 1 and ax plus B a
- 00:25:28plus C equals -1 and we have the margin
- 00:25:32over here and the margin error which is
- 00:25:34a square plus B Square so if you're
- 00:25:36familiarize with gradient descent what
- 00:25:38happens is that we want to take a step
- 00:25:41in the direction of the negative of the
- 00:25:44gradient so the gradient is the
- 00:25:45derivative of the margin error with
- 00:25:48respect to the two parameters a and B
- 00:25:50this is a very simple gradient because
- 00:25:53it's it's a the derivative respect to a
- 00:25:55is simply to a because a square plus B
- 00:25:59square that is back today is to a and
- 00:26:01the respect to B is to be there for our
- 00:26:05grading the same step takes a and sends
- 00:26:08it to a minus the learning rate a de
- 00:26:12times to a which is a derivative and
- 00:26:15that's the same thing with B it turns it
- 00:26:18into B minus a dot times to B now we can
- 00:26:22factor this as 8 times 1 minus 2 ADA and
- 00:26:26the bottom one we can factor it as B
- 00:26:29times 1 minus 2 ADA but notice something
- 00:26:33here notice this number over here
- 00:26:37this is exactly the expanding factor
- 00:26:39because what we're doing is multiplying
- 00:26:41a by a number that is close to one
- 00:26:45remember that we multiplied a MB by 0.99
- 00:26:50this one here is the 0.99 because if we
- 00:26:52take a that to be a small number then
- 00:26:55we're multiplying a by a number that is
- 00:26:59very very close to one because if it is
- 00:27:02small then one minus two ADA is very
- 00:27:05close to one so that is exactly the
- 00:27:07expanding step so the expanding step is
- 00:27:10coming from gradient descent and using
- 00:27:15the regularization step
- 00:27:16anyway the challenge is to formalize
- 00:27:19this and to and to really convince
- 00:27:21yourself that this is the case so now
- 00:27:24let's go back a little bit and remember
- 00:27:26these two models because we never really
- 00:27:30answered a question of which one is
- 00:27:31better remember that the one on the Left
- 00:27:35misclassifies one blue point and the one
- 00:27:37on the right just has a very very short
- 00:27:40distance between the lines so they're
- 00:27:42both good and bad in some way so let's
- 00:27:45really study them the one on the left
- 00:27:47has a large classification error because
- 00:27:49it makes one mistake and a small margin
- 00:27:52error because the lines are pretty far
- 00:27:54apart and the one on the right has a
- 00:27:56small classification error because it
- 00:27:58classifies every point correctly and a
- 00:28:00very large margin error because the
- 00:28:02lines are too close by so again which
- 00:28:05one to pick depends on us it depends on
- 00:28:08what we want from the algorithm however
- 00:28:10we need to pass this information to the
- 00:28:12computer so we need to we need to pass
- 00:28:14information of which one do we care more
- 00:28:17about the classification error or the
- 00:28:19margin error and the way to pass this
- 00:28:22information to the computer is using a
- 00:28:23parameter or a hyper parameter this
- 00:28:26one's gonna call we call the C parameter
- 00:28:28so recall that the error here is the
- 00:28:31classification error plus the margin
- 00:28:33error so we're just gonna take a number
- 00:28:36C and attach it to the classification
- 00:28:39error and so now our error is not the
- 00:28:42sum but a weighted sum where one of
- 00:28:44those weighted by C
- 00:28:46now what happens over here well recall
- 00:28:49our error is the C times the
- 00:28:52classification error plus the margin
- 00:28:54error so what happens if we have a small
- 00:28:55value of C if we have a small value of C
- 00:28:58then the classification error gets
- 00:29:00multiplied by a very small number so
- 00:29:02it's all of a sudden is less important
- 00:29:04and then the margin error is an
- 00:29:06important one so we are really training
- 00:29:08an algorithm to focus a lot more on the
- 00:29:10margin error so we end up with a good
- 00:29:14margin and maybe a bad classification so
- 00:29:17we end up with the model on the left
- 00:29:20however if we have a large value of C
- 00:29:24then C is attached to the classification
- 00:29:27error so this means that the
- 00:29:28classification error ends up being a lot
- 00:29:30more important and the margin errors are
- 00:29:32being a little if if C is large so
- 00:29:35therefore the model with a large C
- 00:29:39focuses more on classification because
- 00:29:42it tries to minimize the classification
- 00:29:44error more than it tries to minimize the
- 00:29:46margin error so we end up with a model
- 00:29:48like the one in the right which is good
- 00:29:51for classification bad for margin and so
- 00:29:54again we decide this parameter ourselves
- 00:29:57what we really do in real life is try a
- 00:29:59bunch of different ones and see which
- 00:30:01algorithm did better but but it's good
- 00:30:02to know that we have certain control
- 00:30:05over this training and these these are
- 00:30:06called hyper parameters every every
- 00:30:08machine learning algorithm has a bunch
- 00:30:10of hyper parameters that one can tune to
- 00:30:12decide what we want so that's all folks
- 00:30:17thank you very much for your attention I
- 00:30:19remind you that this is the last of a
- 00:30:21series of three videos on linear models
- 00:30:23linear regression logistic regression
- 00:30:25and support vector machines so I hope
- 00:30:27you enjoyed this as much as I enjoyed it
- 00:30:29thank you remember to subscribe if you
- 00:30:34want to get notifications of more videos
- 00:30:36coming if you liked it please hit like
- 00:30:39share it with your friends or comment I
- 00:30:41love reading the comments I read them
- 00:30:43all if you have suggestions on what
- 00:30:46other ways to make I love to hear them
- 00:30:48and if you want to tweet at me this is
- 00:30:51my Twitter handle Louis Highsmith thank
- 00:30:54you very much and see you in the next
- 00:30:56video
- SVM
- Support Vector Machines
- Machine Learning
- Classification Algorithms
- Linear Models
- Gradient Descent
- Perceptron
- C Parameter
- Margin Error
- Hyperparameters