00:00:00
Last week we got an overview of Machine Learning,
where we found out that, broadly speaking,
00:00:05
Machine Learning approaches are traditionally
divided into three categories, which are known
00:00:10
as Supervised Learning, Unsupervised Learning
and Reinforcement Learning.
00:00:16
Today, we'll get an introduction to Supervised
Learning.
00:00:21
More specifically, we'll get an introduction
to Regression, which is a subfield of Supervised
00:00:27
Learning.
00:00:28
[record scratch sound effect]
00:00:29
But wait, didn't I cover Regression in Episode
205, when I covered how to plot a linear regression
00:00:36
line?
00:00:37
Was that Machine Learning?
00:00:40
Or was that something else?
00:00:42
What exactly is Regression, and why is it
considered part of Machine Learning?
00:00:48
Well, let's find out!
00:00:51
[golf swing sound effect]
00:00:53
Welcome to Julia for Talented Amateurs, where
I make wholesome Julia tutorials for talented
00:00:58
amateurs everywhere.
00:00:59
I am your host, the Dabbling Doggo.
00:01:02
"I dabble."
00:01:05
As a reminder, Supervised Learning is when
the user provides both the Inputs and the
00:01:10
Outputs and then the computer tries to estimate
the Function that determines the relationship
00:01:15
between the Inputs and the Outputs.
00:01:19
Supervised Learning may be split further between
Regression and Classification.
00:01:26
In today's tutorial, we'll learn about Regression.
00:01:31
Regression is used to understand the relationship
between some independent variable X in order
00:01:37
to predict the value of a dependent variable
Y.
00:01:42
There's a non-Machine Learning approach to
solving these problems and then there's a
00:01:47
Machine Learning approach.
00:01:49
Today, you'll see both approaches, so that
you can compare and contrast the differences.
00:01:57
The outline for this tutorial series is loosely
based on Andrew Ng's Machine Learning course
00:02:01
from Stanford University.
00:02:04
I watched all of the lectures, so that you
wouldn't have to, and I'm happy to report
00:02:10
that I understood, maybe 10% of that course.
00:02:13
"Oof."
00:02:14
I'm not kidding.
00:02:16
It's an incredibly difficult course, but if
you think you're up for the challenge, I've
00:02:21
provided a link to it in the description below.
00:02:26
There's a lot of math involved with Machine
Learning, but for my tutorial series there
00:02:31
are no special math prerequisites, since I'll
be covering any math that you need to know.
00:02:37
But, I am assuming that you know the basic
syntax and semantics of Julia and that you
00:02:43
know how to use VS Code, along with the Julia
extension for VS Code.
00:02:50
While conducting research for my tutorial
series, I realized that different instructors
00:02:55
use different naming conventions when describing
the same Machine Learning concepts.
00:03:00
For the sake of consistency, I'll try my best
to use the naming conventions used by Andrew
00:03:07
Ng, but just be aware that you may encounter
different names used by different instructors
00:03:13
to describe the same things in Machine Learning.
00:03:18
With that said, let's jump right into VS Code
and start coding!
00:03:22
[golf swing sound effect]
00:03:26
In your VS Code Explorer Panel, create a New
Folder for this tutorial.
00:03:32
In the Tutorial folder, create a new file
called "sl_regression.jl".
00:03:40
I'm using "s-l" to stand for Supervised Learning.
00:03:45
Launch the Julia REPL by using Alt+J then
Alt+O.
00:03:53
Maximize the Terminal panel.
00:03:56
Change the present working directory to your
Tutorial directory.
00:04:01
Enter the Package REPL by hitting the closing
square bracket.
00:04:07
Activate your Tutorial directory.
00:04:10
Add the following Packages: CSV, GLM, Plots
and TypedTables.
00:04:22
I have covered all of these packages in previous,
non-Machine Learning tutorials.
00:04:27
You'll be happy to know that do not need to
use any special packages in order to get started
00:04:32
with Machine Learning in Julia.
00:04:36
Type in "status" to confirm the version numbers.
00:04:40
Just as an FYI, I'm using Julia version 1.7.1.
00:04:47
Exit the Package REPL by hitting Backspace.
00:04:51
Minimize the Terminal panel.
00:04:55
I'm going to go through this section fairly
quickly, since this is not Machine Learning,
00:05:00
and since I've already covered it in Episode
205.
00:05:04
Watching Episode 205 is not required for this
tutorial.
00:05:10
For this tutorial, you will need a data set
that you can download from my GitHub repository.
00:05:16
There's a link to it in the description below.
00:05:20
You can save it by right-clicking on it.
00:05:23
When you download the file, save it to your
tutorial directory.
00:05:28
It's a CSV file called "housingdata" and it
contains a collection of Input-Output pairs
00:05:35
showing the relationship between the size
of houses in square feet and the price of
00:05:39
the houses.
00:05:41
I'm not sure what year this data is from,
nor can I confirm its accuracy, but it's supposed
00:05:47
to reflect housing prices in the Portland,
Oregon area.
00:05:51
We will be using this data set as our motivating
example of using Regression to predict the
00:05:56
price of a house based on the size of the
house in square feet.
00:06:02
OK, now that we have out data, let's start
coding.
00:06:07
Start by calling the packages.
00:06:14
Use the CSV package to import the data from
the CSV file that you just downloaded.
00:06:25
You can then use the TypedTables package to
convert that data into a Julia Table.
00:06:32
To make the numbers easier to read, let's
divide the housing prices by $1000.
00:06:40
The GLM package requires data to be in the
form of a Table.
00:06:45
OK, now that we have our data set up, let's
plot it and take a look at it.
00:07:01
[typing sounds]
00:07:21
Subjectively speaking, it looks like there's
a linear relationship between the size of
00:07:26
the house and the price of the house.
00:07:30
Let's use the GLM package to find the best-fit
linear regression line using the Ordinary
00:07:35
Least Squares method.
00:07:40
If you haven't used the GLM package before,
GLM stands for Generalized Linear Models.
00:07:49
The Ordinary Least Squares method takes the
square of the vertical distance between the
00:07:54
predicted value versus the actual value along
the y-axis and then adds them up.
00:08:01
By using this code, the GLM package will return
the values of the y-intercept and the slope
00:08:07
of the line that minimizes the value of the
sum of the squares.
00:08:13
In order to see the output, you might need
to maximize the REPL or scroll up.
00:08:19
There's a lot of information in the output,
but we're only interested in the Coefficients
00:08:24
listed in the 1st column of numbers.
00:08:28
The "intercept" is the value of the y-intercept
and the coefficient value listed in the 2nd
00:08:34
row is the slope of the line.
00:08:38
Make a note of those numbers.
00:08:40
In a few minutes, we're going to set up a
Machine Learning algorithm to see if our computer
00:08:45
can teach itself how to estimate those values.
00:08:50
Now that we have an Ordinary Least Squares
model, we can add it to our plot.
00:08:55
The predict() function comes included with
the GLM package, and it returns the Y values
00:09:01
along the linear regression line.
00:09:05
And, that's it.
00:09:07
Again, subjectively speaking, it looks like
that linear regression line does a pretty
00:09:12
good job of estimating the relationship between
the size of the house and the price of the
00:09:17
house.
00:09:19
Let's use this model to predict the price
of a house given a new value for the size
00:09:23
of a house.
00:09:30
So, this model is estimating that a house
that is 1,250 square feet will have a price
00:09:36
around $240 thousand dollars.
00:09:40
So, that was pretty easy, right?
00:09:44
If it's this easy to use linear regression
without Machine Learning, then why do we need
00:09:49
Machine Learning?
00:09:51
Well, it turns out that this non-Machine Learning
approach works great for simple data sets.
00:09:58
In this example, we're using the size of the
house as the only independent variable to
00:10:03
predict the price of the house.
00:10:06
But, the real world is, of course, more complicated
than that.
00:10:12
What if we wanted to use more independent
variables, like the number of bedrooms, the
00:10:17
number of floors, the age of the house, and
so on?
00:10:23
In order to make more sophisticated predictions,
we need a more sophisticated approach in order
00:10:29
use all of these inputs.
00:10:32
That's where Machine Learning comes in.
00:10:34
[golf swing sound effect]
00:10:37
Rather than sitting though a boring lecture
about abstract Machine Learning concepts,
00:10:43
let's dive right into the code and learn-by-doing.
00:10:47
I'll provide the explanations along the way.
00:10:52
Set up a variable called "ee-poks", or "epics".
00:10:56
Either pronunciation is acceptable.
00:10:59
I'll explain what this is in a few minutes.
00:11:03
Next, let's re-plot our data.
00:11:07
You don't need to retype all of this in.
00:11:09
You can just re-run the code from above.
00:11:12
But for the sake of completeness, I'm going
to copy and paste the plotting code from above.
00:11:18
The only difference is that I added some text
to the title to include the epochs.
00:11:27
Next, we need to initialize the Parameters.
00:11:39
So, recall from math class that the formula
for any line is y = mx + b, where m is the
00:11:48
rise-over-run slope of the line and b is the
y-intercept of the line.
00:11:55
In Machine Learning, both m and b are sometimes
called Parameters.
00:12:00
Other times, Machine Learning folks will refer
to the slope, m, as a "Weight", and the y-intercept,
00:12:07
b, as a "Bias".
00:12:10
This is what I meant earlier when I mentioned
that different instructors use different terminologies
00:12:16
when they're actually referring to the same
thing.
00:12:20
In this tutorial, I'll be referring to both
the slope, m, and the y-intercept, b, as "Parameters".
00:12:29
By changing the values of these Parameters,
you can change the characteristics of the
00:12:33
line.
00:12:35
As you increase the value of m, then the line
becomes steeper.
00:12:40
If you increase the value of b, then the line
move higher along the y-axis.
00:12:47
By changing just those 2 values, you can define
every line in a Cartesian coordinate plane.
00:12:55
In Machine Learning, the convention is to
use theta_0 to refer to the y-intercept and
00:13:02
theta_1 to refer to the slope of the line.
00:13:06
The other convention is to initialize both
theta_0 and theta_1 to a value of zero.
00:13:13
By setting both values to zero, that means
that our initial line will be a horizontal
00:13:18
line along the x-axis.
00:13:22
Next, we use the Parameters to define our
linear regression model, which is just the
00:13:27
formula for a straight line.
00:13:32
The letter "h" refers to "Hypothesis" and
this is the function that we'll be using to
00:13:37
make predictions.
00:13:40
I know that this formula looks a little weird
with theta_0 and theta_1, but it's the same
00:13:45
formula for a line, y = mx + b.
00:13:50
The reason for this naming convention is because,
in theory, you could have more than one feature,
00:13:56
so it's possible to have a theta_2, a theta_3,
and so on.
00:14:02
But for today, we're going to stick with our
basic line.
00:14:07
Let's add this regression line to our plot.
00:14:15
I don't know if you can see this on the screen,
but there's a blue line sitting along the
00:14:20
x-axis.
00:14:22
That's the starting point for our linear regression
estimate.
00:14:26
As you can see, it's way off, but we're going
to see if we can train our computer to figure
00:14:33
out a better estimate.
00:14:36
So, how exactly are we going to train our
computer?
00:14:40
Well, we need some way to measure how good,
or how bad, the estimate is.
00:14:48
For that, we're going to need something called
a Cost Function.
00:14:54
There are different Cost Functions that are
used in Machine Learning, but I'm going to
00:14:58
use the Cost Function from Andrew Ng's course.
00:15:02
For every value of X, this Cost Function will
calculate the vertical distance between the
00:15:07
predicted value of Y versus the actual value
of Y, and then it will square that distance.
00:15:15
The function will then calculate the average
distance between the predicted values of Y
00:15:20
compared to the actual values of Y, and then
divide that by 2.
00:15:26
Those of you with a math background may recognize
this as formula for the Mean Squared Error.
00:15:32
The only difference is that our Cost Function
is divided by 2, which will make it slightly
00:15:37
easier to deal with partial derivatives later.
00:15:42
Your computer will try to improve its estimate
by changing the values of theta_0 and theta_1.
00:15:49
Your computer will know when it's found a
good solution to the problem when it's able
00:15:53
to minimize the value of this Cost Function.
00:15:57
"What?"
00:15:59
Confused?
00:16:01
This will make more sense once you see it
in action and can see some real numbers.
00:16:06
For now, let's add the code for this Cost
Function.
00:16:12
Here, the variable "m" refers to the number
of samples.
00:16:16
It does not refer to the slope of the line.
00:16:20
Other instructors may use the uppercase letter
N for the number of samples, but I'm using
00:16:26
"m" in order to stay consistent with Andrew
Ng's notation.
00:16:31
The variable y_hat is sometimes used for the
predicted values of Y, which is how it's being
00:16:37
used here.
00:16:42
This is the actual Cost Function that I described
earlier.
00:16:46
The Cost Function is sometimes referred to
as the "loss" function.
00:16:53
The uppercase letter "J" is sometime used
to refer to the Cost Function.
00:16:58
It's not clear to me why the uppercase "J"
is used, but it is a convention used elsewhere
00:17:04
in math.
00:17:06
This value is the cost when the Parameters
theta_0 and theta_1 are both equal to zero,
00:17:12
for this data set.
00:17:15
The actual value is not important, but the
change in value over time is important, so
00:17:21
let's keep track of the history of this value.
00:17:25
This the value that we want to minimize.
00:17:31
So, here's the million dollar question.
00:17:35
How's your computer suppose to figure out
the minimum value of the Cost Function?
00:17:41
Well, we know that we can define any line
simply by adjusting the values of the y intercept
00:17:47
and the slope.
00:17:49
So, we need to set up a process where our
computer can make a slight adjustment to the
00:17:54
values of both the y-intercept and the slope
and then check to see if that made things
00:17:59
better or worse.
00:18:02
Based on the result, the computer can make
another slight adjustment to the values of
00:18:07
both the y-intercept and the slope and then
check again to see if that made things better
00:18:12
or worse.
00:18:15
By following this process, your computer can,
quote-unquote, "learn" how to improve the
00:18:20
estimate through its experience.
00:18:23
Your computer can repeat this process as many
times as necessary until it finds an optimal
00:18:28
solution.
00:18:31
The process that I just described is known
as the Gradient Descent Algorithm.
00:18:38
We'll be using a version called the "Batch"
Gradient Descent Algorithm.
00:18:44
The math involved in deriving the gradient
descent formulae is a bit complicated, so
00:18:49
I'm going to skip over it and simply present
the formulae to use.
00:18:53
These formulae are the result of taking partial
derivatives of the Cost Function.
00:18:59
You do not need to know what a partial derivative
is in order to use these formulae.
00:19:05
This is the function that will determine how
to adjust the value of the y-intercept.
00:19:17
And this is the function that will determine
how to adjust the value of the slope.
00:19:29
Now, we need to set the value for something
called the Learning Rate, which is sometimes
00:19:33
referred to by the Greek letter alpha.
00:19:37
In Machine Learning, the Learning Rate is
known as a Hyperparameter to distinguish it
00:19:42
from other Parameters, like theta_0 and theta_1.
00:19:47
That's an 8 with 7 zeros in front of it.
00:19:52
The Learning Rate is used to prevent your
computer from making large adjustments to
00:19:56
the values of either the y-intercept or the
slope.
00:20:00
You can use any values you want for the Learning
Rates.
00:20:05
I have to admit that I cheated here.
00:20:08
By convention, the default Learning Rate is
set to 0.01 for all Parameters, but I found
00:20:15
that it was taking too long to find a solution,
which isn't great since I was trying to make
00:20:19
a tutorial out of this.
00:20:21
Using these values will allow your computer
to find an optimal solution faster, just for
00:20:26
this example.
00:20:28
One of the issues here is that the values
along the X-axis are much larger compared
00:20:33
to the values along the Y-axis.
00:20:36
In future tutorials, we'll discuss methods
to adjust for this, but for today, we'll leave
00:20:41
the values as they are.
00:20:44
For now, let's write the code to make this
Gradient Descent Algorithm come alive.
00:20:55
The use of temp variables here will allow
us to make simultaneous updates to the values
00:20:59
of both the y-intercept as well as the slope.
00:21:14
So, if you look at the values, you can begin
to see what's going on.
00:21:21
The value of theta_0, or the y-intercept,
has changed from 0 to 31, which means that
00:21:29
the new line has moved up slightly along the
y-axis.
00:21:34
Also, the value of theta_1, or the slope,
has changed from 0 to 0.06, which means that
00:21:42
the line has gone from begin horizontal to
sloping up slightly.
00:21:48
In theory, these changes should improve our
estimate.
00:21:52
Now, let's go back up to theta_0_temp and
theta_1_temp.
00:21:59
The actual values are not important here,
but note that both of the values are negative
00:22:04
and note the magnitude of the values.
00:22:08
So, what are these numbers?
00:22:12
Without getting into the math, here's how
you can think about them.
00:22:17
Your computer is asking itself the question,
in order to improve my estimate, in what direction,
00:22:24
and by what magnitude, do I need to change
the values of my Parameters?
00:22:30
Does the change need to be positive, or negative?
00:22:34
Does the change need to be big, or small?
00:22:38
These values provide the answers to those
questions.
00:22:43
Let's recalculate the cost to see if this
theory is true.
00:22:51
You see?
00:22:52
The value of the cost dropped from around
65,000 to around 20,000.
00:22:59
Again, the actual value is not important,
but the direction and the magnitude of the
00:23:04
drop is significant.
00:23:07
Making those slight adjustments to the values
of the y-intercept and the slope made a huge
00:23:12
improvement in the cost.
00:23:15
Let's keep track of this value and then add
it to our plot.
00:23:25
[typing sounds]
00:23:38
You should see a new, blue line on your plot
showing the latest estimate using the new
00:23:43
values of theta_0 and theta_1.
00:23:47
It's a big improvement from the initial horizontal
line along the x-axis, but it's still pretty
00:23:52
far away from the actual data points.
00:23:55
This is a good time to point out that this
is an iterative process, so your computer
00:24:00
will need to repeat these steps in order to
continue to improve.
00:24:06
The variable named "epochs" is keeping track
of the number of iterations.
00:24:11
Epoch is another example of a Hyperparameter.
00:24:16
The term "epoch" is very specific in Machine
Learning.
00:24:20
Technically, Epoch means the number of times
that your Machine Learning algorithm has seen
00:24:25
all of the data.
00:24:27
The term "iteration" is used slightly differently
in other Machine Learning scenarios, but in
00:24:32
this case, both Epoch and Iteration mean the
same thing.
00:24:39
You could write a for-loop to automate this
iterative process, but since we're learning
00:24:44
fundamentals, let's go through this manually
to see how the numbers change as we repeat
00:24:48
this process.
00:24:58
Note that the value of the cost dropped again.
00:25:07
You should see another blue line added to
your plot.
00:25:11
This time it's much closer to the data points,
but still appears to be off.
00:25:17
Let's repeat this process a few more times.
00:25:20
I'll fast forward through this part.
00:25:26
[typing sounds]
00:25:40
So, the improvements appear to be slowing
down, and that last estimate look pretty close
00:25:45
to what we calculated using the GLM package.
00:25:49
Just to be sure, let's add the linear regression
line that we calculated before.
00:25:58
That looks pretty close, but it's not exactly
the same.
00:26:03
In theory, if you kept repeating this process,
the Machine Learning model would eventually
00:26:07
converge with the model that we calculated
using the GLM package.
00:26:13
If you scroll up, you can see that the last
estimate for the y-intercept, or theta_0 is
00:26:18
around 68, and the estimate for the slope,
or theta_1, is around 0.14.
00:26:27
If you maximize the REPL and scroll up, you
should be able to find the coefficients calculated
00:26:32
by the GLM package.
00:26:35
GLM calculated a y-intercept of around 71
and a slope of around 0.13.
00:26:43
So, overall, our Machine Learning algorithm
got pretty close, which is amazing since it
00:26:48
figured out those values on its own.
00:26:51
"Wow!"
00:26:53
Let's take a look at a so-called "Learning
Curve" to see the progress over time.
00:27:04
[typing sounds]
00:27:17
The actual values are not important, but the
overall shape of the curve is important.
00:27:23
The plot shows some dramatic improvements
initially, but it converges near a minimum
00:27:28
value over time.
00:27:31
As a final sanity check, let's use our Machine
Learning-generated model to make a price prediction
00:27:37
based on a new value for the square footage
of a house.
00:27:41
The value is displayed in the REPL.
00:27:44
So, our Machine Learning model is predicting
a price around $237 thousand dollars.
00:27:51
Let's compare that with the value estimated
by the GLM model.
00:27:59
The GLM model is predicting a price around
$239 thousand dollars, which is not the same,
00:28:05
but it's pretty close, which means that we
have successfully implemented our very first
00:28:11
Machine Learning algorithm!
00:28:12
[air horn sound effect]
00:28:15
I know that I covered a lot today, so let's
take a few minutes to recap what we just learned.
00:28:21
[golf swing sound effect]
00:28:25
Today, we got an introduction to Supervised
Learning.
00:28:29
Supervised Learning is when the user provides
both the Inputs and the Outputs, and then
00:28:35
the computer tries to estimate the Function
that determines the relationship between the
00:28:39
Inputs and the Outputs.
00:28:43
Supervised Learning may be further split between
Regression and Classification.
00:28:49
Today, we specifically learned about Supervised
Learning using a Regression algorithm.
00:28:56
By Regression, we mean that we're trying to
understand the relationship between some independent
00:29:01
variable X in order to predict the value of
a dependent variable Y.
00:29:07
As a motivating example, we examined a data
set showing the relationship between the size
00:29:12
of the houses in square feet and the price
of the houses.
00:29:17
After a quick review of how to calculate a
linear regression using a traditional, non-Machine
00:29:22
Learning approach, we dove right in to create
a Supervised Machine Learning algorithm that
00:29:28
allowed our computer to learn on its own how
to derive an estimate for the linear regression
00:29:33
line.
00:29:36
Some of the key elements of the Machine Learning
workflow include:
00:29:40
Defining a Hypothesis Model in order to make
predictions.
00:29:45
Defining a Cost Function in order to measure
performance.
00:29:50
Using a Gradient Descent Algorithm to incrementally
improve performance.
00:29:56
Iterating until the value of the Cost Function
converged on a minimum value.
00:30:03
For this simple example, using a Machine Learning
algorithm may have been excessive, but working
00:30:09
through this motivating example allowed us
to get an introduction to many of the fundamental
00:30:14
concepts in Machine Learning.
00:30:19
I know that I glossed over some of the explanations,
but at this point, I think that it's more
00:30:24
important to get a feel for the terminology,
the code and the overall workflow using a
00:30:30
concrete example, rather than getting lost
in some abstract exposition.
00:30:37
We'll be seeing these Machine Learning concepts
over and over throughout this series, so we'll
00:30:42
have an opportunity to examine them in more
detail as time goes on.
00:30:47
For now, you can be proud that you have officially
moved from theory into practice, so you are
00:30:54
well on your way on this exciting, Machine
Learning adventure.
00:30:58
[golf swing sound effect]
00:31:02
Well, that's all for today.
00:31:05
If you made it this far, congratulations!
00:31:07
[kids cheering sound effect]
00:31:10
If you enjoyed this video, and you feel like
you learned something new, please, give it
00:31:15
a thumbs up!
00:31:17
For more wholesome Julia tutorials, please
be sure, to Subscribe and hit that bell!
00:31:24
If you like what I do, then please consider
Joining and becoming a Channel Member.
00:31:30
New tutorials are posted on Sundays-slash-Mondays.
00:31:34
Thanks for watching, and I'll see you, in
the next video.
00:31:38
"Wow!"