[05x02] Linear Regression | Regression | Supervised Learning | Machine Learning [Julia]

00:31:40
https://www.youtube.com/watch?v=n03pSsA7NtQ

概要

TLDRThe video provides an introduction to Supervised Learning with a focus on Regression, explaining the hand-on coding tutorial in Julia for predicting house prices based on size. It discusses the differences between traditional and Machine Learning approaches, outlining essential elements like hypothesis models, cost functions, and the gradient descent algorithm. Through coded examples, viewers learn how to implement a simple regression model and understand the fundamentals of Machine Learning.

収穫

  • 📊 Understanding Supervised Learning basics.
  • 🏠 Regression predicts dependent variables.
  • 📉 Difference between traditional and ML approaches.
  • 📝 Key elements: Hypothesis, Cost Function, Gradient Descent.
  • 🚀 Hands-on tutorial using Julia.
  • 📈 Data visualization with the Plots package.
  • ⚙️ Learning rates impact convergence speed.
  • 🔄 Iterative processes key to model training.
  • 📊 Cost Function signifies model performance.
  • 🤖 ML algorithms can self-optimize during training.

タイムライン

  • 00:00:00 - 00:05:00

    The video begins with an overview of Machine Learning, highlighting the three main categories: Supervised Learning, Unsupervised Learning, and Reinforcement Learning. It introduces Supervised Learning in detail, specifically focusing on Regression, and raises questions about the relationship of Regression to Machine Learning.

  • 00:05:00 - 00:10:00

    An explanation of Supervised Learning is provided, stating that it involves both Inputs and Outputs provided by the user, which the computer uses to estimate the relationship between them. The tutorial will primarily cover Regression, which is used to predict dependent variables based on independent variables, while contrasting traditional and Machine Learning approaches.

  • 00:10:00 - 00:15:00

    Viewers are directed on how to set up their coding environment in VS Code, with instructions to create a new file and add relevant packages. The importance of having a data set, specifically the 'housingdata' CSV file, is emphasized, which will be used to illustrate Regression predictions.

  • 00:15:00 - 00:20:00

    Once the data set is prepared, it is pointed out that it shows a linear relationship between the size of houses and their prices. The GLM package is introduced to find the best-fit linear regression line using the Ordinary Least Squares method, demonstrating how to extract coefficients for predictions and setup initial models for comparison later.

  • 00:20:00 - 00:25:00

    The necessity of Machine Learning is discussed, emphasizing its advantages in handling complex datasets with multiple independent variables as opposed to simple traditional methods. Viewers are guided to begin coding a basic regression model with parameters for better estimates, leading to the introduction of the Cost Function and the concept of Gradient Descent.

  • 00:25:00 - 00:31:40

    The video wraps up by summarizing the process of implementing a Supervised Learning regression system, detailing the workflow, hypothesis model, cost function, and the iterative improvement of algorithms in practice. It concludes with encouragement for viewers to continue their learning journey in Machine Learning.

もっと見る

マインドマップ

ビデオQ&A

  • What is Supervised Learning?

    Supervised Learning is a type of Machine Learning where the user provides both inputs and outputs, and the computer tries to estimate the function that defines their relationship.

  • What is Regression in Machine Learning?

    Regression is a technique used to predict the value of a dependent variable based on one or more independent variables.

  • What programming language is used in the tutorial?

    The tutorial uses the Julia programming language.

  • What datasets are used in this video?

    The tutorial uses a dataset containing input-output pairs showing the relationship between house size and prices.

  • What is the Gradient Descent Algorithm?

    The Gradient Descent Algorithm is a method used in Machine Learning to optimize the parameters of a model by iteratively adjusting them to minimize the cost function.

  • What libraries are required in Julia for this tutorial?

    The tutorial requires the CSV, GLM, Plots, and TypedTables packages.

  • How do I visualize the regression line in Julia?

    You can visualize the regression line using the predict() function from the GLM package, which returns the y values along the regression line.

  • What is a Cost Function?

    A Cost Function measures the performance of a model, calculating the difference between predicted values and actual values.

  • What is the significance of epochs in Machine Learning?

    Epochs refer to the number of times the algorithm has processed the entire data set during training.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:00
    Last week we got an overview of Machine Learning, where we found out that, broadly speaking,
  • 00:00:05
    Machine Learning approaches are traditionally divided into three categories, which are known
  • 00:00:10
    as Supervised Learning, Unsupervised Learning and Reinforcement Learning.
  • 00:00:16
    Today, we'll get an introduction to Supervised Learning.
  • 00:00:21
    More specifically, we'll get an introduction to Regression, which is a subfield of Supervised
  • 00:00:27
    Learning.
  • 00:00:28
    [record scratch sound effect]
  • 00:00:29
    But wait, didn't I cover Regression in Episode 205, when I covered how to plot a linear regression
  • 00:00:36
    line?
  • 00:00:37
    Was that Machine Learning?
  • 00:00:40
    Or was that something else?
  • 00:00:42
    What exactly is Regression, and why is it considered part of Machine Learning?
  • 00:00:48
    Well, let's find out!
  • 00:00:51
    [golf swing sound effect]
  • 00:00:53
    Welcome to Julia for Talented Amateurs, where I make wholesome Julia tutorials for talented
  • 00:00:58
    amateurs everywhere.
  • 00:00:59
    I am your host, the Dabbling Doggo.
  • 00:01:02
    "I dabble."
  • 00:01:05
    As a reminder, Supervised Learning is when the user provides both the Inputs and the
  • 00:01:10
    Outputs and then the computer tries to estimate the Function that determines the relationship
  • 00:01:15
    between the Inputs and the Outputs.
  • 00:01:19
    Supervised Learning may be split further between Regression and Classification.
  • 00:01:26
    In today's tutorial, we'll learn about Regression.
  • 00:01:31
    Regression is used to understand the relationship between some independent variable X in order
  • 00:01:37
    to predict the value of a dependent variable Y.
  • 00:01:42
    There's a non-Machine Learning approach to solving these problems and then there's a
  • 00:01:47
    Machine Learning approach.
  • 00:01:49
    Today, you'll see both approaches, so that you can compare and contrast the differences.
  • 00:01:57
    The outline for this tutorial series is loosely based on Andrew Ng's Machine Learning course
  • 00:02:01
    from Stanford University.
  • 00:02:04
    I watched all of the lectures, so that you wouldn't have to, and I'm happy to report
  • 00:02:10
    that I understood, maybe 10% of that course.
  • 00:02:13
    "Oof."
  • 00:02:14
    I'm not kidding.
  • 00:02:16
    It's an incredibly difficult course, but if you think you're up for the challenge, I've
  • 00:02:21
    provided a link to it in the description below.
  • 00:02:26
    There's a lot of math involved with Machine Learning, but for my tutorial series there
  • 00:02:31
    are no special math prerequisites, since I'll be covering any math that you need to know.
  • 00:02:37
    But, I am assuming that you know the basic syntax and semantics of Julia and that you
  • 00:02:43
    know how to use VS Code, along with the Julia extension for VS Code.
  • 00:02:50
    While conducting research for my tutorial series, I realized that different instructors
  • 00:02:55
    use different naming conventions when describing the same Machine Learning concepts.
  • 00:03:00
    For the sake of consistency, I'll try my best to use the naming conventions used by Andrew
  • 00:03:07
    Ng, but just be aware that you may encounter different names used by different instructors
  • 00:03:13
    to describe the same things in Machine Learning.
  • 00:03:18
    With that said, let's jump right into VS Code and start coding!
  • 00:03:22
    [golf swing sound effect]
  • 00:03:26
    In your VS Code Explorer Panel, create a New Folder for this tutorial.
  • 00:03:32
    In the Tutorial folder, create a new file called "sl_regression.jl".
  • 00:03:40
    I'm using "s-l" to stand for Supervised Learning.
  • 00:03:45
    Launch the Julia REPL by using Alt+J then Alt+O.
  • 00:03:53
    Maximize the Terminal panel.
  • 00:03:56
    Change the present working directory to your Tutorial directory.
  • 00:04:01
    Enter the Package REPL by hitting the closing square bracket.
  • 00:04:07
    Activate your Tutorial directory.
  • 00:04:10
    Add the following Packages: CSV, GLM, Plots and TypedTables.
  • 00:04:22
    I have covered all of these packages in previous, non-Machine Learning tutorials.
  • 00:04:27
    You'll be happy to know that do not need to use any special packages in order to get started
  • 00:04:32
    with Machine Learning in Julia.
  • 00:04:36
    Type in "status" to confirm the version numbers.
  • 00:04:40
    Just as an FYI, I'm using Julia version 1.7.1.
  • 00:04:47
    Exit the Package REPL by hitting Backspace.
  • 00:04:51
    Minimize the Terminal panel.
  • 00:04:55
    I'm going to go through this section fairly quickly, since this is not Machine Learning,
  • 00:05:00
    and since I've already covered it in Episode 205.
  • 00:05:04
    Watching Episode 205 is not required for this tutorial.
  • 00:05:10
    For this tutorial, you will need a data set that you can download from my GitHub repository.
  • 00:05:16
    There's a link to it in the description below.
  • 00:05:20
    You can save it by right-clicking on it.
  • 00:05:23
    When you download the file, save it to your tutorial directory.
  • 00:05:28
    It's a CSV file called "housingdata" and it contains a collection of Input-Output pairs
  • 00:05:35
    showing the relationship between the size of houses in square feet and the price of
  • 00:05:39
    the houses.
  • 00:05:41
    I'm not sure what year this data is from, nor can I confirm its accuracy, but it's supposed
  • 00:05:47
    to reflect housing prices in the Portland, Oregon area.
  • 00:05:51
    We will be using this data set as our motivating example of using Regression to predict the
  • 00:05:56
    price of a house based on the size of the house in square feet.
  • 00:06:02
    OK, now that we have out data, let's start coding.
  • 00:06:07
    Start by calling the packages.
  • 00:06:14
    Use the CSV package to import the data from the CSV file that you just downloaded.
  • 00:06:25
    You can then use the TypedTables package to convert that data into a Julia Table.
  • 00:06:32
    To make the numbers easier to read, let's divide the housing prices by $1000.
  • 00:06:40
    The GLM package requires data to be in the form of a Table.
  • 00:06:45
    OK, now that we have our data set up, let's plot it and take a look at it.
  • 00:07:01
    [typing sounds]
  • 00:07:21
    Subjectively speaking, it looks like there's a linear relationship between the size of
  • 00:07:26
    the house and the price of the house.
  • 00:07:30
    Let's use the GLM package to find the best-fit linear regression line using the Ordinary
  • 00:07:35
    Least Squares method.
  • 00:07:40
    If you haven't used the GLM package before, GLM stands for Generalized Linear Models.
  • 00:07:49
    The Ordinary Least Squares method takes the square of the vertical distance between the
  • 00:07:54
    predicted value versus the actual value along the y-axis and then adds them up.
  • 00:08:01
    By using this code, the GLM package will return the values of the y-intercept and the slope
  • 00:08:07
    of the line that minimizes the value of the sum of the squares.
  • 00:08:13
    In order to see the output, you might need to maximize the REPL or scroll up.
  • 00:08:19
    There's a lot of information in the output, but we're only interested in the Coefficients
  • 00:08:24
    listed in the 1st column of numbers.
  • 00:08:28
    The "intercept" is the value of the y-intercept and the coefficient value listed in the 2nd
  • 00:08:34
    row is the slope of the line.
  • 00:08:38
    Make a note of those numbers.
  • 00:08:40
    In a few minutes, we're going to set up a Machine Learning algorithm to see if our computer
  • 00:08:45
    can teach itself how to estimate those values.
  • 00:08:50
    Now that we have an Ordinary Least Squares model, we can add it to our plot.
  • 00:08:55
    The predict() function comes included with the GLM package, and it returns the Y values
  • 00:09:01
    along the linear regression line.
  • 00:09:05
    And, that's it.
  • 00:09:07
    Again, subjectively speaking, it looks like that linear regression line does a pretty
  • 00:09:12
    good job of estimating the relationship between the size of the house and the price of the
  • 00:09:17
    house.
  • 00:09:19
    Let's use this model to predict the price of a house given a new value for the size
  • 00:09:23
    of a house.
  • 00:09:30
    So, this model is estimating that a house that is 1,250 square feet will have a price
  • 00:09:36
    around $240 thousand dollars.
  • 00:09:40
    So, that was pretty easy, right?
  • 00:09:44
    If it's this easy to use linear regression without Machine Learning, then why do we need
  • 00:09:49
    Machine Learning?
  • 00:09:51
    Well, it turns out that this non-Machine Learning approach works great for simple data sets.
  • 00:09:58
    In this example, we're using the size of the house as the only independent variable to
  • 00:10:03
    predict the price of the house.
  • 00:10:06
    But, the real world is, of course, more complicated than that.
  • 00:10:12
    What if we wanted to use more independent variables, like the number of bedrooms, the
  • 00:10:17
    number of floors, the age of the house, and so on?
  • 00:10:23
    In order to make more sophisticated predictions, we need a more sophisticated approach in order
  • 00:10:29
    use all of these inputs.
  • 00:10:32
    That's where Machine Learning comes in.
  • 00:10:34
    [golf swing sound effect]
  • 00:10:37
    Rather than sitting though a boring lecture about abstract Machine Learning concepts,
  • 00:10:43
    let's dive right into the code and learn-by-doing.
  • 00:10:47
    I'll provide the explanations along the way.
  • 00:10:52
    Set up a variable called "ee-poks", or "epics".
  • 00:10:56
    Either pronunciation is acceptable.
  • 00:10:59
    I'll explain what this is in a few minutes.
  • 00:11:03
    Next, let's re-plot our data.
  • 00:11:07
    You don't need to retype all of this in.
  • 00:11:09
    You can just re-run the code from above.
  • 00:11:12
    But for the sake of completeness, I'm going to copy and paste the plotting code from above.
  • 00:11:18
    The only difference is that I added some text to the title to include the epochs.
  • 00:11:27
    Next, we need to initialize the Parameters.
  • 00:11:39
    So, recall from math class that the formula for any line is y = mx + b, where m is the
  • 00:11:48
    rise-over-run slope of the line and b is the y-intercept of the line.
  • 00:11:55
    In Machine Learning, both m and b are sometimes called Parameters.
  • 00:12:00
    Other times, Machine Learning folks will refer to the slope, m, as a "Weight", and the y-intercept,
  • 00:12:07
    b, as a "Bias".
  • 00:12:10
    This is what I meant earlier when I mentioned that different instructors use different terminologies
  • 00:12:16
    when they're actually referring to the same thing.
  • 00:12:20
    In this tutorial, I'll be referring to both the slope, m, and the y-intercept, b, as "Parameters".
  • 00:12:29
    By changing the values of these Parameters, you can change the characteristics of the
  • 00:12:33
    line.
  • 00:12:35
    As you increase the value of m, then the line becomes steeper.
  • 00:12:40
    If you increase the value of b, then the line move higher along the y-axis.
  • 00:12:47
    By changing just those 2 values, you can define every line in a Cartesian coordinate plane.
  • 00:12:55
    In Machine Learning, the convention is to use theta_0 to refer to the y-intercept and
  • 00:13:02
    theta_1 to refer to the slope of the line.
  • 00:13:06
    The other convention is to initialize both theta_0 and theta_1 to a value of zero.
  • 00:13:13
    By setting both values to zero, that means that our initial line will be a horizontal
  • 00:13:18
    line along the x-axis.
  • 00:13:22
    Next, we use the Parameters to define our linear regression model, which is just the
  • 00:13:27
    formula for a straight line.
  • 00:13:32
    The letter "h" refers to "Hypothesis" and this is the function that we'll be using to
  • 00:13:37
    make predictions.
  • 00:13:40
    I know that this formula looks a little weird with theta_0 and theta_1, but it's the same
  • 00:13:45
    formula for a line, y = mx + b.
  • 00:13:50
    The reason for this naming convention is because, in theory, you could have more than one feature,
  • 00:13:56
    so it's possible to have a theta_2, a theta_3, and so on.
  • 00:14:02
    But for today, we're going to stick with our basic line.
  • 00:14:07
    Let's add this regression line to our plot.
  • 00:14:15
    I don't know if you can see this on the screen, but there's a blue line sitting along the
  • 00:14:20
    x-axis.
  • 00:14:22
    That's the starting point for our linear regression estimate.
  • 00:14:26
    As you can see, it's way off, but we're going to see if we can train our computer to figure
  • 00:14:33
    out a better estimate.
  • 00:14:36
    So, how exactly are we going to train our computer?
  • 00:14:40
    Well, we need some way to measure how good, or how bad, the estimate is.
  • 00:14:48
    For that, we're going to need something called a Cost Function.
  • 00:14:54
    There are different Cost Functions that are used in Machine Learning, but I'm going to
  • 00:14:58
    use the Cost Function from Andrew Ng's course.
  • 00:15:02
    For every value of X, this Cost Function will calculate the vertical distance between the
  • 00:15:07
    predicted value of Y versus the actual value of Y, and then it will square that distance.
  • 00:15:15
    The function will then calculate the average distance between the predicted values of Y
  • 00:15:20
    compared to the actual values of Y, and then divide that by 2.
  • 00:15:26
    Those of you with a math background may recognize this as formula for the Mean Squared Error.
  • 00:15:32
    The only difference is that our Cost Function is divided by 2, which will make it slightly
  • 00:15:37
    easier to deal with partial derivatives later.
  • 00:15:42
    Your computer will try to improve its estimate by changing the values of theta_0 and theta_1.
  • 00:15:49
    Your computer will know when it's found a good solution to the problem when it's able
  • 00:15:53
    to minimize the value of this Cost Function.
  • 00:15:57
    "What?"
  • 00:15:59
    Confused?
  • 00:16:01
    This will make more sense once you see it in action and can see some real numbers.
  • 00:16:06
    For now, let's add the code for this Cost Function.
  • 00:16:12
    Here, the variable "m" refers to the number of samples.
  • 00:16:16
    It does not refer to the slope of the line.
  • 00:16:20
    Other instructors may use the uppercase letter N for the number of samples, but I'm using
  • 00:16:26
    "m" in order to stay consistent with Andrew Ng's notation.
  • 00:16:31
    The variable y_hat is sometimes used for the predicted values of Y, which is how it's being
  • 00:16:37
    used here.
  • 00:16:42
    This is the actual Cost Function that I described earlier.
  • 00:16:46
    The Cost Function is sometimes referred to as the "loss" function.
  • 00:16:53
    The uppercase letter "J" is sometime used to refer to the Cost Function.
  • 00:16:58
    It's not clear to me why the uppercase "J" is used, but it is a convention used elsewhere
  • 00:17:04
    in math.
  • 00:17:06
    This value is the cost when the Parameters theta_0 and theta_1 are both equal to zero,
  • 00:17:12
    for this data set.
  • 00:17:15
    The actual value is not important, but the change in value over time is important, so
  • 00:17:21
    let's keep track of the history of this value.
  • 00:17:25
    This the value that we want to minimize.
  • 00:17:31
    So, here's the million dollar question.
  • 00:17:35
    How's your computer suppose to figure out the minimum value of the Cost Function?
  • 00:17:41
    Well, we know that we can define any line simply by adjusting the values of the y intercept
  • 00:17:47
    and the slope.
  • 00:17:49
    So, we need to set up a process where our computer can make a slight adjustment to the
  • 00:17:54
    values of both the y-intercept and the slope and then check to see if that made things
  • 00:17:59
    better or worse.
  • 00:18:02
    Based on the result, the computer can make another slight adjustment to the values of
  • 00:18:07
    both the y-intercept and the slope and then check again to see if that made things better
  • 00:18:12
    or worse.
  • 00:18:15
    By following this process, your computer can, quote-unquote, "learn" how to improve the
  • 00:18:20
    estimate through its experience.
  • 00:18:23
    Your computer can repeat this process as many times as necessary until it finds an optimal
  • 00:18:28
    solution.
  • 00:18:31
    The process that I just described is known as the Gradient Descent Algorithm.
  • 00:18:38
    We'll be using a version called the "Batch" Gradient Descent Algorithm.
  • 00:18:44
    The math involved in deriving the gradient descent formulae is a bit complicated, so
  • 00:18:49
    I'm going to skip over it and simply present the formulae to use.
  • 00:18:53
    These formulae are the result of taking partial derivatives of the Cost Function.
  • 00:18:59
    You do not need to know what a partial derivative is in order to use these formulae.
  • 00:19:05
    This is the function that will determine how to adjust the value of the y-intercept.
  • 00:19:17
    And this is the function that will determine how to adjust the value of the slope.
  • 00:19:29
    Now, we need to set the value for something called the Learning Rate, which is sometimes
  • 00:19:33
    referred to by the Greek letter alpha.
  • 00:19:37
    In Machine Learning, the Learning Rate is known as a Hyperparameter to distinguish it
  • 00:19:42
    from other Parameters, like theta_0 and theta_1.
  • 00:19:47
    That's an 8 with 7 zeros in front of it.
  • 00:19:52
    The Learning Rate is used to prevent your computer from making large adjustments to
  • 00:19:56
    the values of either the y-intercept or the slope.
  • 00:20:00
    You can use any values you want for the Learning Rates.
  • 00:20:05
    I have to admit that I cheated here.
  • 00:20:08
    By convention, the default Learning Rate is set to 0.01 for all Parameters, but I found
  • 00:20:15
    that it was taking too long to find a solution, which isn't great since I was trying to make
  • 00:20:19
    a tutorial out of this.
  • 00:20:21
    Using these values will allow your computer to find an optimal solution faster, just for
  • 00:20:26
    this example.
  • 00:20:28
    One of the issues here is that the values along the X-axis are much larger compared
  • 00:20:33
    to the values along the Y-axis.
  • 00:20:36
    In future tutorials, we'll discuss methods to adjust for this, but for today, we'll leave
  • 00:20:41
    the values as they are.
  • 00:20:44
    For now, let's write the code to make this Gradient Descent Algorithm come alive.
  • 00:20:55
    The use of temp variables here will allow us to make simultaneous updates to the values
  • 00:20:59
    of both the y-intercept as well as the slope.
  • 00:21:14
    So, if you look at the values, you can begin to see what's going on.
  • 00:21:21
    The value of theta_0, or the y-intercept, has changed from 0 to 31, which means that
  • 00:21:29
    the new line has moved up slightly along the y-axis.
  • 00:21:34
    Also, the value of theta_1, or the slope, has changed from 0 to 0.06, which means that
  • 00:21:42
    the line has gone from begin horizontal to sloping up slightly.
  • 00:21:48
    In theory, these changes should improve our estimate.
  • 00:21:52
    Now, let's go back up to theta_0_temp and theta_1_temp.
  • 00:21:59
    The actual values are not important here, but note that both of the values are negative
  • 00:22:04
    and note the magnitude of the values.
  • 00:22:08
    So, what are these numbers?
  • 00:22:12
    Without getting into the math, here's how you can think about them.
  • 00:22:17
    Your computer is asking itself the question, in order to improve my estimate, in what direction,
  • 00:22:24
    and by what magnitude, do I need to change the values of my Parameters?
  • 00:22:30
    Does the change need to be positive, or negative?
  • 00:22:34
    Does the change need to be big, or small?
  • 00:22:38
    These values provide the answers to those questions.
  • 00:22:43
    Let's recalculate the cost to see if this theory is true.
  • 00:22:51
    You see?
  • 00:22:52
    The value of the cost dropped from around 65,000 to around 20,000.
  • 00:22:59
    Again, the actual value is not important, but the direction and the magnitude of the
  • 00:23:04
    drop is significant.
  • 00:23:07
    Making those slight adjustments to the values of the y-intercept and the slope made a huge
  • 00:23:12
    improvement in the cost.
  • 00:23:15
    Let's keep track of this value and then add it to our plot.
  • 00:23:25
    [typing sounds]
  • 00:23:38
    You should see a new, blue line on your plot showing the latest estimate using the new
  • 00:23:43
    values of theta_0 and theta_1.
  • 00:23:47
    It's a big improvement from the initial horizontal line along the x-axis, but it's still pretty
  • 00:23:52
    far away from the actual data points.
  • 00:23:55
    This is a good time to point out that this is an iterative process, so your computer
  • 00:24:00
    will need to repeat these steps in order to continue to improve.
  • 00:24:06
    The variable named "epochs" is keeping track of the number of iterations.
  • 00:24:11
    Epoch is another example of a Hyperparameter.
  • 00:24:16
    The term "epoch" is very specific in Machine Learning.
  • 00:24:20
    Technically, Epoch means the number of times that your Machine Learning algorithm has seen
  • 00:24:25
    all of the data.
  • 00:24:27
    The term "iteration" is used slightly differently in other Machine Learning scenarios, but in
  • 00:24:32
    this case, both Epoch and Iteration mean the same thing.
  • 00:24:39
    You could write a for-loop to automate this iterative process, but since we're learning
  • 00:24:44
    fundamentals, let's go through this manually to see how the numbers change as we repeat
  • 00:24:48
    this process.
  • 00:24:58
    Note that the value of the cost dropped again.
  • 00:25:07
    You should see another blue line added to your plot.
  • 00:25:11
    This time it's much closer to the data points, but still appears to be off.
  • 00:25:17
    Let's repeat this process a few more times.
  • 00:25:20
    I'll fast forward through this part.
  • 00:25:26
    [typing sounds]
  • 00:25:40
    So, the improvements appear to be slowing down, and that last estimate look pretty close
  • 00:25:45
    to what we calculated using the GLM package.
  • 00:25:49
    Just to be sure, let's add the linear regression line that we calculated before.
  • 00:25:58
    That looks pretty close, but it's not exactly the same.
  • 00:26:03
    In theory, if you kept repeating this process, the Machine Learning model would eventually
  • 00:26:07
    converge with the model that we calculated using the GLM package.
  • 00:26:13
    If you scroll up, you can see that the last estimate for the y-intercept, or theta_0 is
  • 00:26:18
    around 68, and the estimate for the slope, or theta_1, is around 0.14.
  • 00:26:27
    If you maximize the REPL and scroll up, you should be able to find the coefficients calculated
  • 00:26:32
    by the GLM package.
  • 00:26:35
    GLM calculated a y-intercept of around 71 and a slope of around 0.13.
  • 00:26:43
    So, overall, our Machine Learning algorithm got pretty close, which is amazing since it
  • 00:26:48
    figured out those values on its own.
  • 00:26:51
    "Wow!"
  • 00:26:53
    Let's take a look at a so-called "Learning Curve" to see the progress over time.
  • 00:27:04
    [typing sounds]
  • 00:27:17
    The actual values are not important, but the overall shape of the curve is important.
  • 00:27:23
    The plot shows some dramatic improvements initially, but it converges near a minimum
  • 00:27:28
    value over time.
  • 00:27:31
    As a final sanity check, let's use our Machine Learning-generated model to make a price prediction
  • 00:27:37
    based on a new value for the square footage of a house.
  • 00:27:41
    The value is displayed in the REPL.
  • 00:27:44
    So, our Machine Learning model is predicting a price around $237 thousand dollars.
  • 00:27:51
    Let's compare that with the value estimated by the GLM model.
  • 00:27:59
    The GLM model is predicting a price around $239 thousand dollars, which is not the same,
  • 00:28:05
    but it's pretty close, which means that we have successfully implemented our very first
  • 00:28:11
    Machine Learning algorithm!
  • 00:28:12
    [air horn sound effect]
  • 00:28:15
    I know that I covered a lot today, so let's take a few minutes to recap what we just learned.
  • 00:28:21
    [golf swing sound effect]
  • 00:28:25
    Today, we got an introduction to Supervised Learning.
  • 00:28:29
    Supervised Learning is when the user provides both the Inputs and the Outputs, and then
  • 00:28:35
    the computer tries to estimate the Function that determines the relationship between the
  • 00:28:39
    Inputs and the Outputs.
  • 00:28:43
    Supervised Learning may be further split between Regression and Classification.
  • 00:28:49
    Today, we specifically learned about Supervised Learning using a Regression algorithm.
  • 00:28:56
    By Regression, we mean that we're trying to understand the relationship between some independent
  • 00:29:01
    variable X in order to predict the value of a dependent variable Y.
  • 00:29:07
    As a motivating example, we examined a data set showing the relationship between the size
  • 00:29:12
    of the houses in square feet and the price of the houses.
  • 00:29:17
    After a quick review of how to calculate a linear regression using a traditional, non-Machine
  • 00:29:22
    Learning approach, we dove right in to create a Supervised Machine Learning algorithm that
  • 00:29:28
    allowed our computer to learn on its own how to derive an estimate for the linear regression
  • 00:29:33
    line.
  • 00:29:36
    Some of the key elements of the Machine Learning workflow include:
  • 00:29:40
    Defining a Hypothesis Model in order to make predictions.
  • 00:29:45
    Defining a Cost Function in order to measure performance.
  • 00:29:50
    Using a Gradient Descent Algorithm to incrementally improve performance.
  • 00:29:56
    Iterating until the value of the Cost Function converged on a minimum value.
  • 00:30:03
    For this simple example, using a Machine Learning algorithm may have been excessive, but working
  • 00:30:09
    through this motivating example allowed us to get an introduction to many of the fundamental
  • 00:30:14
    concepts in Machine Learning.
  • 00:30:19
    I know that I glossed over some of the explanations, but at this point, I think that it's more
  • 00:30:24
    important to get a feel for the terminology, the code and the overall workflow using a
  • 00:30:30
    concrete example, rather than getting lost in some abstract exposition.
  • 00:30:37
    We'll be seeing these Machine Learning concepts over and over throughout this series, so we'll
  • 00:30:42
    have an opportunity to examine them in more detail as time goes on.
  • 00:30:47
    For now, you can be proud that you have officially moved from theory into practice, so you are
  • 00:30:54
    well on your way on this exciting, Machine Learning adventure.
  • 00:30:58
    [golf swing sound effect]
  • 00:31:02
    Well, that's all for today.
  • 00:31:05
    If you made it this far, congratulations!
  • 00:31:07
    [kids cheering sound effect]
  • 00:31:10
    If you enjoyed this video, and you feel like you learned something new, please, give it
  • 00:31:15
    a thumbs up!
  • 00:31:17
    For more wholesome Julia tutorials, please be sure, to Subscribe and hit that bell!
  • 00:31:24
    If you like what I do, then please consider Joining and becoming a Channel Member.
  • 00:31:30
    New tutorials are posted on Sundays-slash-Mondays.
  • 00:31:34
    Thanks for watching, and I'll see you, in the next video.
  • 00:31:38
    "Wow!"
タグ
  • Machine Learning
  • Supervised Learning
  • Regression
  • Julia
  • Data Science
  • Coding Tutorial
  • Gradient Descent
  • Cost Function
  • Hypothesis Model
  • House Pricing Prediction