Simple Linear Regression: Basic Concepts Part I

00:45:33
https://www.youtube.com/watch?v=BLRjywb0mes

概要

TLDRThis tutorial on simple linear regression explains how to analyze the relationship between two variables using regression analysis. It covers the concepts of dependent and independent variables, the regression model, and how to calculate the slope and intercept using the least squares method. The tutorial also discusses the significance of the slope, the coefficient of determination (R²), and the correlation coefficient (R) to assess the model's fit. Additionally, it introduces hypothesis testing for the slope to determine if a significant relationship exists between the variables, providing a comprehensive understanding of simple linear regression and its applications.

収穫

  • 📊 Simple linear regression models the relationship between two variables.
  • 🔍 Dependent variable (Y) is what we predict; independent variable (X) is what we use to predict.
  • 📈 The slope indicates the direction and strength of the relationship.
  • 📉 R² measures how well the independent variable explains the variability in the dependent variable.
  • 🧮 The least squares method minimizes the sum of squared differences between observed and predicted values.
  • 🔬 Hypothesis testing can determine if the slope is significantly different from zero.
  • 📉 A correlation coefficient (R) close to 1 or -1 indicates a strong relationship.
  • 📊 Multiple regression involves more than one independent variable.
  • 📈 Regression analysis can be applied in various fields for predictions.
  • 📉 Understanding regression helps in making informed decisions based on data.

タイムライン

  • 00:00:00 - 00:05:00

    This tutorial introduces simple linear regression, focusing on understanding the relationship between two variables, such as advertising expenditures and sales, or practice time and errors. It explains the concepts of dependent and independent variables, using Y for the dependent variable and X for the independent variable, and emphasizes the goal of regression analysis in predicting outcomes based on these variables.

  • 00:05:00 - 00:10:00

    The tutorial defines simple linear regression, highlighting that it involves one independent variable and one dependent variable, and aims to fit a straight line to the data to approximate their relationship. It contrasts simple regression with multiple regression, which involves more than one independent variable.

  • 00:10:00 - 00:15:00

    The regression model is presented as Y = Beta0 + Beta1 * X + Epsilon, where Beta0 is the Y-intercept, Beta1 is the slope, and Epsilon represents the error in predictions. The tutorial explains the significance of these components in understanding the regression line and its relationship with the data.

  • 00:15:00 - 00:20:00

    Examples of regression lines are provided, illustrating positive, negative, and no relationships between X and Y. The tutorial explains how the slope (Beta1) indicates the direction and steepness of the relationship, while the Y-intercept (Beta0) shows the value of Y when X is zero.

  • 00:20:00 - 00:25:00

    The tutorial discusses estimating the regression line using sample data, introducing the concept of a scatter diagram to visualize the relationship between X and Y. It explains how to plot data points and suggests that a linear model can be created to describe the relationship between the variables.

  • 00:25:00 - 00:30:00

    Using the least squares method, the tutorial explains how to find the best-fitting line by minimizing the distance of data points from the regression line. It introduces the formula for calculating the slope and Y-intercept, emphasizing the importance of these calculations in defining the regression line.

  • 00:30:00 - 00:35:00

    The tutorial provides a step-by-step calculation of the slope and Y-intercept using sample data, demonstrating how to derive these values from the observed data points. It emphasizes the need for accurate calculations to define the regression line effectively.

  • 00:35:00 - 00:40:00

    The tutorial explains how to use the regression line to make predictions, illustrating this with an example of predicting grades based on hours studied. It highlights the importance of assessing the fit of the regression line to the data and introduces the coefficient of determination (R²) as a measure of fit.

  • 00:40:00 - 00:45:33

    Finally, the tutorial discusses hypothesis testing for the slope of the regression line, explaining how to determine if there is a significant relationship between the variables. It covers both the critical value and P-value approaches to hypothesis testing, concluding that there is evidence of a significant linear relationship between the variables studied.

もっと見る

マインドマップ

ビデオQ&A

  • What is simple linear regression?

    Simple linear regression is a statistical method used to model the relationship between two variables by fitting a straight line to the data.

  • What are dependent and independent variables?

    The dependent variable is the outcome we are trying to predict (Y), while the independent variable is the predictor (X).

  • How do you calculate the slope and intercept in regression?

    The slope (B1) and intercept (B0) can be calculated using formulas derived from the least squares method.

  • What does R² represent in regression analysis?

    R², or the coefficient of determination, measures the proportion of variability in the dependent variable that can be explained by the independent variable.

  • What is the significance of the slope in regression?

    The slope indicates the direction and strength of the relationship between the independent and dependent variables.

  • How do you test the significance of the regression slope?

    You can test the significance of the slope using a t-test to determine if it is significantly different from zero.

  • What is the correlation coefficient?

    The correlation coefficient (R) measures the strength and direction of the linear relationship between two variables.

  • What is the least squares method?

    The least squares method is a statistical technique used to minimize the sum of the squares of the residuals (the differences between observed and predicted values).

  • What is the difference between simple and multiple regression?

    Simple regression involves one independent variable, while multiple regression involves two or more independent variables.

  • How can regression analysis be applied in real life?

    Regression analysis can be used in various fields, such as economics, biology, and social sciences, to make predictions and understand relationships between variables.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:00
    welcome to this tutorial on simple
  • 00:00:02
    linear regression what we will be
  • 00:00:05
    learning in this tutorial is how to
  • 00:00:07
    better understand the relationship of
  • 00:00:09
    two or more variables using regression
  • 00:00:12
    analysis for example let's say we want
  • 00:00:15
    to study the relationship between
  • 00:00:17
    advertising expenditures and sales what
  • 00:00:20
    might we expect well usually the more we
  • 00:00:23
    spend on Advertising the more sales we
  • 00:00:25
    should have so we would want to see if
  • 00:00:28
    as advertising expenditure increase do
  • 00:00:31
    sales increase as well and by how much
  • 00:00:35
    we might also want to study the number
  • 00:00:36
    of hours we practice some task and the
  • 00:00:39
    number of errors in this case we would
  • 00:00:42
    expect that as we practice more the
  • 00:00:45
    number of Errors we make should decrease
  • 00:00:48
    what we are interested in doing with
  • 00:00:50
    regression analysis is to develop a
  • 00:00:52
    model that helps us to understand if two
  • 00:00:55
    variables are related and to help us to
  • 00:00:57
    make certain predictions for example if
  • 00:00:59
    if we can develop a model to show the
  • 00:01:01
    relationship between advertising
  • 00:01:03
    expenditures and sales we would be able
  • 00:01:06
    to use that model to predict the sales
  • 00:01:08
    for a given level of advertising in
  • 00:01:10
    regression we Define what is called a
  • 00:01:13
    dependent variable and that is the
  • 00:01:15
    variable we are trying to predict we
  • 00:01:18
    will also Define an independent variable
  • 00:01:21
    and this is the variable we will use to
  • 00:01:23
    predict the dependent variable we will
  • 00:01:25
    use the letter Y to represent our
  • 00:01:28
    dependent variable and the letter X to
  • 00:01:30
    represent the independent variable if we
  • 00:01:33
    are trying to predict sales based on
  • 00:01:35
    Advertising expenditures then y would be
  • 00:01:39
    the sales so our dependent variable is
  • 00:01:41
    sales that's what we're trying to
  • 00:01:43
    predict and advertising expenditures
  • 00:01:45
    would be our independent variable X
  • 00:01:48
    because that's what we're using to
  • 00:01:49
    predict sales what about the
  • 00:01:51
    relationship between practice time and
  • 00:01:53
    number of errors in this case we are
  • 00:01:55
    saying that as practice time increases
  • 00:01:58
    we would expect the number of errors to
  • 00:02:00
    decrease so we would try to predict the
  • 00:02:02
    number of errors and that would be our
  • 00:02:05
    dependent variable y based on practice
  • 00:02:08
    time so that would be our independent
  • 00:02:10
    variable X let's talk a little more
  • 00:02:13
    about what simple linear regression is
  • 00:02:15
    the term simple refers to the number of
  • 00:02:18
    independent variables we are using in a
  • 00:02:20
    simple regression in simple regression
  • 00:02:23
    we have only one independent variable
  • 00:02:25
    and one dependent variable that is 1 X
  • 00:02:28
    and one y using that one y to predict X
  • 00:02:32
    the term linear refers to the type of
  • 00:02:35
    model we are trying to create in this
  • 00:02:37
    case we are trying to fit a straight
  • 00:02:38
    line to the data to approximate the
  • 00:02:41
    relationship between X and Y the term
  • 00:02:43
    linear means fitting a straight line so
  • 00:02:47
    what we are doing in this tutorial is
  • 00:02:49
    simple linear regression simple because
  • 00:02:52
    we have only one independent variable
  • 00:02:54
    and linear because we are trying to
  • 00:02:56
    explain the relationship between X and Y
  • 00:02:58
    using a straight line
  • 00:03:00
    we can also use more than one
  • 00:03:02
    independent variable when we feel it
  • 00:03:04
    will help with the fit of the model and
  • 00:03:06
    when we use more than one independent
  • 00:03:08
    variable it would be called multiple
  • 00:03:11
    regression so now that we understand
  • 00:03:13
    what simple regression is let's take a
  • 00:03:15
    look at the model and it looks like this
  • 00:03:18
    we have y is equal to Beta KN + beta 1 *
  • 00:03:22
    X plus the Greek letter Epsilon this
  • 00:03:26
    model is for the population we will use
  • 00:03:28
    another model with B instead of betas
  • 00:03:31
    when we discussed the estimated model
  • 00:03:33
    using sampling much like we used xar
  • 00:03:35
    instead of mu to approximate the
  • 00:03:37
    population mean let's look at the
  • 00:03:40
    components of this model a little more
  • 00:03:43
    closely beta KN refers to the Y
  • 00:03:45
    intercept of the line we are defining
  • 00:03:47
    for this model we will refer to this
  • 00:03:50
    line as the regression line or the line
  • 00:03:52
    of regression so beta knot is the Y
  • 00:03:55
    intercept which means it is where the
  • 00:03:57
    line would cross over the Y AIS AIS that
  • 00:04:00
    point would be beta KN you can also
  • 00:04:03
    think of it as the value of y when X is
  • 00:04:06
    zero at the Y AIS the value of x is zero
  • 00:04:10
    so when X is zero whatever y is at that
  • 00:04:13
    point would be the Y intercept beta not
  • 00:04:17
    next we have beta 1 this would represent
  • 00:04:20
    the slope of the regression line the
  • 00:04:22
    slope tells us two things whether the
  • 00:04:24
    line is increasing or decreasing and how
  • 00:04:27
    steep it is the last component of the
  • 00:04:30
    model is Epsilon and that stands for the
  • 00:04:33
    error that exists in our prediction
  • 00:04:34
    model as good as our model is there is
  • 00:04:37
    always a random error term that cannot
  • 00:04:39
    be accounted for let's take a look at
  • 00:04:42
    some examples of the regression line
  • 00:04:45
    here we have an example that shows a
  • 00:04:47
    line sloping upward because the line is
  • 00:04:50
    sloping upward it is showing a positive
  • 00:04:52
    or increasing relationship between X and
  • 00:04:55
    Y this means that as X increases so does
  • 00:04:59
    y so the line slopes upward and
  • 00:05:02
    therefore beta 1 the slope will be a
  • 00:05:05
    positive number the line itself is
  • 00:05:08
    called the regression line or the line
  • 00:05:10
    of regression and the Y intercept beta
  • 00:05:13
    KN is here where the regression line
  • 00:05:16
    hits the Y AIS this next example shows a
  • 00:05:21
    downward sloping line and depicts a
  • 00:05:23
    negative linear relationship between X
  • 00:05:25
    and Y when we have a negative
  • 00:05:28
    relationship we see that as X increases
  • 00:05:31
    y decreases and that is why the line
  • 00:05:33
    slopes
  • 00:05:35
    downward so beta 1 the slope would be a
  • 00:05:38
    negative number the regression line is
  • 00:05:41
    here and the Y intercept is here where
  • 00:05:44
    the line hits the Y
  • 00:05:47
    AIS here is another example where we
  • 00:05:49
    have a flat line across this graph shows
  • 00:05:53
    that as X increases y Remains the Same
  • 00:05:56
    so there is no relationship between X
  • 00:05:59
    and Y
  • 00:06:00
    in this scenario beta 1 the slope would
  • 00:06:03
    be equal to zero so when there is no
  • 00:06:05
    linear relationship between X and Y the
  • 00:06:08
    slope is zero the regression line is
  • 00:06:10
    this flat line going across and the Y
  • 00:06:13
    intercept beta KN would be here where
  • 00:06:16
    the regression line hits the Y
  • 00:06:19
    AIS beta KN and beta 1 are the
  • 00:06:22
    population parameters for the Y
  • 00:06:24
    intercept and the slope just like mu was
  • 00:06:27
    used to refer to the population
  • 00:06:28
    parameter for the mean and the Greek
  • 00:06:30
    letter Sigma was used for the standard
  • 00:06:33
    deviation we use beta and beta not for
  • 00:06:36
    the population parameters of the Y
  • 00:06:38
    intercept and the slope we don't know
  • 00:06:41
    the true population parameters unless we
  • 00:06:43
    take a complete census so when we get
  • 00:06:45
    our data from a sample we are getting
  • 00:06:47
    what are called sample statistics rather
  • 00:06:50
    than true population parameters in
  • 00:06:53
    regression B knot and B1 are used to
  • 00:06:56
    estimate the true population parameters
  • 00:06:58
    of beta kn and beta 1 just like we used
  • 00:07:02
    xar to estimate mu and S to estimate
  • 00:07:06
    Sigma we use B and B1 to estimate beta
  • 00:07:10
    KN and beta 1 so for a sample the
  • 00:07:13
    estimated simple linear regression
  • 00:07:15
    equation looks like this now this y with
  • 00:07:19
    something that looks like a hat on it is
  • 00:07:22
    actually called y hat and in the
  • 00:07:25
    equation Y hat refers to the estimated
  • 00:07:29
    or predicted value of y for a given x
  • 00:07:33
    value B not is the Y intercept for the
  • 00:07:35
    line and B1 is the slope so now we have
  • 00:07:39
    all the components we need to define a
  • 00:07:41
    straight line we have the slope and we
  • 00:07:44
    have the Y
  • 00:07:45
    intercept now we are ready to look at
  • 00:07:48
    some sample data so that we can
  • 00:07:49
    calculate the estimated slope and the Y
  • 00:07:53
    intercept let's start by looking at a
  • 00:07:55
    graph called a scattered diagram this
  • 00:07:58
    graph shows us the relation a ship
  • 00:08:00
    between X and Y We Begin by drawing a
  • 00:08:03
    horizontal axis and labeling it X and a
  • 00:08:06
    vertical axis and labeling it y on the x
  • 00:08:09
    axis we will plot our independent
  • 00:08:12
    variable and on the Y AIS we will plot
  • 00:08:14
    our dependent variable for our example
  • 00:08:18
    number of hours studied is the
  • 00:08:20
    independent variable and grade on exam
  • 00:08:23
    will be our dependent variable let's
  • 00:08:26
    take a look at some sample data that I
  • 00:08:28
    have here I have two columns of numbers
  • 00:08:31
    one is with the number of hours students
  • 00:08:33
    studied and the second column has their
  • 00:08:35
    corresponding grades on an
  • 00:08:37
    exam there are 10 sets of numbers here
  • 00:08:41
    to begin let's draw our horizontal and
  • 00:08:44
    vertical axes and label them by putting
  • 00:08:47
    the number of hours studied on the x
  • 00:08:49
    axis the horizontal axis and grades on
  • 00:08:52
    the Y AIS the vertical axis remember we
  • 00:08:55
    said that X is always the independent
  • 00:08:58
    variable and that's what what we use to
  • 00:09:00
    predict y are dependent variable in this
  • 00:09:03
    example we will be using number of hours
  • 00:09:05
    studied to predict the grade so X would
  • 00:09:08
    be the independent variable number of
  • 00:09:10
    hours studied and Y would be the
  • 00:09:12
    dependent variable grade okay so let's
  • 00:09:16
    put some tick marks here and let's put
  • 00:09:18
    the numbers so that we can begin
  • 00:09:20
    plotting the data let's begin with the
  • 00:09:22
    first set of X and Y coordinates the
  • 00:09:25
    first X is 2 and its corresponding Y is
  • 00:09:28
    a grade of 6 9 so plotting the
  • 00:09:30
    coordinates of 2 and 69 we would put a
  • 00:09:33
    dot for that observation somewhere
  • 00:09:36
    around here now let's move on to the
  • 00:09:39
    next set of X and Y coordinates and they
  • 00:09:41
    are 9 and 98 so that would be around
  • 00:09:45
    here the third set of X and Y
  • 00:09:48
    coordinates is 5 and 82 and that would
  • 00:09:51
    be around here and if we do this for the
  • 00:09:54
    remaining x's and y's we get a scattered
  • 00:09:56
    diagram that looks like this if I had
  • 00:09:59
    graph paper it would be a little more
  • 00:10:01
    accurate but this is the best I can do
  • 00:10:03
    by eyeballing it we can see that the
  • 00:10:05
    plot seems to be showing a positive
  • 00:10:08
    relationship between X and Y that is as
  • 00:10:11
    X the number of hours increases so does
  • 00:10:14
    y the grade we can also see that a
  • 00:10:17
    straight line would probably fit the
  • 00:10:19
    data somewhere around here so we can
  • 00:10:23
    create a linear model to describe this
  • 00:10:25
    relationship between X and Y by finding
  • 00:10:28
    the slope and the Y intercept that
  • 00:10:30
    defines the line that fits this data the
  • 00:10:34
    best to find the line that fits this
  • 00:10:37
    data the best we will use our sample
  • 00:10:39
    data to help Define the line of
  • 00:10:41
    regression which is this line the line
  • 00:10:44
    that fits the data the best that line
  • 00:10:47
    will be the Y hat line where y hat is
  • 00:10:50
    the predicted value of y for a given X
  • 00:10:54
    so for our example it would be the
  • 00:10:56
    predicted grade on an exam for a given X
  • 00:10:59
    number of hours
  • 00:11:00
    studied and B KN would be the Y
  • 00:11:03
    intercept of the line so it would be the
  • 00:11:05
    value of y when X is zero if x is zero
  • 00:11:10
    that would mean the number of hours
  • 00:11:11
    studied is zero and the Y intercept B
  • 00:11:14
    not would be the grade for zero number
  • 00:11:17
    of hours studied B1 would be the slope
  • 00:11:20
    of the line and it tells us whether
  • 00:11:23
    there is an increasing or decreasing
  • 00:11:25
    relationship between X and Y and how
  • 00:11:27
    steep the line is for this example we
  • 00:11:30
    will expect to find a positive slope
  • 00:11:33
    because it is a positive relationship
  • 00:11:35
    between X and Y another way to put it is
  • 00:11:38
    that the slope tells us how much y
  • 00:11:40
    increases for every one unit increase in
  • 00:11:44
    X so for every one unit increase in X
  • 00:11:47
    the slope tells us how much y would
  • 00:11:49
    increase by the last letter in the
  • 00:11:52
    equation is X and that is the number of
  • 00:11:54
    hours studied so we will multiply x * B1
  • 00:11:58
    and then add add be not to get y hat
  • 00:12:02
    okay so let's try to get that line using
  • 00:12:04
    the Le squares method what we will do is
  • 00:12:07
    find the line that fits the data the
  • 00:12:09
    best that line the Y hat regression line
  • 00:12:12
    fits the data the best when the distance
  • 00:12:15
    of each of the data points is at its
  • 00:12:17
    minimum distance from the line in other
  • 00:12:20
    words you want to minimize the distance
  • 00:12:22
    of each Yi from each corresponding y hat
  • 00:12:26
    that is what is meant by the formula in
  • 00:12:28
    the red box
  • 00:12:30
    Min means to minimize and then you can
  • 00:12:33
    see that we are taking each Yi those are
  • 00:12:35
    our observed grades and we have 10 of
  • 00:12:37
    those for this example so we want to
  • 00:12:39
    minimize each of those observed Yi
  • 00:12:42
    values from the line of
  • 00:12:45
    regression that would be the line that
  • 00:12:47
    fits the data the best you can see how I
  • 00:12:50
    drew a straight line through the plotted
  • 00:12:52
    coordinates those coordinates are from
  • 00:12:54
    the sample data and then I used my eyes
  • 00:12:57
    to try to approximate where a line would
  • 00:12:59
    run through those points so that it is
  • 00:13:02
    minimizing the differences between each
  • 00:13:04
    of those points to the line both above
  • 00:13:06
    it and below it I eyeballed it here when
  • 00:13:10
    I drew it but we need to use a formula
  • 00:13:12
    to find the exact line by finding two
  • 00:13:15
    values a slope and a y intercept take a
  • 00:13:18
    look at the formula in the red box
  • 00:13:21
    again Yi represents the observed value
  • 00:13:24
    of y for the I
  • 00:13:26
    observation and Y hat would be the
  • 00:13:29
    predicted value of y for that same I
  • 00:13:32
    observation so let's say for example we
  • 00:13:35
    take an x value of three that is the
  • 00:13:37
    number of hours studied is three on the
  • 00:13:40
    graph we would look at three for the x
  • 00:13:42
    value and then look up to the line of
  • 00:13:45
    regression and over to where that point
  • 00:13:48
    is on the Y AIS and we would get a
  • 00:13:50
    predicted y value y hat for that
  • 00:13:53
    observation would be 69 once we get an
  • 00:13:57
    exact equation for the regression line
  • 00:13:59
    we will be able to predict that value
  • 00:14:01
    more exactly but it is approximately
  • 00:14:05
    69 now if you look back at the original
  • 00:14:07
    sample data from a few slides back you
  • 00:14:10
    will see that there was an observation
  • 00:14:11
    of xal 3 that is 3 hours studied and
  • 00:14:15
    that observation had a corresponding
  • 00:14:18
    observed y value of
  • 00:14:21
    71 so that
  • 00:14:23
    doyi represents the actual observed
  • 00:14:26
    value of 71 from the sample Zeta and the
  • 00:14:30
    Y hat value represents the predicted
  • 00:14:32
    value we would get from the line of
  • 00:14:34
    regression our task is to define a
  • 00:14:37
    straight line that minimizes the
  • 00:14:40
    differences or deviations from each of
  • 00:14:42
    those dots to the line and that is what
  • 00:14:44
    the formula in the red box is saying
  • 00:14:47
    take each Yi from each y hat and
  • 00:14:51
    minimize that squared difference now
  • 00:14:54
    that we understand what the best fitting
  • 00:14:56
    line to the data would be we need to
  • 00:14:58
    calc calculate its slope and its Y
  • 00:15:01
    intercept let's first start with the
  • 00:15:03
    slope here you can see the formula and
  • 00:15:06
    we can Define X subscript I as the value
  • 00:15:09
    of x for observation I in our example we
  • 00:15:13
    have 10 observations for X the number of
  • 00:15:15
    hours studied so we would have X
  • 00:15:17
    subscript 1 x subscript 2 x subscript 3
  • 00:15:21
    and so on until X subscript 10 so those
  • 00:15:24
    are our X subscript I likewise y
  • 00:15:27
    subscript I would be the value of y for
  • 00:15:30
    the I observation and we would have 10
  • 00:15:33
    of those for each of the corresponding y
  • 00:15:36
    subis xbar would be the average of the
  • 00:15:39
    X's so we would add up all of the x's
  • 00:15:41
    and divide by 10 in this case and Y Bar
  • 00:15:45
    is the average of the Y's so we would
  • 00:15:47
    add up all the grades and divide by how
  • 00:15:49
    many there are to get Y Bar once we plug
  • 00:15:52
    in all of these numbers and calculate
  • 00:15:54
    the slope then we can calculate the Y
  • 00:15:56
    intercept B not by using this formula
  • 00:16:00
    notice that in this formula we have B1
  • 00:16:02
    the slope so we need to First calculate
  • 00:16:05
    the slope before we can calculate the Y
  • 00:16:07
    intercept so let's take a look at how we
  • 00:16:10
    would do these
  • 00:16:11
    calculations here is the sample data
  • 00:16:13
    with the 10 observations of X and Y
  • 00:16:16
    number of hours studied and grade on
  • 00:16:18
    exam so we have two columns here one for
  • 00:16:21
    X subscript I and one for all the Y
  • 00:16:24
    subscript eyes now let's make some more
  • 00:16:27
    columns with all the numbers we will
  • 00:16:29
    need to calculate the slope let's start
  • 00:16:32
    with the third column where we will have
  • 00:16:34
    all the x sub i - x bars and if you look
  • 00:16:38
    back at the formula for the slope that
  • 00:16:39
    is the first component in the numerator
  • 00:16:42
    of the slope take a look at the
  • 00:16:43
    formula then we need a column for yi
  • 00:16:47
    minus y and that's the next component in
  • 00:16:50
    the numerator for the slope and then if
  • 00:16:53
    you take a look at that numerator you'll
  • 00:16:54
    see that we multiply those so the next
  • 00:16:57
    column will be the product of the third
  • 00:16:59
    and fourth columns and that will help us
  • 00:17:02
    to complete the calculations for the
  • 00:17:04
    numerator and finally we need the last
  • 00:17:06
    column which is the third column squared
  • 00:17:09
    and that will give us the denominator of
  • 00:17:11
    the slope formula before we fill in all
  • 00:17:14
    of these columns with our calculations
  • 00:17:16
    we first need to get xbar and Y Bar so
  • 00:17:20
    let's add up all of the x's and we get
  • 00:17:24
    48 and so xar would be 48 / 10 and we
  • 00:17:29
    get
  • 00:17:30
    4.8 now let's add up all the Y's and we
  • 00:17:33
    get
  • 00:17:34
    778 and then to get Y Bar we divide
  • 00:17:38
    778 by 10 and we get
  • 00:17:41
    77.8 so now that we have xar and Y Bar
  • 00:17:45
    we are ready to get all of the other
  • 00:17:48
    numbers now we are ready to calculate
  • 00:17:50
    all the numbers in the third column and
  • 00:17:53
    we get these
  • 00:17:54
    numbers make sure you understand where
  • 00:17:57
    all these numbers come from the first
  • 00:17:59
    number for example is -2.8 and how do we
  • 00:18:03
    get that we get that by subtracting the
  • 00:18:05
    x value 2 from xar and xar is 4.8 so 2 -
  • 00:18:11
    4.8 gives us -2.8 look at the column
  • 00:18:15
    header and you can see the column header
  • 00:18:17
    says xub I minus xar so that's what we
  • 00:18:21
    just did xub I is X1 and that is a two
  • 00:18:25
    and xar is
  • 00:18:27
    4.8 for the next number we get 4.2 by
  • 00:18:30
    subtracting the next xabi from xar so we
  • 00:18:33
    get 9 - 4.8 and that gives us 4.2 and so
  • 00:18:38
    on down the whole column for the next
  • 00:18:41
    column we get these numbers again make
  • 00:18:44
    sure you understand where all these
  • 00:18:46
    numbers are coming from the header of
  • 00:18:48
    this column is Yi minus y bar so we take
  • 00:18:52
    each Yi and subtract from Y Bar the
  • 00:18:55
    first Yi value is 69 - Y Bar is
  • 00:19:00
    77.8 so we take 69 -
  • 00:19:03
    77.8 and we get 8.8 and so on down for
  • 00:19:08
    this whole column of numbers now that we
  • 00:19:11
    have columns three and four we're ready
  • 00:19:13
    to get column five and that will be each
  • 00:19:15
    number in column 3 times each number in
  • 00:19:18
    column 4 and here are the numbers we
  • 00:19:21
    would get let's take a look at the first
  • 00:19:23
    number the first number is 24.6 4 and we
  • 00:19:27
    get that by taking 2.8 * 8.8 and that
  • 00:19:32
    gives us 24.6 4 and so on down the whole
  • 00:19:36
    column and finally for the last column
  • 00:19:39
    we take the third column and square each
  • 00:19:42
    value to get this column of numbers so
  • 00:19:45
    -2.8 SAR gives us
  • 00:19:48
    7.84 and so on and now we're almost
  • 00:19:51
    ready to get the slope we first need to
  • 00:19:54
    get the sum of this column and this is
  • 00:19:56
    320.50
  • 00:19:59
    if you look at the formula for the slope
  • 00:20:01
    the numerator is the sum of these
  • 00:20:03
    numbers you can see the capital Sigma
  • 00:20:06
    sign that tells us to sum our next step
  • 00:20:09
    is to sum the last column and we get
  • 00:20:12
    67.6 and this would give us the sum for
  • 00:20:15
    the denominator of the slope formula
  • 00:20:17
    here are those two very important
  • 00:20:20
    columns and the slope formula you can
  • 00:20:23
    see that the numerator in the slope
  • 00:20:24
    formula is this so the sum of this
  • 00:20:27
    column would go in the numerator
  • 00:20:29
    and the denominator of the slope formula
  • 00:20:32
    is the sum of this
  • 00:20:34
    column putting this all together we get
  • 00:20:37
    the slope equals
  • 00:20:40
    320.50 and we get that from here and
  • 00:20:42
    then we divide that by
  • 00:20:45
    67.6 which we get from the sum of this
  • 00:20:48
    column and now we divide and we get
  • 00:20:52
    4.74 so now we're ready to calculate the
  • 00:20:55
    Y intercept B not we will need xar and Y
  • 00:21:00
    Bar to calculate B not remember we
  • 00:21:02
    already calculated xar as 4.8 and we
  • 00:21:06
    found Y Bar was 77.8 on a previous slide
  • 00:21:10
    so now we have all the numbers we need
  • 00:21:12
    to calculate be not the Y intercept and
  • 00:21:15
    we get bot is equal to
  • 00:21:18
    77.8 +
  • 00:21:20
    4.74 * 4.8 77.8 is 77.8 1 2 3 77.8 is Y
  • 00:21:30
    Bar and 4.8 is xar and where did 4.74
  • 00:21:35
    come from that comes from here the slope
  • 00:21:38
    and we get a y intercept of
  • 00:21:44
    55.4 so now we have our slope and our Y
  • 00:21:48
    intercept and now we can Define our y
  • 00:21:50
    hat line substituting B KN and B1 in
  • 00:21:54
    this line we get our y hat line
  • 00:21:58
    y hat is equal to 55.0 48 +
  • 00:22:03
    4.74 * X now we can use this y hat line
  • 00:22:08
    also known as our line of regression to
  • 00:22:10
    predict any y for a given x value we
  • 00:22:13
    would just plug in the x value and out
  • 00:22:16
    comes y hat the predicted
  • 00:22:18
    yvalue let's start on a fresh page here
  • 00:22:21
    where we have our estimated regression
  • 00:22:23
    line that we just calculated now what
  • 00:22:26
    we're going to do is use this line of
  • 00:22:28
    regression expression to predict y for a
  • 00:22:30
    given x value so suppose the number of
  • 00:22:33
    hours studied is three that is our given
  • 00:22:36
    x value remember when we eyeballed it we
  • 00:22:39
    said it was around
  • 00:22:40
    69 let's see what it would be using this
  • 00:22:44
    y hat line what would be the predicted
  • 00:22:46
    grade on the exam so what we're asking
  • 00:22:49
    is when X is three what is the predicted
  • 00:22:53
    value of y to answer this question we
  • 00:22:56
    use our line of regression our y hat
  • 00:22:59
    line and substitute in the number three
  • 00:23:02
    for the letter X as you see here so we
  • 00:23:05
    have y hat is equal to
  • 00:23:08
    55.4 + 4.74 * 3 instead of time x and
  • 00:23:14
    this would give us our y hat our
  • 00:23:16
    predicted value of y for a given X and
  • 00:23:19
    that is
  • 00:23:20
    69.2 68 so if a student studies 3 hours
  • 00:23:26
    we would predict the grade to be 69
  • 00:23:29
    268 how good a prediction is this well
  • 00:23:33
    that depends on how good a fit the
  • 00:23:35
    regression line is to the data anyone
  • 00:23:38
    can draw a straight line through any
  • 00:23:39
    data points and Define it mathematically
  • 00:23:42
    with a slope and a y intercept but that
  • 00:23:44
    doesn't mean it's a good fitting model
  • 00:23:47
    even if there is no relationship between
  • 00:23:49
    X and Y we could still mathematically
  • 00:23:51
    Define a straight line that fits the
  • 00:23:53
    data the best but it would not be a good
  • 00:23:55
    fit so we need a measurement that tells
  • 00:23:58
    tell us how well the regression line
  • 00:24:00
    fits the data one such measurement is
  • 00:24:03
    called the coefficient of determination
  • 00:24:06
    and it tells us how good a fit the
  • 00:24:08
    regression line is to our data the
  • 00:24:10
    formula for the coefficient of
  • 00:24:12
    determination is shown here in the red
  • 00:24:14
    box R 2 is the coefficient of
  • 00:24:17
    determination and to calculate it we
  • 00:24:19
    take something called SSR and divided by
  • 00:24:23
    SST where SSR is defined as the sum of
  • 00:24:27
    the squares du due to regression the way
  • 00:24:30
    we calculate SSR is to take the sum of
  • 00:24:33
    the squared deviations of each predicted
  • 00:24:36
    value of y That's each y hat and
  • 00:24:39
    subtract Y Bar the average y so it
  • 00:24:43
    measures the difference between the
  • 00:24:45
    predicted values and the
  • 00:24:47
    average the denominator is SST and that
  • 00:24:51
    is defined as the sum of the squares for
  • 00:24:53
    the total deviation and we find that
  • 00:24:55
    value by taking the sum of the squared
  • 00:24:58
    different of each Yi that's each actual
  • 00:25:01
    observation from Y Bar the average and
  • 00:25:05
    finally we have something called SSE
  • 00:25:08
    which is defined as the sum of the
  • 00:25:10
    squares for the error and that is
  • 00:25:12
    calculated by taking the sum of the
  • 00:25:14
    square differences of each Yi from each
  • 00:25:18
    y hat each predicted value it is
  • 00:25:21
    important to know that SST is equal to
  • 00:25:24
    the sum of SSR and ssse so if we add SSR
  • 00:25:29
    plus ssse we would get SST this will
  • 00:25:32
    help us to make the calculation simpler
  • 00:25:35
    since if we know any two of these
  • 00:25:37
    numbers we can get the third number by
  • 00:25:39
    either adding or subtracting as you will
  • 00:25:41
    see in a few
  • 00:25:42
    moments let's go back to our original
  • 00:25:45
    data here we have the data in two
  • 00:25:47
    columns labeled X and Y now let's make a
  • 00:25:51
    third column for the predicted values of
  • 00:25:53
    y y hat for all these given X's so now
  • 00:25:57
    we have to take each value of x plug it
  • 00:25:59
    in the regression line and get this
  • 00:26:02
    column of numbers make sure you
  • 00:26:04
    understand where all these numbers are
  • 00:26:06
    coming from these are all the Y hat
  • 00:26:08
    values for each I observation take the
  • 00:26:11
    first number 64. 528 how did we get that
  • 00:26:16
    well you take the x value 2 and plug it
  • 00:26:19
    in the Y hat line so 55.0 48 + 4.74 * 2
  • 00:26:27
    right the two comes from the first x
  • 00:26:28
    value and if you plug that in the Y hat
  • 00:26:31
    line you get
  • 00:26:45
    64.52%
  • 00:26:47
    708 and so on until we have the entire
  • 00:26:50
    column of predicted y values so just to
  • 00:26:53
    be clear this column of numbers has the
  • 00:26:55
    predicted values of Y for each of the
  • 00:26:57
    given X values we have 10 x values and
  • 00:27:01
    so we have 10 y hat values the next
  • 00:27:05
    column will be for the error and that is
  • 00:27:07
    each Yi minus each y hat and we get this
  • 00:27:12
    column of numbers take a look at the
  • 00:27:14
    first number 4.47214
  • 00:27:28
    4. 528 so
  • 00:27:31
    69us
  • 00:27:33
    6452 gives us
  • 00:27:37
    4472 now the next column is the squared
  • 00:27:40
    error so it's the previous column
  • 00:27:42
    squared and we get all these numbers
  • 00:27:44
    just by squaring the previous column so
  • 00:27:47
    19.99 A8 is 4.47214
  • 00:27:58
    the average remember Y Bar was
  • 00:28:01
    77.8 so we take each y and subtract
  • 00:28:06
    77.8 to get this column of numbers so
  • 00:28:10
    for example the first y value is 69 and
  • 00:28:13
    Y Bar is
  • 00:28:15
    77.8 so 69 -
  • 00:28:18
    77.8 is
  • 00:28:20
    8.8 and we do that for each of the 10
  • 00:28:23
    numbers in this column and finally the
  • 00:28:26
    last column is the square Square
  • 00:28:28
    deviations so it is the previous column
  • 00:28:31
    squared so -8.8 squared would be
  • 00:28:36
    7744 and so on now to get ssse in order
  • 00:28:41
    to get ssse we need to sum up the
  • 00:28:44
    squared error column of these numbers
  • 00:28:47
    and we get 79.1 1215 for
  • 00:28:53
    ssse next we want to get SST next we
  • 00:28:57
    want to get SST so we need to sum up the
  • 00:29:00
    squared deviations column and we get
  • 00:29:05
    15996 for
  • 00:29:07
    SST so let's review we have SSE equal to
  • 00:29:11
    79.1
  • 00:29:13
    1215 and we have SST equal to
  • 00:29:17
    15996 now to get the coefficient of
  • 00:29:19
    determination we need to divide SSR by
  • 00:29:23
    SST but we didn't calculate SSR we
  • 00:29:27
    calculated SS e and SST remember that
  • 00:29:30
    SST is equal to SSR plus SS so we get
  • 00:29:35
    SSR by subtracting SST minus ssse so SSR
  • 00:29:41
    would be
  • 00:29:43
    15996 - 79.1
  • 00:29:46
    1215 which is 1520
  • 00:29:51
    4785 now we can go back to the formula
  • 00:29:53
    for R 2 and calculate it by dividing SSR
  • 00:29:57
    by SS St and we get
  • 00:30:00
    15204 785 that's SSR divided by
  • 00:30:05
    15996 SST and that gives us an R 2 value
  • 00:30:09
    of
  • 00:30:12
    9505 so our coefficient of determination
  • 00:30:15
    is
  • 00:30:18
    955 the coefficient of determination R
  • 00:30:21
    2ar measures the percent of variability
  • 00:30:24
    in y that can be explained by the X
  • 00:30:27
    variable
  • 00:30:28
    in this case Y is grades and X is the
  • 00:30:31
    number of hours studied so what we
  • 00:30:33
    measured shows the percent of
  • 00:30:35
    variability in grades that is explained
  • 00:30:37
    by the number of hours studied since R 2
  • 00:30:41
    is
  • 00:30:42
    9505 we can say that 95.0 5% of the
  • 00:30:48
    variability in grades can be explained
  • 00:30:51
    by the number of hours studied one more
  • 00:30:54
    measure of how well the line fits the
  • 00:30:56
    data needs to be discussed and that is
  • 00:30:58
    the correlation coefficient this
  • 00:31:01
    measures the strength of association
  • 00:31:03
    between X and Y the correlation
  • 00:31:05
    coefficient is called R and its values
  • 00:31:08
    are from -1 to positive 1 a value of
  • 00:31:13
    positive 1 means that there is a perfect
  • 00:31:16
    positive linear relationship between X
  • 00:31:18
    and Y so that means that all the data
  • 00:31:21
    points from the sample lie exactly on
  • 00:31:23
    the line of regression with no deviation
  • 00:31:26
    and that the line slopes
  • 00:31:29
    upward an R value of -1 means a perfect
  • 00:31:33
    negative linear relationship between X
  • 00:31:35
    and Y in this case all the data points
  • 00:31:38
    lie exactly on the line of regression
  • 00:31:40
    but the line is sloping downward R can
  • 00:31:43
    take on any value between and including
  • 00:31:47
    Nega 1 and two and including positive
  • 00:31:51
    one if R is zero then that means there
  • 00:31:54
    is no relationship between X and Y to
  • 00:31:57
    calculate R we simply take the square
  • 00:32:00
    root of the coefficient of determination
  • 00:32:02
    and use the sign of the slope we
  • 00:32:05
    calculated the r here has a subscript of
  • 00:32:08
    X and Y and it just tells us that R the
  • 00:32:11
    correlation coefficient is for the
  • 00:32:13
    values of X and Y sometimes we just
  • 00:32:15
    leave the X and Y out and say
  • 00:32:18
    R so to calculate R we take the sign of
  • 00:32:22
    the slope B1 and multiply the square
  • 00:32:24
    root of R 2 so for our example R 2 was
  • 00:32:31
    9505 now if we just take the square root
  • 00:32:34
    of that number we don't know if R should
  • 00:32:36
    be negative or positive since squared
  • 00:32:39
    numbers always lose their sign so in
  • 00:32:41
    order to know whether it is a positive
  • 00:32:43
    or A negative number we have to look at
  • 00:32:45
    the slope is it positive or
  • 00:32:48
    negative and then we use the sign for
  • 00:32:50
    our slope in our example of grades and
  • 00:32:53
    numbers of hours studied the slope was a
  • 00:32:56
    positive 4.7 4 so we use that positive
  • 00:33:00
    sign and we get R is equal to the
  • 00:33:02
    positive square < TK of
  • 00:33:05
    9505 and that is
  • 00:33:09
    9749 now remember we said that a plus
  • 00:33:11
    one would be perfect positive linear
  • 00:33:14
    relationship which is very rare so
  • 00:33:16
    positive 9749 would indicate a very
  • 00:33:20
    strong positive linear relationship
  • 00:33:22
    between X and Y so let's review what
  • 00:33:26
    we've just done and try to understand
  • 00:33:28
    the bigger picture first we calculated
  • 00:33:31
    the regression line using the least
  • 00:33:33
    squares method we calculated the slope
  • 00:33:35
    B1 and the Y intercept B not and came up
  • 00:33:39
    with this line to fit the grade data
  • 00:33:41
    that we had when we plot this y hat line
  • 00:33:44
    on a scatter diagram we find it falls
  • 00:33:47
    around here close to the data points you
  • 00:33:50
    can see how this line fits the data very
  • 00:33:53
    nicely and we saw that when we
  • 00:33:55
    calculated R and R squ this line is a
  • 00:33:58
    very good fit to the data some of the
  • 00:34:00
    data points are exactly on the line some
  • 00:34:02
    are above and some are below let's take
  • 00:34:05
    a look at the average grade Y Bar for
  • 00:34:08
    this data remember we calculated Y Bar
  • 00:34:11
    by summing up all 10 grades from the
  • 00:34:13
    data set and dividing by 10 and we got
  • 00:34:16
    an average Y Bar of
  • 00:34:19
    77.8 so let's plot this line on the
  • 00:34:22
    scatter diagram and it would be around
  • 00:34:25
    here the average grade for the class why
  • 00:34:27
    Y Bar is
  • 00:34:28
    77.8 so now we have a line for Y Bar and
  • 00:34:32
    we have a line for y hat notice that the
  • 00:34:35
    10 data points are closer to the Y hat
  • 00:34:38
    line than the Y Bar line when we
  • 00:34:41
    calculated R squar we measured how well
  • 00:34:44
    the line fit the data by calculating SSR
  • 00:34:48
    and SST R SAR is the proportion of SSR
  • 00:34:52
    to SST so what exactly is SSR and SST
  • 00:34:57
    let's start with SST which stands for
  • 00:34:59
    the total sums of squares SST measures
  • 00:35:03
    how well the observations cluster around
  • 00:35:06
    the Y Bar line you can see from the
  • 00:35:08
    formula SST is Yi minus y bar^ squar so
  • 00:35:13
    it is taking every y observation and
  • 00:35:16
    measuring its deviation from the mean Y
  • 00:35:19
    Bar let's say for example we have an
  • 00:35:22
    observation Point here a student
  • 00:35:24
    studying seven hours with a grade of 100
  • 00:35:27
    so for this data point x is 7 and Y is
  • 00:35:31
    100 now the difference between this
  • 00:35:34
    grade of 100 and the class average of
  • 00:35:37
    77.8 is Yi minus y bar and this distance
  • 00:35:42
    is here this deviation is called the
  • 00:35:46
    total deviation that is what SST
  • 00:35:49
    measures now the predicted grade y hat
  • 00:35:52
    for 7 hours of study xal 7 is given by
  • 00:35:56
    the line of progression and that would
  • 00:35:59
    be 882287705
  • 00:36:28
    SSR is the explained variation which is
  • 00:36:30
    the deviation between the predicted
  • 00:36:33
    value of y and the average value of y it
  • 00:36:36
    is explained by the line of regression
  • 00:36:39
    in other words the class average is a
  • 00:36:42
    77.8 so that's the expected grade
  • 00:36:44
    without any additional information but
  • 00:36:47
    if we use number of hour study to get a
  • 00:36:49
    better prediction then we get a y hat
  • 00:36:51
    value of 88.2 to8 so the difference
  • 00:36:55
    between the class average and the
  • 00:36:56
    predicted GR
  • 00:36:58
    is what is explained by the line of
  • 00:37:00
    regression and that is why it's called
  • 00:37:02
    explained variation explained by what
  • 00:37:05
    explained by the number of hours studied
  • 00:37:07
    our X variable but take a look at the
  • 00:37:09
    graph we have an observation point at
  • 00:37:12
    100 a student who studied 7 hours got
  • 00:37:15
    100 and not the predicted grade of 88.2
  • 00:37:18
    to8 that deviation between the actual
  • 00:37:21
    grade and the predicted grade is called
  • 00:37:24
    error SS the sum of the squares for the
  • 00:37:27
    error is shown by this formula the
  • 00:37:30
    difference between each Yi observation
  • 00:37:33
    and the Y hat or predicted value of y
  • 00:37:37
    you see we have Yi minus y hat and on
  • 00:37:39
    the scatter diagram where would that be
  • 00:37:42
    we'll take a look at Yi in the Y hat
  • 00:37:44
    line and that is here the deviation
  • 00:37:47
    between the actual value of y and the
  • 00:37:50
    predicted value of y and that is called
  • 00:37:53
    unexplained variation it is the
  • 00:37:55
    variation of Y that is not explained by
  • 00:37:58
    the line of regression and that's what
  • 00:38:00
    SS is so going back to our equation for
  • 00:38:03
    R 2 R 2 is SSR ided by SST and you can
  • 00:38:08
    see now that that means r s is explained
  • 00:38:12
    variation SSR divided by total variation
  • 00:38:16
    SST so when we calculated our r s value
  • 00:38:19
    we got
  • 00:38:21
    955 and we said that that meant that 95%
  • 00:38:25
    of the variability of grades can be
  • 00:38:28
    explained by the number of hours studied
  • 00:38:30
    what that means here is that the line of
  • 00:38:32
    regression the Y hat line explains 95%
  • 00:38:36
    of the variation in grades from the mean
  • 00:38:39
    but around 5% of the variation is
  • 00:38:41
    unexplained by the line of regression
  • 00:38:44
    and that would be SS e now we are ready
  • 00:38:46
    to test the significance of this
  • 00:38:48
    relationship in a different way by
  • 00:38:50
    looking at the slope of the line
  • 00:38:53
    remember when we talked about the slope
  • 00:38:54
    earlier we said a slope of zero means
  • 00:38:57
    there is no relationship between X and Y
  • 00:39:00
    the equation for a simple linear
  • 00:39:01
    regression line is y is equal to Beta KN
  • 00:39:05
    plus beta 1 * x + Epsilon the Epsilon
  • 00:39:09
    term we're not going to deal with
  • 00:39:11
    because that is the random error if the
  • 00:39:13
    slope is zero then y will be beta KN no
  • 00:39:17
    matter what value X is which means that
  • 00:39:21
    the value of y does not depend on X so
  • 00:39:24
    there is no linear relationship between
  • 00:39:26
    X and Y y when the slope beta 1 is
  • 00:39:30
    zero we use this understanding of the
  • 00:39:32
    slope to conduct a hypothesis test to
  • 00:39:35
    see if there is a linear relationship as
  • 00:39:38
    follows we would State our null
  • 00:39:41
    hypothesis H knot that the slope is
  • 00:39:43
    equal to zero and the alternative ha to
  • 00:39:46
    see if we find evidence that the slope
  • 00:39:48
    is not equal to zero if we find evidence
  • 00:39:51
    to support the alternative hypothesis
  • 00:39:54
    that the slope is not equal to zero we
  • 00:39:56
    can conclude that there is a linear
  • 00:39:59
    relationship between X and Y since we do
  • 00:40:01
    not know the value of Sigma for this
  • 00:40:03
    distribution we will be using a t test
  • 00:40:06
    and the test statistic would be B1 over
  • 00:40:09
    sb1 where sb1 is the standard error for
  • 00:40:12
    the slope to calculate the standard
  • 00:40:15
    error for the slope sb1 we use this
  • 00:40:18
    formula we need to First find S the
  • 00:40:21
    standard deviation for this distribution
  • 00:40:23
    and then divid it by the square root of
  • 00:40:25
    the sum of the X the - xar SAR and so s
  • 00:40:31
    is the square root of ss / nus 2 from a
  • 00:40:37
    previous calculation we see that we
  • 00:40:39
    found ssse to be 79.1 1215 here now to
  • 00:40:45
    get S we take the square root of ss e /
  • 00:40:49
    nus 2 and that is
  • 00:40:53
    79.1
  • 00:40:55
    1215 divided
  • 00:40:58
    10 - 2 which is
  • 00:41:02
    31449 so now we have S we are ready to
  • 00:41:06
    get sb1 the standard error for the slope
  • 00:41:09
    and that is the s that we just
  • 00:41:10
    calculated over the square root of the
  • 00:41:12
    sum of the x sub eyes - xar 2ar so s sp1
  • 00:41:18
    is
  • 00:41:20
    3114 / the square < TK of
  • 00:41:25
    67.6 if you remember back when we
  • 00:41:27
    calculated the slope we created a column
  • 00:41:30
    of each X from the mean squared and
  • 00:41:33
    added it up and we got
  • 00:41:35
    67.6 over here so this is the same
  • 00:41:38
    number we're using again okay back to
  • 00:41:41
    our calculations we get sb1 as
  • 00:41:47
    3825 now we can finally calculate our
  • 00:41:50
    test statistic which is B1 over sb1 so
  • 00:41:54
    it is the slope that we calculated
  • 00:41:56
    earlier remember was
  • 00:41:58
    4.74 so we have
  • 00:42:00
    4.74 over
  • 00:42:03
    3825 and that gives us a test statistic
  • 00:42:06
    of
  • 00:42:07
    12.39
  • 00:42:09
    21 all right let's look at what we have
  • 00:42:11
    so far we are testing to see if we have
  • 00:42:13
    enough evidence to support the
  • 00:42:15
    alternative hypothesis that the slope is
  • 00:42:18
    not equal to zero if we find this
  • 00:42:20
    evidence we will conclude that there is
  • 00:42:22
    a linear relationship between X and Y we
  • 00:42:26
    calculated our test statistic to be 12.
  • 00:42:30
    3921 so now we're ready to use either
  • 00:42:32
    the critical value approach or the P
  • 00:42:35
    value approach to solve this problem
  • 00:42:38
    let's begin with the critical value
  • 00:42:39
    approach and let's use an alpha value of
  • 00:42:43
    001 now since this is a two-tail test we
  • 00:42:46
    split Alpha and half so we look up Alpha
  • 00:42:49
    divided half
  • 00:42:51
    .5 since this is a T Test we would look
  • 00:42:54
    up our critical value in the T table
  • 00:42:57
    under N - 2° of
  • 00:42:59
    Freedom Looking In the T table under 8°
  • 00:43:03
    of Freedom right n minus 2 is 10 - 2
  • 00:43:07
    which is 8 degrees of freedom and Alpha
  • 00:43:09
    divided half
  • 00:43:11
    .5 we find a critical value of
  • 00:43:16
    3355 so with a critical value of
  • 00:43:19
    3355 and a test statistic of 12.39 21 we
  • 00:43:24
    can see looking at the T distribution
  • 00:43:27
    the critical value splits the
  • 00:43:28
    distribution into rejection regions and
  • 00:43:31
    non-rejection regions and the test
  • 00:43:33
    statistic Falls around
  • 00:43:35
    here in the rejection region we are now
  • 00:43:38
    ready to come to a statistical
  • 00:43:40
    conclusion and that of course would be
  • 00:43:43
    to reject the null there is evidence
  • 00:43:46
    that the slope is not equal to zero
  • 00:43:48
    which means there is a significant
  • 00:43:50
    relationship between grades and number
  • 00:43:53
    of hours studied we can also solve this
  • 00:43:56
    problem using the P value approach to
  • 00:43:58
    use the P value approach we must first
  • 00:44:00
    calculate the test
  • 00:44:02
    statistic and we got 12.
  • 00:44:06
    3921 so we need to look up that number
  • 00:44:08
    in the T table under nus 2 or 8 degrees
  • 00:44:12
    of freedom looking in the T table we
  • 00:44:14
    find under 8 degrees of freedom 12.39
  • 00:44:18
    to1 would be off the chart and therefore
  • 00:44:22
    the exact area under the curve for 12.39
  • 00:44:25
    can't be established from the table but
  • 00:44:28
    we can extrapolate the number to be less
  • 00:44:31
    than
  • 00:44:34
    005 remember for a two-tail test we
  • 00:44:37
    double the value we got in the table so
  • 00:44:41
    0.005 * 2 is equal to
  • 00:44:46
    0.1 the rejection rule is to reject the
  • 00:44:49
    null hypothesis if the P value was less
  • 00:44:52
    than or equal to Alpha since our Alpha
  • 00:44:55
    value for this problem was set at
  • 00:44:57
    01 001 our P value is less than 01 our
  • 00:45:02
    Alpha value and therefore we reject the
  • 00:45:05
    null hypothesis and find evidence that
  • 00:45:07
    the slope is not equal to zero which
  • 00:45:10
    means that grades and number of hours
  • 00:45:12
    studied have a linear
  • 00:45:14
    relationship that concludes this
  • 00:45:16
    tutorial on simple linear regression I
  • 00:45:19
    hope you enjoyed this tutorial and I
  • 00:45:21
    hope you learned something
タグ
  • linear regression
  • dependent variable
  • independent variable
  • slope
  • intercept
  • least squares
  • coefficient of determination
  • correlation coefficient
  • hypothesis testing
  • data analysis