Lecture 18: The Multivariate Model

00:40:59
https://www.youtube.com/watch?v=Y9Awwqqc288

الملخص

TLDRSara Ellison elaborates on the multivariate linear model using matrix notation, discussing the identification assumptions necessary for model estimation and the requirements for error behavior. Key points include the importance of independent regressors, the challenges posed by multicollinearity, and real-world implications illustrated through examples. The framework for hypothesis testing is introduced, explaining how to compare restricted and unrestricted models through their goodness of fit. The overall emphasis is on how violations of assumptions can impact the reliability of regression coefficients and the resulting statistical inferences.

الوجبات الجاهزة

  • 📊 Multivariate linear models use matrix notation for clarity.
  • 📝 Identification assumptions require more observations than regressors.
  • 🔍 Linear independence of regressors is essential for estimating the model.
  • 🚫 Perfect multicollinearity prevents separate identification of parameter effects.
  • ⚠️ High variance in coefficient estimates indicates multicollinearity issues.
  • 🔗 Use correlation coefficients to detect linear relationships between regressors.
  • 📉 Restricted models help assess the impact of hypotheses on fitting the data.
  • 📊 The F distribution is crucial for determining statistical significance in tests.
  • 🧮 Coefficients in matrix notation allow for elegant derivations in regression.
  • 🔁 Hypothesis tests can capture various forms of relationships between parameters.

الجدول الزمني

  • 00:00:00 - 00:05:00

    In the introduction, Sara Ellison outlines a more general linear model using matrix notation, describing the representation of observations, errors, and explanatory variables. The multivariate linear model is summarized by the equation y = xβ + ε, where y is the dependent variable, x consists of explanatory variables and a column of ones, β represents coefficients, and ε denotes error terms.

  • 00:05:00 - 00:10:00

    Ellison presents the assumptions necessary for identification in the multivariate linear model. These include having more observations than explanatory variables and ensuring that the matrix of explanatory variables (X) is of full rank, meaning regressors must be linearly independent to avoid perfect multicollinearity. This is crucial for estimating the model accurately.

  • 00:10:00 - 00:15:00

    The necessity for positive sample variation in regressors is emphasized. If a regressor does not vary, it cannot help estimate the relationship between the dependent variable and the regressors. Perfect linear relationships between regressors prevent the separate identification of their effects, highlighting the importance of avoiding multicollinearity.

  • 00:15:00 - 00:20:00

    Two examples are given to illustrate the concept of perfect multicollinearity. The first discusses estimating the impact of schooling, experience, and age on salary, where these variables may perfectly correlate in certain datasets, preventing accurate estimation. The second uses dummy variables for mutually exclusive categories (e.g., pet ownership) that cannot all be included in the regression simultaneously, as this leads to collinearity issues.

  • 00:20:00 - 00:25:00

    Ellison delineates the assumptions regarding error behavior in the multivariate linear model, paralleling those in bivariate models. She discusses the expectation of the error term (ε) and its covariance structure, underscoring the requirement for the errors to have a mean of zero and be homoscedastic (constant variance across observations).

  • 00:25:00 - 00:30:00

    Next, the presentation formulates the way to derive β̂ (the estimated coefficients) that minimizes the sum of squared errors, leading to the elegant matrix representation of the least squares estimator. She highlights the unbiased nature of the estimator and the covariance structure, thus revealing insights into the properties of the estimators derived from the multivariate linear model.

  • 00:30:00 - 00:35:00

    Ellison transitions into the topic of hypothesis testing for the estimated coefficients (β), stressing the common interest in understanding their significance within the model. She introduces a flexible framework for testing hypotheses involving linear combinations of parameters while also noting the limitations of one-sided tests under this framework.

  • 00:35:00 - 00:40:59

    The discussion culminates in the intuitive method of hypothesis testing through the comparison of unrestricted and restricted models. By estimating both models and assessing their goodness of fit, researchers can discern the validity of their null hypotheses based on the significance of the restrictions imposed.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

  • What is the multivariate linear model?

    A model that describes the relationship between multiple explanatory variables and a dependent variable using matrix notation.

  • What are identification assumptions?

    Assumptions that ensure the ability to estimate the model, requiring more observations than regressors.

  • What does it mean for regressors to be linearly independent?

    It means no regressor can be expressed as a linear combination of other regressors.

  • What is perfect multicollinearity?

    A situation where regressors are perfectly correlated, making it impossible to estimate their individual effects.

  • How can one identify multicollinearity in data?

    By calculating correlation coefficients of regressors or observing large standard errors in estimated coefficients.

  • What is the sum of squares in hypothesis testing?

    A statistical measure used to compare the goodness of fit between restricted and unrestricted models.

  • How do you run a restricted model?

    By excluding regressors that correspond to the parameters you wish to constrain.

  • What is the F distribution?

    A probability distribution used to determine the significance of the test statistic in hypothesis testing.

  • What must the distribution of beta hat be for inference?

    It is normally distributed under certain assumptions about the error terms.

  • What happens when variables have near perfect multicollinearity?

    It can cause high variance in coefficient estimates, making them unstable.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!
الترجمات
en
التمرير التلقائي:
  • 00:00:00
  • 00:00:12
    SARA ELLISON: OK, so last time right
  • 00:00:14
    at the end of the lecture, I had introduced
  • 00:00:19
    a more general linear model, the multivariate linear model.
  • 00:00:25
    And I had just gone through the first couple of these slides,
  • 00:00:30
    saying let's analyze this model using a different notation,
  • 00:00:36
    in particular matrix notation, because the summation
  • 00:00:41
    notation was just too clunky.
  • 00:00:42
    It wasn't up for the job.
  • 00:00:44
    And so let me just go through quickly.
  • 00:00:48
    Let's see.
  • 00:00:49
    This was, I think, the next to last slide I had up.
  • 00:00:53
    So if we let y be the column vector of all
  • 00:00:59
    of the observations on the dependent variable,
  • 00:01:02
    then let epsilon be the column vector of all of the errors,
  • 00:01:06
    and then let x be the matrix, where
  • 00:01:11
    across the rows of the matrix, we
  • 00:01:13
    have first the column of ones, and then a column of each
  • 00:01:20
    of the explanatory variables.
  • 00:01:23
    And then sort of down the matrix,
  • 00:01:29
    we have observations on each of the--
  • 00:01:33
    well, we have each of the observations.
  • 00:01:35
    Each observation corresponds to a row.
  • 00:01:37
    So if we define this matrix and vectors this way,
  • 00:01:43
    then we can write our multivariate linear model
  • 00:01:48
    in the following very parsimonious fashion.
  • 00:01:51
    y equals x beta plus epsilon.
  • 00:01:56
    OK, so now, I'm basically going to go
  • 00:02:03
    through the same assumptions, but in a slightly more general
  • 00:02:06
    way from when I discussed assumptions
  • 00:02:10
    in the bivariate model.
  • 00:02:11
    The assumptions have to be discussed in a more general way
  • 00:02:15
    now because actually, they're a little bit more complicated
  • 00:02:17
    with the multivariate model.
  • 00:02:22
    So I'm going to condense all of the assumptions
  • 00:02:26
    into two basic categories.
  • 00:02:28
    One is the identification assumptions.
  • 00:02:30
    And two is the assumptions on the error behavior.
  • 00:02:33
    OK, so in the multivariate linear model,
  • 00:02:38
    in order to have identification, in order
  • 00:02:40
    to be able to estimate our model,
  • 00:02:42
    we have to have n greater than k plus 1.
  • 00:02:46
    That just means we have to have more observations than we have
  • 00:02:51
    explanatory variables, plus 1.
  • 00:02:55
    And x has to have full column rank of k plus 1.
  • 00:03:02
    And what does that mean?
  • 00:03:04
    In other words, means that the regressors have
  • 00:03:07
    to be linearly independent.
  • 00:03:09
    Or another way of saying this is that the matrix x prime, or x
  • 00:03:13
    transpose x, is invertible.
  • 00:03:16
    I'm going to go through this in some more
  • 00:03:17
    detail in just a second.
  • 00:03:20
    And then the second main assumption or main category
  • 00:03:23
    of assumptions are on the error behavior.
  • 00:03:25
    And these actually are exactly like the assumptions
  • 00:03:28
    we saw before.
  • 00:03:29
    I'm just using matrix notation to express them.
  • 00:03:33
    So here we have the expectation of epsilon here.
  • 00:03:37
    Epsilon is a vector.
  • 00:03:38
    And that's the vector of zeros.
  • 00:03:41
    The expectation of epsilon, epsilon transpose
  • 00:03:45
    is equal to sigma squared times the n by n identity matrix.
  • 00:03:53
    And this is in fact--
  • 00:03:56
    this matrix here is, in fact, just the matrix
  • 00:04:03
    that we denote covariance of epsilon, which I'll show you
  • 00:04:08
    a picture of it in a second.
  • 00:04:09
    It's just a matrix that has the variances
  • 00:04:12
    of epsilon along the diagonal and the covariances
  • 00:04:15
    on the off diagonal.
  • 00:04:18
    And so what we're saying here is that the diagonal
  • 00:04:21
    is equal to sigma squared.
  • 00:04:25
    And the off diagonals are zeros.
  • 00:04:27
  • 00:04:30
    In a stronger version of this is the epsilon vector
  • 00:04:37
    has this multivariate normal distribution
  • 00:04:40
    with this variance/covariance matrix.
  • 00:04:43
    I'll go into more detail in both of these in just a second.
  • 00:04:50
    So let's do the--
  • 00:04:51
    oh, n by n identity matrix.
  • 00:04:55
    OK, so let's take a closer look at these assumptions.
  • 00:04:58
    So assumption one, the identification assumption,
  • 00:05:02
    what exactly does this mean?
  • 00:05:05
    Well, we need to have more observations than regressors.
  • 00:05:07
    That shouldn't come as a surprise,
  • 00:05:09
    especially if we think about the bivariate model.
  • 00:05:12
    We have one regressor.
  • 00:05:13
    You have to have at least two observations or else
  • 00:05:16
    you can't draw a line.
  • 00:05:19
    So this is just sort of generalizes
  • 00:05:21
    that to higher dimensions.
  • 00:05:24
    We can't have any regressors that do not
  • 00:05:26
    have positive sample variation.
  • 00:05:29
    So we saw this assumption in the bivariate case.
  • 00:05:34
    Remember I had a picture that looked like--
  • 00:05:37
  • 00:05:48
    I had a picture that looked like this.
  • 00:05:50
    And I said if all of our observations
  • 00:05:52
    are on the same value of x, we can't
  • 00:05:57
    identify how the conditional mean of y changes with x.
  • 00:06:06
    Well, again, in the multivariate regression,
  • 00:06:10
    or the multivariate case, we can't
  • 00:06:13
    identify a particular parameter if we
  • 00:06:17
    have a regressor that doesn't have positive sample variation.
  • 00:06:19
    So all of our regressors have to have positive sample variation.
  • 00:06:23
    And then the third one-- and this
  • 00:06:24
    is the one that actually trips people up sometimes-- is
  • 00:06:28
    that we can't have any regressors that
  • 00:06:30
    are linear functions of one or more other regressors.
  • 00:06:35
    And in matrix notation, that's another way
  • 00:06:40
    to say that is the regressors are linearly independent.
  • 00:06:43
    And that turns out to be equivalent to x
  • 00:06:46
    prime x being invertible.
  • 00:06:47
    Yep.
  • 00:06:48
    STUDENT: Can you give an example of that?
  • 00:06:50
    SARA ELLISON: I will give two examples in fact.
  • 00:06:54
    OK, so here is one example.
  • 00:06:58
    Let's imagine a case where we want
  • 00:07:00
    to estimate the effect of schooling, work
  • 00:07:03
    experience, and age on salary.
  • 00:07:06
    And we have individual level data.
  • 00:07:10
    So we have sort of a data set.
  • 00:07:11
    And we have a bunch of different salaries.
  • 00:07:14
    And then we also have each person's years of schooling,
  • 00:07:18
    each person's years of work experience, each person's age,
  • 00:07:22
    and maybe some other stuff too.
  • 00:07:24
    Doesn't matter.
  • 00:07:26
    Well, it could be in our particular sample,
  • 00:07:29
    it's quite possible that everyone in our sample
  • 00:07:32
    started school at age six, went to school
  • 00:07:35
    until he or she finished school, and then started working.
  • 00:07:39
    Wouldn't be crazy if that happened.
  • 00:07:41
    Well, if in fact that was the case, then
  • 00:07:45
    the years of schooling plus the years of work experience
  • 00:07:48
    plus 6 is equal to the age.
  • 00:07:51
  • 00:07:54
    So if that's the case, we can't estimate this regression
  • 00:07:59
    equation.
  • 00:08:00
    And it sort of makes logical sense too in the sense
  • 00:08:05
    that there's nothing that helps us separately
  • 00:08:12
    identify what the effect of schooling,
  • 00:08:15
    and work experience, and age are.
  • 00:08:18
    There's no variation that allows us to separately figure out
  • 00:08:22
    the effects of all three of those, if, in fact, they're
  • 00:08:26
    collinear in our sample.
  • 00:08:29
    So we can't estimate such a model.
  • 00:08:32
    Is that clear?
  • 00:08:33
    STUDENT: So you would drop a regressor.
  • 00:08:35
    SARA ELLISON: Exactly, you have to drop one of the regressors.
  • 00:08:37
  • 00:08:39
    Does this make sense?
  • 00:08:42
    Yes.
  • 00:08:43
    STUDENT: If you drop age it still wouldn't work,
  • 00:08:45
    like if you those?
  • 00:08:47
    SARA ELLISON: If you drop age, it still wouldn't work?
  • 00:08:49
    Yes.
  • 00:08:50
    STUDENT: It still wouldn't work or--
  • 00:08:52
    SARA ELLISON: It would.
  • 00:08:54
    It wouldn't work if everyone went to school until age 18.
  • 00:09:00
    So we would still need to have--
  • 00:09:04
    we couldn't have a perfect linear relationship
  • 00:09:07
    between number of years of schooling--
  • 00:09:12
    well, actually, that would just be no sample variation in years
  • 00:09:17
    of schooling.
  • 00:09:20
    But if some people went to school until age 18,
  • 00:09:23
    and some people went to age 20, and some went to age 25,
  • 00:09:26
    then if we dropped age from this regression,
  • 00:09:29
    then we could estimate it.
  • 00:09:32
    STUDENT: You say why this doesn't
  • 00:09:36
    hold is that if we took x1, x2, and x3,
  • 00:09:40
    we could get values of beta 1, beta 2, and all of these, which
  • 00:09:43
    would essentially make y 0.
  • 00:09:45
    So 1, 1, and minus 1, for example.
  • 00:09:48
    So then this equation would go [INAUDIBLE]
  • 00:09:50
    and then y would be 0 in that case.
  • 00:09:53
    SARA ELLISON: So that's not the intuition I have.
  • 00:09:56
    That may be correct in some way.
  • 00:09:59
    But that's certainly not the intuition I have.
  • 00:10:02
    I would just say that my intuition is just that we don't
  • 00:10:08
    have any variation to separately identify
  • 00:10:10
    the effects of these three things
  • 00:10:12
    if they're perfectly linearly associated with one another.
  • 00:10:17
    Yes.
  • 00:10:18
    STUDENT: So what if the two regressor
  • 00:10:20
    are somewhat relating, but they are not perfectly relating?
  • 00:10:25
    SARA ELLISON: So that's an excellent question.
  • 00:10:27
    So the question was, what if they are closely related?
  • 00:10:31
    So maybe this doesn't quite hold in our sample.
  • 00:10:34
    But it comes close to holding.
  • 00:10:36
    Like we had a few people who went to school at age
  • 00:10:38
    five instead of age six.
  • 00:10:40
    And we had like a couple people who took a year off and didn't
  • 00:10:43
    work.
  • 00:10:44
    So this linear relationship is close to holding,
  • 00:10:47
    but not quite.
  • 00:10:48
    That's something that Esther might be
  • 00:10:50
    able to talk about next time.
  • 00:10:52
    I'm not sure.
  • 00:10:53
    So basically, in that case, you can--
  • 00:10:57
    maybe, maybe not.
  • 00:10:59
    But anyhow, I'll tell you the answer.
  • 00:11:01
    In that case, you can estimate this equation.
  • 00:11:04
    But you end up sort of having trouble separately identifying
  • 00:11:11
    the coefficients on these variables,
  • 00:11:14
    on these three variables.
  • 00:11:15
    And in fact, they're going to be coefficients that have very--
  • 00:11:19
    that your estimator is going to be of very high variance
  • 00:11:23
    estimator.
  • 00:11:24
    And so in that case, what you might want to do
  • 00:11:26
    is still drop one of them.
  • 00:11:28
    That's going to give you much lower variance estimators
  • 00:11:32
    for the remaining two.
  • 00:11:33
    It's going to introduce a little bit of bias,
  • 00:11:36
    if that regressor belongs in there.
  • 00:11:38
    It's going to introduce a little bit of bias.
  • 00:11:40
    But you might be willing to accept that bias
  • 00:11:42
    to have much lower variance estimators.
  • 00:11:46
    Yes.
  • 00:11:46
    STUDENT: [INAUDIBLE] that digression of--
  • 00:11:51
    what if you had a large data set with a lot of variables.
  • 00:11:54
    And you don't know that there are regressions in there
  • 00:11:58
    that do have [INAUDIBLE]?
  • 00:11:59
    What are the things that can insinuate from this,
  • 00:12:02
    could be causing problems in my analysis or--
  • 00:12:05
    SARA ELLISON: Yeah, so first of all,
  • 00:12:08
    if I tried to run this regression
  • 00:12:10
    and this linear relationship existed in my data set,
  • 00:12:14
    R would throw up its hands and say, you can't do that.
  • 00:12:19
    So I can't even do it.
  • 00:12:23
    So you would find that out.
  • 00:12:24
    If this relationship didn't quite exist,
  • 00:12:29
    it was close to existing in the data set,
  • 00:12:31
    but not quite, R would go ahead and give you
  • 00:12:34
    the results of this.
  • 00:12:36
    But one thing that you could do is
  • 00:12:39
    you could compute the correlation coefficients
  • 00:12:44
    for all of your regressors before you run the regression
  • 00:12:46
    and see if any are really highly correlated.
  • 00:12:49
    That wouldn't necessarily pick up a linear relationship
  • 00:12:53
    like this.
  • 00:12:54
    But the other thing you could do is
  • 00:12:55
    after you run the regression, if you
  • 00:12:57
    have really large standard errors, that
  • 00:13:00
    could be a signal to you that you
  • 00:13:03
    could have this situation that's close to perfect collinearity.
  • 00:13:08
    Yes.
  • 00:13:09
    STUDENT: So if we have two regressors, x1 and x2,
  • 00:13:12
    could we use the ratio in some ways to [INAUDIBLE] regression,
  • 00:13:15
    as a third regressor?
  • 00:13:16
    Or would that also be [INAUDIBLE]??
  • 00:13:18
    SARA ELLISON: So that wouldn't induce
  • 00:13:23
    this sort of perfect collinearity problem.
  • 00:13:29
    You could.
  • 00:13:30
    You might not want to do it for other reasons.
  • 00:13:33
    But yeah.
  • 00:13:35
    ESTHER DUFLO: There's one more.
  • 00:13:36
    Continuing the question on if you had many [INAUDIBLE]
  • 00:13:39
    and you didn't know which one to pick,
  • 00:13:42
    [INAUDIBLE] fall away from traditional econometrics,
  • 00:13:46
    it becomes then this [INAUDIBLE] we're
  • 00:13:49
    going to talk about when we talk about machine learning.
  • 00:13:51
    If you really don't want--
  • 00:13:53
    traditional econometrics assumes that you
  • 00:13:55
    have a model that you are trying to test so you don't go
  • 00:13:58
    on a giant fishing expedition.
  • 00:14:01
    If you want to go on a giant fishing expedition,
  • 00:14:04
    there are techniques for that.
  • 00:14:05
    And that's [INAUDIBLE].
  • 00:14:06
    That's what we are going to introduce.
  • 00:14:09
  • 00:14:12
    SARA ELLISON: Good.
  • 00:14:13
    OK, so that's one example.
  • 00:14:15
    A second example of this perfect multi-colinearity and one
  • 00:14:23
    that sort of researchers run afoul of all the time
  • 00:14:27
    is when they use dummy variables to indicate, say, observations
  • 00:14:33
    falling into exhaustive and mutually
  • 00:14:37
    exclusive set of classes.
  • 00:14:39
    So here's an example.
  • 00:14:41
    Let's suppose I have a data set.
  • 00:14:42
    Let's say I go talk to all my friends
  • 00:14:46
    in the dorm to collect my data set for 1431.
  • 00:14:48
    And I ask them what pets they have at home.
  • 00:14:50
    And let's say all of them have pets.
  • 00:14:54
    We could have a category for no pet as well.
  • 00:14:56
    But anyhow, let's say all of them have pets.
  • 00:14:58
    But they either have a cat, a dog, or a fish.
  • 00:15:01
    And so then I create three different dummy variables.
  • 00:15:04
    One is equal to 1 if they have a cat, and 0 otherwise.
  • 00:15:08
    1 is equal to 1 if they have a dog, and 0 otherwise.
  • 00:15:11
    And 1 is equal to 1 if they have a fish, and 0 otherwise.
  • 00:15:14
    And everyone has exactly one of those pets.
  • 00:15:20
    I cannot include all three of those dummy variables
  • 00:15:24
    in the regression because if we add up those three dummy
  • 00:15:29
    variables, we get a column of ones.
  • 00:15:31
    And that's perfectly co-linear with our column
  • 00:15:35
    of ones that allows us to estimate the intercept.
  • 00:15:40
    Now, there are other ways.
  • 00:15:44
    You can, in fact, decide to include all three
  • 00:15:49
    dummy variables and not include an intercept,
  • 00:15:51
    not estimate an intercept in this regression.
  • 00:15:54
    It's entirely equivalent.
  • 00:15:57
    It would be a little troubling if it wasn't equivalent.
  • 00:15:59
    But it is in fact entirely equivalent.
  • 00:16:01
    It's just have to interpret the coefficient estimates
  • 00:16:03
    a different way.
  • 00:16:04
    But you can't have both an intercept in your regression
  • 00:16:06
    and a set of dummy variables that
  • 00:16:09
    are a full set of exhaustive and mutually exclusive classes.
  • 00:16:15
    And like I said, R will not let you do this anyhow.
  • 00:16:18
  • 00:16:21
    OK, so now the second assumption or second-- yes.
  • 00:16:26
    STUDENT: [INAUDIBLE] data?
  • 00:16:29
    SARA ELLISON: It only changes the interpretation.
  • 00:16:32
    So basically, I think I'll leave that question for Esther
  • 00:16:38
    because she will be giving examples
  • 00:16:40
    of how to use dummy variables in regressions
  • 00:16:43
    and how to interpret the coefficients.
  • 00:16:44
    So we'll leave that till later.
  • 00:16:48
    OK, so then the second assumption
  • 00:16:51
    was about the error behavior.
  • 00:16:53
    And as I said before, these assumptions
  • 00:16:56
    aren't different for the multivariate model.
  • 00:17:00
    It's just that I've expressed them using matrix notation.
  • 00:17:03
    So let me just go through these exactly the same assumptions
  • 00:17:08
    we saw in the bivariate model.
  • 00:17:10
    Here I'm using matrix notation.
  • 00:17:12
    So I'll just go through and show you what they mean.
  • 00:17:15
    So first of all, expectation of epsilon is equal to 0.
  • 00:17:22
    Epsilon is a vector.
  • 00:17:23
    And so it's just equal to a vector of zeros.
  • 00:17:27
    And then for some reason, we often
  • 00:17:31
    write instead of writing the assumption
  • 00:17:35
    as the covariance matrix of epsilon
  • 00:17:38
    equals sigma squared times the identity, the n
  • 00:17:42
    by n identity matrix, we write it
  • 00:17:44
    as the expectation of epsilon epsilon transpose
  • 00:17:48
    is equal to that.
  • 00:17:49
    Well, it turns out because the expectation of epsilon
  • 00:17:53
    is identically equal to 0, this matrix is equal to this matrix.
  • 00:18:01
    You can do the calculations.
  • 00:18:02
    It's just two lines to convince yourself.
  • 00:18:04
    But I could have expressed this assumption
  • 00:18:08
    by just saying that this matrix is equal to this.
  • 00:18:13
    But for whatever reason, we often
  • 00:18:16
    see it written as this matrix is equal to this, same thing.
  • 00:18:22
    Does everyone understand why this matrix equaling this
  • 00:18:28
    is exactly the same assumptions we saw before?
  • 00:18:32
    So this matrix is just simply a matrix
  • 00:18:36
    containing the variances of the epsilons on the diagonal.
  • 00:18:40
    So remember before we said each epsilon had
  • 00:18:43
    variance sigma squared?
  • 00:18:45
    That was our homoscedasticity assumption.
  • 00:18:47
    So each variance is sigma squared.
  • 00:18:50
    And then all the covariances were 0.
  • 00:18:53
    That was our no serial correlation assumption.
  • 00:18:56
    All of the off diagonals here are 0.
  • 00:18:59
    So it's the same thing, just in matrix form.
  • 00:19:01
  • 00:19:09
    Yeah, I think I said this verbally.
  • 00:19:12
    But this thing is denoted covariance of epsilon.
  • 00:19:17
    And it's called the variance-covariance matrix
  • 00:19:19
    of epsilon.
  • 00:19:19
  • 00:19:23
    OK, fine.
  • 00:19:25
    We've got this linear model.
  • 00:19:26
    We've got these assumptions.
  • 00:19:27
    Just like before, we're going to now ask the question,
  • 00:19:30
    how do we get beta hat?
  • 00:19:32
    And what distribution does beta hat have?
  • 00:19:34
    And the answers are not going to be surprising.
  • 00:19:37
    But they're going to be more beautiful than they
  • 00:19:39
    were last time.
  • 00:19:40
    So what is beta hat?
  • 00:19:42
    Well, it's a vector that minimizes
  • 00:19:44
    the sum of squared errors.
  • 00:19:45
    So we've got a vector of residuals transpose
  • 00:19:51
    times a vector of residuals.
  • 00:19:53
    And that's sort of expanded out what it looks like.
  • 00:20:00
    OK, so we want to choose the beta hat that
  • 00:20:03
    minimizes that thing.
  • 00:20:04
    So what we do is we take the derivative with respect
  • 00:20:07
    to beta, set it equal to 0, and obtain this.
  • 00:20:13
    If you're not used to doing calculus
  • 00:20:15
    with vectors and matrices, in the notes that I posted online,
  • 00:20:20
    I write this out in more detail.
  • 00:20:23
    And you can take a look at that if you want.
  • 00:20:25
    But basically, we get this sort of equation set equal to 0.
  • 00:20:33
    And this is going to tell us what
  • 00:20:35
    the beta hat that minimizes the sum of squared residuals is.
  • 00:20:40
    Then we solve for beta hat.
  • 00:20:42
    The negative 2 we can just divide both sides
  • 00:20:46
    by negative 2.
  • 00:20:47
    And so that goes away.
  • 00:20:48
    Then we write this equation as x prime y,
  • 00:20:54
    or x transpose y equals x transpose x times beta hat.
  • 00:20:59
    And then if this is invertible--
  • 00:21:04
    and remember, that was one of our assumptions.
  • 00:21:06
    That was our identification assumption,
  • 00:21:08
    that that thing was invertible.
  • 00:21:09
    If that's invertible, then we get that beta hat
  • 00:21:13
    is just equal to x prime x inverse x prime y.
  • 00:21:17
  • 00:21:20
    Beautiful.
  • 00:21:21
  • 00:21:31
    I mean, that was literally the derivation
  • 00:21:33
    of the least squares estimators in matrix notation.
  • 00:21:37
    If you look at my notes that I've posted online,
  • 00:21:39
    I mean, it's just pages of algebra using that summation
  • 00:21:43
    notation.
  • 00:21:44
    So this is why we love doing it in matrix notation.
  • 00:21:49
    What do we want to know about beta hat?
  • 00:21:51
    What do we always want to know about an estimator,
  • 00:21:54
    so we can do inference?
  • 00:21:56
    It's distribution.
  • 00:21:58
    Oh, it's right up there.
  • 00:22:01
    OK, fine.
  • 00:22:02
    So the expectation of beta hat is equal to beta.
  • 00:22:07
    Again, in matrix notation, it's very simple.
  • 00:22:10
    I haven't included the four lines or something like that.
  • 00:22:13
    But if you treat the x's as fixed,
  • 00:22:17
    then that makes the sort of proof very simple.
  • 00:22:21
    They come outside the expectation operator
  • 00:22:23
    and basically just falls out.
  • 00:22:27
    So it is unbiased.
  • 00:22:29
    And the covariance of beta hat, remember
  • 00:22:32
    this is the variance-covariance matrix.
  • 00:22:36
    So it's the matrix that has along the diagonal
  • 00:22:39
    the variances of each of the beta hats
  • 00:22:41
    and on the off diagonals, the covariances between them.
  • 00:22:44
    That is just equal to sigma squared times
  • 00:22:48
    x prime x inverse.
  • 00:22:51
    So again, very elegant, very beautiful, and not too hard
  • 00:22:56
    to show if you treat the x's as fixed.
  • 00:22:59
    And you can look on the website if you're interested.
  • 00:23:01
  • 00:23:05
    And finally, we often don't know what sigma squared is.
  • 00:23:09
    For inference, we need to know what sigma squared is
  • 00:23:12
    or we need an estimate for sigma squared.
  • 00:23:15
    So this is our unbiased estimate for sigma squared.
  • 00:23:20
    And as Esther anticipated, here we
  • 00:23:25
    have to subtract off a k instead of a 2
  • 00:23:28
    because instead of it being a bivariate model,
  • 00:23:33
    it's a multivariate model.
  • 00:23:36
  • 00:23:39
    And then finally, if we're willing to impose the more
  • 00:23:44
    strict assumption on the error distribution,
  • 00:23:47
    the errors are normally distributed,
  • 00:23:50
    then the beta hats are also normally distributed.
  • 00:23:53
    So sometimes we want to impose that.
  • 00:23:55
    Sometimes we want to be less proscriptive.
  • 00:24:01
    OK, so now, finally we get to inference.
  • 00:24:04
    So typically, in the linear model,
  • 00:24:07
    we're going to want to test hypotheses involving the betas.
  • 00:24:11
    That's where the real action is.
  • 00:24:12
    I mean, I can dream up hypotheses involving sigma
  • 00:24:15
    squared and things like that.
  • 00:24:17
    And that's fine.
  • 00:24:18
    And occasionally, you might want to test a hypothesis involving
  • 00:24:21
    sigma squared.
  • 00:24:22
    But really we care about the betas
  • 00:24:24
    because the betas are the parameters
  • 00:24:25
    in our conditional mean function of our outcome variable.
  • 00:24:29
    And the questions that we usually
  • 00:24:31
    want to answer using linear regression
  • 00:24:33
    are about the nature of this conditional mean function.
  • 00:24:37
    So sometimes we might only be interested in one of the betas.
  • 00:24:42
    Other times we might want to simultaneously test hypotheses
  • 00:24:45
    about a whole bunch of them.
  • 00:24:48
    And as we saw in the output that I showed you last lecture,
  • 00:24:52
    statistical packages typically perform some standard tests
  • 00:24:57
    on the betas for free and just report them with the output.
  • 00:25:03
    And that's fine.
  • 00:25:05
    And we can use those.
  • 00:25:06
    And they're often quite handy.
  • 00:25:08
    But there may be other ones that we need to do ourselves.
  • 00:25:11
    So they don't perform every conceivable test
  • 00:25:14
    we might be interested in.
  • 00:25:17
    OK, so let's start with a pretty general framework
  • 00:25:21
    for testing hypotheses about beta.
  • 00:25:24
    And it's not only quite general and flexible.
  • 00:25:27
    It's also super intuitive.
  • 00:25:28
    It's one of my favorite tests.
  • 00:25:30
    I really like it.
  • 00:25:31
    OK, so let's consider hypotheses of the following form.
  • 00:25:36
    A matrix r times beta is equal to a vector c.
  • 00:25:40
    That's the null hypothesis.
  • 00:25:42
    The alternative is that it's not equal to the vector c.
  • 00:25:46
    So what is this matrix r?
  • 00:25:49
    It's a matrix of restrictions.
  • 00:25:52
    And its dimensions are r by k plus 1.
  • 00:25:56
    So it has the number of--
  • 00:25:59
    so the number of columns equal to the number
  • 00:26:03
    of parameters, the number of betas
  • 00:26:05
    that we're estimating in the linear model.
  • 00:26:07
    And then the number of rows is the number of restrictions
  • 00:26:10
    that we want to impose in our null hypothesis,
  • 00:26:14
    the number of restrictions we want to test.
  • 00:26:18
    So we could have a matrix, where r is equal to 1.
  • 00:26:22
    And then we're just testing one restriction.
  • 00:26:24
    So that would correspond to something
  • 00:26:26
    like beta 1 is equal to 0.
  • 00:26:28
  • 00:26:33
    Oh, so let me just say this.
  • 00:26:34
    I'll get to some examples in a minute.
  • 00:26:37
    So almost any hypothesis involving
  • 00:26:39
    beta you can dream up in the context of a linear model
  • 00:26:42
    can be captured in this framework, not
  • 00:26:44
    quite any, but most of them.
  • 00:26:46
    You can test whether individual parameters are equal to 0.
  • 00:26:49
    You can test whether individual parameters
  • 00:26:51
    are equal to something other than 0.
  • 00:26:53
    You can test multiple hypotheses simultaneously.
  • 00:26:56
    You can test hypotheses about linear combinations
  • 00:26:59
    of parameters.
  • 00:27:00
    The world is your oyster.
  • 00:27:02
    So let me show you a few examples of these
  • 00:27:05
    and exactly what the r matrix looks like
  • 00:27:08
    and what the c vector looks like in these examples.
  • 00:27:12
    OK, so let's say, for instance, that we set up the matrix
  • 00:27:19
    r to be just a row vector with a 0 in the first spot,
  • 00:27:25
    and then a 1, and then the rest 0s.
  • 00:27:28
    So what that matrix is doing is it's picking out beta 1.
  • 00:27:32
    Remember, this spot corresponds to beta 0.
  • 00:27:38
    So being in the second spot, it's picking out beta 1.
  • 00:27:41
    And c is just what beta 1 is equal to under the null.
  • 00:27:49
    So that r and this c corresponds to the hypothesis
  • 00:27:54
    that beta 1 is equal to 0.
  • 00:27:55
  • 00:27:59
    Let's suppose instead that we want
  • 00:28:03
    to test a whole bunch of hypotheses
  • 00:28:06
    simultaneously, that beta 1 is equal to 0,
  • 00:28:09
    and beta 2 is equal to 0, and beta 3
  • 00:28:11
    is equal to 0, et cetera.
  • 00:28:14
    Well, then this is what our matrix would look like.
  • 00:28:17
    So it would basically be an identity matrix with a column
  • 00:28:22
    of 0's tacked on the front.
  • 00:28:24
    And the reason why the column of 0's is tacked on the front
  • 00:28:26
    is because that corresponds to the intercept.
  • 00:28:29
    And we're not interested at least here
  • 00:28:32
    in testing a hypothesis about the intercept.
  • 00:28:36
    And then the c vector is just a vector of 0's.
  • 00:28:40
  • 00:28:45
    So I do want to emphasize, even though I've sort of written
  • 00:28:49
    this as like one equation, this is actually
  • 00:28:52
    we're testing k hypotheses simultaneously here.
  • 00:28:55
    So we have k equal signs.
  • 00:28:57
  • 00:29:02
    OK, so here's a more complicated example.
  • 00:29:06
    If our r matrix has in the first row a 1 and a negative 1,
  • 00:29:13
    and then the rest 0's, sorry, 0, and then 1, negative 1,
  • 00:29:16
    the rest 0's.
  • 00:29:17
    And then the second row there's a 1
  • 00:29:21
    in the fourth spot, et cetera.
  • 00:29:26
    And then the c vector looks like this.
  • 00:29:29
    What does this correspond to in terms
  • 00:29:31
    of a hypothesis we might want a test or a series of hypotheses?
  • 00:29:36
    Well, here, the first row gives us the hypothesis
  • 00:29:40
    that beta 1 minus beta 2 is equal to 0.
  • 00:29:45
    So I could just write that as beta 1 is equal to beta 2.
  • 00:29:50
    The second row corresponds to beta 3--
  • 00:29:53
    this is beta 3 here-- being equal to 5.
  • 00:29:57
    And the third row corresponds to beta k
  • 00:30:01
    being equal to negative 2.
  • 00:30:04
    Yes.
  • 00:30:04
    STUDENT: Can you explain the beta 1 equals beta 2?
  • 00:30:08
    SARA ELLISON: OK, so if I just multiply the matrix, the r
  • 00:30:18
    matrix, by beta and sort of wrote these out as equations,
  • 00:30:22
    I would get beta 1 minus--
  • 00:30:26
    so this is beta 0 here.
  • 00:30:28
    So here's beta 1 minus beta 2 is equal to 0.
  • 00:30:34
    And then I just rewrote that as beta 1 is equal to beta 2.
  • 00:30:37
    That's all.
  • 00:30:38
    Yep.
  • 00:30:39
    STUDENT: How often are we [INAUDIBLE] specific value
  • 00:30:42
    rather than the range?
  • 00:30:44
    And is there a [INAUDIBLE] against that?
  • 00:30:47
    SARA ELLISON: So yes and no.
  • 00:30:50
    So basically, if we're not interested in--
  • 00:30:56
    if we're interested in whether beta is in a range,
  • 00:30:59
    then what we might want to do is instead
  • 00:31:01
    of doing a hypothesis test, where
  • 00:31:03
    the null was a single value and the alternative
  • 00:31:06
    was everything else, we might want
  • 00:31:07
    to do, say, a one-sided test, where the null is that beta
  • 00:31:11
    is less than some value and the alternative
  • 00:31:14
    is that it's greater than some value.
  • 00:31:15
    We can do those.
  • 00:31:17
    We can't do them in this framework.
  • 00:31:19
    So I'll talk about that in a second.
  • 00:31:21
    The other thing that you might be suggesting
  • 00:31:23
    is instead of doing hypothesis testing,
  • 00:31:26
    we might want to just report confidence intervals as well.
  • 00:31:29
    So remember that really hypothesis
  • 00:31:32
    testing and constructing confidence intervals
  • 00:31:34
    are kind of the same thing.
  • 00:31:36
    It's just reporting the same information in different forms.
  • 00:31:39
    And so it can just be a matter of style or preference.
  • 00:31:45
    Instead of reporting hypothesis tests,
  • 00:31:48
    you report confidence intervals.
  • 00:31:50
    And that's perfectly fine.
  • 00:31:52
    Yeah.
  • 00:31:53
    ESTHER DUFLO: So confidence interval,
  • 00:31:54
    it can be harder to say whether between the [INAUDIBLE]
  • 00:31:58
    minus [? 4 ?] is [INAUDIBLE].
  • 00:32:02
    I mean, it's kind of hard to see.
  • 00:32:04
    They don't really add up.
  • 00:32:05
    SARA ELLISON: Yeah, and I guess the other more fundamental
  • 00:32:09
    answer to your question is that sometimes we actually do--
  • 00:32:13
    there might be a theory that says beta
  • 00:32:16
    should be equal to this number.
  • 00:32:18
    And in order to test that theory,
  • 00:32:20
    we want to perform a hypothesis test that beta
  • 00:32:22
    is equal to that number.
  • 00:32:24
    So that does come up, not every case.
  • 00:32:28
    But yeah, it is relevant.
  • 00:32:31
    Other questions?
  • 00:32:32
    No.
  • 00:32:34
    STUDENT: You could also use it to if somebody came out
  • 00:32:37
    with a paper today describing the treatment of malaria
  • 00:32:40
    [INAUDIBLE] wanted to see if that was true or not,
  • 00:32:42
    just take that beta and test for it and do hypothesis testing?
  • 00:32:47
    SARA ELLISON: Yeah.
  • 00:32:48
    STUDENT: OK.
  • 00:32:48
  • 00:32:51
    SARA ELLISON: OK, oh, here's part
  • 00:32:55
    of the answer to your question.
  • 00:32:56
    One thing you can't do in this framework
  • 00:32:58
    is test one-sided hypotheses.
  • 00:33:00
    We'll get back to those.
  • 00:33:03
    So now we have this framework.
  • 00:33:07
    I mean, it's not really a framework, just
  • 00:33:09
    sort of a notation in some sense to deal
  • 00:33:13
    with hypotheses of all of the forms we just talked about.
  • 00:33:18
    And within the regression framework,
  • 00:33:21
    we have a super intuitive and cool way
  • 00:33:23
    to test these hypotheses.
  • 00:33:25
    So first of all, let's think of the null
  • 00:33:28
    as describing a set of restrictions on the model.
  • 00:33:31
    So let me just go back for a second.
  • 00:33:34
    So in this case, this null has three different restrictions,
  • 00:33:39
    that beta 1 is equal to beta 2, that beta 3 is equal to 5,
  • 00:33:42
    and that beta k is equal to minus 2.
  • 00:33:45
    And we think of the null as imposing restrictions
  • 00:33:48
    on the model.
  • 00:33:50
    Then here's how we perform the test.
  • 00:33:52
    We estimate the unrestricted model.
  • 00:33:55
    We impose the restrictions of the null
  • 00:33:57
    and estimate that model.
  • 00:34:00
    And then we compare the goodness of fit of those two models.
  • 00:34:04
    So that's why I love this test.
  • 00:34:05
    It seems really intuitive to me that if you
  • 00:34:10
    have a set of restrictions and they really bind,
  • 00:34:13
    and they really sort of affect how good your fit is, then
  • 00:34:18
    that tells you, well, maybe those restrictions are not
  • 00:34:22
    true.
  • 00:34:23
    If the restrictions on the other hand
  • 00:34:26
    don't really bind that much, if your model fits almost as
  • 00:34:33
    well with the restricted model as it
  • 00:34:36
    does with the unrestricted model, then
  • 00:34:38
    that tells you maybe these restrictions
  • 00:34:40
    are true or close to true.
  • 00:34:41
    And we don't want to reject them.
  • 00:34:44
    So that's the whole intuition and the idea behind this test.
  • 00:34:48
  • 00:34:51
    OK, so a couple of details before we
  • 00:34:57
    get to the distribution, the test statistic.
  • 00:34:59
    Estimating the unrestricted model is simple.
  • 00:35:02
    Just run the regression.
  • 00:35:03
    But how do we estimate the restricted model?
  • 00:35:06
    Well, it depends on what form the restrictions take.
  • 00:35:10
    So let's say we're testing hypothesis,
  • 00:35:12
    where just a bunch of the betas are equal to 0.
  • 00:35:15
    How do we run the restricted model?
  • 00:35:18
  • 00:35:22
    Yes.
  • 00:35:23
    STUDENT: We just think the [INAUDIBLE]
  • 00:35:25
    is kind of on the diagonal 1 so that [INAUDIBLE]..
  • 00:35:28
    SARA ELLISON: Yes, exactly.
  • 00:35:31
    So practically speaking, what we do is we just run
  • 00:35:34
    the regression leaving out all of those x's.
  • 00:35:37
    So that's the way we constrain the coefficients to be
  • 00:35:40
    equal to 0.
  • 00:35:42
    So we have the unrestricted.
  • 00:35:45
    The unrestricted regression is just
  • 00:35:47
    all of the x's are in there.
  • 00:35:49
    If we want to restrict that certain betas are equal to 0,
  • 00:35:51
    we just run another regression, where we leave out the x's
  • 00:35:56
    associated with the betas that we want to have equal to 0.
  • 00:36:00
    So that's our restricted model.
  • 00:36:05
    Then let's say the restriction is
  • 00:36:09
    that the two betas are equal.
  • 00:36:10
    We have beta 1 equals beta 2, or something like that.
  • 00:36:13
    That's our null restriction.
  • 00:36:16
    Then how do we impose that restriction on a linear model?
  • 00:36:20
    Well, actually, it might help if I write the linear model.
  • 00:36:25
    So we have y sub i equals beta 0 plus beta 1 x1.
  • 00:36:35
    I hope I'm using the same notation.
  • 00:36:36
  • 00:36:40
    Do I have my subscripts in the same order?
  • 00:36:43
    I hope so.
  • 00:36:43
  • 00:36:48
    OK, so let's suppose this is our unrestricted model.
  • 00:36:56
    We want to restrict beta 1 to be equal to beta 2.
  • 00:36:59
    Well, what do we do?
  • 00:37:01
    We just create a new variable that's the sum of these two.
  • 00:37:06
    So this is called x1i plus x2i, just a new variable.
  • 00:37:16
    And then we only estimate one coefficient on that.
  • 00:37:23
    So our restricted model is just that we
  • 00:37:27
    don't include this variable as a regressor or this variable
  • 00:37:30
    as a regressor.
  • 00:37:31
    We include their sum as a regressor.
  • 00:37:35
    And that's how we're imposing the null restriction
  • 00:37:37
    because when we include their sum,
  • 00:37:39
    we're making their two coefficients equal.
  • 00:37:42
    We're forcing their two coefficients to be equal.
  • 00:37:46
    Yeah.
  • 00:37:47
    STUDENT: So are we testing the first beta
  • 00:37:49
    1 and the second beta 1 are the same?
  • 00:37:51
    SARA ELLISON: Exactly, yes.
  • 00:37:53
    This is testing the hypothesis that beta 1 is equal to beta 2.
  • 00:38:00
    Yep.
  • 00:38:02
    STUDENT: Is that different from testing the beta 2 [? at 0? ?]
  • 00:38:05
    SARA ELLISON: Yeah, it's definitely different.
  • 00:38:07
    So here, these betas could be anything.
  • 00:38:12
    They could be a million.
  • 00:38:14
    We're just testing the hypothesis that they're equal.
  • 00:38:17
  • 00:38:20
    Yes.
  • 00:38:21
  • 00:38:24
    STUDENT: But if they're equal, wouldn't it
  • 00:38:26
    be like a linear combination of the ther [INAUDIBLE]??
  • 00:38:29
    You know how they cannot be like a linear sum?
  • 00:38:34
  • 00:38:38
    SARA ELLISON: I'm not sure if I understand your question.
  • 00:38:40
    So basically, what I'm trying to do here
  • 00:38:43
    is impose just this hypothesis, but not
  • 00:38:46
    impose anything else about what the betas might be equal to.
  • 00:38:51
    STUDENT: The identification restriction
  • 00:38:54
    is not on the betas.
  • 00:38:55
    It's on the x's.
  • 00:38:57
    Beta can have whatever medium conditions.
  • 00:39:01
    SARA ELLISON: Ah, you were confused
  • 00:39:02
    about the identification assumption.
  • 00:39:04
    Yep, yep, yep, that's right.
  • 00:39:05
  • 00:39:08
    ESTHER DUFLO: You can reask your question
  • 00:39:09
    saying, if you could post the sum on just x
  • 00:39:13
    and it turned out that in fact they were equal,
  • 00:39:16
    you can guess what the beta [INAUDIBLE] following x.
  • 00:39:20
  • 00:39:23
    SARA ELLISON: OK, what if the restriction
  • 00:39:25
    is that some beta is equal to a constant c?
  • 00:39:29
    How would we impose that restriction
  • 00:39:32
    and then re-estimate the restricted model?
  • 00:39:35
    Well, let's suppose this is just a constant.
  • 00:39:39
    So we impose that this one is equal to a constant.
  • 00:39:42
    So then here, there's no parameter in this term
  • 00:39:47
    that we need to estimate under the null.
  • 00:39:50
    So we just subtract the constant times x1
  • 00:39:54
    from the dependent variable and rerun that regression.
  • 00:39:58
    And that's our restricted regression.
  • 00:40:01
    Does that make sense?
  • 00:40:03
    OK, so going back, we estimate the unrestricted model.
  • 00:40:10
    We impose restrictions and estimate that model.
  • 00:40:13
    And then we compare the goodness of fit.
  • 00:40:15
    And if the goodness of fit is not very different,
  • 00:40:19
    we don't reject the null.
  • 00:40:20
    If it's very different, we reject the null.
  • 00:40:23
    In particular, this test statistic,
  • 00:40:27
    which is basically the numerator has the difference
  • 00:40:31
    in the sum of the restricted and unrestricted sum of squares
  • 00:40:36
    and then in the denominator has the unrestricted sum
  • 00:40:39
    of squares.
  • 00:40:40
    This is how we form the test statistic.
  • 00:40:42
    And it turns out that has an F distribution under the null.
  • 00:40:47
    And we reject the null for large values of this test statistic.
  • 00:40:53
الوسوم
  • multivariate linear model
  • matrix notation
  • identification assumptions
  • error behavior
  • multicollinearity
  • hypothesis testing
  • goodness of fit
  • dummy variables
  • variance-covariance matrix
  • beta hat