What is the multivariate linear model?

A model that describes the relationship between multiple explanatory variables and a dependent variable using matrix notation.

What are identification assumptions?

Assumptions that ensure the ability to estimate the model, requiring more observations than regressors.

What does it mean for regressors to be linearly independent?

It means no regressor can be expressed as a linear combination of other regressors.

What is perfect multicollinearity?

A situation where regressors are perfectly correlated, making it impossible to estimate their individual effects.

How can one identify multicollinearity in data?

By calculating correlation coefficients of regressors or observing large standard errors in estimated coefficients.

What is the sum of squares in hypothesis testing?

A statistical measure used to compare the goodness of fit between restricted and unrestricted models.

How do you run a restricted model?

By excluding regressors that correspond to the parameters you wish to constrain.

What is the F distribution?

A probability distribution used to determine the significance of the test statistic in hypothesis testing.

What must the distribution of beta hat be for inference?

It is normally distributed under certain assumptions about the error terms.

What happens when variables have near perfect multicollinearity?

It can cause high variance in coefficient estimates, making them unstable.

Lecture 18: The Multivariate Model

00:40:59

https://www.youtube.com/watch?v=Y9Awwqqc288

الملخص

TLDRSara Ellison elaborates on the multivariate linear model using matrix notation, discussing the identification assumptions necessary for model estimation and the requirements for error behavior. Key points include the importance of independent regressors, the challenges posed by multicollinearity, and real-world implications illustrated through examples. The framework for hypothesis testing is introduced, explaining how to compare restricted and unrestricted models through their goodness of fit. The overall emphasis is on how violations of assumptions can impact the reliability of regression coefficients and the resulting statistical inferences.

الوجبات الجاهزة

📊 Multivariate linear models use matrix notation for clarity.
📝 Identification assumptions require more observations than regressors.
🔍 Linear independence of regressors is essential for estimating the model.
🚫 Perfect multicollinearity prevents separate identification of parameter effects.
⚠️ High variance in coefficient estimates indicates multicollinearity issues.
🔗 Use correlation coefficients to detect linear relationships between regressors.
📉 Restricted models help assess the impact of hypotheses on fitting the data.
📊 The F distribution is crucial for determining statistical significance in tests.
🧮 Coefficients in matrix notation allow for elegant derivations in regression.
🔁 Hypothesis tests can capture various forms of relationships between parameters.

الجدول الزمني

00:00:00 - 00:05:00
In the introduction, Sara Ellison outlines a more general linear model using matrix notation, describing the representation of observations, errors, and explanatory variables. The multivariate linear model is summarized by the equation y = xβ + ε, where y is the dependent variable, x consists of explanatory variables and a column of ones, β represents coefficients, and ε denotes error terms.
00:05:00 - 00:10:00
Ellison presents the assumptions necessary for identification in the multivariate linear model. These include having more observations than explanatory variables and ensuring that the matrix of explanatory variables (X) is of full rank, meaning regressors must be linearly independent to avoid perfect multicollinearity. This is crucial for estimating the model accurately.
00:10:00 - 00:15:00
The necessity for positive sample variation in regressors is emphasized. If a regressor does not vary, it cannot help estimate the relationship between the dependent variable and the regressors. Perfect linear relationships between regressors prevent the separate identification of their effects, highlighting the importance of avoiding multicollinearity.
00:15:00 - 00:20:00
Two examples are given to illustrate the concept of perfect multicollinearity. The first discusses estimating the impact of schooling, experience, and age on salary, where these variables may perfectly correlate in certain datasets, preventing accurate estimation. The second uses dummy variables for mutually exclusive categories (e.g., pet ownership) that cannot all be included in the regression simultaneously, as this leads to collinearity issues.
00:20:00 - 00:25:00
Ellison delineates the assumptions regarding error behavior in the multivariate linear model, paralleling those in bivariate models. She discusses the expectation of the error term (ε) and its covariance structure, underscoring the requirement for the errors to have a mean of zero and be homoscedastic (constant variance across observations).
00:25:00 - 00:30:00
Next, the presentation formulates the way to derive β̂ (the estimated coefficients) that minimizes the sum of squared errors, leading to the elegant matrix representation of the least squares estimator. She highlights the unbiased nature of the estimator and the covariance structure, thus revealing insights into the properties of the estimators derived from the multivariate linear model.
00:30:00 - 00:35:00
Ellison transitions into the topic of hypothesis testing for the estimated coefficients (β), stressing the common interest in understanding their significance within the model. She introduces a flexible framework for testing hypotheses involving linear combinations of parameters while also noting the limitations of one-sided tests under this framework.
00:35:00 - 00:40:59
The discussion culminates in the intuitive method of hypothesis testing through the comparison of unrestricted and restricted models. By estimating both models and assessing their goodness of fit, researchers can discern the validity of their null hypotheses based on the significance of the restrictions imposed.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

What is the multivariate linear model?
A model that describes the relationship between multiple explanatory variables and a dependent variable using matrix notation.
What are identification assumptions?
Assumptions that ensure the ability to estimate the model, requiring more observations than regressors.
What does it mean for regressors to be linearly independent?
It means no regressor can be expressed as a linear combination of other regressors.
What is perfect multicollinearity?
A situation where regressors are perfectly correlated, making it impossible to estimate their individual effects.
How can one identify multicollinearity in data?
By calculating correlation coefficients of regressors or observing large standard errors in estimated coefficients.
What is the sum of squares in hypothesis testing?
A statistical measure used to compare the goodness of fit between restricted and unrestricted models.
How do you run a restricted model?
By excluding regressors that correspond to the parameters you wish to constrain.
What is the F distribution?
A probability distribution used to determine the significance of the test statistic in hypothesis testing.
What must the distribution of beta hat be for inference?
It is normally distributed under certain assumptions about the error terms.
What happens when variables have near perfect multicollinearity?
It can cause high variance in coefficient estimates, making them unstable.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
00:00:12
SARA ELLISON: OK, so last time right
00:00:14
at the end of the lecture, I had introduced
00:00:19
a more general linear model, the multivariate linear model.
00:00:25
And I had just gone through the first couple of these slides,
00:00:30
saying let's analyze this model using a different notation,
00:00:36
in particular matrix notation, because the summation
00:00:41
notation was just too clunky.
00:00:42
It wasn't up for the job.
00:00:44
And so let me just go through quickly.
00:00:48
Let's see.
00:00:49
This was, I think, the next to last slide I had up.
00:00:53
So if we let y be the column vector of all
00:00:59
of the observations on the dependent variable,
00:01:02
then let epsilon be the column vector of all of the errors,
00:01:06
and then let x be the matrix, where
00:01:11
across the rows of the matrix, we
00:01:13
have first the column of ones, and then a column of each
00:01:20
of the explanatory variables.
00:01:23
And then sort of down the matrix,
00:01:29
we have observations on each of the--
00:01:33
well, we have each of the observations.
00:01:35
Each observation corresponds to a row.
00:01:37
So if we define this matrix and vectors this way,
00:01:43
then we can write our multivariate linear model
00:01:48
in the following very parsimonious fashion.
00:01:51
y equals x beta plus epsilon.
00:01:56
OK, so now, I'm basically going to go
00:02:03
through the same assumptions, but in a slightly more general
00:02:06
way from when I discussed assumptions
00:02:10
in the bivariate model.
00:02:11
The assumptions have to be discussed in a more general way
00:02:15
now because actually, they're a little bit more complicated
00:02:17
with the multivariate model.
00:02:22
So I'm going to condense all of the assumptions
00:02:26
into two basic categories.
00:02:28
One is the identification assumptions.
00:02:30
And two is the assumptions on the error behavior.
00:02:33
OK, so in the multivariate linear model,
00:02:38
in order to have identification, in order
00:02:40
to be able to estimate our model,
00:02:42
we have to have n greater than k plus 1.
00:02:46
That just means we have to have more observations than we have
00:02:51
explanatory variables, plus 1.
00:02:55
And x has to have full column rank of k plus 1.
00:03:02
And what does that mean?
00:03:04
In other words, means that the regressors have
00:03:07
to be linearly independent.
00:03:09
Or another way of saying this is that the matrix x prime, or x
00:03:13
transpose x, is invertible.
00:03:16
I'm going to go through this in some more
00:03:17
detail in just a second.
00:03:20
And then the second main assumption or main category
00:03:23
of assumptions are on the error behavior.
00:03:25
And these actually are exactly like the assumptions
00:03:28
we saw before.
00:03:29
I'm just using matrix notation to express them.
00:03:33
So here we have the expectation of epsilon here.
00:03:37
Epsilon is a vector.
00:03:38
And that's the vector of zeros.
00:03:41
The expectation of epsilon, epsilon transpose
00:03:45
is equal to sigma squared times the n by n identity matrix.
00:03:53
And this is in fact--
00:03:56
this matrix here is, in fact, just the matrix
00:04:03
that we denote covariance of epsilon, which I'll show you
00:04:08
a picture of it in a second.
00:04:09
It's just a matrix that has the variances
00:04:12
of epsilon along the diagonal and the covariances
00:04:15
on the off diagonal.
00:04:18
And so what we're saying here is that the diagonal
00:04:21
is equal to sigma squared.
00:04:25
And the off diagonals are zeros.
00:04:27
00:04:30
In a stronger version of this is the epsilon vector
00:04:37
has this multivariate normal distribution
00:04:40
with this variance/covariance matrix.
00:04:43
I'll go into more detail in both of these in just a second.
00:04:50
So let's do the--
00:04:51
oh, n by n identity matrix.
00:04:55
OK, so let's take a closer look at these assumptions.
00:04:58
So assumption one, the identification assumption,
00:05:02
what exactly does this mean?
00:05:05
Well, we need to have more observations than regressors.
00:05:07
That shouldn't come as a surprise,
00:05:09
especially if we think about the bivariate model.
00:05:12
We have one regressor.
00:05:13
You have to have at least two observations or else
00:05:16
you can't draw a line.
00:05:19
So this is just sort of generalizes
00:05:21
that to higher dimensions.
00:05:24
We can't have any regressors that do not
00:05:26
have positive sample variation.
00:05:29
So we saw this assumption in the bivariate case.
00:05:34
Remember I had a picture that looked like--
00:05:37
00:05:48
I had a picture that looked like this.
00:05:50
And I said if all of our observations
00:05:52
are on the same value of x, we can't
00:05:57
identify how the conditional mean of y changes with x.
00:06:06
Well, again, in the multivariate regression,
00:06:10
or the multivariate case, we can't
00:06:13
identify a particular parameter if we
00:06:17
have a regressor that doesn't have positive sample variation.
00:06:19
So all of our regressors have to have positive sample variation.
00:06:23
And then the third one-- and this
00:06:24
is the one that actually trips people up sometimes-- is
00:06:28
that we can't have any regressors that
00:06:30
are linear functions of one or more other regressors.
00:06:35
And in matrix notation, that's another way
00:06:40
to say that is the regressors are linearly independent.
00:06:43
And that turns out to be equivalent to x
00:06:46
prime x being invertible.
00:06:47
Yep.
00:06:48
STUDENT: Can you give an example of that?
00:06:50
SARA ELLISON: I will give two examples in fact.
00:06:54
OK, so here is one example.
00:06:58
Let's imagine a case where we want
00:07:00
to estimate the effect of schooling, work
00:07:03
experience, and age on salary.
00:07:06
And we have individual level data.
00:07:10
So we have sort of a data set.
00:07:11
And we have a bunch of different salaries.
00:07:14
And then we also have each person's years of schooling,
00:07:18
each person's years of work experience, each person's age,
00:07:22
and maybe some other stuff too.
00:07:24
Doesn't matter.
00:07:26
Well, it could be in our particular sample,
00:07:29
it's quite possible that everyone in our sample
00:07:32
started school at age six, went to school
00:07:35
until he or she finished school, and then started working.
00:07:39
Wouldn't be crazy if that happened.
00:07:41
Well, if in fact that was the case, then
00:07:45
the years of schooling plus the years of work experience
00:07:48
plus 6 is equal to the age.
00:07:51
00:07:54
So if that's the case, we can't estimate this regression
00:07:59
equation.
00:08:00
And it sort of makes logical sense too in the sense
00:08:05
that there's nothing that helps us separately
00:08:12
identify what the effect of schooling,
00:08:15
and work experience, and age are.
00:08:18
There's no variation that allows us to separately figure out
00:08:22
the effects of all three of those, if, in fact, they're
00:08:26
collinear in our sample.
00:08:29
So we can't estimate such a model.
00:08:32
Is that clear?
00:08:33
STUDENT: So you would drop a regressor.
00:08:35
SARA ELLISON: Exactly, you have to drop one of the regressors.
00:08:37
00:08:39
Does this make sense?
00:08:42
Yes.
00:08:43
STUDENT: If you drop age it still wouldn't work,
00:08:45
like if you those?
00:08:47
SARA ELLISON: If you drop age, it still wouldn't work?
00:08:49
Yes.
00:08:50
STUDENT: It still wouldn't work or--
00:08:52
SARA ELLISON: It would.
00:08:54
It wouldn't work if everyone went to school until age 18.
00:09:00
So we would still need to have--
00:09:04
we couldn't have a perfect linear relationship
00:09:07
between number of years of schooling--
00:09:12
well, actually, that would just be no sample variation in years
00:09:17
of schooling.
00:09:20
But if some people went to school until age 18,
00:09:23
and some people went to age 20, and some went to age 25,
00:09:26
then if we dropped age from this regression,
00:09:29
then we could estimate it.
00:09:32
STUDENT: You say why this doesn't
00:09:36
hold is that if we took x1, x2, and x3,
00:09:40
we could get values of beta 1, beta 2, and all of these, which
00:09:43
would essentially make y 0.
00:09:45
So 1, 1, and minus 1, for example.
00:09:48
So then this equation would go [INAUDIBLE]
00:09:50
and then y would be 0 in that case.
00:09:53
SARA ELLISON: So that's not the intuition I have.
00:09:56
That may be correct in some way.
00:09:59
But that's certainly not the intuition I have.
00:10:02
I would just say that my intuition is just that we don't
00:10:08
have any variation to separately identify
00:10:10
the effects of these three things
00:10:12
if they're perfectly linearly associated with one another.
00:10:17
Yes.
00:10:18
STUDENT: So what if the two regressor
00:10:20
are somewhat relating, but they are not perfectly relating?
00:10:25
SARA ELLISON: So that's an excellent question.
00:10:27
So the question was, what if they are closely related?
00:10:31
So maybe this doesn't quite hold in our sample.
00:10:34
But it comes close to holding.
00:10:36
Like we had a few people who went to school at age
00:10:38
five instead of age six.
00:10:40
And we had like a couple people who took a year off and didn't
00:10:43
work.
00:10:44
So this linear relationship is close to holding,
00:10:47
but not quite.
00:10:48
That's something that Esther might be
00:10:50
able to talk about next time.
00:10:52
I'm not sure.
00:10:53
So basically, in that case, you can--
00:10:57
maybe, maybe not.
00:10:59
But anyhow, I'll tell you the answer.
00:11:01
In that case, you can estimate this equation.
00:11:04
But you end up sort of having trouble separately identifying
00:11:11
the coefficients on these variables,
00:11:14
on these three variables.
00:11:15
And in fact, they're going to be coefficients that have very--
00:11:19
that your estimator is going to be of very high variance
00:11:23
estimator.
00:11:24
And so in that case, what you might want to do
00:11:26
is still drop one of them.
00:11:28
That's going to give you much lower variance estimators
00:11:32
for the remaining two.
00:11:33
It's going to introduce a little bit of bias,
00:11:36
if that regressor belongs in there.
00:11:38
It's going to introduce a little bit of bias.
00:11:40
But you might be willing to accept that bias
00:11:42
to have much lower variance estimators.
00:11:46
Yes.
00:11:46
STUDENT: [INAUDIBLE] that digression of--
00:11:51
what if you had a large data set with a lot of variables.
00:11:54
And you don't know that there are regressions in there
00:11:58
that do have [INAUDIBLE]?
00:11:59
What are the things that can insinuate from this,
00:12:02
could be causing problems in my analysis or--
00:12:05
SARA ELLISON: Yeah, so first of all,
00:12:08
if I tried to run this regression
00:12:10
and this linear relationship existed in my data set,
00:12:14
R would throw up its hands and say, you can't do that.
00:12:19
So I can't even do it.
00:12:23
So you would find that out.
00:12:24
If this relationship didn't quite exist,
00:12:29
it was close to existing in the data set,
00:12:31
but not quite, R would go ahead and give you
00:12:34
the results of this.
00:12:36
But one thing that you could do is
00:12:39
you could compute the correlation coefficients
00:12:44
for all of your regressors before you run the regression
00:12:46
and see if any are really highly correlated.
00:12:49
That wouldn't necessarily pick up a linear relationship
00:12:53
like this.
00:12:54
But the other thing you could do is
00:12:55
after you run the regression, if you
00:12:57
have really large standard errors, that
00:13:00
could be a signal to you that you
00:13:03
could have this situation that's close to perfect collinearity.
00:13:08
Yes.
00:13:09
STUDENT: So if we have two regressors, x1 and x2,
00:13:12
could we use the ratio in some ways to [INAUDIBLE] regression,
00:13:15
as a third regressor?
00:13:16
Or would that also be [INAUDIBLE]??
00:13:18
SARA ELLISON: So that wouldn't induce
00:13:23
this sort of perfect collinearity problem.
00:13:29
You could.
00:13:30
You might not want to do it for other reasons.
00:13:33
But yeah.
00:13:35
ESTHER DUFLO: There's one more.
00:13:36
Continuing the question on if you had many [INAUDIBLE]
00:13:39
and you didn't know which one to pick,
00:13:42
[INAUDIBLE] fall away from traditional econometrics,
00:13:46
it becomes then this [INAUDIBLE] we're
00:13:49
going to talk about when we talk about machine learning.
00:13:51
If you really don't want--
00:13:53
traditional econometrics assumes that you
00:13:55
have a model that you are trying to test so you don't go
00:13:58
on a giant fishing expedition.
00:14:01
If you want to go on a giant fishing expedition,
00:14:04
there are techniques for that.
00:14:05
And that's [INAUDIBLE].
00:14:06
That's what we are going to introduce.
00:14:09
00:14:12
SARA ELLISON: Good.
00:14:13
OK, so that's one example.
00:14:15
A second example of this perfect multi-colinearity and one
00:14:23
that sort of researchers run afoul of all the time
00:14:27
is when they use dummy variables to indicate, say, observations
00:14:33
falling into exhaustive and mutually
00:14:37
exclusive set of classes.
00:14:39
So here's an example.
00:14:41
Let's suppose I have a data set.
00:14:42
Let's say I go talk to all my friends
00:14:46
in the dorm to collect my data set for 1431.
00:14:48
And I ask them what pets they have at home.
00:14:50
And let's say all of them have pets.
00:14:54
We could have a category for no pet as well.
00:14:56
But anyhow, let's say all of them have pets.
00:14:58
But they either have a cat, a dog, or a fish.
00:15:01
And so then I create three different dummy variables.
00:15:04
One is equal to 1 if they have a cat, and 0 otherwise.
00:15:08
1 is equal to 1 if they have a dog, and 0 otherwise.
00:15:11
And 1 is equal to 1 if they have a fish, and 0 otherwise.
00:15:14
And everyone has exactly one of those pets.
00:15:20
I cannot include all three of those dummy variables
00:15:24
in the regression because if we add up those three dummy
00:15:29
variables, we get a column of ones.
00:15:31
And that's perfectly co-linear with our column
00:15:35
of ones that allows us to estimate the intercept.
00:15:40
Now, there are other ways.
00:15:44
You can, in fact, decide to include all three
00:15:49
dummy variables and not include an intercept,
00:15:51
not estimate an intercept in this regression.
00:15:54
It's entirely equivalent.
00:15:57
It would be a little troubling if it wasn't equivalent.
00:15:59
But it is in fact entirely equivalent.
00:16:01
It's just have to interpret the coefficient estimates
00:16:03
a different way.
00:16:04
But you can't have both an intercept in your regression
00:16:06
and a set of dummy variables that
00:16:09
are a full set of exhaustive and mutually exclusive classes.
00:16:15
And like I said, R will not let you do this anyhow.
00:16:18
00:16:21
OK, so now the second assumption or second-- yes.
00:16:26
STUDENT: [INAUDIBLE] data?
00:16:29
SARA ELLISON: It only changes the interpretation.
00:16:32
So basically, I think I'll leave that question for Esther
00:16:38
because she will be giving examples
00:16:40
of how to use dummy variables in regressions
00:16:43
and how to interpret the coefficients.
00:16:44
So we'll leave that till later.
00:16:48
OK, so then the second assumption
00:16:51
was about the error behavior.
00:16:53
And as I said before, these assumptions
00:16:56
aren't different for the multivariate model.
00:17:00
It's just that I've expressed them using matrix notation.
00:17:03
So let me just go through these exactly the same assumptions
00:17:08
we saw in the bivariate model.
00:17:10
Here I'm using matrix notation.
00:17:12
So I'll just go through and show you what they mean.
00:17:15
So first of all, expectation of epsilon is equal to 0.
00:17:22
Epsilon is a vector.
00:17:23
And so it's just equal to a vector of zeros.
00:17:27
And then for some reason, we often
00:17:31
write instead of writing the assumption
00:17:35
as the covariance matrix of epsilon
00:17:38
equals sigma squared times the identity, the n
00:17:42
by n identity matrix, we write it
00:17:44
as the expectation of epsilon epsilon transpose
00:17:48
is equal to that.
00:17:49
Well, it turns out because the expectation of epsilon
00:17:53
is identically equal to 0, this matrix is equal to this matrix.
00:18:01
You can do the calculations.
00:18:02
It's just two lines to convince yourself.
00:18:04
But I could have expressed this assumption
00:18:08
by just saying that this matrix is equal to this.
00:18:13
But for whatever reason, we often
00:18:16
see it written as this matrix is equal to this, same thing.
00:18:22
Does everyone understand why this matrix equaling this
00:18:28
is exactly the same assumptions we saw before?
00:18:32
So this matrix is just simply a matrix
00:18:36
containing the variances of the epsilons on the diagonal.
00:18:40
So remember before we said each epsilon had
00:18:43
variance sigma squared?
00:18:45
That was our homoscedasticity assumption.
00:18:47
So each variance is sigma squared.
00:18:50
And then all the covariances were 0.
00:18:53
That was our no serial correlation assumption.
00:18:56
All of the off diagonals here are 0.
00:18:59
So it's the same thing, just in matrix form.
00:19:01
00:19:09
Yeah, I think I said this verbally.
00:19:12
But this thing is denoted covariance of epsilon.
00:19:17
And it's called the variance-covariance matrix
00:19:19
of epsilon.
00:19:19
00:19:23
OK, fine.
00:19:25
We've got this linear model.
00:19:26
We've got these assumptions.
00:19:27
Just like before, we're going to now ask the question,
00:19:30
how do we get beta hat?
00:19:32
And what distribution does beta hat have?
00:19:34
And the answers are not going to be surprising.
00:19:37
But they're going to be more beautiful than they
00:19:39
were last time.
00:19:40
So what is beta hat?
00:19:42
Well, it's a vector that minimizes
00:19:44
the sum of squared errors.
00:19:45
So we've got a vector of residuals transpose
00:19:51
times a vector of residuals.
00:19:53
And that's sort of expanded out what it looks like.
00:20:00
OK, so we want to choose the beta hat that
00:20:03
minimizes that thing.
00:20:04
So what we do is we take the derivative with respect
00:20:07
to beta, set it equal to 0, and obtain this.
00:20:13
If you're not used to doing calculus
00:20:15
with vectors and matrices, in the notes that I posted online,
00:20:20
I write this out in more detail.
00:20:23
And you can take a look at that if you want.
00:20:25
But basically, we get this sort of equation set equal to 0.
00:20:33
And this is going to tell us what
00:20:35
the beta hat that minimizes the sum of squared residuals is.
00:20:40
Then we solve for beta hat.
00:20:42
The negative 2 we can just divide both sides
00:20:46
by negative 2.
00:20:47
And so that goes away.
00:20:48
Then we write this equation as x prime y,
00:20:54
or x transpose y equals x transpose x times beta hat.
00:20:59
And then if this is invertible--
00:21:04
and remember, that was one of our assumptions.
00:21:06
That was our identification assumption,
00:21:08
that that thing was invertible.
00:21:09
If that's invertible, then we get that beta hat
00:21:13
is just equal to x prime x inverse x prime y.
00:21:17
00:21:20
Beautiful.
00:21:21
00:21:31
I mean, that was literally the derivation
00:21:33
of the least squares estimators in matrix notation.
00:21:37
If you look at my notes that I've posted online,
00:21:39
I mean, it's just pages of algebra using that summation
00:21:43
notation.
00:21:44
So this is why we love doing it in matrix notation.
00:21:49
What do we want to know about beta hat?
00:21:51
What do we always want to know about an estimator,
00:21:54
so we can do inference?
00:21:56
It's distribution.
00:21:58
Oh, it's right up there.
00:22:01
OK, fine.
00:22:02
So the expectation of beta hat is equal to beta.
00:22:07
Again, in matrix notation, it's very simple.
00:22:10
I haven't included the four lines or something like that.
00:22:13
But if you treat the x's as fixed,
00:22:17
then that makes the sort of proof very simple.
00:22:21
They come outside the expectation operator
00:22:23
and basically just falls out.
00:22:27
So it is unbiased.
00:22:29
And the covariance of beta hat, remember
00:22:32
this is the variance-covariance matrix.
00:22:36
So it's the matrix that has along the diagonal
00:22:39
the variances of each of the beta hats
00:22:41
and on the off diagonals, the covariances between them.
00:22:44
That is just equal to sigma squared times
00:22:48
x prime x inverse.
00:22:51
So again, very elegant, very beautiful, and not too hard
00:22:56
to show if you treat the x's as fixed.
00:22:59
And you can look on the website if you're interested.
00:23:01
00:23:05
And finally, we often don't know what sigma squared is.
00:23:09
For inference, we need to know what sigma squared is
00:23:12
or we need an estimate for sigma squared.
00:23:15
So this is our unbiased estimate for sigma squared.
00:23:20
And as Esther anticipated, here we
00:23:25
have to subtract off a k instead of a 2
00:23:28
because instead of it being a bivariate model,
00:23:33
it's a multivariate model.
00:23:36
00:23:39
And then finally, if we're willing to impose the more
00:23:44
strict assumption on the error distribution,
00:23:47
the errors are normally distributed,
00:23:50
then the beta hats are also normally distributed.
00:23:53
So sometimes we want to impose that.
00:23:55
Sometimes we want to be less proscriptive.
00:24:01
OK, so now, finally we get to inference.
00:24:04
So typically, in the linear model,
00:24:07
we're going to want to test hypotheses involving the betas.
00:24:11
That's where the real action is.
00:24:12
I mean, I can dream up hypotheses involving sigma
00:24:15
squared and things like that.
00:24:17
And that's fine.
00:24:18
And occasionally, you might want to test a hypothesis involving
00:24:21
sigma squared.
00:24:22
But really we care about the betas
00:24:24
because the betas are the parameters
00:24:25
in our conditional mean function of our outcome variable.
00:24:29
And the questions that we usually
00:24:31
want to answer using linear regression
00:24:33
are about the nature of this conditional mean function.
00:24:37
So sometimes we might only be interested in one of the betas.
00:24:42
Other times we might want to simultaneously test hypotheses
00:24:45
about a whole bunch of them.
00:24:48
And as we saw in the output that I showed you last lecture,
00:24:52
statistical packages typically perform some standard tests
00:24:57
on the betas for free and just report them with the output.
00:25:03
And that's fine.
00:25:05
And we can use those.
00:25:06
And they're often quite handy.
00:25:08
But there may be other ones that we need to do ourselves.
00:25:11
So they don't perform every conceivable test
00:25:14
we might be interested in.
00:25:17
OK, so let's start with a pretty general framework
00:25:21
for testing hypotheses about beta.
00:25:24
And it's not only quite general and flexible.
00:25:27
It's also super intuitive.
00:25:28
It's one of my favorite tests.
00:25:30
I really like it.
00:25:31
OK, so let's consider hypotheses of the following form.
00:25:36
A matrix r times beta is equal to a vector c.
00:25:40
That's the null hypothesis.
00:25:42
The alternative is that it's not equal to the vector c.
00:25:46
So what is this matrix r?
00:25:49
It's a matrix of restrictions.
00:25:52
And its dimensions are r by k plus 1.
00:25:56
So it has the number of--
00:25:59
so the number of columns equal to the number
00:26:03
of parameters, the number of betas
00:26:05
that we're estimating in the linear model.
00:26:07
And then the number of rows is the number of restrictions
00:26:10
that we want to impose in our null hypothesis,
00:26:14
the number of restrictions we want to test.
00:26:18
So we could have a matrix, where r is equal to 1.
00:26:22
And then we're just testing one restriction.
00:26:24
So that would correspond to something
00:26:26
like beta 1 is equal to 0.
00:26:28
00:26:33
Oh, so let me just say this.
00:26:34
I'll get to some examples in a minute.
00:26:37
So almost any hypothesis involving
00:26:39
beta you can dream up in the context of a linear model
00:26:42
can be captured in this framework, not
00:26:44
quite any, but most of them.
00:26:46
You can test whether individual parameters are equal to 0.
00:26:49
You can test whether individual parameters
00:26:51
are equal to something other than 0.
00:26:53
You can test multiple hypotheses simultaneously.
00:26:56
You can test hypotheses about linear combinations
00:26:59
of parameters.
00:27:00
The world is your oyster.
00:27:02
So let me show you a few examples of these
00:27:05
and exactly what the r matrix looks like
00:27:08
and what the c vector looks like in these examples.
00:27:12
OK, so let's say, for instance, that we set up the matrix
00:27:19
r to be just a row vector with a 0 in the first spot,
00:27:25
and then a 1, and then the rest 0s.
00:27:28
So what that matrix is doing is it's picking out beta 1.
00:27:32
Remember, this spot corresponds to beta 0.
00:27:38
So being in the second spot, it's picking out beta 1.
00:27:41
And c is just what beta 1 is equal to under the null.
00:27:49
So that r and this c corresponds to the hypothesis
00:27:54
that beta 1 is equal to 0.
00:27:55
00:27:59
Let's suppose instead that we want
00:28:03
to test a whole bunch of hypotheses
00:28:06
simultaneously, that beta 1 is equal to 0,
00:28:09
and beta 2 is equal to 0, and beta 3
00:28:11
is equal to 0, et cetera.
00:28:14
Well, then this is what our matrix would look like.
00:28:17
So it would basically be an identity matrix with a column
00:28:22
of 0's tacked on the front.
00:28:24
And the reason why the column of 0's is tacked on the front
00:28:26
is because that corresponds to the intercept.
00:28:29
And we're not interested at least here
00:28:32
in testing a hypothesis about the intercept.
00:28:36
And then the c vector is just a vector of 0's.
00:28:40
00:28:45
So I do want to emphasize, even though I've sort of written
00:28:49
this as like one equation, this is actually
00:28:52
we're testing k hypotheses simultaneously here.
00:28:55
So we have k equal signs.
00:28:57
00:29:02
OK, so here's a more complicated example.
00:29:06
If our r matrix has in the first row a 1 and a negative 1,
00:29:13
and then the rest 0's, sorry, 0, and then 1, negative 1,
00:29:16
the rest 0's.
00:29:17
And then the second row there's a 1
00:29:21
in the fourth spot, et cetera.
00:29:26
And then the c vector looks like this.
00:29:29
What does this correspond to in terms
00:29:31
of a hypothesis we might want a test or a series of hypotheses?
00:29:36
Well, here, the first row gives us the hypothesis
00:29:40
that beta 1 minus beta 2 is equal to 0.
00:29:45
So I could just write that as beta 1 is equal to beta 2.
00:29:50
The second row corresponds to beta 3--
00:29:53
this is beta 3 here-- being equal to 5.
00:29:57
And the third row corresponds to beta k
00:30:01
being equal to negative 2.
00:30:04
Yes.
00:30:04
STUDENT: Can you explain the beta 1 equals beta 2?
00:30:08
SARA ELLISON: OK, so if I just multiply the matrix, the r
00:30:18
matrix, by beta and sort of wrote these out as equations,
00:30:22
I would get beta 1 minus--
00:30:26
so this is beta 0 here.
00:30:28
So here's beta 1 minus beta 2 is equal to 0.
00:30:34
And then I just rewrote that as beta 1 is equal to beta 2.
00:30:37
That's all.
00:30:38
Yep.
00:30:39
STUDENT: How often are we [INAUDIBLE] specific value
00:30:42
rather than the range?
00:30:44
And is there a [INAUDIBLE] against that?
00:30:47
SARA ELLISON: So yes and no.
00:30:50
So basically, if we're not interested in--
00:30:56
if we're interested in whether beta is in a range,
00:30:59
then what we might want to do is instead
00:31:01
of doing a hypothesis test, where
00:31:03
the null was a single value and the alternative
00:31:06
was everything else, we might want
00:31:07
to do, say, a one-sided test, where the null is that beta
00:31:11
is less than some value and the alternative
00:31:14
is that it's greater than some value.
00:31:15
We can do those.
00:31:17
We can't do them in this framework.
00:31:19
So I'll talk about that in a second.
00:31:21
The other thing that you might be suggesting
00:31:23
is instead of doing hypothesis testing,
00:31:26
we might want to just report confidence intervals as well.
00:31:29
So remember that really hypothesis
00:31:32
testing and constructing confidence intervals
00:31:34
are kind of the same thing.
00:31:36
It's just reporting the same information in different forms.
00:31:39
And so it can just be a matter of style or preference.
00:31:45
Instead of reporting hypothesis tests,
00:31:48
you report confidence intervals.
00:31:50
And that's perfectly fine.
00:31:52
Yeah.
00:31:53
ESTHER DUFLO: So confidence interval,
00:31:54
it can be harder to say whether between the [INAUDIBLE]
00:31:58
minus [? 4 ?] is [INAUDIBLE].
00:32:02
I mean, it's kind of hard to see.
00:32:04
They don't really add up.
00:32:05
SARA ELLISON: Yeah, and I guess the other more fundamental
00:32:09
answer to your question is that sometimes we actually do--
00:32:13
there might be a theory that says beta
00:32:16
should be equal to this number.
00:32:18
And in order to test that theory,
00:32:20
we want to perform a hypothesis test that beta
00:32:22
is equal to that number.
00:32:24
So that does come up, not every case.
00:32:28
But yeah, it is relevant.
00:32:31
Other questions?
00:32:32
No.
00:32:34
STUDENT: You could also use it to if somebody came out
00:32:37
with a paper today describing the treatment of malaria
00:32:40
[INAUDIBLE] wanted to see if that was true or not,
00:32:42
just take that beta and test for it and do hypothesis testing?
00:32:47
SARA ELLISON: Yeah.
00:32:48
STUDENT: OK.
00:32:48
00:32:51
SARA ELLISON: OK, oh, here's part
00:32:55
of the answer to your question.
00:32:56
One thing you can't do in this framework
00:32:58
is test one-sided hypotheses.
00:33:00
We'll get back to those.
00:33:03
So now we have this framework.
00:33:07
I mean, it's not really a framework, just
00:33:09
sort of a notation in some sense to deal
00:33:13
with hypotheses of all of the forms we just talked about.
00:33:18
And within the regression framework,
00:33:21
we have a super intuitive and cool way
00:33:23
to test these hypotheses.
00:33:25
So first of all, let's think of the null
00:33:28
as describing a set of restrictions on the model.
00:33:31
So let me just go back for a second.
00:33:34
So in this case, this null has three different restrictions,
00:33:39
that beta 1 is equal to beta 2, that beta 3 is equal to 5,
00:33:42
and that beta k is equal to minus 2.
00:33:45
And we think of the null as imposing restrictions
00:33:48
on the model.
00:33:50
Then here's how we perform the test.
00:33:52
We estimate the unrestricted model.
00:33:55
We impose the restrictions of the null
00:33:57
and estimate that model.
00:34:00
And then we compare the goodness of fit of those two models.
00:34:04
So that's why I love this test.
00:34:05
It seems really intuitive to me that if you
00:34:10
have a set of restrictions and they really bind,
00:34:13
and they really sort of affect how good your fit is, then
00:34:18
that tells you, well, maybe those restrictions are not
00:34:22
true.
00:34:23
If the restrictions on the other hand
00:34:26
don't really bind that much, if your model fits almost as
00:34:33
well with the restricted model as it
00:34:36
does with the unrestricted model, then
00:34:38
that tells you maybe these restrictions
00:34:40
are true or close to true.
00:34:41
And we don't want to reject them.
00:34:44
So that's the whole intuition and the idea behind this test.
00:34:48
00:34:51
OK, so a couple of details before we
00:34:57
get to the distribution, the test statistic.
00:34:59
Estimating the unrestricted model is simple.
00:35:02
Just run the regression.
00:35:03
But how do we estimate the restricted model?
00:35:06
Well, it depends on what form the restrictions take.
00:35:10
So let's say we're testing hypothesis,
00:35:12
where just a bunch of the betas are equal to 0.
00:35:15
How do we run the restricted model?
00:35:18
00:35:22
Yes.
00:35:23
STUDENT: We just think the [INAUDIBLE]
00:35:25
is kind of on the diagonal 1 so that [INAUDIBLE]..
00:35:28
SARA ELLISON: Yes, exactly.
00:35:31
So practically speaking, what we do is we just run
00:35:34
the regression leaving out all of those x's.
00:35:37
So that's the way we constrain the coefficients to be
00:35:40
equal to 0.
00:35:42
So we have the unrestricted.
00:35:45
The unrestricted regression is just
00:35:47
all of the x's are in there.
00:35:49
If we want to restrict that certain betas are equal to 0,
00:35:51
we just run another regression, where we leave out the x's
00:35:56
associated with the betas that we want to have equal to 0.
00:36:00
So that's our restricted model.
00:36:05
Then let's say the restriction is
00:36:09
that the two betas are equal.
00:36:10
We have beta 1 equals beta 2, or something like that.
00:36:13
That's our null restriction.
00:36:16
Then how do we impose that restriction on a linear model?
00:36:20
Well, actually, it might help if I write the linear model.
00:36:25
So we have y sub i equals beta 0 plus beta 1 x1.
00:36:35
I hope I'm using the same notation.
00:36:36
00:36:40
Do I have my subscripts in the same order?
00:36:43
I hope so.
00:36:43
00:36:48
OK, so let's suppose this is our unrestricted model.
00:36:56
We want to restrict beta 1 to be equal to beta 2.
00:36:59
Well, what do we do?
00:37:01
We just create a new variable that's the sum of these two.
00:37:06
So this is called x1i plus x2i, just a new variable.
00:37:16
And then we only estimate one coefficient on that.
00:37:23
So our restricted model is just that we
00:37:27
don't include this variable as a regressor or this variable
00:37:30
as a regressor.
00:37:31
We include their sum as a regressor.
00:37:35
And that's how we're imposing the null restriction
00:37:37
because when we include their sum,
00:37:39
we're making their two coefficients equal.
00:37:42
We're forcing their two coefficients to be equal.
00:37:46
Yeah.
00:37:47
STUDENT: So are we testing the first beta
00:37:49
1 and the second beta 1 are the same?
00:37:51
SARA ELLISON: Exactly, yes.
00:37:53
This is testing the hypothesis that beta 1 is equal to beta 2.
00:38:00
Yep.
00:38:02
STUDENT: Is that different from testing the beta 2 [? at 0? ?]
00:38:05
SARA ELLISON: Yeah, it's definitely different.
00:38:07
So here, these betas could be anything.
00:38:12
They could be a million.
00:38:14
We're just testing the hypothesis that they're equal.
00:38:17
00:38:20
Yes.
00:38:21
00:38:24
STUDENT: But if they're equal, wouldn't it
00:38:26
be like a linear combination of the ther [INAUDIBLE]??
00:38:29
You know how they cannot be like a linear sum?
00:38:34
00:38:38
SARA ELLISON: I'm not sure if I understand your question.
00:38:40
So basically, what I'm trying to do here
00:38:43
is impose just this hypothesis, but not
00:38:46
impose anything else about what the betas might be equal to.
00:38:51
STUDENT: The identification restriction
00:38:54
is not on the betas.
00:38:55
It's on the x's.
00:38:57
Beta can have whatever medium conditions.
00:39:01
SARA ELLISON: Ah, you were confused
00:39:02
about the identification assumption.
00:39:04
Yep, yep, yep, that's right.
00:39:05
00:39:08
ESTHER DUFLO: You can reask your question
00:39:09
saying, if you could post the sum on just x
00:39:13
and it turned out that in fact they were equal,
00:39:16
you can guess what the beta [INAUDIBLE] following x.
00:39:20
00:39:23
SARA ELLISON: OK, what if the restriction
00:39:25
is that some beta is equal to a constant c?
00:39:29
How would we impose that restriction
00:39:32
and then re-estimate the restricted model?
00:39:35
Well, let's suppose this is just a constant.
00:39:39
So we impose that this one is equal to a constant.
00:39:42
So then here, there's no parameter in this term
00:39:47
that we need to estimate under the null.
00:39:50
So we just subtract the constant times x1
00:39:54
from the dependent variable and rerun that regression.
00:39:58
And that's our restricted regression.
00:40:01
Does that make sense?
00:40:03
OK, so going back, we estimate the unrestricted model.
00:40:10
We impose restrictions and estimate that model.
00:40:13
And then we compare the goodness of fit.
00:40:15
And if the goodness of fit is not very different,
00:40:19
we don't reject the null.
00:40:20
If it's very different, we reject the null.
00:40:23
In particular, this test statistic,
00:40:27
which is basically the numerator has the difference
00:40:31
in the sum of the restricted and unrestricted sum of squares
00:40:36
and then in the denominator has the unrestricted sum
00:40:39
of squares.
00:40:40
This is how we form the test statistic.
00:40:42
And it turns out that has an F distribution under the null.
00:40:47
And we reject the null for large values of this test statistic.
00:40:53

الوسوم

multivariate linear model
matrix notation
identification assumptions
error behavior
multicollinearity
hypothesis testing
goodness of fit
dummy variables
variance-covariance matrix
beta hat