00:00:00
00:00:12
SARA ELLISON: OK,
so last time right
00:00:14
at the end of the
lecture, I had introduced
00:00:19
a more general linear model,
the multivariate linear model.
00:00:25
And I had just gone through the
first couple of these slides,
00:00:30
saying let's analyze this model
using a different notation,
00:00:36
in particular matrix notation,
because the summation
00:00:41
notation was just too clunky.
00:00:42
It wasn't up for the job.
00:00:44
And so let me just
go through quickly.
00:00:48
Let's see.
00:00:49
This was, I think, the next
to last slide I had up.
00:00:53
So if we let y be the
column vector of all
00:00:59
of the observations on
the dependent variable,
00:01:02
then let epsilon be the column
vector of all of the errors,
00:01:06
and then let x be
the matrix, where
00:01:11
across the rows
of the matrix, we
00:01:13
have first the column of ones,
and then a column of each
00:01:20
of the explanatory variables.
00:01:23
And then sort of
down the matrix,
00:01:29
we have observations
on each of the--
00:01:33
well, we have each
of the observations.
00:01:35
Each observation
corresponds to a row.
00:01:37
So if we define this matrix
and vectors this way,
00:01:43
then we can write our
multivariate linear model
00:01:48
in the following very
parsimonious fashion.
00:01:51
y equals x beta plus epsilon.
00:01:56
OK, so now, I'm
basically going to go
00:02:03
through the same assumptions,
but in a slightly more general
00:02:06
way from when I
discussed assumptions
00:02:10
in the bivariate model.
00:02:11
The assumptions have to be
discussed in a more general way
00:02:15
now because actually, they're
a little bit more complicated
00:02:17
with the multivariate model.
00:02:22
So I'm going to condense
all of the assumptions
00:02:26
into two basic categories.
00:02:28
One is the identification
assumptions.
00:02:30
And two is the assumptions
on the error behavior.
00:02:33
OK, so in the
multivariate linear model,
00:02:38
in order to have
identification, in order
00:02:40
to be able to
estimate our model,
00:02:42
we have to have n
greater than k plus 1.
00:02:46
That just means we have to have
more observations than we have
00:02:51
explanatory variables, plus 1.
00:02:55
And x has to have full
column rank of k plus 1.
00:03:02
And what does that mean?
00:03:04
In other words, means
that the regressors have
00:03:07
to be linearly independent.
00:03:09
Or another way of saying this
is that the matrix x prime, or x
00:03:13
transpose x, is invertible.
00:03:16
I'm going to go through
this in some more
00:03:17
detail in just a second.
00:03:20
And then the second main
assumption or main category
00:03:23
of assumptions are on
the error behavior.
00:03:25
And these actually are
exactly like the assumptions
00:03:28
we saw before.
00:03:29
I'm just using matrix
notation to express them.
00:03:33
So here we have the
expectation of epsilon here.
00:03:37
Epsilon is a vector.
00:03:38
And that's the vector of zeros.
00:03:41
The expectation of
epsilon, epsilon transpose
00:03:45
is equal to sigma squared times
the n by n identity matrix.
00:03:53
And this is in fact--
00:03:56
this matrix here is, in
fact, just the matrix
00:04:03
that we denote covariance of
epsilon, which I'll show you
00:04:08
a picture of it in a second.
00:04:09
It's just a matrix
that has the variances
00:04:12
of epsilon along the
diagonal and the covariances
00:04:15
on the off diagonal.
00:04:18
And so what we're saying
here is that the diagonal
00:04:21
is equal to sigma squared.
00:04:25
And the off diagonals are zeros.
00:04:27
00:04:30
In a stronger version of
this is the epsilon vector
00:04:37
has this multivariate
normal distribution
00:04:40
with this
variance/covariance matrix.
00:04:43
I'll go into more detail in
both of these in just a second.
00:04:50
So let's do the--
00:04:51
oh, n by n identity matrix.
00:04:55
OK, so let's take a closer
look at these assumptions.
00:04:58
So assumption one, the
identification assumption,
00:05:02
what exactly does this mean?
00:05:05
Well, we need to have more
observations than regressors.
00:05:07
That shouldn't
come as a surprise,
00:05:09
especially if we think
about the bivariate model.
00:05:12
We have one regressor.
00:05:13
You have to have at least
two observations or else
00:05:16
you can't draw a line.
00:05:19
So this is just
sort of generalizes
00:05:21
that to higher dimensions.
00:05:24
We can't have any
regressors that do not
00:05:26
have positive sample variation.
00:05:29
So we saw this assumption
in the bivariate case.
00:05:34
Remember I had a picture
that looked like--
00:05:37
00:05:48
I had a picture that
looked like this.
00:05:50
And I said if all
of our observations
00:05:52
are on the same
value of x, we can't
00:05:57
identify how the conditional
mean of y changes with x.
00:06:06
Well, again, in the
multivariate regression,
00:06:10
or the multivariate
case, we can't
00:06:13
identify a particular
parameter if we
00:06:17
have a regressor that doesn't
have positive sample variation.
00:06:19
So all of our regressors have to
have positive sample variation.
00:06:23
And then the third
one-- and this
00:06:24
is the one that actually
trips people up sometimes-- is
00:06:28
that we can't have
any regressors that
00:06:30
are linear functions of one
or more other regressors.
00:06:35
And in matrix notation,
that's another way
00:06:40
to say that is the regressors
are linearly independent.
00:06:43
And that turns out
to be equivalent to x
00:06:46
prime x being invertible.
00:06:47
Yep.
00:06:48
STUDENT: Can you give
an example of that?
00:06:50
SARA ELLISON: I will give
two examples in fact.
00:06:54
OK, so here is one example.
00:06:58
Let's imagine a
case where we want
00:07:00
to estimate the effect
of schooling, work
00:07:03
experience, and age on salary.
00:07:06
And we have
individual level data.
00:07:10
So we have sort of a data set.
00:07:11
And we have a bunch
of different salaries.
00:07:14
And then we also have each
person's years of schooling,
00:07:18
each person's years of work
experience, each person's age,
00:07:22
and maybe some other stuff too.
00:07:24
Doesn't matter.
00:07:26
Well, it could be in
our particular sample,
00:07:29
it's quite possible that
everyone in our sample
00:07:32
started school at age
six, went to school
00:07:35
until he or she finished school,
and then started working.
00:07:39
Wouldn't be crazy
if that happened.
00:07:41
Well, if in fact that
was the case, then
00:07:45
the years of schooling plus
the years of work experience
00:07:48
plus 6 is equal to the age.
00:07:51
00:07:54
So if that's the case, we
can't estimate this regression
00:07:59
equation.
00:08:00
And it sort of makes logical
sense too in the sense
00:08:05
that there's nothing
that helps us separately
00:08:12
identify what the
effect of schooling,
00:08:15
and work experience,
and age are.
00:08:18
There's no variation that allows
us to separately figure out
00:08:22
the effects of all three of
those, if, in fact, they're
00:08:26
collinear in our sample.
00:08:29
So we can't estimate
such a model.
00:08:32
Is that clear?
00:08:33
STUDENT: So you would
drop a regressor.
00:08:35
SARA ELLISON: Exactly, you have
to drop one of the regressors.
00:08:37
00:08:39
Does this make sense?
00:08:42
Yes.
00:08:43
STUDENT: If you drop age
it still wouldn't work,
00:08:45
like if you those?
00:08:47
SARA ELLISON: If you drop
age, it still wouldn't work?
00:08:49
Yes.
00:08:50
STUDENT: It still
wouldn't work or--
00:08:52
SARA ELLISON: It would.
00:08:54
It wouldn't work if everyone
went to school until age 18.
00:09:00
So we would still need to have--
00:09:04
we couldn't have a perfect
linear relationship
00:09:07
between number of
years of schooling--
00:09:12
well, actually, that would just
be no sample variation in years
00:09:17
of schooling.
00:09:20
But if some people went
to school until age 18,
00:09:23
and some people went to age
20, and some went to age 25,
00:09:26
then if we dropped age
from this regression,
00:09:29
then we could estimate it.
00:09:32
STUDENT: You say
why this doesn't
00:09:36
hold is that if we
took x1, x2, and x3,
00:09:40
we could get values of beta 1,
beta 2, and all of these, which
00:09:43
would essentially make y 0.
00:09:45
So 1, 1, and minus
1, for example.
00:09:48
So then this equation
would go [INAUDIBLE]
00:09:50
and then y would
be 0 in that case.
00:09:53
SARA ELLISON: So that's
not the intuition I have.
00:09:56
That may be correct in some way.
00:09:59
But that's certainly not
the intuition I have.
00:10:02
I would just say that my
intuition is just that we don't
00:10:08
have any variation to
separately identify
00:10:10
the effects of
these three things
00:10:12
if they're perfectly linearly
associated with one another.
00:10:17
Yes.
00:10:18
STUDENT: So what if
the two regressor
00:10:20
are somewhat relating, but they
are not perfectly relating?
00:10:25
SARA ELLISON: So that's
an excellent question.
00:10:27
So the question was, what
if they are closely related?
00:10:31
So maybe this doesn't
quite hold in our sample.
00:10:34
But it comes close to holding.
00:10:36
Like we had a few people
who went to school at age
00:10:38
five instead of age six.
00:10:40
And we had like a couple people
who took a year off and didn't
00:10:43
work.
00:10:44
So this linear relationship
is close to holding,
00:10:47
but not quite.
00:10:48
That's something
that Esther might be
00:10:50
able to talk about next time.
00:10:52
I'm not sure.
00:10:53
So basically, in
that case, you can--
00:10:57
maybe, maybe not.
00:10:59
But anyhow, I'll
tell you the answer.
00:11:01
In that case, you can
estimate this equation.
00:11:04
But you end up sort of having
trouble separately identifying
00:11:11
the coefficients
on these variables,
00:11:14
on these three variables.
00:11:15
And in fact, they're going to
be coefficients that have very--
00:11:19
that your estimator is going
to be of very high variance
00:11:23
estimator.
00:11:24
And so in that case,
what you might want to do
00:11:26
is still drop one of them.
00:11:28
That's going to give you much
lower variance estimators
00:11:32
for the remaining two.
00:11:33
It's going to introduce
a little bit of bias,
00:11:36
if that regressor
belongs in there.
00:11:38
It's going to introduce
a little bit of bias.
00:11:40
But you might be willing
to accept that bias
00:11:42
to have much lower
variance estimators.
00:11:46
Yes.
00:11:46
STUDENT: [INAUDIBLE]
that digression of--
00:11:51
what if you had a large data
set with a lot of variables.
00:11:54
And you don't know that there
are regressions in there
00:11:58
that do have [INAUDIBLE]?
00:11:59
What are the things that
can insinuate from this,
00:12:02
could be causing problems
in my analysis or--
00:12:05
SARA ELLISON: Yeah,
so first of all,
00:12:08
if I tried to run
this regression
00:12:10
and this linear relationship
existed in my data set,
00:12:14
R would throw up its hands
and say, you can't do that.
00:12:19
So I can't even do it.
00:12:23
So you would find that out.
00:12:24
If this relationship
didn't quite exist,
00:12:29
it was close to existing
in the data set,
00:12:31
but not quite, R would
go ahead and give you
00:12:34
the results of this.
00:12:36
But one thing that
you could do is
00:12:39
you could compute the
correlation coefficients
00:12:44
for all of your regressors
before you run the regression
00:12:46
and see if any are
really highly correlated.
00:12:49
That wouldn't necessarily
pick up a linear relationship
00:12:53
like this.
00:12:54
But the other thing
you could do is
00:12:55
after you run the
regression, if you
00:12:57
have really large
standard errors, that
00:13:00
could be a signal
to you that you
00:13:03
could have this situation that's
close to perfect collinearity.
00:13:08
Yes.
00:13:09
STUDENT: So if we have
two regressors, x1 and x2,
00:13:12
could we use the ratio in some
ways to [INAUDIBLE] regression,
00:13:15
as a third regressor?
00:13:16
Or would that also
be [INAUDIBLE]??
00:13:18
SARA ELLISON: So
that wouldn't induce
00:13:23
this sort of perfect
collinearity problem.
00:13:29
You could.
00:13:30
You might not want to
do it for other reasons.
00:13:33
But yeah.
00:13:35
ESTHER DUFLO: There's one more.
00:13:36
Continuing the question on
if you had many [INAUDIBLE]
00:13:39
and you didn't know
which one to pick,
00:13:42
[INAUDIBLE] fall away from
traditional econometrics,
00:13:46
it becomes then this
[INAUDIBLE] we're
00:13:49
going to talk about when we
talk about machine learning.
00:13:51
If you really don't want--
00:13:53
traditional econometrics
assumes that you
00:13:55
have a model that you are
trying to test so you don't go
00:13:58
on a giant fishing expedition.
00:14:01
If you want to go on a
giant fishing expedition,
00:14:04
there are techniques for that.
00:14:05
And that's [INAUDIBLE].
00:14:06
That's what we are
going to introduce.
00:14:09
00:14:12
SARA ELLISON: Good.
00:14:13
OK, so that's one example.
00:14:15
A second example of this perfect
multi-colinearity and one
00:14:23
that sort of researchers
run afoul of all the time
00:14:27
is when they use dummy variables
to indicate, say, observations
00:14:33
falling into
exhaustive and mutually
00:14:37
exclusive set of classes.
00:14:39
So here's an example.
00:14:41
Let's suppose I have a data set.
00:14:42
Let's say I go talk
to all my friends
00:14:46
in the dorm to collect
my data set for 1431.
00:14:48
And I ask them what
pets they have at home.
00:14:50
And let's say all
of them have pets.
00:14:54
We could have a category
for no pet as well.
00:14:56
But anyhow, let's say
all of them have pets.
00:14:58
But they either have a
cat, a dog, or a fish.
00:15:01
And so then I create three
different dummy variables.
00:15:04
One is equal to 1 if they
have a cat, and 0 otherwise.
00:15:08
1 is equal to 1 if they
have a dog, and 0 otherwise.
00:15:11
And 1 is equal to 1 if they
have a fish, and 0 otherwise.
00:15:14
And everyone has exactly
one of those pets.
00:15:20
I cannot include all three
of those dummy variables
00:15:24
in the regression because if
we add up those three dummy
00:15:29
variables, we get
a column of ones.
00:15:31
And that's perfectly
co-linear with our column
00:15:35
of ones that allows us to
estimate the intercept.
00:15:40
Now, there are other ways.
00:15:44
You can, in fact, decide
to include all three
00:15:49
dummy variables and not
include an intercept,
00:15:51
not estimate an intercept
in this regression.
00:15:54
It's entirely equivalent.
00:15:57
It would be a little troubling
if it wasn't equivalent.
00:15:59
But it is in fact
entirely equivalent.
00:16:01
It's just have to interpret
the coefficient estimates
00:16:03
a different way.
00:16:04
But you can't have both an
intercept in your regression
00:16:06
and a set of dummy
variables that
00:16:09
are a full set of exhaustive
and mutually exclusive classes.
00:16:15
And like I said, R will
not let you do this anyhow.
00:16:18
00:16:21
OK, so now the second
assumption or second-- yes.
00:16:26
STUDENT: [INAUDIBLE] data?
00:16:29
SARA ELLISON: It only
changes the interpretation.
00:16:32
So basically, I think I'll
leave that question for Esther
00:16:38
because she will
be giving examples
00:16:40
of how to use dummy
variables in regressions
00:16:43
and how to interpret
the coefficients.
00:16:44
So we'll leave that till later.
00:16:48
OK, so then the
second assumption
00:16:51
was about the error behavior.
00:16:53
And as I said before,
these assumptions
00:16:56
aren't different for
the multivariate model.
00:17:00
It's just that I've expressed
them using matrix notation.
00:17:03
So let me just go through these
exactly the same assumptions
00:17:08
we saw in the bivariate model.
00:17:10
Here I'm using matrix notation.
00:17:12
So I'll just go through and
show you what they mean.
00:17:15
So first of all, expectation
of epsilon is equal to 0.
00:17:22
Epsilon is a vector.
00:17:23
And so it's just equal
to a vector of zeros.
00:17:27
And then for some
reason, we often
00:17:31
write instead of
writing the assumption
00:17:35
as the covariance
matrix of epsilon
00:17:38
equals sigma squared
times the identity, the n
00:17:42
by n identity
matrix, we write it
00:17:44
as the expectation of
epsilon epsilon transpose
00:17:48
is equal to that.
00:17:49
Well, it turns out because
the expectation of epsilon
00:17:53
is identically equal to 0, this
matrix is equal to this matrix.
00:18:01
You can do the calculations.
00:18:02
It's just two lines
to convince yourself.
00:18:04
But I could have
expressed this assumption
00:18:08
by just saying that this
matrix is equal to this.
00:18:13
But for whatever
reason, we often
00:18:16
see it written as this matrix
is equal to this, same thing.
00:18:22
Does everyone understand why
this matrix equaling this
00:18:28
is exactly the same
assumptions we saw before?
00:18:32
So this matrix is
just simply a matrix
00:18:36
containing the variances of
the epsilons on the diagonal.
00:18:40
So remember before we
said each epsilon had
00:18:43
variance sigma squared?
00:18:45
That was our
homoscedasticity assumption.
00:18:47
So each variance
is sigma squared.
00:18:50
And then all the
covariances were 0.
00:18:53
That was our no serial
correlation assumption.
00:18:56
All of the off
diagonals here are 0.
00:18:59
So it's the same thing,
just in matrix form.
00:19:01
00:19:09
Yeah, I think I
said this verbally.
00:19:12
But this thing is denoted
covariance of epsilon.
00:19:17
And it's called the
variance-covariance matrix
00:19:19
of epsilon.
00:19:19
00:19:23
OK, fine.
00:19:25
We've got this linear model.
00:19:26
We've got these assumptions.
00:19:27
Just like before, we're going
to now ask the question,
00:19:30
how do we get beta hat?
00:19:32
And what distribution
does beta hat have?
00:19:34
And the answers are not
going to be surprising.
00:19:37
But they're going to be
more beautiful than they
00:19:39
were last time.
00:19:40
So what is beta hat?
00:19:42
Well, it's a vector
that minimizes
00:19:44
the sum of squared errors.
00:19:45
So we've got a vector
of residuals transpose
00:19:51
times a vector of residuals.
00:19:53
And that's sort of expanded
out what it looks like.
00:20:00
OK, so we want to
choose the beta hat that
00:20:03
minimizes that thing.
00:20:04
So what we do is we take
the derivative with respect
00:20:07
to beta, set it equal
to 0, and obtain this.
00:20:13
If you're not used
to doing calculus
00:20:15
with vectors and matrices, in
the notes that I posted online,
00:20:20
I write this out in more detail.
00:20:23
And you can take a look
at that if you want.
00:20:25
But basically, we get this sort
of equation set equal to 0.
00:20:33
And this is going
to tell us what
00:20:35
the beta hat that minimizes the
sum of squared residuals is.
00:20:40
Then we solve for beta hat.
00:20:42
The negative 2 we can
just divide both sides
00:20:46
by negative 2.
00:20:47
And so that goes away.
00:20:48
Then we write this
equation as x prime y,
00:20:54
or x transpose y equals x
transpose x times beta hat.
00:20:59
And then if this is invertible--
00:21:04
and remember, that was
one of our assumptions.
00:21:06
That was our
identification assumption,
00:21:08
that that thing was invertible.
00:21:09
If that's invertible,
then we get that beta hat
00:21:13
is just equal to x prime
x inverse x prime y.
00:21:17
00:21:20
Beautiful.
00:21:21
00:21:31
I mean, that was
literally the derivation
00:21:33
of the least squares
estimators in matrix notation.
00:21:37
If you look at my notes
that I've posted online,
00:21:39
I mean, it's just pages of
algebra using that summation
00:21:43
notation.
00:21:44
So this is why we love
doing it in matrix notation.
00:21:49
What do we want to
know about beta hat?
00:21:51
What do we always want to
know about an estimator,
00:21:54
so we can do inference?
00:21:56
It's distribution.
00:21:58
Oh, it's right up there.
00:22:01
OK, fine.
00:22:02
So the expectation of
beta hat is equal to beta.
00:22:07
Again, in matrix notation,
it's very simple.
00:22:10
I haven't included the four
lines or something like that.
00:22:13
But if you treat
the x's as fixed,
00:22:17
then that makes the sort
of proof very simple.
00:22:21
They come outside the
expectation operator
00:22:23
and basically just falls out.
00:22:27
So it is unbiased.
00:22:29
And the covariance
of beta hat, remember
00:22:32
this is the
variance-covariance matrix.
00:22:36
So it's the matrix that
has along the diagonal
00:22:39
the variances of
each of the beta hats
00:22:41
and on the off diagonals,
the covariances between them.
00:22:44
That is just equal to
sigma squared times
00:22:48
x prime x inverse.
00:22:51
So again, very elegant, very
beautiful, and not too hard
00:22:56
to show if you treat
the x's as fixed.
00:22:59
And you can look on the
website if you're interested.
00:23:01
00:23:05
And finally, we often don't
know what sigma squared is.
00:23:09
For inference, we need to
know what sigma squared is
00:23:12
or we need an estimate
for sigma squared.
00:23:15
So this is our unbiased
estimate for sigma squared.
00:23:20
And as Esther
anticipated, here we
00:23:25
have to subtract off
a k instead of a 2
00:23:28
because instead of it
being a bivariate model,
00:23:33
it's a multivariate model.
00:23:36
00:23:39
And then finally, if we're
willing to impose the more
00:23:44
strict assumption on
the error distribution,
00:23:47
the errors are
normally distributed,
00:23:50
then the beta hats are
also normally distributed.
00:23:53
So sometimes we
want to impose that.
00:23:55
Sometimes we want to
be less proscriptive.
00:24:01
OK, so now, finally
we get to inference.
00:24:04
So typically, in
the linear model,
00:24:07
we're going to want to test
hypotheses involving the betas.
00:24:11
That's where the real action is.
00:24:12
I mean, I can dream up
hypotheses involving sigma
00:24:15
squared and things like that.
00:24:17
And that's fine.
00:24:18
And occasionally, you might want
to test a hypothesis involving
00:24:21
sigma squared.
00:24:22
But really we care
about the betas
00:24:24
because the betas
are the parameters
00:24:25
in our conditional mean function
of our outcome variable.
00:24:29
And the questions
that we usually
00:24:31
want to answer using
linear regression
00:24:33
are about the nature of this
conditional mean function.
00:24:37
So sometimes we might only be
interested in one of the betas.
00:24:42
Other times we might want to
simultaneously test hypotheses
00:24:45
about a whole bunch of them.
00:24:48
And as we saw in the output
that I showed you last lecture,
00:24:52
statistical packages typically
perform some standard tests
00:24:57
on the betas for free and just
report them with the output.
00:25:03
And that's fine.
00:25:05
And we can use those.
00:25:06
And they're often quite handy.
00:25:08
But there may be other ones
that we need to do ourselves.
00:25:11
So they don't perform
every conceivable test
00:25:14
we might be interested in.
00:25:17
OK, so let's start with a
pretty general framework
00:25:21
for testing
hypotheses about beta.
00:25:24
And it's not only quite
general and flexible.
00:25:27
It's also super intuitive.
00:25:28
It's one of my favorite tests.
00:25:30
I really like it.
00:25:31
OK, so let's consider hypotheses
of the following form.
00:25:36
A matrix r times beta
is equal to a vector c.
00:25:40
That's the null hypothesis.
00:25:42
The alternative is that it's
not equal to the vector c.
00:25:46
So what is this matrix r?
00:25:49
It's a matrix of restrictions.
00:25:52
And its dimensions
are r by k plus 1.
00:25:56
So it has the number of--
00:25:59
so the number of columns
equal to the number
00:26:03
of parameters, the
number of betas
00:26:05
that we're estimating
in the linear model.
00:26:07
And then the number of rows
is the number of restrictions
00:26:10
that we want to impose
in our null hypothesis,
00:26:14
the number of restrictions
we want to test.
00:26:18
So we could have a matrix,
where r is equal to 1.
00:26:22
And then we're just
testing one restriction.
00:26:24
So that would
correspond to something
00:26:26
like beta 1 is equal to 0.
00:26:28
00:26:33
Oh, so let me just say this.
00:26:34
I'll get to some
examples in a minute.
00:26:37
So almost any
hypothesis involving
00:26:39
beta you can dream up in the
context of a linear model
00:26:42
can be captured in
this framework, not
00:26:44
quite any, but most of them.
00:26:46
You can test whether individual
parameters are equal to 0.
00:26:49
You can test whether
individual parameters
00:26:51
are equal to something
other than 0.
00:26:53
You can test multiple
hypotheses simultaneously.
00:26:56
You can test hypotheses
about linear combinations
00:26:59
of parameters.
00:27:00
The world is your oyster.
00:27:02
So let me show you a
few examples of these
00:27:05
and exactly what the
r matrix looks like
00:27:08
and what the c vector looks
like in these examples.
00:27:12
OK, so let's say, for instance,
that we set up the matrix
00:27:19
r to be just a row vector
with a 0 in the first spot,
00:27:25
and then a 1, and
then the rest 0s.
00:27:28
So what that matrix is doing
is it's picking out beta 1.
00:27:32
Remember, this spot
corresponds to beta 0.
00:27:38
So being in the second spot,
it's picking out beta 1.
00:27:41
And c is just what beta 1
is equal to under the null.
00:27:49
So that r and this c
corresponds to the hypothesis
00:27:54
that beta 1 is equal to 0.
00:27:55
00:27:59
Let's suppose
instead that we want
00:28:03
to test a whole
bunch of hypotheses
00:28:06
simultaneously, that
beta 1 is equal to 0,
00:28:09
and beta 2 is equal
to 0, and beta 3
00:28:11
is equal to 0, et cetera.
00:28:14
Well, then this is what
our matrix would look like.
00:28:17
So it would basically be an
identity matrix with a column
00:28:22
of 0's tacked on the front.
00:28:24
And the reason why the column
of 0's is tacked on the front
00:28:26
is because that corresponds
to the intercept.
00:28:29
And we're not
interested at least here
00:28:32
in testing a hypothesis
about the intercept.
00:28:36
And then the c vector
is just a vector of 0's.
00:28:40
00:28:45
So I do want to emphasize, even
though I've sort of written
00:28:49
this as like one
equation, this is actually
00:28:52
we're testing k hypotheses
simultaneously here.
00:28:55
So we have k equal signs.
00:28:57
00:29:02
OK, so here's a more
complicated example.
00:29:06
If our r matrix has in the
first row a 1 and a negative 1,
00:29:13
and then the rest 0's, sorry,
0, and then 1, negative 1,
00:29:16
the rest 0's.
00:29:17
And then the second
row there's a 1
00:29:21
in the fourth spot, et cetera.
00:29:26
And then the c vector
looks like this.
00:29:29
What does this
correspond to in terms
00:29:31
of a hypothesis we might want a
test or a series of hypotheses?
00:29:36
Well, here, the first row
gives us the hypothesis
00:29:40
that beta 1 minus
beta 2 is equal to 0.
00:29:45
So I could just write that
as beta 1 is equal to beta 2.
00:29:50
The second row
corresponds to beta 3--
00:29:53
this is beta 3 here--
being equal to 5.
00:29:57
And the third row
corresponds to beta k
00:30:01
being equal to negative 2.
00:30:04
Yes.
00:30:04
STUDENT: Can you explain
the beta 1 equals beta 2?
00:30:08
SARA ELLISON: OK, so if I just
multiply the matrix, the r
00:30:18
matrix, by beta and sort of
wrote these out as equations,
00:30:22
I would get beta 1 minus--
00:30:26
so this is beta 0 here.
00:30:28
So here's beta 1 minus
beta 2 is equal to 0.
00:30:34
And then I just rewrote that
as beta 1 is equal to beta 2.
00:30:37
That's all.
00:30:38
Yep.
00:30:39
STUDENT: How often are we
[INAUDIBLE] specific value
00:30:42
rather than the range?
00:30:44
And is there a
[INAUDIBLE] against that?
00:30:47
SARA ELLISON: So yes and no.
00:30:50
So basically, if we're
not interested in--
00:30:56
if we're interested in
whether beta is in a range,
00:30:59
then what we might
want to do is instead
00:31:01
of doing a hypothesis
test, where
00:31:03
the null was a single
value and the alternative
00:31:06
was everything
else, we might want
00:31:07
to do, say, a one-sided test,
where the null is that beta
00:31:11
is less than some value
and the alternative
00:31:14
is that it's greater
than some value.
00:31:15
We can do those.
00:31:17
We can't do them
in this framework.
00:31:19
So I'll talk about
that in a second.
00:31:21
The other thing that
you might be suggesting
00:31:23
is instead of doing
hypothesis testing,
00:31:26
we might want to just report
confidence intervals as well.
00:31:29
So remember that
really hypothesis
00:31:32
testing and constructing
confidence intervals
00:31:34
are kind of the same thing.
00:31:36
It's just reporting the same
information in different forms.
00:31:39
And so it can just be a
matter of style or preference.
00:31:45
Instead of reporting
hypothesis tests,
00:31:48
you report confidence intervals.
00:31:50
And that's perfectly fine.
00:31:52
Yeah.
00:31:53
ESTHER DUFLO: So
confidence interval,
00:31:54
it can be harder to say
whether between the [INAUDIBLE]
00:31:58
minus [? 4 ?] is [INAUDIBLE].
00:32:02
I mean, it's kind
of hard to see.
00:32:04
They don't really add up.
00:32:05
SARA ELLISON: Yeah, and I guess
the other more fundamental
00:32:09
answer to your question is
that sometimes we actually do--
00:32:13
there might be a
theory that says beta
00:32:16
should be equal to this number.
00:32:18
And in order to
test that theory,
00:32:20
we want to perform a
hypothesis test that beta
00:32:22
is equal to that number.
00:32:24
So that does come
up, not every case.
00:32:28
But yeah, it is relevant.
00:32:31
Other questions?
00:32:32
No.
00:32:34
STUDENT: You could also use
it to if somebody came out
00:32:37
with a paper today describing
the treatment of malaria
00:32:40
[INAUDIBLE] wanted to see
if that was true or not,
00:32:42
just take that beta and test for
it and do hypothesis testing?
00:32:47
SARA ELLISON: Yeah.
00:32:48
STUDENT: OK.
00:32:48
00:32:51
SARA ELLISON: OK,
oh, here's part
00:32:55
of the answer to your question.
00:32:56
One thing you can't
do in this framework
00:32:58
is test one-sided hypotheses.
00:33:00
We'll get back to those.
00:33:03
So now we have this framework.
00:33:07
I mean, it's not really
a framework, just
00:33:09
sort of a notation
in some sense to deal
00:33:13
with hypotheses of all of the
forms we just talked about.
00:33:18
And within the
regression framework,
00:33:21
we have a super
intuitive and cool way
00:33:23
to test these hypotheses.
00:33:25
So first of all, let's
think of the null
00:33:28
as describing a set of
restrictions on the model.
00:33:31
So let me just go
back for a second.
00:33:34
So in this case, this null has
three different restrictions,
00:33:39
that beta 1 is equal to beta
2, that beta 3 is equal to 5,
00:33:42
and that beta k is
equal to minus 2.
00:33:45
And we think of the null
as imposing restrictions
00:33:48
on the model.
00:33:50
Then here's how we
perform the test.
00:33:52
We estimate the
unrestricted model.
00:33:55
We impose the
restrictions of the null
00:33:57
and estimate that model.
00:34:00
And then we compare the goodness
of fit of those two models.
00:34:04
So that's why I love this test.
00:34:05
It seems really intuitive
to me that if you
00:34:10
have a set of restrictions
and they really bind,
00:34:13
and they really sort of affect
how good your fit is, then
00:34:18
that tells you, well, maybe
those restrictions are not
00:34:22
true.
00:34:23
If the restrictions
on the other hand
00:34:26
don't really bind that much,
if your model fits almost as
00:34:33
well with the
restricted model as it
00:34:36
does with the
unrestricted model, then
00:34:38
that tells you maybe
these restrictions
00:34:40
are true or close to true.
00:34:41
And we don't want
to reject them.
00:34:44
So that's the whole intuition
and the idea behind this test.
00:34:48
00:34:51
OK, so a couple of
details before we
00:34:57
get to the distribution,
the test statistic.
00:34:59
Estimating the unrestricted
model is simple.
00:35:02
Just run the regression.
00:35:03
But how do we estimate
the restricted model?
00:35:06
Well, it depends on what
form the restrictions take.
00:35:10
So let's say we're
testing hypothesis,
00:35:12
where just a bunch of
the betas are equal to 0.
00:35:15
How do we run the
restricted model?
00:35:18
00:35:22
Yes.
00:35:23
STUDENT: We just
think the [INAUDIBLE]
00:35:25
is kind of on the diagonal
1 so that [INAUDIBLE]..
00:35:28
SARA ELLISON: Yes, exactly.
00:35:31
So practically speaking,
what we do is we just run
00:35:34
the regression leaving
out all of those x's.
00:35:37
So that's the way we constrain
the coefficients to be
00:35:40
equal to 0.
00:35:42
So we have the unrestricted.
00:35:45
The unrestricted
regression is just
00:35:47
all of the x's are in there.
00:35:49
If we want to restrict that
certain betas are equal to 0,
00:35:51
we just run another regression,
where we leave out the x's
00:35:56
associated with the betas that
we want to have equal to 0.
00:36:00
So that's our restricted model.
00:36:05
Then let's say
the restriction is
00:36:09
that the two betas are equal.
00:36:10
We have beta 1 equals beta
2, or something like that.
00:36:13
That's our null restriction.
00:36:16
Then how do we impose that
restriction on a linear model?
00:36:20
Well, actually, it might help
if I write the linear model.
00:36:25
So we have y sub i equals
beta 0 plus beta 1 x1.
00:36:35
I hope I'm using
the same notation.
00:36:36
00:36:40
Do I have my subscripts
in the same order?
00:36:43
I hope so.
00:36:43
00:36:48
OK, so let's suppose this
is our unrestricted model.
00:36:56
We want to restrict beta
1 to be equal to beta 2.
00:36:59
Well, what do we do?
00:37:01
We just create a new variable
that's the sum of these two.
00:37:06
So this is called x1i plus
x2i, just a new variable.
00:37:16
And then we only estimate
one coefficient on that.
00:37:23
So our restricted
model is just that we
00:37:27
don't include this variable as
a regressor or this variable
00:37:30
as a regressor.
00:37:31
We include their
sum as a regressor.
00:37:35
And that's how we're
imposing the null restriction
00:37:37
because when we
include their sum,
00:37:39
we're making their two
coefficients equal.
00:37:42
We're forcing their two
coefficients to be equal.
00:37:46
Yeah.
00:37:47
STUDENT: So are we
testing the first beta
00:37:49
1 and the second
beta 1 are the same?
00:37:51
SARA ELLISON: Exactly, yes.
00:37:53
This is testing the hypothesis
that beta 1 is equal to beta 2.
00:38:00
Yep.
00:38:02
STUDENT: Is that different from
testing the beta 2 [? at 0? ?]
00:38:05
SARA ELLISON: Yeah, it's
definitely different.
00:38:07
So here, these betas
could be anything.
00:38:12
They could be a million.
00:38:14
We're just testing the
hypothesis that they're equal.
00:38:17
00:38:20
Yes.
00:38:21
00:38:24
STUDENT: But if they're
equal, wouldn't it
00:38:26
be like a linear combination
of the ther [INAUDIBLE]??
00:38:29
You know how they cannot
be like a linear sum?
00:38:34
00:38:38
SARA ELLISON: I'm not sure if
I understand your question.
00:38:40
So basically, what
I'm trying to do here
00:38:43
is impose just this
hypothesis, but not
00:38:46
impose anything else about what
the betas might be equal to.
00:38:51
STUDENT: The
identification restriction
00:38:54
is not on the betas.
00:38:55
It's on the x's.
00:38:57
Beta can have whatever
medium conditions.
00:39:01
SARA ELLISON: Ah,
you were confused
00:39:02
about the identification
assumption.
00:39:04
Yep, yep, yep, that's right.
00:39:05
00:39:08
ESTHER DUFLO: You can
reask your question
00:39:09
saying, if you could
post the sum on just x
00:39:13
and it turned out that
in fact they were equal,
00:39:16
you can guess what the beta
[INAUDIBLE] following x.
00:39:20
00:39:23
SARA ELLISON: OK, what
if the restriction
00:39:25
is that some beta is
equal to a constant c?
00:39:29
How would we impose
that restriction
00:39:32
and then re-estimate
the restricted model?
00:39:35
Well, let's suppose
this is just a constant.
00:39:39
So we impose that this one
is equal to a constant.
00:39:42
So then here, there's no
parameter in this term
00:39:47
that we need to
estimate under the null.
00:39:50
So we just subtract
the constant times x1
00:39:54
from the dependent variable
and rerun that regression.
00:39:58
And that's our
restricted regression.
00:40:01
Does that make sense?
00:40:03
OK, so going back, we estimate
the unrestricted model.
00:40:10
We impose restrictions
and estimate that model.
00:40:13
And then we compare
the goodness of fit.
00:40:15
And if the goodness of
fit is not very different,
00:40:19
we don't reject the null.
00:40:20
If it's very different,
we reject the null.
00:40:23
In particular, this
test statistic,
00:40:27
which is basically the
numerator has the difference
00:40:31
in the sum of the restricted
and unrestricted sum of squares
00:40:36
and then in the denominator
has the unrestricted sum
00:40:39
of squares.
00:40:40
This is how we form
the test statistic.
00:40:42
And it turns out that has an
F distribution under the null.
00:40:47
And we reject the null for large
values of this test statistic.
00:40:53