00:00:00
00:00:00
In this video, I want to
show you a really cool way
00:00:02
to solve a constrained
optimization problem, which
00:00:05
is a problem where
you're trying to find
00:00:07
the minimum or the
maximum of a function,
00:00:09
like J of x, and it's subject to
a constraint, c of x equals 0.
00:00:14
And the solution involves
this equation down here,
00:00:17
where the inner portion
is called the Lagrangian.
00:00:20
And lambda is called
the Lagrange multiplier.
00:00:24
So what is this equation doing?
00:00:26
And how does it help us solve
constrained optimization?
00:00:29
Well, it's actually rather
intuitive and kind of amazing.
00:00:32
So I hope you stick around for
this really quick explanation.
00:00:36
I'm Brian.
00:00:36
And welcome to a
Matlab Tech Talk.
00:00:40
Let's look at an
optimization problem
00:00:42
with an objective function, J
of x equals x squared, which is
00:00:46
plotted over here on this graph.
00:00:48
And the question is, what value
of x produces the minimum J?
00:00:53
And obviously, looking at
this graph, it's x equals 0.
00:00:56
But what we want to
know is, how can we
00:00:57
determine this mathematically?
00:01:00
Well, one thing we can notice is
that the slope of the function
00:01:03
is non-0 everywhere, except
at the minimum point, where
00:01:07
it's flat.
00:01:08
In fact, minimum
and maximum points
00:01:11
must exist where
the slope is flat.
00:01:13
Because if the function was
increasing or decreasing still,
00:01:17
well, then that point,
wouldn't be the most extreme.
00:01:20
And we can tell
if a flat point is
00:01:23
a minimum if the rate of
change of the slope is positive
00:01:27
or if everything is curving
up and away from it.
00:01:30
And it's a maximum point if
everything is curving down.
00:01:35
Therefore, for J
equals x squared,
00:01:37
we can solve for
where the slope is
00:01:39
0 by taking the derivative
of the function with respect
00:01:42
to x and then just setting
that derivative equal to 0.
00:01:46
And then we can determine that
the rate of change of the slope
00:01:49
is positive by just looking
at the second derivative
00:01:52
at this point.
00:01:55
Now, similarly,
we can show that x
00:01:57
equals 0 is a
maximum point for J
00:01:59
equals negative x squared
since the second derivative is
00:02:03
negative.
00:02:04
And therefore, we can deduce
that the output is curving down
00:02:07
from this one point, x equals 0.
00:02:10
And then, finally, we can
show that the flat part of J
00:02:12
equals x cubed is neither a
maximum or a minimum point.
00:02:15
Since the second derivative
is 0 at this point.
00:02:19
It's neither
positive or negative.
00:02:22
And the idea of
looking at the slope
00:02:24
can be expanded to
higher-order functions as well.
00:02:26
For example, this
function, J equals
00:02:29
x1 squared plus x2 squared.
00:02:31
We can look at this
and, once again,
00:02:33
pretty easily see that
the minimum J occurs
00:02:35
when x1 and x2 are both 0.
00:02:37
Now, to find this
point, we once again
00:02:40
look at where the derivative
is 0, or more precisely,
00:02:43
where the gradient is 0.
00:02:45
The gradient of J of x
is the partial derivative
00:02:48
of the function with
respect to both x1 and x2.
00:02:52
And when we do this, we
get the vector 2x1 and 2x2.
00:02:57
So, really, the gradient
returns a vector field
00:03:00
that denotes the direction
and the magnitude
00:03:02
of the steepest slope for
every possible x combination.
00:03:07
And for our
bowl-shaped function,
00:03:08
the gradient looks like this.
00:03:10
There is 0 slope at the
very bottom of the bowl.
00:03:12
And then it gradually
becomes steeper
00:03:14
as we move towards the
edge in all directions.
00:03:18
And this is the basis of
all optimization problems.
00:03:21
We set up an objective
function and then solve for,
00:03:24
or, in a lot of cases, we
search for the locations
00:03:27
of 0 gradient.
00:03:29
And I say "search for" because,
often, objective functions
00:03:32
aren't as
straightforward as this.
00:03:34
And we have to hunt
for the optimal point
00:03:36
with gradient
descent-type algorithms.
00:03:39
And these are algorithms
that find the optimal point
00:03:42
numerically, with an iterative
approach by descending
00:03:46
the gradient.
00:03:47
That is, it's going in
the opposite direction
00:03:50
of the arrows until you
reach a stationary point.
00:03:55
All right, this is
probably a good time
00:03:56
to bring up the issue of
local and global minimum.
00:03:59
If we're looking for stationary
points using gradient descent,
00:04:03
then where that algorithm
starts its search
00:04:05
becomes really important.
00:04:07
If there are multiple basins,
then a gradient descent
00:04:10
algorithm will get
stuck in a local minimum
00:04:12
if the starting state is within
that basin of attraction.
00:04:16
And there's different methods
for addressing this, including
00:04:19
just picking many
different starting points,
00:04:21
and then running gradient
descent on each one,
00:04:23
and seeing where
they all end up.
00:04:26
But what's really nice about
the analytical approach
00:04:29
of taking the gradient
of the function
00:04:30
is that each of the
stationary points
00:04:32
can be identified mathematically
by setting the gradient to 0
00:04:36
and solving for x.
00:04:38
Here, we have three stationary
points corresponding
00:04:41
to the local minimum, the global
minimum, and the local maximum
00:04:44
that occurs between them.
00:04:46
So now, we can just check the
value of the objective function
00:04:49
at each of these
stationary points
00:04:51
and then pick the lowest one
to find that global minimum.
00:04:55
00:04:58
All right, so that's the
unconstrained optimization.
00:05:01
That's where we're free
to choose any value that
00:05:03
minimizes our function.
00:05:05
But this video is on
constrained optimization.
00:05:08
And with constrained
optimization,
00:05:10
the objective
function is subject
00:05:11
to an additional constraint.
00:05:13
In this case, let's say
that the constraint is
00:05:16
x1 plus x2 plus 3 must equal 0.
00:05:19
So that means that
we can only choose
00:05:21
combinations of x1 and x2
that lie along this line.
00:05:25
And now, if I
erase this line up,
00:05:27
this means that
we're really looking
00:05:28
for the minimum value along
this intersecting parabola.
00:05:33
So now, how do we
find this location?
00:05:36
Well, let's change the surface
plot into a contour plot.
00:05:40
And so, now, we can imagine that
we're hiking along this Black.
00:05:43
Line on a surface
that has this contour.
00:05:45
And so, at the
beginning, you're hiking
00:05:47
down pretty steeply as you
cross the contour lines quickly.
00:05:50
And then it gradually
gets less steep.
00:05:52
And then, right here,
right when your path
00:05:54
is exactly tangent
to a contour line,
00:05:56
you've reached the lowest point.
00:05:58
And then you start
heading back up again.
00:06:01
So the lowest point is
where the path is parallel
00:06:04
to the contour.
00:06:05
Or another way of
putting it, it's
00:06:07
where the path is
perpendicular to the gradient.
00:06:11
And this makes sense, right?
00:06:12
I mean, if the gradient
is the direction
00:06:14
of the steepest travel, and
you're walking perpendicularly
00:06:17
to that, then you're not going
up or down right at that point.
00:06:21
Your path is flat.
00:06:24
Now, to calculate this point,
we can do something clever.
00:06:27
We can compare the gradient
of the objective function
00:06:30
with the gradient
of the constraint.
00:06:32
And if the gradients
or the directions
00:06:35
of the steepest slope for the
two functions are parallel,
00:06:39
then this is a stationary point.
00:06:41
And therefore, it's either
a maximum, a minimum,
00:06:43
or some kind of a saddle point
like we had with x cubed.
00:06:47
So the gradient
of the constraint
00:06:49
is the partial derivative with
respect to both x1 and x2,
00:06:53
which, it turns out, is
just 1 for both directions.
00:06:57
And if I plot that vector
field on our graph,
00:07:00
we get these arrows that
are all perpendicular
00:07:03
to the black line.
00:07:05
And notice that, at the minimum
point, the gradient of J
00:07:09
is parallel to
the gradient of c.
00:07:12
Now, they are pointing
in different directions.
00:07:14
But they are parallel.
00:07:17
So to make them
equivalent, we can simply
00:07:18
multiply the constraint
by a constant, lambda.
00:07:22
And choosing different
lambdas is essentially
00:07:25
scaling the gradient, either
positively or negatively,
00:07:28
like I'm showing here.
00:07:29
And the idea is that we want
to find the Lagrange multiplier
00:07:32
that makes the two gradients
equal right at this point where
00:07:36
they are parallel.
00:07:38
OK, so, now, let me
rearrange this equation.
00:07:41
And what we're left with is the
equation that we started with.
00:07:46
Now, you're going to
notice that if you
00:07:47
do this rearranging yourself,
it should be a minus lambda
00:07:51
C. But to make the
equation look better,
00:07:53
we just roll that negative into
the constant, lambda, which
00:07:56
can be positive or negative.
00:07:59
All right, so the
solution to the problem
00:08:02
exists where the gradient
of the Lagrangian equals 0.
00:08:08
And again, this doesn't
tell us if we're
00:08:10
at a maximum, a minimum,
or a saddle point.
00:08:13
We know that we need some
form of a second derivative
00:08:15
to determine that.
00:08:17
And generally speaking,
that second derivative
00:08:19
comes in the form of a
bordered Hessian matrix.
00:08:22
And I'm not going
to cover this here.
00:08:24
Since I just wanted to talk
about the Lagrangian itself.
00:08:27
And also, in our
problem, we know
00:08:29
that the point is a minimum
due to our understanding
00:08:31
of the functions.
00:08:32
But I did leave some good
references for it down below.
00:08:37
All right, so now
that we understand
00:08:38
why the Lagrangian
is important, let's
00:08:41
use it to solve the constraint
optimization problem.
00:08:44
Our Lagrangian is
x1 squared plus x2
00:08:47
squared plus lambda
times x1 plus x2 plus 3.
00:08:51
And the gradient gives
us these three equations,
00:08:54
which we can use to solve
for the three unknowns
00:08:57
by just setting each of
the equations to zero.
00:08:59
And we find that the optimal
point is at minus 1.5
00:09:03
and minus 1.5.
00:09:04
And the Lagrange multiplier
has a value of 3.
00:09:09
And how cool is that?
00:09:11
I mean, it's just
such a simple idea
00:09:13
that we can find
minima and maxima
00:09:16
of constrained
optimization problems
00:09:18
just by looking at where the
gradients of the two functions
00:09:21
are parallel.
00:09:22
00:09:24
All right, so now, before
I wrap up this video,
00:09:26
I want to show you
how we can actually
00:09:28
solve this example in Matlab
since, more often than not,
00:09:31
you're not going to
be solving it by hand.
00:09:32
00:09:35
I have this live script that
uses the well-named function
00:09:39
optimproblem to solve
our optimization problem.
00:09:43
And here, I've defined
the objective function,
00:09:45
x1 squared plus x2 squared, and
then the constraint, x1 plus x2
00:09:49
plus 3 equals 0.
00:09:51
The function, sol,
solves the problem.
00:09:54
And it returns several
different outputs.
00:09:56
And when it runs, I'm
displaying the optimal point
00:09:59
x and the Lagrange
multiplier lambda.
00:10:02
And over here, we see that x
is minus 1.5 and minus 1.5.
00:10:07
And lambda is 3, just
as we calculated.
00:10:11
All right, so that's where
I'm going to leave this video.
00:10:13
Hopefully, you have a
better understanding
00:10:15
of how the Lagrangian
plays a role
00:10:17
in constrained
optimization problems
00:10:19
and what the Lagrange
multiplier is actually doing.
00:10:22
And I've left links below to
several different resources
00:10:25
and Matlab examples if
you want to learn more.
00:10:28
Also, you can find all
of the Tech Talk videos
00:10:30
across many different
topics nicely organized
00:10:33
at MathWorks.com.
00:10:34
And if you like
this video, then you
00:10:36
might be interested to
learn about a popular type
00:10:38
of constrained
optimization that we
00:10:40
use for control problems called
the linear quadratic regulator.
00:10:45
All right, thanks for watching.
00:10:47
And I'll see you next time.