Constrained Optimization: Intuition behind the Lagrangian

00:10:48
https://www.youtube.com/watch?v=GR4ff0dTLTw

Summary

TLDRThe video provides an insightful exploration of constrained optimization problems and how they can be solved using the Lagrangian method. By introducing the Lagrangian and Lagrange multipliers, it demonstrates how these concepts help find minimum or maximum values of functions subjected to constraints. The host, Brian, explains with examples how gradients and derivatives are used to identify these extreme points analytically and illuminate the importance of parallel gradients in constrained scenarios. He shows how MATLAB can be utilized to automate and solve such problems, emphasizing the significance of methodical approaches in practical applications. The video concludes with references and resources available for further exploration into constrained optimization problems.

Takeaways

  • 🎯 The Lagrangian method is key for solving constrained optimization problems.
  • 🔗 Lagrange multipliers help adjust constraints to find extremum points.
  • 📉 Stationary points occur where the gradient is zero in optimization.
  • 🔀 Optimization can be for minimum or maximum values of functions.
  • 🧭 Gradient descent helps in numerical optimization for complex functions.
  • 📊 MATLAB facilitates solving optimization problems programmatically.
  • ❗ Distinction between local and global optima is crucial in optimization.
  • 🚥 The analytical approach involves setting gradients to zero mathematically.
  • 🔄 MATLAB provides functions like 'optimproblem' for solution finding.
  • 📚 Further learning resources are offered for deeper understanding.

Timeline

  • 00:00:00 - 00:05:00

    The video begins with an introduction to solving constrained optimization problems by employing the Lagrangian method. The host uses an objective function, J(x), which is typically optimized either to a minimum or a maximum, given a constraint like c(x) = 0. The Lagrangian introduces a Lagrange multiplier, lambda, which enables the solution of such problems. The process is intuitive: it locates where the gradient of the function is zero, which indicates potential minimum or maximum points. For example, in the case of J(x) = x², the derivative equals zero at the minimum point, x = 0. The explanation extends to a higher-order function J(x1, x2) = x1² + x2², identifying its minimum point through the gradient being zero, illustrating foundational optimization principles and introducing gradient descent for complex functions.

  • 00:05:00 - 00:10:48

    The video transitions from unconstrained optimization to constrained optimization by demonstrating with a constraint equation x1 + x2 + 3 = 0, modifying the problem to find the minimum along a parabola intersect. This involves switching to a contour plot, where the path's tangent zeros out the gradient, pinpointing the minimum through the Lagrange multipliers where the gradient of the objective function matches the constraint's gradient. The video explains that solving the gradient of the Lagrangian gives the solution and recognizes whether it's maximum, minimum, or a saddle point. The key takeaway is the practical application of Lagrangian in finding where gradients of functions align. It concludes with a Matlab demonstration using optimproblem function to solve for the optimal points, affirming the previously calculated results of x1 and x2 as -1.5 and the Lagrange multiplier as 3.

Mind Map

Video Q&A

  • What is a constrained optimization problem?

    A constrained optimization problem involves finding the minimum or maximum of a function subject to a constraint.

  • What are Lagrange multipliers?

    Lagrange multipliers are values that adjust the constraints into the optimization problem, making gradients parallel to find stationary points.

  • How is the gradient related to optimization problems?

    The gradient indicates the direction of the steepest ascent or descent, and optimization involves finding points where this gradient is zero.

  • What is the purpose of a bordered Hessian matrix?

    A bordered Hessian matrix is used to determine if a stationary point is a minimum, maximum, or saddle point.

  • Can non-analytical functions be optimized easily?

    Non-analytical functions often require numerical methods like gradient descent to find optimal points.

  • What's the role of MATLAB in solving optimization problems?

    MATLAB can solve optimization problems programmatically using functions like 'optimproblem' to simplify the process.

  • What happens if multiple stationary points exist?

    Multiple stationary points, including local minima or maxima, require testing to find global optima.

  • What is an unconstrained optimization problem?

    Unconstrained optimization allows choosing any values to minimize or maximize a function without additional constraints.

  • How does MATLAB output the solution to an optimization problem?

    MATLAB outputs optimal points and Lagrange multipliers as part of its solution when using functions like 'optimproblem'.

  • What signifies the end of a search in gradient descent?

    Gradient descent ends when it reaches a stationary point, where the gradient is zero.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:00
  • 00:00:00
    In this video, I want to show you a really cool way
  • 00:00:02
    to solve a constrained optimization problem, which
  • 00:00:05
    is a problem where you're trying to find
  • 00:00:07
    the minimum or the maximum of a function,
  • 00:00:09
    like J of x, and it's subject to a constraint, c of x equals 0.
  • 00:00:14
    And the solution involves this equation down here,
  • 00:00:17
    where the inner portion is called the Lagrangian.
  • 00:00:20
    And lambda is called the Lagrange multiplier.
  • 00:00:24
    So what is this equation doing?
  • 00:00:26
    And how does it help us solve constrained optimization?
  • 00:00:29
    Well, it's actually rather intuitive and kind of amazing.
  • 00:00:32
    So I hope you stick around for this really quick explanation.
  • 00:00:36
    I'm Brian.
  • 00:00:36
    And welcome to a Matlab Tech Talk.
  • 00:00:40
    Let's look at an optimization problem
  • 00:00:42
    with an objective function, J of x equals x squared, which is
  • 00:00:46
    plotted over here on this graph.
  • 00:00:48
    And the question is, what value of x produces the minimum J?
  • 00:00:53
    And obviously, looking at this graph, it's x equals 0.
  • 00:00:56
    But what we want to know is, how can we
  • 00:00:57
    determine this mathematically?
  • 00:01:00
    Well, one thing we can notice is that the slope of the function
  • 00:01:03
    is non-0 everywhere, except at the minimum point, where
  • 00:01:07
    it's flat.
  • 00:01:08
    In fact, minimum and maximum points
  • 00:01:11
    must exist where the slope is flat.
  • 00:01:13
    Because if the function was increasing or decreasing still,
  • 00:01:17
    well, then that point, wouldn't be the most extreme.
  • 00:01:20
    And we can tell if a flat point is
  • 00:01:23
    a minimum if the rate of change of the slope is positive
  • 00:01:27
    or if everything is curving up and away from it.
  • 00:01:30
    And it's a maximum point if everything is curving down.
  • 00:01:35
    Therefore, for J equals x squared,
  • 00:01:37
    we can solve for where the slope is
  • 00:01:39
    0 by taking the derivative of the function with respect
  • 00:01:42
    to x and then just setting that derivative equal to 0.
  • 00:01:46
    And then we can determine that the rate of change of the slope
  • 00:01:49
    is positive by just looking at the second derivative
  • 00:01:52
    at this point.
  • 00:01:55
    Now, similarly, we can show that x
  • 00:01:57
    equals 0 is a maximum point for J
  • 00:01:59
    equals negative x squared since the second derivative is
  • 00:02:03
    negative.
  • 00:02:04
    And therefore, we can deduce that the output is curving down
  • 00:02:07
    from this one point, x equals 0.
  • 00:02:10
    And then, finally, we can show that the flat part of J
  • 00:02:12
    equals x cubed is neither a maximum or a minimum point.
  • 00:02:15
    Since the second derivative is 0 at this point.
  • 00:02:19
    It's neither positive or negative.
  • 00:02:22
    And the idea of looking at the slope
  • 00:02:24
    can be expanded to higher-order functions as well.
  • 00:02:26
    For example, this function, J equals
  • 00:02:29
    x1 squared plus x2 squared.
  • 00:02:31
    We can look at this and, once again,
  • 00:02:33
    pretty easily see that the minimum J occurs
  • 00:02:35
    when x1 and x2 are both 0.
  • 00:02:37
    Now, to find this point, we once again
  • 00:02:40
    look at where the derivative is 0, or more precisely,
  • 00:02:43
    where the gradient is 0.
  • 00:02:45
    The gradient of J of x is the partial derivative
  • 00:02:48
    of the function with respect to both x1 and x2.
  • 00:02:52
    And when we do this, we get the vector 2x1 and 2x2.
  • 00:02:57
    So, really, the gradient returns a vector field
  • 00:03:00
    that denotes the direction and the magnitude
  • 00:03:02
    of the steepest slope for every possible x combination.
  • 00:03:07
    And for our bowl-shaped function,
  • 00:03:08
    the gradient looks like this.
  • 00:03:10
    There is 0 slope at the very bottom of the bowl.
  • 00:03:12
    And then it gradually becomes steeper
  • 00:03:14
    as we move towards the edge in all directions.
  • 00:03:18
    And this is the basis of all optimization problems.
  • 00:03:21
    We set up an objective function and then solve for,
  • 00:03:24
    or, in a lot of cases, we search for the locations
  • 00:03:27
    of 0 gradient.
  • 00:03:29
    And I say "search for" because, often, objective functions
  • 00:03:32
    aren't as straightforward as this.
  • 00:03:34
    And we have to hunt for the optimal point
  • 00:03:36
    with gradient descent-type algorithms.
  • 00:03:39
    And these are algorithms that find the optimal point
  • 00:03:42
    numerically, with an iterative approach by descending
  • 00:03:46
    the gradient.
  • 00:03:47
    That is, it's going in the opposite direction
  • 00:03:50
    of the arrows until you reach a stationary point.
  • 00:03:55
    All right, this is probably a good time
  • 00:03:56
    to bring up the issue of local and global minimum.
  • 00:03:59
    If we're looking for stationary points using gradient descent,
  • 00:04:03
    then where that algorithm starts its search
  • 00:04:05
    becomes really important.
  • 00:04:07
    If there are multiple basins, then a gradient descent
  • 00:04:10
    algorithm will get stuck in a local minimum
  • 00:04:12
    if the starting state is within that basin of attraction.
  • 00:04:16
    And there's different methods for addressing this, including
  • 00:04:19
    just picking many different starting points,
  • 00:04:21
    and then running gradient descent on each one,
  • 00:04:23
    and seeing where they all end up.
  • 00:04:26
    But what's really nice about the analytical approach
  • 00:04:29
    of taking the gradient of the function
  • 00:04:30
    is that each of the stationary points
  • 00:04:32
    can be identified mathematically by setting the gradient to 0
  • 00:04:36
    and solving for x.
  • 00:04:38
    Here, we have three stationary points corresponding
  • 00:04:41
    to the local minimum, the global minimum, and the local maximum
  • 00:04:44
    that occurs between them.
  • 00:04:46
    So now, we can just check the value of the objective function
  • 00:04:49
    at each of these stationary points
  • 00:04:51
    and then pick the lowest one to find that global minimum.
  • 00:04:55
  • 00:04:58
    All right, so that's the unconstrained optimization.
  • 00:05:01
    That's where we're free to choose any value that
  • 00:05:03
    minimizes our function.
  • 00:05:05
    But this video is on constrained optimization.
  • 00:05:08
    And with constrained optimization,
  • 00:05:10
    the objective function is subject
  • 00:05:11
    to an additional constraint.
  • 00:05:13
    In this case, let's say that the constraint is
  • 00:05:16
    x1 plus x2 plus 3 must equal 0.
  • 00:05:19
    So that means that we can only choose
  • 00:05:21
    combinations of x1 and x2 that lie along this line.
  • 00:05:25
    And now, if I erase this line up,
  • 00:05:27
    this means that we're really looking
  • 00:05:28
    for the minimum value along this intersecting parabola.
  • 00:05:33
    So now, how do we find this location?
  • 00:05:36
    Well, let's change the surface plot into a contour plot.
  • 00:05:40
    And so, now, we can imagine that we're hiking along this Black.
  • 00:05:43
    Line on a surface that has this contour.
  • 00:05:45
    And so, at the beginning, you're hiking
  • 00:05:47
    down pretty steeply as you cross the contour lines quickly.
  • 00:05:50
    And then it gradually gets less steep.
  • 00:05:52
    And then, right here, right when your path
  • 00:05:54
    is exactly tangent to a contour line,
  • 00:05:56
    you've reached the lowest point.
  • 00:05:58
    And then you start heading back up again.
  • 00:06:01
    So the lowest point is where the path is parallel
  • 00:06:04
    to the contour.
  • 00:06:05
    Or another way of putting it, it's
  • 00:06:07
    where the path is perpendicular to the gradient.
  • 00:06:11
    And this makes sense, right?
  • 00:06:12
    I mean, if the gradient is the direction
  • 00:06:14
    of the steepest travel, and you're walking perpendicularly
  • 00:06:17
    to that, then you're not going up or down right at that point.
  • 00:06:21
    Your path is flat.
  • 00:06:24
    Now, to calculate this point, we can do something clever.
  • 00:06:27
    We can compare the gradient of the objective function
  • 00:06:30
    with the gradient of the constraint.
  • 00:06:32
    And if the gradients or the directions
  • 00:06:35
    of the steepest slope for the two functions are parallel,
  • 00:06:39
    then this is a stationary point.
  • 00:06:41
    And therefore, it's either a maximum, a minimum,
  • 00:06:43
    or some kind of a saddle point like we had with x cubed.
  • 00:06:47
    So the gradient of the constraint
  • 00:06:49
    is the partial derivative with respect to both x1 and x2,
  • 00:06:53
    which, it turns out, is just 1 for both directions.
  • 00:06:57
    And if I plot that vector field on our graph,
  • 00:07:00
    we get these arrows that are all perpendicular
  • 00:07:03
    to the black line.
  • 00:07:05
    And notice that, at the minimum point, the gradient of J
  • 00:07:09
    is parallel to the gradient of c.
  • 00:07:12
    Now, they are pointing in different directions.
  • 00:07:14
    But they are parallel.
  • 00:07:17
    So to make them equivalent, we can simply
  • 00:07:18
    multiply the constraint by a constant, lambda.
  • 00:07:22
    And choosing different lambdas is essentially
  • 00:07:25
    scaling the gradient, either positively or negatively,
  • 00:07:28
    like I'm showing here.
  • 00:07:29
    And the idea is that we want to find the Lagrange multiplier
  • 00:07:32
    that makes the two gradients equal right at this point where
  • 00:07:36
    they are parallel.
  • 00:07:38
    OK, so, now, let me rearrange this equation.
  • 00:07:41
    And what we're left with is the equation that we started with.
  • 00:07:46
    Now, you're going to notice that if you
  • 00:07:47
    do this rearranging yourself, it should be a minus lambda
  • 00:07:51
    C. But to make the equation look better,
  • 00:07:53
    we just roll that negative into the constant, lambda, which
  • 00:07:56
    can be positive or negative.
  • 00:07:59
    All right, so the solution to the problem
  • 00:08:02
    exists where the gradient of the Lagrangian equals 0.
  • 00:08:08
    And again, this doesn't tell us if we're
  • 00:08:10
    at a maximum, a minimum, or a saddle point.
  • 00:08:13
    We know that we need some form of a second derivative
  • 00:08:15
    to determine that.
  • 00:08:17
    And generally speaking, that second derivative
  • 00:08:19
    comes in the form of a bordered Hessian matrix.
  • 00:08:22
    And I'm not going to cover this here.
  • 00:08:24
    Since I just wanted to talk about the Lagrangian itself.
  • 00:08:27
    And also, in our problem, we know
  • 00:08:29
    that the point is a minimum due to our understanding
  • 00:08:31
    of the functions.
  • 00:08:32
    But I did leave some good references for it down below.
  • 00:08:37
    All right, so now that we understand
  • 00:08:38
    why the Lagrangian is important, let's
  • 00:08:41
    use it to solve the constraint optimization problem.
  • 00:08:44
    Our Lagrangian is x1 squared plus x2
  • 00:08:47
    squared plus lambda times x1 plus x2 plus 3.
  • 00:08:51
    And the gradient gives us these three equations,
  • 00:08:54
    which we can use to solve for the three unknowns
  • 00:08:57
    by just setting each of the equations to zero.
  • 00:08:59
    And we find that the optimal point is at minus 1.5
  • 00:09:03
    and minus 1.5.
  • 00:09:04
    And the Lagrange multiplier has a value of 3.
  • 00:09:09
    And how cool is that?
  • 00:09:11
    I mean, it's just such a simple idea
  • 00:09:13
    that we can find minima and maxima
  • 00:09:16
    of constrained optimization problems
  • 00:09:18
    just by looking at where the gradients of the two functions
  • 00:09:21
    are parallel.
  • 00:09:22
  • 00:09:24
    All right, so now, before I wrap up this video,
  • 00:09:26
    I want to show you how we can actually
  • 00:09:28
    solve this example in Matlab since, more often than not,
  • 00:09:31
    you're not going to be solving it by hand.
  • 00:09:32
  • 00:09:35
    I have this live script that uses the well-named function
  • 00:09:39
    optimproblem to solve our optimization problem.
  • 00:09:43
    And here, I've defined the objective function,
  • 00:09:45
    x1 squared plus x2 squared, and then the constraint, x1 plus x2
  • 00:09:49
    plus 3 equals 0.
  • 00:09:51
    The function, sol, solves the problem.
  • 00:09:54
    And it returns several different outputs.
  • 00:09:56
    And when it runs, I'm displaying the optimal point
  • 00:09:59
    x and the Lagrange multiplier lambda.
  • 00:10:02
    And over here, we see that x is minus 1.5 and minus 1.5.
  • 00:10:07
    And lambda is 3, just as we calculated.
  • 00:10:11
    All right, so that's where I'm going to leave this video.
  • 00:10:13
    Hopefully, you have a better understanding
  • 00:10:15
    of how the Lagrangian plays a role
  • 00:10:17
    in constrained optimization problems
  • 00:10:19
    and what the Lagrange multiplier is actually doing.
  • 00:10:22
    And I've left links below to several different resources
  • 00:10:25
    and Matlab examples if you want to learn more.
  • 00:10:28
    Also, you can find all of the Tech Talk videos
  • 00:10:30
    across many different topics nicely organized
  • 00:10:33
    at MathWorks.com.
  • 00:10:34
    And if you like this video, then you
  • 00:10:36
    might be interested to learn about a popular type
  • 00:10:38
    of constrained optimization that we
  • 00:10:40
    use for control problems called the linear quadratic regulator.
  • 00:10:45
    All right, thanks for watching.
  • 00:10:47
    And I'll see you next time.
Tags
  • constrained optimization
  • Lagrangian
  • Lagrange multiplier
  • gradient
  • MATLAB
  • unconstrained optimization
  • derivative
  • gradient descent
  • stationary points
  • bordered Hessian matrix