Lagrange multipliers
In mathematical optimization problems, Lagrange multipliers are a method for dealing with constraints. Suppose the question as given is to find local extrema of a function of several variables subject to one or more constraints given by setting further functions of the variables to given values. The method introduces a new unknown scalar variable, the Lagrange multiplier, for each constraint; and forms a linear combination involving the multipliers as coefficients. This reduces the constrained problem to an unconstrained problem. It may then be solved, for example by the usual gradient method.
To explain why this has a chance of working, consider a two-dimensional case. Suppose we have a function f(x,y) to maximise, subject to
- g(x,y) = c,
- f(x,y) = d
A familiar example can be obtained from weather maps, with their contours for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).
Geometrically we translate the tangency condition to saying that the gradients of f and g are parallel vectors at the maximum. Introducing an unknown scalar λ, the gradient of
- f − λg
- f − λg
| Table of contents |
|
2 Example 3 External links |
The method of Lagrange multipliers
Let f (x1, x2, … xn) be a function defined on an n-dimensional open set.
We suppose f has continuous partial derivatives on that set.
A constraint is defined by the hypersurface which satisfies g (x1, x2, … xn) = C within the open set; where C is a constant.
Additionally,
g ≠ 0 everywhere on the curve (where
is the gradient).
Given that there is a local maximum (or minimum) of f(x) on the set {x ∈ Rn | g (x) = C}, by setting the derivative of f along the hypersurface to 0, one can show that there exists a real number λ (the Lagrange multiplier) such that
- .
In general one can apply the reasoning to several constraints, by introducing multipliers λ, μ, ... for each.
The constraint g (x) = C will have to be to be used again, in applying the new criterion to find the solutions, since as it stands the particular value of C isn't even mentioned.
Example
Suppose we wish to find the discrete probability distribution with maximal information entropy. Then
The method of Lagrange multipliers is generalized by the Kuhn-Tucker Theorem.
