when applying lagrangian multiplier, why is the influence of constraints always assumed to be "linear" and independent

84 Views Asked by At

This might seem to be dumb, but when I was taught to apply Lagrangian multiplier in class, the first step is always set up the Lagrangian function for single & multiple constraints

$${\displaystyle {\mathcal {L}}(x,y,\lambda )=f(x,y)-\lambda \varphi(x,y)}$$ $${\mathcal {L}}\left(x_{1},\ldots ,x_{n},\lambda _{1},\ldots ,\lambda _{M}\right)=f\left(x_{1},\ldots ,x_{n}\right)-\sum \limits _{k=1}^{M}{\lambda _{k}\varphi_{k}\left(x_{1},\ldots ,x_{n}\right)}$$

My initial understanding is that ${\mathcal {L}}$ represents the function $f$ under the sum of weighted influence from the constraint(s).


But on the second thought, if that is the case, why do we always assume the influence from constraint $\varphi(x,y)$ to be "linear"? I mean what is the problem if I choose to set up the ${\mathcal {L}}$ as below $${\displaystyle {\mathcal {L}}(x,y,\lambda )={[f(x,y)]^m}-\lambda {[\varphi(x,y)]}^n}$$


Furthermore, when there are multiple constraints, why do we assume the influence from each $\varphi$ on $f$ is independent and can be put into the form of linear sum?

1

There are 1 best solutions below

0
On BEST ANSWER

Consider a manifold $M$ satisfying all $\varphi_i(x_j)=0$ for all $i=1,\ldots,m$. Since it's a manifold, there is a local parametrization $X:q_\alpha\to x_j$. Since all $x_j(q_\alpha)$ belong to $M$ by construction. Thus the equivalent problem is to find $q_\alpha$ to optimize $f(x_j(q_\alpha))$: $$ \nabla_{q_\alpha}\left(f(x_j(q_\alpha))\right) = 0,\\ \sum_j\frac{\partial f}{\partial x_j}\frac{\partial x_j}{\partial q_\alpha} = 0. $$

Let's analyse what we have here. Vector $\nabla f = \partial f/\partial x_j$ is the gradient of $f$. Vectors $(\partial x_j/\partial q_1,\ldots,\partial x_j/\partial q_{n-m})$ are all the tangent vectors to the manifold $M$. The equation says that scalar product is zero for every $\alpha$. But if $\nabla f$ is perpendicular to all tangent vectors, it belongs to a perpendicular subspace which is a span of vectors $(\nabla \varphi_1,\ldots,\nabla \varphi_m)$. In other words, there exist coefficients $-\lambda_1,\ldots,-\lambda_m$ that: $$ \nabla f = -\sum_i\lambda_i\nabla\varphi_i,\qquad\text{or}\qquad \nabla\left(f+\sum_j\lambda_j\varphi_j\right)=0. $$

So to put it short, if functions $f,\varphi_i$ are good enough, we can replace function $f$ with a linear function and constraints $\varphi_i=0$ with hyperplanes near the point of interest. Thus, the simplest form to combine linear functions / linear constraints is a linear sum.