I am trying to understand Lagrange Multiplier. I think I have grasped the theory and can follow less difficult examples but I feel I am still missing full understanding. I think to optimize a function $f(\mathbf{x})$ subject to constraint $g(\mathbf{x})=C$, I can build a new function, Lagrangian, as follows:
$$L(\mathbf{x},\lambda)=f(\mathbf{x})-\lambda \left(g(\mathbf{x})-C\right)$$
If I take gradient of the Lagrangian, I'll get a vector function of derivatives ($D+1$) while $D$ is dimension of $\mathbf{x}$:
$$ \nabla L(\mathbf{x},\lambda)= \begin{bmatrix} \frac{\partial L(\mathbf{x},\lambda)}{\partial x_1} \\ \vdots\\ \frac{\partial L(\mathbf{x},\lambda)}{\partial x_d} \\ \frac{\partial L(\mathbf{x},\lambda)}{\partial \lambda} \\ \end{bmatrix} = \begin{bmatrix} \frac{\partial f(\mathbf{x})}{\partial x_1}-\lambda \left[ \frac{\partial }{\partial x_1} \left(g(\mathbf{x})-C \right) \right]\\ \vdots\\ \frac{\partial f(\mathbf{x})}{\partial x_d}-\lambda \left[ \frac{\partial }{\partial x_d} \left(g(\mathbf{x})-C \right) \right]\\ -\left(g(\mathbf{x})-C\right) \\ \end{bmatrix} =0 $$
Problem.
I believe gradients of Lagrangian are equal to 0 because we want to find a point(s) $\mathbf{x}_{\,0}$ for which gradients are proportional, having the same direction. I followed a couple of theory explanations and cannot figure out why we want gradients to have the same direction, why do we look for a "point" where gradients are proportional. If I add multiple constraints, each constraint add its own gradient, how do we find "the point of the same gradient" when we have multiple gradients from different constraints?
Example.
Also, I tried a very trivial example and I failed to understand its output. I guess it is because I don't fully understand how to apply constraints using Lagrange Multipliers. For instance:
$f(x)=(x-1)(x-5)=x^2-6x+5 \\ g(x)=x-3 \\ L(x)=x^2-6x+5 - \lambda (x-3) \\ $
This is a parabola and I was looking for max value. If I don't apply constraint, max of this function is in infinity! I tried to apply a line $g(x)=x-3$ as a constraint. I build the Lagrangian, calculated its derivatives and have everything equal to 0, I have following result:
$\begin{cases} \frac{\mathrm dL(x)}{\mathrm d x} = 2x-6-\lambda = 0\\ \frac{\mathrm dL(x)}{\mathrm d \lambda} = x-3 = 0\\ \end{cases}$
This is awkward because the first and the second equations are the same if I don't have lambda. Also, substituting second to the first cause $\lambda=0$. When I draw parabola and the line, I have two points of intersection. I don't know why the Lagrange Multiplier doesn't work here, giving me a point $x_0 \approx 5.5$.
Thanks
The Lagrangian function is a purely formal object not having any intuitive interpretation. Setting $\nabla L=0$ together with the constraint furnishes the points ${\bf x}\in S$ (the manifold defined by the constraint) where $$\nabla f({\bf x})=\lambda\nabla g({\bf x})\tag{1}$$ for some factor $\lambda$. These points are the conditionally stationary points of $f$ on $S$.
Now the condition $(1)$ has an intuitive geometric meaning: Consider a point ${\bf p}\in S$. Since $S$ is a level surface of the constraint function $g$ the gradient $\nabla g({\bf p})$ (assumed to be $\ne{\bf 0}$) is orthogonal to $S$ at ${\bf p}$, or more precisely: is the normal of the tangent hyperplane $S_{\bf p}$.
On the other hand, if ${\bf p}$ is a conditionally stationary point of $f$, then the directional derivative $$\lim_{t\to0+}{f({\bf p}+t{\bf A})-f({\bf p})\over t}=\nabla f({\bf p})\cdot{\bf A}$$ of $f$ at ${\bf p}$ is $=0$ in all allowed directions, i.e., in all directions ${\bf A}\in S_{\bf p}$. This means that $\nabla f({\bf p})\perp S_{\bf p}$, hence $\nabla f({\bf p})$ is parallel to $\nabla g({\bf p})$, and this is what $(1)$ is saying.