I'm currently trying to understand the method of Lagrange Multipliers. The explanation I'm currently looking at says something along the lines of
"Suppose we wish to minimise the function $f(x,y)$ subject to the constraint $g(x,y)=0$, and that this minimum is the point $(x_{0}, y_{0})$. Then $\nabla f(x_{0}, y_{0})$ is the normal to the function $f$ at this point. Furthermore, the normal vectors of $f$ and $g$ are are parallel. Thus, $\nabla f(x_{0}, y_{0})=\lambda \nabla g(x_{0}, y_{0})$."
(Source: http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html)
I really don't understand why the normal vectors of $f$ and $g$ are are parallel, or how this gives rise to the equation $\nabla f(x_{0}, y_{0})=\lambda \nabla g(x_{0}, y_{0})$.
Could someone please explain this to me?
Many thanks.
I’ve always thought of it in terms of differentials and tangent vectors instead of gradients. If $f(\mathbf P)$ has an extremum at the point $\mathbf P_0$ then $\mathrm df_{\mathbf P_0}(\mathbf v)=0$ for any vector $\mathbf v$ that’s tangent to the curve $g(\mathbf P)=\text{constant}$. This is a level curve of $g$, so $\mathbf v$ also satisfies $\mathrm dg(\mathbf v)=0$. That means that at $\mathbf P_0$, $\mathrm df_{\mathbf P_0}$ must be a multiple of $\mathrm dg_{\mathbf P_0}$, say $\mathrm df=\lambda\mathrm dg$.
This translates pretty directly to gradients, which are orthogonal to tangents. The gradient of a function is always normal to its level curves, so $\nabla g$ is everywhere normal to the constraint curve $g(x,y)=0$. Now, the gradient of $f$ at a point gives the direction of fastest increase. The amount of change in other directions is $\nabla f\cdot\mathbf u$, where $\mathbf u$ is a unit vector that specifies the direction. That dot product is $0$ when $\mathbf u$ is orthogonal to $\nabla f$. So, for $f$’s value along some curve to be stationary (i.e., have a local extremum) at some point, $\nabla f$ must be normal to the curve there, i.e., parallel to $\nabla g$. This means that it must be some scalar multiple of $g$’s gradient, i.e., $\nabla f=\lambda\nabla g$.