I'm trying to get my head around an explanation for the Lagrange Multiplier method in $2D$ or $3D$. Here is an excerpt from my applied math textbook.
Assume we find the extreme values of $f(x,y)$ subject to the constraint $g(x,y) = c$, where $c$ is a constant. If $M$ is an extreme value of $f$ at $(x_0,y_0)$, then the level curve $f(x,y) = M$ and $g(x,y) = c$ share the same tangent line at $(x_0,y_0)$. (And then the explanation proceeds...)
The question is, why exactly does the level curve $f(x,y) = M$ and $g(x,y) = c$ share the same tangent line at $(x_0,y_0)$? When I draw a picture, it seems so, but I'd like a more concrete explanation.
In multivariate optimization, the first order condition says that all the "admissible" directional derivatives of $f$ should be zero. "Admissible" means that these directional derivatives are taken along directions that you can move while preserving the constraints. So here, you can move tangent to the level curve of $g$. The reason for the first order condition is that if it weren't true, then you could move a small distance $d$ in some admissible direction $u$ and then $f$ would change by approximately $(\nabla f \cdot u) d$, as you know from directional derivatives.
Now the tangent of the level curve of $g$ is perpendicular to the gradient of $g$. This is because, if it were not, then you could move a small distance $d$ along the tangent and the value of $g$ would change approximately by $(\nabla g \cdot T) d$. So the directions that you're allowed to move are all perpendicular to the gradient of $g$. This is the only way they're constrained.
So we want $\nabla f$ to be perpendicular to all vectors which are perpendicular to $\nabla g$. This can only happen if $\nabla f$ is parallel to $\nabla g$, which is the usual way we state the Lagrange multiplier condition.
In actuality the picture is a little bit more complicated, because we can't actually move along the level curve by going off in the direction of the tangent. We'd need to go off in the direction of the tangent and then return along the direction of the normal. The idea behind resolving this issue is that if you go a short enough distance along the tangent, then the distance you have to go to return along the normal is much smaller than the distance you went along the tangent, so that if $\nabla f$ is not parallel to $\nabla g$ then the change in $f$ when you moved in the normal direction is much smaller than the change when you moved in the tangent direction. In particular the total change and the change in the tangent direction will end up having the same sign.