In my calculus III textbook, the following sentence is causing trouble for me and preventing me from understanding the theory behind Lagrange multipliers.
"Since the gradient vector for a given function is orthogonal to its level curves at any given point, for a level curve of $f$ to be tangent to the constraint curve $g(x,y) = 0$, the gradients of $f$ and $g$ must be parallel"
There are bits and pieces I understand, but I'm missing the holistic picture that will put my mind at ease. I'm quite certain that I understand that for a given curve $f(x,y)$, its gradient will be tangent to the level surface $f(x,y,z) = k$ because its directional derivative will be $0$. Specifically, I'm hung up on the idea that they must be parallel I cannot directly see how the case where they are anti-parallel isn't possible. Furthermore, I'm not sure why the constraint curve $g(x,y)$ is set to $0$ in this explanation. If someone could explain in detail the ideas behind this sentence, I would appreciate it.
The constraint curve can be given by $g(x,y)=c,$ where $c$ is any constant. But if you consider $h(x,y)=g(x,y)-c$ then you have that the curve is given by $h(x,y)=0.$ So, you always can assume that $g(x,y)=0.$
Now, since $g(x,y)=0,$ assuming that in a neigbourhood of a point it is $y=y(x),$ we get $$\frac{\partial g}{\partial x}+\frac{\partial g}{\partial y}\frac{dy}{dx}=0.$$ Since $T=\left(1,\frac{dy}{dx}\right)$ is tangent to the curve it is $T\cdot \nabla g=0,$ that is, $\nabla g$ is perpendicular to the curve.
Now, if we consider a level curve of $f$ we have the same result, that is, $\nabla f$ is perpendicular to such a curve. So, if a level curve of is tangent to the curve $g(x,y)=0$ at some point $(x_0,y_0)$ then both curves have the same tangent vector $T$ at $(x_0,y_0).$ Since $\nabla f\perp T$ (since $\nabla f$ is perpendicular to the tangent vector of its level curves) and $\nabla g\perp T,$ as we have seen, the only possibility is $\nabla f||\nabla g.$