A simplified version of the Lagrange theorem is:
Given an objective $f: A \to \mathbb{R}^1$ and a constraint $g: A \to > \mathbb{R}^1$, wherein $C = \{ x \in A | \,\, g = 0 \}$ is the constraint region, such that $f, g$ have continuous partial derivatives in $C$ and if $\nabla g \not= 0$ on $C$, then we have $\nabla f = \lambda \nabla g$ at any local maximum or minimum constrained to $C$.
Why do we require $\nabla g \not= 0$?? Or that the rank of the Jacobian in as large as it can be when we have multiple constraints? Thank you.
You need the rank to be maximal in order that $g=0$ defines an $n-r$-dimensional surface, where you can choose $n-r$ linearly independent tangent vectors and corresponding curves. See for ex. the argument here https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwjR0M62-obnAhVTop4KHVJzB18QFjABegQICxAE&url=http%3A%2F%2Fwww.math.cmu.edu%2F~gautam%2Fsj%2Fteaching%2F2016-17%2F269-vector-analysis%2Fpdfs%2Flagrange.pdf&usg=AOvVaw2s9p-LXBGduH3qY5G7Zs-A
You need that to be able to conclude that $\nabla f(P)$ is orthogonal to all the curves, hence to the surface and, consequently, it belongs to the span of the gradients (parallel to $\nabla g(P)$ if you just have one constraint).
An example where the condition $\nabla g\neq 0$ is not met and the conclusion of the theorem is false: in 3D space take $g(x,y,z)=x^2+y^2$ and $f(x,y,z)=(x-1)^2+y^2$. Clearly, $g=0$ on the $z$-axis (not a surface) and $\nabla g=0$ there. On the other hand, $f=1$ on that line so all the points are conditional max (and min). However, $\nabla f=(2(x-1),2y,0)$ and $\nabla f=(-2,0,0)$ on $g=0$. Clearly, the condition $\nabla f+\lambda\nabla g=0$ does not hold at any point on $g=0$.