I have understood the intuition behing the lagrange multiplier with a single equality constraint. It basically means that at the optimum the gradient of $\nabla f(x) $ should be in the direction of $\nabla g(x)$ so $\nabla f(x) = \lambda \nabla g(x)$.
However I am having confusion when there are multiple equality constraints and the following equation holds. $$\nabla f(x) = \sum \lambda_i \nabla g_i(x) $$
This basically means that the gradient of $f(x)$ is a linear combination of the gradients of the level curves $g_i(x)=0$ at the optimum point.
The book by Kochenderfer and Wheeler (Algorithms for optimization ) has a derivation for this but I am not able to follow it properly. It basically says that, for a method with 2 constraints lets consider. $$g_{comb}(x)=g_1^2(x)+cg_2^2(x)=0$$ And then it goes on to show that $\nabla f(x)=\lambda \nabla g_{comb}(x)$ which means $$ \nabla f(x)= 2\lambda g_1(x)\nabla g_1(x)+2\lambda c g_1(x)\nabla g_2(x)$$ $$ \implies \nabla f(x)= \lambda_1\nabla g_1(x)+\lambda_2\nabla g_2(x)$$ I dont get how they considered $2\lambda g_1(x)=\lambda_1$ & $2\lambda g_2(x)=\lambda_2$ since $\lambda_1$ and $\lambda_2$ are dependent on $g_1$ and $g_2$. Also at the optimum $x^*$, $g_1(x^*)$ and $g_2(x^*)$ are zero so isn't $\lambda_1$ and $\lambda_2$ zero?
Thus they go on to generalize for $l$ multipliers. I am a bit confused by this. Can someone please let me know why this proof is correct and also why the gradient of $f$ is linear combination of all gradients of the level curves at the optimum??
