Proposition: Let $f,g_1,\dots,g_k: \mathbb R^n \to \mathbb R$ be $C^1$ functions, $M = \{x \in \mathbb R^n: g_1(x) = \dots=g_k(x) = 0\}$, $x_0 \in M$ be a conditional extremum of $f$ with respect to constraints $g_1, \dots, g_k$, and lastly suppose that $\nabla g_1(x_0), \dots, \nabla g_k(x_0)$ are linearly independent vectors.
Then $\exists \lambda_1, \dots, \lambda_k$ such that $\nabla f(x_0) = \sum_{i=1}^k\lambda_i\nabla g_i(x_0)$
Proof:
Define $H:\mathbb R^n \to \mathbb R^{k+1}$ by $H(x) = \begin{pmatrix}f(x) \\ g_1(x) \\ \vdots \\ g_k(x)\end{pmatrix}$.
Notice that $H(x_0) = \begin{pmatrix}f(x_0) \\ 0 \\ \vdots \\ 0\end{pmatrix}$, and that $D_H(x_0) = \begin{pmatrix}\nabla f(x_0) \\ \nabla g_1(x_0) \\ \vdots \\ \nabla g_k(x_0)\end{pmatrix}$.
We assumed the gradients of the $g$ functions are linearly independent at $x_0$, so the rank of this differential is either $k$ or $k+1$.
Assume it is $k+1$.
Let $U_{x_0}$ be an open neighborhood of $x_0$. wlog suppose $x_0$ is a conditional minimum, then $\forall x \in M \cap U_{x_0}: f(x) \geq f(x_0).$
Since the differential has full rank, from open mapping theorem we know $H(x_0) = \begin{pmatrix}f(x_0) \\ 0 \\ \vdots \\ 0\end{pmatrix}\in H(U_{x_0})^{\circ}$, the interior of $H(U_{x_0})$.
But then $\exists x \in U_{x_0}$ such that $H(x) = \begin{pmatrix}a \\ 0 \\ \vdots \\ 0\end{pmatrix}$ and $a<f(x_0)$.
In other words, $\exists x \in M$ such that $f(x) = a < f(x_0)$, which contradicts $x_0$ being conditional minimum. Hence the rank has to be $k$ which proves our point.
My problem
I don't understand why the very last line is true. Just because $x \in U_{x_0}$ doesn't mean it is in $M$. Infact it is possible that $M \cap U_{x_0} = \{x_0\}$ no? That case seems problematic in our proof.
I would emphasize the constrained critical point idea. You have some $k$ independent gradient vectors for the $g_i \; .$ These are all orthogonal to $M,$ and span the normal space to $M$ at the point of interest.
If the gradient of $f$ is actually zero, or is at least a sum of the $g$ gradients, then it is orthogonal to $M; \;$ it has no component tangent to $M.$
If the gradient of $F$ is not the sum of $g$ gradients, the projection of the gradient of $f$ onto the tangent space at the point of interest is nonzero. There is a tangent vector $v$ at the supposed constrained extremum, such that $v \cdot \nabla f \neq 0 \; .$ Furthermore, there is a curve contained within $M$ going through the point, with $\gamma' = v,$ so that the derivative of $f$ along the curve $\gamma(t)$ is nonzero, so it cannot be a conditional extremum.