Suppose $\vec{x} \in \mathbb{R}^n$ and we wish to optimize $f(\vec{x})$ subject to k constraints: $g_i(\vec{x}) = c_i \quad \forall i \in \{1,...,k\}$
Now it is clear that at a local max/min, say $\vec{x}^*, \nabla f(\vec{x}^*)$ is orthogonal to the any curve $r(x)$ that satisfies the $g_i$ constraints (i.e. $\nabla f(\vec{x}^*) \cdot r^{'}(\vec{x}^*) = 0) \quad\quad \textbf{(1)}$.
Therefore, it is also clear to me that if $\nabla f(\vec{x}^*) = \sum_{i=1}^k \lambda_i \nabla g_i(\vec{x}^*)$ then $\textbf{(1)}$ will hold (since each of the $\nabla g_i(\vec{x}^*)$ are orthogonal to the curve themselves. However, I am confused as to why the converse is true.
Why is it that if $\nabla f(\vec{x}^*) \cdot r^{'}(\vec{x}^*) = 0, \nabla f(\vec{x}^*)$ MUST be in the span of the gradients of the $g_i$?