Let $A\subset\mathbb{R}^n$ be open and let $g\in C^1 (A, \mathbb{R} )$ be a function. Fix a value $\alpha$ in the image set $g(A)$, and consider the level set of $g$ corresponding to $\alpha$: $S = \{ \vec{x} \in A \mid g( \vec{x}) = \alpha \}$.
Assume that $S$ has the following property:
(*)$\left\{ \begin{array}{c} \mbox{For every $\vec{a} \in S$, the linear span of the vectors tangent} \\ \mbox{ to $S$ at $\vec{a}$ is a linear subspace of dimension $n-1$ in $\mathbb{R}^n$. } \end{array} \right.$
Now suppose that $f$ is another function in $C^1 ( A, \mathbb{R} )$, and that $\vec{a} \in S$ is a point of local extremum for $f$ on $S$. Prove that there exists a $\lambda \in \mathbb{R}$ such that $( \nabla f ) ( \vec{a} ) = \lambda ( \nabla g ) ( \vec{a} )$.
I have some difficulty proving this result. Here's what I've tried:
It has been previously proved that, since $\vec{a}$ is a point of local extremum for $f$, $\langle \nabla f(\vec{a}),\vec{v} \rangle=0$, where $\vec{v}$ is a vector in the tangent space of $S$ at $\vec{a}$. We also know from linear algebra that the dimension of the space of $\{\nabla f(\vec{a}):\langle \nabla f(\vec{a}), \vec{v} \rangle=0\}$ is 1 (from (*)).
Now, since $\vec{x}$ is not necessarily a local extremum point of $g$, we can't assume the same inner product as above for $g(\vec{x})$ (and we probably don't need to). But neither is $\vec{a}$ necessarily a local extremum for $g$, so we can't equate something like $\langle \nabla f(\vec{a}), \vec{v} \rangle=\langle \nabla g(\vec{a}), \vec{v} \rangle$ to get the desired result. So, what do we do? How is the dimension relevant here? I think that because the dimension is $1$ in both cases, $\{\nabla g(\vec{b}):\langle \nabla g(\vec{b}), \vec{w} \rangle=0\}$ (with $\vec{b}$ being an extremum point of $g$ and $w$ being in the tangent space to $S$ at $\vec{b}$), both gradients must be proportional? Yet, a subspace of one dimension in an $n$-dimensional space need not be proportional to another 1-dimensional subspace. So how do we reconcile all the above?
Would appreciate your hints.
You need to use the implicit function theorem. Consider the equation $g(\overline{x})-\alpha=0$. To apply the implicit function theorem you need to show that $\frac{\partial g}{\partial x_i} (\overline{a})\ne 0$ for some $i$. So assume by contradiction that $\frac{\partial g}{\partial x_i} (\overline{a})= 0$ for all $i$. This contradicts the hypothesis on the tangent space at $\overline{a}$ since the tangent vectors $\overline{t}$ to $S$ at $\overline{a}$ are exactly those for which $\nabla g(\overline{a})\overline{t}=0$ and if $\nabla g(\overline{a})=0$ then the tangent space has dimension $n$. Hence, one of the partial derivatives is not zero, say the last. Then by the implicit function theorem near $\overline{a}$ you can write $S$ as $x_n=h(x_1,\dots,x_{n-1})$. In turn, you have that the function $p(x_1,\dots,x_{n-1}):=f(x_1,\dots,x_{n-1},h(x_1,\dots,x_{n-1}))$ has a local extremum at the point $(a_1,\dots,a_{n-1})$ and so the gradient of $h$ (with respect to the $n-1$ variables $x_1,\dots,x_{n-1}$) must be zero at the point $\overline{a}':=(a_1,\dots,a_{n-1})$. Now play with the chain rule.
EDIT Adding more details. You have $$0=\frac{\partial p}{\partial x_i}(\overline{a}')=\frac{\partial f}{\partial x_i}(\overline{a})+\frac{\partial f}{\partial x_n}(\overline{a})\frac{\partial h}{\partial x_i}(\overline{a}').$$ On the other hand, differentiating the equation $g(x_1,\dots,x_{n-1},h(x_1,\dots,x_{n-1}))=0$ you get $$0=\frac{\partial g}{\partial x_i}(\overline{a})+\frac{\partial g}{\partial x_n}(\overline{a})\frac{\partial h}{\partial x_i}(\overline{a}').$$ So $\frac{\partial h}{\partial x_i}(\overline{a}')=-\frac{\frac{\partial g}{\partial x_i}(\overline{a})}{\frac{\partial g}{\partial x_n}(\overline{a})}$ and by substituting this in the equation with $f$ you get$$0=\frac{\partial f}{\partial x_i}(\overline{a})-\frac{\partial f}{\partial x_n}(\overline{a})\frac{\frac{\partial g}{\partial x_i}(\overline{a})}{\frac{\partial g}{\partial x_n}(\overline{a})},$$that is $$\frac{\partial f}{\partial x_i}(\overline{a})=\frac{\frac{\partial f}{\partial x_n}(\overline{a})}{\frac{\partial g}{\partial x_n}(\overline{a})}\frac{\partial g}{\partial x_i}(\overline{a})$$ for every $i=1,\ldots,n-1$. Take $\lambda=\frac{\frac{\partial f}{\partial x_n}(\overline{a})}{\frac{\partial g}{\partial x_n}(\overline{a})}$.