A special case of the Lagrange multiplier theorem may be stated as: Let $S, T \subset \mathbb{R}^{n}$ be open. Let $f: S \to \mathbb{R}$ be differentiable on $S$ and $g: T \to \mathbb{R}$ differentiable on $T$ such that $S \cap g^{-1}\{ 0 \}$ is not empty. If $f(x)$ is an extremum of $f$ for some $x \in S \cap g^{-1}\{ 0 \}$ and $\nabla g (x) \neq 0,$ then there is a $\lambda \in \mathbb{R}$ such that $$\nabla f (x) = \lambda \nabla g(x).$$
However, since $f(x)$ is an extremum and $S \cap g^{-1} \{ 0 \} \subset S,$ should not we have $\nabla f(x) = 0$?
The usual Fermat theorem gives you that $Df(x)=0$ simply because you can test the function in any direction around $x$. Here you can move only along directions that belong to the tangent space at $x$ to $S \cap g^{-1}(0)$, and you will not conclude that $Df(x)$ must vanish. So to speak, in constrained optimization you lack too many direction to deduce that $f$ must be "flat" at extrema: you can only conclude that the gradient must be parallel to the gradient of the contraint, and this is precisely the content of Lagrange's multiplier rule.