In a Calculus-course in our university the students learn how to calculate stationary points of functions $f$ on constraints. For the necessary condition $\nabla \mathcal{L} = 0$ (where $\mathcal{L}$ is the Lagrange-function of the system), they get the intuitive explanation that in a local extreme point the constraint $g(x, y) = 0$ and the corresponding level curve of $f$ have to be tangent, which leads to the well-known equation $\nabla {f} = \lambda \nabla g$.
So recently they looked at $f(x, y) = xy^2$ on the unit circle $x^2 + y^2 = 1$. With the necessary condition, one easily calculates the stationary points
$$(\pm 1, 0), \quad (\pm 1 / \sqrt{3}, \pm \sqrt{2/3}).$$
Then one visualizes the situation and identifies the type of the point (extreme point or saddle point). This is shown in the picture below. A student from this course asked me now: "But in the points $(\pm 1, 0)$, there are no level lines which are tangent to the unit circle?" (Note that in the picture one sees the level lines tangent to the circle in the four latter points very well.)
After thinking a bit about this, I could not give him a satisfying answer. The level lines corresponding to the $0$-level are $x = 0$ and $y = 0$. The line $x = 0$ would be tangent to the circle, if we put it one unit to the right (respective to the left), but we can't do this.
My idea is, that the problem maybe comes from the fact that $x^2 + y^2 - 1$ does not define an implicit function in $(\pm 1, 0)$ by the implicit function theorem. When looking in my notes, I further found that $\nabla g(x, y) \neq 0$ has to be fulfilled in order to apply the necessary condition, but this should be the case here. Nevertheless, $(1, 0)$ is a local minimum and $(- 1, 0)$ is a local maximum.
So my questions are:
- Am I right with my idea about the problem? If not, what's going on here?
- Is there a possibility to explain the situation geometrically?

The intuition "both level curves must be tangent to each other" fails for the case where $\nabla f(p) = 0$, where $f$ is the function you want to study under the constraint $g^{-1}(c)$. This is due to some facts; first of all, $f^{-1}(c')$ might not even be a "nice" level-set near $p$. (i.e., it may not be a manifold, or may not have the correct dimension etc.) But let's explore further what exactly happens.
Let's recall one of the proofs that if $p$ is a point of max/min of $f|_{g^{-1}(c)}$ (and $\nabla g(p) \neq 0$), then $\nabla f(p) = \lambda \nabla g(p)$ for some $\lambda$ and see what is happening.
For that, pick a curve $\gamma$ passing through $p$ at time $0$ and that is contained in $g^{-1}(c)$. Since $p$ is a maximum (wlog), we have that $(f \circ \gamma)'(0)=0$. By the chain rule, this tells us that $\langle \nabla f(p), \gamma'(0)\rangle=0$. Since this holds for an arbitrary curve on $g^{-1}(c)$, we have that $\nabla f(p)$ is orthogonal to the tangent space of $g^{-1}(c)$. But we also know that $\nabla g(p)$ is orthogonal to the same space, so $\nabla f(p)$ is in the same line of $\nabla g(p)$, and therefore is a scalar multiple of it. It can very well be zero. Note that what we know about $\nabla f(p)$ is that it is orthogonal to the tangent space of the level-set of $g$.
In the case that $\nabla f(p)$ is not zero, we then have a well-defined smooth level set for $f$ which has $\nabla f(p)$ as an orthogonal vector to its tangent space at $p$. Then, we have two smooth level-sets which are "tangent", which means that they intersect and have the same tangent space. But this is a posteriori, relying on the fact that a $n-1$-plane in $\mathbb{R}^n$ is determined uniquely by the line which is orthogonal to it and both $\nabla f(p)$ and $\nabla g(p)$ are determining the same line.
To summarize: when we make $\nabla f = \lambda \nabla g$, we are actually searching for the points where $\nabla f$ is orthogonal to the tangent space of $g^{-1}(c)$, since those are the candidates for our minima/maxima. (Assuming, of course, no "boundary" conditions on our restriction.) If it is the case that $\nabla f$ is non-zero at such a point, only then can we guarantee that the two level curves are tangent there.