This question is related to these two questions of mine: Intuition or motivation for the definition of an hypersurface. What are we actually trying to define? and Understanding this very generic divergence theorem where the open set have border $C^k$
The definition of an hypersurface requires us to have an open $U_0$ around the point $x_0$ on the surface $S$. This open $U_0$ must be diffeomorph to something and the intersection with the surface (in this case a 'circle' must have image $0$.
My book then says that the vector $\nabla \phi(x)$ is perpendicular to $S$ at $x$.. This is the part I don't understand.
In my drawing I did a cube which is diffeomorph to the sphere $U_0$ by a function $\phi$ such that the point $\phi(x_0) = 0$ on the cube. Below the cube I've draw the points on the cube such that the height is $0$, so this square contains $\phi(x_0)$.
The user timur asked me to imagine the inverse image of $X = \phi^{-1}$ on the square, that is, when the height is $0$. So it's $\phi^{-1}(\{(t_1,t_2,t_3) | t_3 = 0\})$. He asked me to image the inverse image when I move only $t_1$ and leave $t_2$ fixed. Here's what it's confusing me: I can image one possible parametrization that will go from the square to the circle, where one side of the square (horizontal) represents $r$, the radius, and the other side of the square (vertical) represents $\theta$, the angle, and therefore I have a polar parametrization. If I fix $r$ and walk around with $\theta$, I'll get a smaller circle inside the circle, and the derivative of this path will be indeed perpendicular to the boundary. Now if I fix $\theta$ and vary $r$, I'll have a line and its derivative will not be perpendicular to the boundary, I guess. Also there can be lots of other types of parametrizations, how do I know they will be orthogonal?

In general, assume you have a map $\phi: \mathbb{R}^n \rightarrow \mathbb{R}$, and you have a submanifold $S \subset \mathbb{R}^n$ defined as the inverse image of a regular value $t_0$ for $\phi$, $N=\phi^{-1}(t_0)$. Then you want to prove that $\nabla\phi(x_0)\bot T_{x_0}S$, i.e, $\nabla\phi(x_0)$ is orthogonal to any curve through $x_0$ in $S$ (hence to the tangent space to $S$ at $x_0$).
Just take any curve $\gamma: (-1,1)\rightarrow S$ with $\gamma(0)=x_0$. Then since $\phi$ is constant $t_0$ when restricted to $S$, we have:
$(\phi \circ \gamma )(t)=t_0, \forall t$.
Now take derivatives at $t=0$, applying the Chain Rule to get:
$0=\phi'(\gamma(0))\cdot \gamma'(0)=\nabla \phi(x_0)\cdot \gamma'(0)$.
So, the gradient of $\phi$ at any point $x_0$ is orthogonal to $S$ at $x_0$, since when multiplying vectors, the matrix product coincides with the inner product in the corresponding Euclidean space.