I think I'm having some confusions about some elementary concepts.
I'm studying the book "Calculus of Several Variables" by Lang. I realize the book is not completely rigorous but the following left me questioning some things, while I was trying to understand the author's argument about Lagrange multipliers.
See the screenshots below. The author argues that the gradient vector is the unique normal/orthogonal vector at a surface because it's normal to every differentiable curve passing through that surface.
But what if your "surface" was a line in $R^3$ - say the x axis? Then you could have two different orthogonal vectors that are normal to the surface (i.e. y and z axes). But I realize the x-axis can not be put in the form $g(x, y, z) = c$.
So what's going on here? Do points satisfying a constraint such as $g(x, y, z) = c$ always have one unique orthogonal vector at every point unlike a line in $R^3$ (in the sense the vector is orthogonal to any curve containing those points)? Can this be proven?
Is a line not a "surface" in $R^3$?
I hope what I'm saying makes sense.


The word "surface" in this context is defined in the first sentence of your screenshot. It is a result of differential geometry that such a set of points has exactly the $2$-dimensional shape that you have in mind when you hear the word "surface". It cannot be a line. See this question: in your case, since the target space is $\mathbb{R}$, $\text{grad}\ f(X)$ being non-zero is equivalent to d$_Xf$ being surjective, hence the preimage is locally a $3-1=2$-manifold.
Note also the gradient cannot be uniquely defined by the property you mention. There are infinitely many normal vectors to a surface at a given point (namely all multiples of the gradient).
I am quite surprised by your sentence "I realize the book is not completely rigorous". What do you mean? What is not rigorous?
Edit: A linear form is what you call a linear transformation in the special case where the codomain has dimension $1$. Say that the domain has dimension $n$. By the rank-nullity theorem, the kernel has to have dimension either $n$ (if the map is trivial) or $n-1$. Note that there is only one direction that is orthogonal to an $n-1$-dimensional subspace - and there is one unique orthogonal vector $v_0$ such that your linear form is the inner product $v\mapsto \left< v, v_0\right>$. That vector is nothing but the $1\times n$ matrix that represents the linear map.
In your case, when the linear form is the differential of a smooth function at a given point $p$, and provided that $\operatorname{d}_pf$ is surjective (hence has $n-1$-dimensional kernel), this one unique vector is what you call $\operatorname{grad} f$.
By smoothness, $\operatorname{d}f$ will stay surjective in a neighbourhood of $p$ and the gradient will therefore exist. In such a neighbourhood, $f$ is close to a linear map and its level sets $f^{-1}(c)$ look like $n-1$ dimensional affine subspaces, in the sense that they are affine subspaces up to a smooth change of variables.
This is known as the submersion theorem - see for instance these lecture notes, Theorem 1.