Guillemin and Pollack give quite confusing (at least for me) definition of the preimage orientation (see below).
I don't understand the part starting from the last display. Namely:
- How exactly does the last display follow?
- Why does $T_xS$ contain $\ker df_x$ and how does it follow from this that $df_x$ restricted to $N_x(S;X)$ is injective (I believe that's what is claimed)?
- At which point does this prove use the fact that every vector from $N_x(S; X)$ is orthogonal to every vector from $T_x S$?

It's not important that $N_x(S;X)$ is orthogonal to $T_xS$. It's only important that they are in direct sum. But you may find it useful for the sake of visualization/concreteness.
The reason that the kernel of $df_x$ contains $S$ is just that the kernel is spanned by the tangent vectors of curves mapped identically to $z$.
Since the kernel of $df_x$ is contained in $T_xS$ and $T_xS$ is in direct sum with $N_x$, the restriction of $df_x$ to $N_x$ is injective.
The last display consists of two assertions. One is that every element of $T_zY$ is the sum of an element of $df_xN_x$ and an element of $T_zZ$. The second is that the intersection of $T_zZ$ and $df_xN_x$ is just the zero vector. These two assertions follow from the previous equation, the fact that $T_xS$ is mapped into $T_zZ$ by $df_x$, and the fact that $T_xS$ is in direct sum with $N_x$.