Theorem 8.5
If $F : N \rightarrow M$ and $G : M \rightarrow P$ are smooth maps of manifolds and $p \in N$, then
$$(G \circ F)_{*, p} = G_{*, F(p)} \circ F_{*, p}$$
Proof
Let $X_p \in T_p N$ and $f$ be a smooth function at $G(F(p))$ in $P$. Then:
$$((G \circ F)_* X_p)f = X_p (f \circ G \circ F)$$
$$((G_* \circ F_*) X_p)f = (G_* (F_* X_p))f = (F_* X_p)(f \circ G) = X_p (f \circ G \circ F)$$
I just don't understand this:
As $X_p$ is a tangent vector, why do we write it before the functions (and not after)? What I mean is, should it rather be:
$$(f \circ G \circ F) X_p$$ for example? The same for our function $f$, shouldn't it be
$$f((G \circ F)_* X_p)$$ for example?
It seems like a simple proof, but I just can't understand the notation here, as it's strange to see the argument first and then the function.
If the definition of $T_pN$ is the equivalence class of curves, then the pushforward $F_{*,p}: T_pN \to T_{F(p)}M$ is given by composition: If $X_p = [\gamma]$ (where $[\gamma]$ denotes the equivalence class which contains $\gamma$), then $$ F_{*, p} X_p = [F\circ \gamma].$$ Note that $F\circ \gamma$ is a curve at $F(p)$, so $[F\circ \gamma]\in T_{F(p)}M$.
Taking this definition, chain rule is just a consequence of function compositions: \begin{align} (G\circ F)_{*,p} X_p &= (G\circ F)_{*,p} [\gamma] \\ &= [(G\circ F)\circ \gamma] \\ &= [G\circ (F\circ \gamma)] \\ &= G_{*, F(p)} [F\circ \gamma] \\ &= G_{*, F(p)} F_{*,p}[\gamma] \\ &=G_{*, F(p)} F_{*,p} X_p. \end{align}
In the text, they take a different definition of $T_pN$ as derivation. Using your definition, for each $v=[\gamma] \in T_pN$, one can define a linear map $D_v : C^\infty(M) \to \mathbb R$ by
$$ D_v f : = \frac{d}{dt}\bigg|_{t=0} (f\circ \gamma)(t). $$
There is a space $\mathscr D_pN $ of derivation at $p$, where each $X\in \mathscr D_pN $ is a linear map $X : C^\infty (N) \to \mathbb R$ which satisfies the product rule $$X(fg) = f(p) Xg + g(p) Xf.$$ It turns out that the map $v = [\gamma] \mapsto D_v$ is a one to one correspondence between $T_pN$ and $\mathscr D_pN$.
Under this correspondence $T_pN \Leftrightarrow \mathscr D_pN$ let's see how the pushforward $w = F_{*,p}v$ acts on functions: let $v= [\gamma]$. Then $$ w = F_{*,p}[ \gamma] = [F\circ\gamma]$$ and thus for each $g\in C^\infty (M)$,
\begin{align} D_w g &= \frac{d}{dt}\bigg|_{t=0} (g \circ (F\circ \gamma)) (t) \\ &=\frac{d}{dt}\bigg|_{t=0}((g \circ F)\circ \gamma)) (t)\\ &= D_v (g\circ F). \end{align}
That is, for each $X\in \mathscr D_pN$, the pushforward $X : \mathscr D_p N \to \mathscr D_{F(p)} M$ is defined by $$ (F_{*,p} X)g = X( F\circ g).$$
To sum up, when you think of $X \in T_pN$ as equivalence class of curves, then $Xg$ really means (1) sending $X$ to $D_X$, and (2) $Xf := D_Xf$
In Tu's book, $\mathscr D_pN$ IS the definition of tangent space (and they just denote it as $T_pN$). Thus the confusion.