What does derivation $(df_p(v))(g)=v(g \circ f)(p)$ really mean?
It's said that $df_p(v) \in T_{f(p)}(N)$ is treated as a derivation which when applied to $g$ results in the directional derivative of $g$. By the R.H.S. this derivative is defined by forming $f \circ g$ and "computing its directional derivative at $p$ using $v$". Or i.e. the directional derivative of $g$ on $N$ is defined as the directional derivative of $f \circ g$.
I don't understand how $v(g \circ f)p$ is a directional derivative. Is it perhaps shorthand for
$$\lim_{h \rightarrow} \frac{g \circ f(x+vh, y+vh,...)-g \circ f(x,y,...)}{h}$$
and $p=(x,y,...)$.
From your comments, it seems to me that you are confusing standard definition of tangent vector in differential geometry and isomorphism $T_pV\cong V$ for $V$ a vector space.
Def. (tangent vector) Let $M$ be a smooth manifold and $C^\infty(M)$ the space of smooth functions on $M$. A linear functional $v$ on $C^\infty(M)$ is called tangent vector at point $p$ if $$v(fg) = v(f)g(p)+f(p)v(g).$$
From this definition it follows that $v(g\circ f)$ should be interpreted as linear functional $v\colon C^\infty(M) \to \mathbb R$ evaluated at function $g\circ f\in C^\infty(M)$.
In a special case when the manifold is a (finite dimensional) vector space $V$, there is an isomorphism $V\to T_pV$ that sends vector $v\in V$ to linear functional $f\mapsto \frac d{dt}f(p + tv)|_{t=0}$, which is precisely directional derivative $D_vf(p)$. It is easy to check that this satisfies Leibniz rule.
Finally, for a smooth map $f\colon M\to N$, we can define a linear map $df_p\colon T_pM\to T_{f(p)}N$ in the following way: if $v\in T_pM$, then $df_p(v)$ should be a linear functional on $C^\infty(N)$, so for a smooth function $g\colon N\to \mathbb R$, we define $$(df_p(v))(g) = v(g\circ f).$$