Actually, I am getting confused in the following line "$\nabla_{X_p}Y$" is the directional derivative of the vector field $Y$ along the tangent vector $X_p$.
When we say that the directional derivative of the vector field $Y$ along a curve say $\gamma$ that is $\nabla_{\dot{\gamma(t)}}Y$ so in this case we fix a point $\gamma(t)$ on the curve in the manifold and choose some other point $p$ and then we have a tangent vector $Y_p$ and then we parallel translate the tangent vector at $\gamma (t)$ say $X_{\gamma(t)}$ to point $p$.Then we look for the change and accordingly define the covariant derivative.
My question is in the case of the directional derivative along the curve we moved the vector at $\gamma(t)$ to $Y$ along the curve but what in the case of the directional derivative along the tangent vector $X_p$ mean geometrically? I mean how we are moving our point along what??
When computing the derivative of a vector field, you want to somehow construct the difference quotient between $Y_p\in T_pM$ and $Y_{γ(t)}\in T_{γ(t)}$.
Now the tangent vector spaces are not the same, they form a bundle over the manifold, which implies some continuity, but that does not allow to embed one into the other uniquely.
So what is needed is some map $A_t:T_{γ(t)}\to T_pM$, so that then the divided quotient $\frac{A_tY_{γ(t)}-Y_p}{t}$ makes sense, has a limit, and that limit does only depend, additionally to $Y$, on $γ'(0)$.
The question is now transformed on how to construct such a map that it is uniquely defined for every path $p$ in some consistent way, independent of the coordinate system used in its construction.
If the maps $A_t$ are isometric towards some Riemannian metric on $M$ and in some sense $t\mapsto A_t$ realizes the "least torsion", one can call it "parallel transport along the path". In extension, also non-metric variants of it are called parallel transport, especially on general vector bundles that have no inherent connection to the metric on the manifold.