Let $E\overset{\pi}{\twoheadrightarrow}M$ be a fiber bundle, and $\sigma$ be a smooth local section, i.e. $\pi\circ\sigma=\mathrm{id}$. For $\forall X\in T_{x}M$, if $E$ is trivial, then one can talk about the directional derivative $d\sigma(x)(X)$, where $d\sigma(x)$ is a tangent map $T_{x}M\rightarrow T_{\sigma(x)}E$. This is because there's an obvious way to parallelly transport each element in $E$.
For example, let $\gamma(t)$ is a curve in $M$, such that $\gamma(0)=x$, and $\gamma^{\prime}(0)=X$, then, since $E\overset{\pi}{\twoheadrightarrow}M$ is trivial, one has $$d\sigma(x)(X)=\frac{d}{dt}\Bigg|_{t=0}\sigma(\gamma(t))=\lim_{t\rightarrow 0}\frac{\sigma(\gamma(t))-\sigma(\gamma(0))}{t},$$
where $\sigma(\gamma(t))\in E_{\gamma(t)}$. It makes sense because $E_{\gamma(t)}$ is isomorphic to $E_{\gamma(0)}$ in a canonical way.
In general, if $E$ is none trivial, then one has to use the covariant differential $\nabla\sigma(x)$ to define the tangent map. The covariant derivative is usually defined by introducing the parallel transport:
For each curve $\gamma(t)$ in manifold $M$, the collections of diffeomorphisms $$\Gamma(\gamma)_{s}^{t}: E_{\gamma(s)}\rightarrow E_{\gamma(t)}$$ such that \begin{align} &1.\,\,\,\Gamma(\gamma)_{s}^{s}=\mathrm{Id}_{E_{\gamma(s)}} \\ &2.\,\,\,\Gamma(\gamma)_{\epsilon}^{t}\circ\Gamma(\gamma)_{s}^{\epsilon}=\Gamma(\gamma)_{s}^{t} \\ &3.\,\,\,\Gamma(\gamma)_{s}^{t}\,\,\,\mathrm{depends\,\,on\,\,\gamma,\,\,s,\,\,\mathrm{and}\,\,t\,\,\mathrm{smoothly}.} \end{align}
Then, for a given curve $\gamma(t)$ in M, such that $\gamma(t)=x$, and $\gamma^{\prime}(t)=X$, one defines the covariant derivative $$\nabla\sigma(x)(X)=\nabla\sigma(x)(\gamma^{\prime}(t))\equiv\nabla_{X}\sigma(x)\equiv\frac{d}{d\epsilon}\Bigg|_{\epsilon=0}\Gamma(\gamma)_{t+\epsilon}^{t}\circ\sigma(\gamma(t+\epsilon)).$$
What confused me a lot recently is that under the diffeomorphism $\Gamma(\gamma)_{t+\epsilon}^{t}: E_{\gamma(t+\epsilon)}\rightarrow E_{\gamma(t)}$, the section $\sigma(\gamma(t+\epsilon))\in E_{\gamma(t+\epsilon)}$ is mapped to another section $\varsigma_{\epsilon}(\gamma(t))\in E_{\gamma(t)}$. For convenience, I denote $$\frac{d}{d\epsilon}\Bigg|_{\epsilon=0}\varsigma_{\epsilon}(\gamma(t))=\xi(t)\in T_{\varsigma_{\epsilon}(\gamma(t))}E.$$
Then, from the canonical projection $E\overset{\pi}{\twoheadrightarrow}M$, which locally gives $$\pi\circ\sigma=\pi\circ\varsigma_{\epsilon}=\mathrm{id},$$
one has $$d\pi(\xi(t))=\frac{d}{d\epsilon}\Bigg|_{\epsilon=0}\pi(\varsigma_{\epsilon}(\gamma(t)))=\frac{d}{d\epsilon}\Bigg|_{\epsilon=0}\gamma(t)=0.$$
In other words, it seems to me that the push forward $d\pi(\nabla\sigma(x)(X))=0$.
Or, if one views the covariant differential $\nabla\sigma(x)$ as a tangent map $T_{x}M\rightarrow T_{\sigma(x)}E$, then the above calculation really showed that the covariant derivative actually maps the tangent vector $X\in T_{x}M$ to the vertical subspace $V_{\sigma(x)}E$. i.e. $$\nabla\sigma(x): T_{x}M\rightarrow V_{\sigma(x)}E.$$ Is that correct?
I found the same claim from several sources. For example, in this lecture notes, Chris Wendl also claimed that the covariant derivative is the vertical part of the tangent map. So I believe that my understanding of the covariant derivative is correct.
In the following, I will prove the result for a vector bundle from a different perspective. From this Wikipedia page, given a connection $\nabla$ on a vector bundle $E$ over $M$, and $\gamma(t)$ a smooth curve in $M$, a section $\sigma$ of $E$ is called parallel if $$\nabla_{\dot{\gamma}(t)}\sigma=0. \tag{0}$$
This is supposed to be equivalent to the definition of $\Gamma(\gamma)_{s}^{t}$ of parallel transport in a vector bundle. I will use this equivalence to prove that for a generic section, its covariant derivative is indeed vertical.
Starting from equation (0), locally, in an neighborhood $U\subset M$ with local coordinates $\left\{q^{\mu}\right\}$, the parallel section $\sigma$ can be expressed as $\sigma|_{U}=\sigma^{i}e_{i}$, where $\left\{e_{i}\right\}_{i=1,\cdots,m}$ is a local frame. For convenience, I will denote $\sigma^{i}(\gamma(t))\equiv\sigma(t)^{i}$, and $q^{\mu}(\gamma(t))=q^{\mu}(t)$. Denoting the connection $1$-form of the vector bundle $E$ by $A_{i}^{j}=A_{i\mu}^{j}dq^{\mu}$. Then, one has $$\nabla e_{i}=A_{i}^{j}e_{j}.$$
Using the Leibniz rule, one has $$\nabla_{\dot{\gamma}(t)}\sigma=\nabla_{\dot{\gamma}(t)}(\sigma^{i}e_{i})=\left(\frac{d\sigma^{i}}{dt}+\sigma^{j}\frac{dq^{\mu}}{dt}A_{j\mu}^{i}\right)e_{i}. \tag{1}$$
So the condition that the section $\sigma$ is parallel implies that the differential equation $$\frac{d\sigma^{i}}{dt}+\sigma^{j}\frac{dq^{\mu}}{dt}A_{j\mu}^{i}=0 \tag{2}$$
has a solution, which is certainly true according to the theory of ordinary differential equations.
The above statement implies that the tangent space $T_{\sigma}E$ has a decomposition
Then, finally, one has the following theorem:
The above theorem implies that the covariant differential as a tangent map is indeed vertical on a vector bundle. It acts on a section by taking ordinary differential and then eating up the horizontal component. Unsurprisingly, if the section is parallel (i.e. horizontal), its covariant derivative vanishes.