I have always read in many physics books that a valid way of intuitively introducing the covariant derivative and the connection was the following: (example in GR but same thing for gauge theories)
Let $TM$ be the tangent bundle over a manifold $M$. I define the "weird" derivative $\partial^{\mathfrak w}$ of a vector $V$ (only the element of the tangent space obtained by projecting to the second element) at $\gamma(t_o)$ as $$\partial^{\mathfrak w}_WV=\lim_{h\to 0} \frac{V_{\gamma(t_o+h)}-V_{\gamma(t_o)}}{h}$$ where $W_{\gamma(t_o)}=\dot\gamma(t_o)$ .
This derivative breaks the properties of a vector, because if the limit is to be intended as "limit of components" then transition function (jacobian) loses its meaning, whereas if intended as subtraction of differential operators, independent of coordinates, then it does not make sense because they do not take functions over the same point in $M$.
But all of this because I am exceeding out of the fiber.
In general here you read in these books that you cannot sum different vectors because they are belonging to different tangent spaces (fibers), so you introduce a covariant differentiation. But we know that the tangent space is a vector bundle and thus it admits a local trivialization, therefore the fibers are locally isomorphic to $\mathbb R^n$, so the two spaces are indeed isomorphic.
$\textbf{Question}$: I would say that I can define such a weird derivative in the sense of components in $\mathbb R^n$, I would simply go out of the fiber and thus I can no longer talk about vectors and change of charts. What do you think? (Btw, in components this derivative would be the ordinary derivative $\partial_\mu$ that we find in the covariant derivative $\nabla_\mu V^\nu=\partial_\mu V^\nu+\Gamma_{\mu\alpha}^\nu V^\alpha$ .)