Let $P$ be a principal bundle, and $\omega$ a vector-valued 1-form on it. In standard textbooks, one define the exterior covariant derivative of $\omega$ to be
$$D\omega(X_1, X_2) = d\omega(X_1^h, X_2^h),$$
where $X_1, X_2\in TP$, and $^h$ means taking the horizontal part of the vector.
Further, one can show that in Yang-Mills theory, the field strength as the curvature two-form pulled back by a section:
$$\mathcal{F}=\sigma^*(D\omega)$$
I have zero intuition about why taking the horizontal part in the definition of the exterior covariant derivative. Why would one even think about this in the beginning? Why does this definition make sense?
In the seemly straightforward (from a physics point of view) definition of Yang-Mills field strength
$$\mathcal{F}=dA+A\wedge A$$
how could I tell to formulate this in the language of connections on fibre bundles, one would need that "horizontal part" in the definition?
The way I think about horizontality of such a form is as follows. If we have a principal $G$-bundle $P\to M$, we can take the curvature $F_A$ of a connection $A$ on $P$. But now we have information which lives on $P$, namely $F_A\in\Omega^2(P,\mathfrak{g})$, where $\mathfrak{g}$ is the Lie algebra of $G$. We want to have this information on the base manifold $M$. You should be able to find the following theorem, in any text book that discusses gauge theory: there exists an isomorphism between $\Omega^k(M,\text{Ad}(P))$ and $\Omega^k_\text{basic}(P,\mathfrak{g})$, where the space of basic forms are Ad-equivariant horizontal $k$-forms on $P$.
So if we keep in mind that we actually want data on the base manifold $M$, it makes sense to consider horizontal (and equivariant) forms. But there is generally no canonical way of choosing a horizontal sub-bundle of $TP$, and so we are left with this "gauge freedom" of choosing our own horizontal distribution. Note that the Yang-Mills equation is in fact phrased in terms of such forms on the base manifold, because $M$ is where we would have a Riemannian metric - not $P$.
As an example, consider the trivial bundle $S^1\times S^1$. Then the way I think about a horizontal distribution on this total space, i.e. the torus, is as a foliation of $S^1\times S^1$, where the leaves of the foliation are the integral submanifolds of the horizontal distribution. Each of these leaves is diffeomorphic to the base space $S^1$, and the tangent space of each of these leaves is the horizontal distribution restricted to the leaf. If $\omega\in\Omega^1_\text{basic}(P,\mathfrak{g})$, then we can restrict this to a form $\iota^*\omega\in\Omega^1(L,\mathfrak{g})$ where $L$ denotes a leaf of the foliation. The fact that $\omega$ is $\text{Ad}$-equivariant implies that this restriction does not depend on the choice of leaf in the foliation, and the fact that $\omega$ is horizontal means we lose nothing by considering it as a $1$-form on this leaf, because any component of an input vector which is not tangent to the leaf will be killed off.
When I say that $\iota^*\omega$ does not depend on the choice of leaf, I mean the following. Suppose we are at a point $x\in S^1$, i.e. on the base space. If we have two points $p_1,p_2$ in the fibre $\pi^{-1}(x)$, then we can evaluate $\omega$ on the leaf through $p_1$ or the leaf through $p_2$, and these yield the same result. Thus, from the $1$-form on $P$, we have now obtained a $1$-form on $S^1$, which is where we actually wanted it.
As a final note: a word of caution. The above picture helps (me, at least) with understanding how connections and horizontality make sense, but for a general connection there will no integral submanifolds through each point - this is because integrability of the horizontal distribution is equivalent to the connection being flat.