Let $E$ be a vector bundle over connected manifold $M$ and let $\nabla$ be a connection on $E$ i.e. a map $\nabla: \Gamma^{\infty}(M,E) \to \Omega^1(M) \otimes \Gamma^{\infty}(M,E)$. It can be extended to act on whole $\Omega(M,E):=\Omega^{\bullet}(M) \otimes \Gamma^{\infty}(M,E)$. Although $\nabla$ is not $C^{\infty}(M)$ linear its square $\nabla^2$ (understood as a composition of original connection with the extended one: $\nabla \circ \nabla$) is $C^{\infty}(M)$ linear. A bundle $E$ is called flat if $\nabla^2=0$. I would like to understand why
Flatness is equivalent to the condition that the transition functions of $E$ are constant.
I know that this was discussed already, e.g. here however the explanation there is rather sketchy and I would like to see some more detailed argument (and also understand the converse implication).
You should be careful - as written, your claim is not quite correct. The zero-curvature condition $\nabla^2 = 0$ is the definition of flatness of the connection $\nabla$. We say the bundle $E$ is flat if it admits some flat connection; but there many connections on any bundle, most of which will not be flat.
Similarly, there are many bundle atlases for any given bundle, which will have different collections of transition functions; so the correct claim is the following:
We can think of a bundle atlas as a collection $(U_i, e_{(i)})$ where the $U_i$ are open sets covering $M$ and each $e_{(i)}$ is a frame field for $E|_{U_i}$, i.e. a map $e_{(i)} : U_i \times \mathbb R^k \to E|_{U_i}$ such that for each $p$, $e_{(i)}(p) : v \mapsto e_{(i)}(p,v)$ is a linear isomorphism $\mathbb R^k \to E_p$. The transition functions are $$\phi_{ij}(p) = e_{(i)}(p)^{-1} \circ e_{(j)}(p) \in GL(k,\mathbb R).$$ The correspondence between flat atlases (those with constant transition functions) and flat connections is simple: the frame fields in the atlas should be parallel. If you're given a flat connection $\nabla$, then you can build a flat atlas like so:
On the other hand, suppose you are given a flat atlas $(U_i,e_{(i)}).$ On each $U_i$ we can define a flat connection $\nabla_{(i)}$ by declaring that $e_{(i)}$ is a parallel frame. The fact that $\phi_{ij}$ is constant is then exactly the condition you need to ensure that $\nabla_{(i)}$ and $\nabla_{(j)}$ agree on $U_i \cap U_j,$ since it tells you that the vectors comprising $e_{(j)}$ can be written as a constant linear combination of those comprising $e_{(i)}$. Thus we get a well-defined global flat connection by gluing all the $\nabla_{(i)}$ together.