This Wikipedia article explains how the Riemann curvature tensor is a measure of the failure for a tangent vector to parallel translate back to itself along an infinitesimally small loop. The article gives $X,Y,Z\in\Gamma(TM)$ and lets $tX$ and $tY$ denote the integral curves of $X$ and $Y$ respectively. They let $$\tau_{tX}:T_pM\to T_pM \ \ \ \ \ \ \ \text{ and } \ \ \ \ \ \ \ \tau_{tY}:T_pM\to T_pM$$ denote the parallel transport maps along the respective integral curves.
I am hoping someone can explain the computation giving this equality:
$$\left.\frac{d}{ds}\frac{d}{dt}\right|_{t,s=0}\tau_{sX}^{-1}\tau_{tY}^{-1}\tau_{sX}\tau_{tY}Z=\left(\nabla_X\nabla_Y-\nabla_Y\nabla_X-\nabla_{[X,Y]}\right)Z.$$
I understand that if $P_\gamma$ denotes parallel transport along a curve $\gamma(t)$ then $$\nabla_{\gamma^\prime(t)}Z=\left.\frac{d}{dt}\right|_{t=0}P_{\gamma(t)}Z$$but there is some crazy chain rule stuff going on in the curvature expression that I can't seem to get right.
This question is old but since the above answer was not very helpful for me, I figured I should post the one I got.
I denote $\tau_{tX}$ by $\tau_t^X$. Note that $\left(\tau_t^X\right)^{-1} = \tau_{-t}^X$. Also, denote $g(t_1,t_2,t_3,t_4) = \tau^X_{t_1}\tau_{t_2}^Y\tau_{t_3}^X\tau_{t_4}^YZ$. Then : \begin{align*} \left.\frac{d}{dt}\right\vert_0\left.\frac{d}{ds}\right\vert_0 \left(\tau^X_t\right)^{-1}\left(\tau_s^Y\right)^{-1}\tau_t^X\tau_s^YZ =& \left.\frac{d}{dt}\right\vert_0\left.\frac{d}{ds}\right\vert_0 g(-t,-s,t,s)\\ =& \left.\frac{d}{dt}\right\vert_0 \left[-\frac{\partial g}{\partial t_2}(-t,0,t,0) + \frac{\partial g}{\partial t_4}(-t,0,t,0)\right]\\ =& \frac{\partial^2 g}{\partial t_1\partial t_2}(0) - \frac{\partial^2 g}{\partial t_3\partial t_2}(0) - \frac{\partial^2 g}{\partial t_1\partial t_4}(0) + \frac{\partial^2 g}{\partial t_3\partial t_4}(0)\\ =& \left.\frac{d}{dt}\right\vert_0\left.\frac{d}{ds}\right\vert_0 \Big[\tau^X_{t}\tau_{s}^Y\tau_{0}^X\tau_{0}^YZ - \tau^X_{0}\tau_{s}^Y\tau_{t}^X\tau_{0}^YZ\\ &- \tau^X_{t}\tau_{0}^Y\tau_{0}^X\tau_{s}^YZ +\tau^X_{0}\tau_{0}^Y\tau_{t}^X\tau_{s}^YZ \Big]\\ =& \left.\frac{d}{dt}\right\vert_0\left.\frac{d}{ds}\right\vert_0 \Big[\tau^X_{-t}\tau_{-s}^Y\tau_{0}^X\tau_{0}^YZ - \tau^X_{0}\tau_{-s}^Y\tau_{-t}^X\tau_{0}^YZ\\ &- \tau^X_{-t}\tau_{0}^Y\tau_{0}^X\tau_{-s}^YZ +\tau^X_{0}\tau_{0}^Y\tau_{-t}^X\tau_{-s}^YZ \Big](-1)^2\\ =& \nabla_X\nabla_Y Z-\nabla_Y\nabla_X Z-\nabla_X\nabla_Y Z+\nabla_X\nabla_Y Z\\ =& \nabla_X\nabla_Y Z-\nabla_Y\nabla_X Z = R(X,Y)Z, \end{align*} where we recall that in the Wikipedia's article, it is assumed that the vector fields commute, i.e. $[X,Y]=0$.