The following is from Bergmann's Introduction to the Theory of Relativity. An image of the original text is included in case I screwed up the transcription.
A definition of parallel displacement is actually possible in a comparatively simple way. Of course the value of the displaced vector depends on the original vector itself and on the direction of displacement. Let us first consider a Euclidean space, where we can introduce a Cartesian coordinate system. With respect to such a coordinate system, the law of parallel displacement takes the form $$ \delta a_{i}\equiv a_{i,k}\delta x_{k}=0\text{, (5.76)} $$ where $\delta x_{k}$represents the infinitesimal displacement. Let us now introduce an arbitrary coordinate transformation (6.46) $\left[\xi^{i}=f^{i}\left(x_{1},\dots,x_{n}\right),i=1\dots n\right]$. The vector components with respect to that new coordinate system may be denoted by a prime. Then we have \begin{align*} a_{i}= & \frac{\partial\xi^{r}}{\partial x_{i}}a_{r}^{\prime},\\ \frac{\partial a_{i}}{\partial x_{k}}= & \frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial}{\partial\xi^{s}}\left(\frac{\partial\xi^{r}}{\partial x_{i}}a_{r}^{\prime}\right)\\ = & \frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial\xi^{r}}{\partial x_{i}}\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}+\frac{\partial\xi^{s}}{\partial x_{k}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}a_{r}^{\prime}. \end{align*} The $\delta x_{k}$ transforms according to eq. (5.48) $\left[dx_{k}=\frac{\partial x_{k}}{\partial\xi^{i}}d\xi^{i}\right]$, and we obtain $$ 0=a_{i,k}\delta x_{k}=\left\{ \frac{\partial\xi^{r}}{\partial x_{i}}\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}+\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}\frac{\partial x_{l}}{\partial\xi^{s}}a_{r}^{\prime}\right\} \delta\xi^{s}\text{. (5.76a)} $$ $\frac{\partial a_{r}^{\prime}}{\partial\xi^{s}}\delta\xi^{s}$ is the actual increment of $a_{r}^{\prime}$ as a result of the displacement, and shall be denoted $\delta a_{r}^{\prime}.$ Multiplying the right-hand side of eq. (5.76a) by $\frac{\partial x_{i}}{\partial\xi^{t}},$ we get finally $$ \delta a_{t}^{\prime}=\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}a_{r}^{\prime}\delta\xi^{s}\text{. (5.77)} $$
The final expression looks like a simple application of the chain rule which would contract to give
$$0=\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}=\frac{\partial^{2}\xi^{r}}{\partial\xi^{t}\partial\xi^{s}}.$$
I typically explain away this kind of situation by using a parallel transported tangent plane and claim the $\xi^r$ being differentiated is a coordinate in the tangent plane, whereas the differentiating coordinates are in the manifold. But that's pretty hand-wavy, and doesn't have any merit in this circumstance since I can't parallel transport a tangent plane until parallel transportation has been defined. And that is the purpose of the entire derivation.


Why do you think $\frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}=\frac{\partial^{2}\xi^{r}}{\partial\xi^{t}\partial\xi^{s}}$? What you can say is \begin{align} \frac{\partial x_{i}}{\partial\xi^{t}}\frac{\partial x_{l}}{\partial\xi^{s}}\frac{\partial^{2}\xi^{r}}{\partial x_{i}\partial x_{l}}&=\frac{\partial x_i}{\partial \xi^t}\frac{\partial x_l}{\partial\xi^s}\frac{\partial^2\xi^r}{\partial x_l\partial x_i}\tag{Scharz's rule}\\ &=\frac{\partial x_i}{\partial \xi^t}\frac{\partial}{\partial \xi^s}\left(\frac{\partial\xi^r}{\partial x_i}\right). \end{align} But now what? You definitely cannot commute the $\frac{\partial}{\partial \xi^s}$ with $\frac{\partial}{\partial x_i}$. The stuff in the last two bullet points are completely different expressions and there's no reason to expect equality in general. Schwarz's rule for symmetry of mixed partials only holds for when you're working in the same coordinate system.
Even more precisely, you should think of being given a function $f:M\to\Bbb{R}$ on a smooth manifold, and two charts $(U,\alpha=(x_1,\dots, x_n))$ and $(V,\beta=(\xi^1,\dots, \xi^n))$. The symbol $\frac{\partial f}{\partial x_i}$ stands for the function$[\partial_i(f\circ\alpha^{-1})]\circ\alpha:U\to\Bbb{R}$, i.e you consider the chart representative $f\circ\alpha^{-1}$, take its $i^{th}$ partial derivative (which is then a function $\alpha[U]\subset\Bbb{R}^n\to\Bbb{R}$) and then pull it back to a function on $U\subset M$ by composing with $\alpha$. The reason Schwarz's rule doesn't apply when you mix charts is because you don't just have one function! There are two different functions involved in the game: $f\circ \alpha^{-1}$ (the local representative with respect to one chart) and $f\circ\beta^{-1}$ (the local representative with respect to the other chart). Even more explicitly:
So clearly, you're not just doing $\partial_i\partial_sg=\partial_s\partial_ig$ for a single function $g$. There are different functions involved which is why Schwarz's rule doesn't work in different coordinate systems.