Let $(M,g)$ be a Lorentzian 4-dimensional manifold, in other words, $g$ has signature $(-,+,+,+)$. If $U$ is a geodesically convex set, one defines $\sigma : U\times U\to \mathbb{R}$ in the following way: since $U$ is geodesically convex, pick $x$, then there is a unique geodesic $\gamma_{x,y} : [a,b]\to U$ connecting it to $y\in U$. In that case,
$$\sigma(x,y)=\dfrac{1}{2}\int_{a}^b g(\gamma_{x,y}(t))(\gamma_{x,y}'(t),\gamma_{x,y}'(t))dt.$$
Now I want to differentiate this $\sigma$ rigorously with respect to the two parameters, namely, as a function $C^\infty(M\times M)$, in order to yield a two-point vector field.
In Physics works there is a rather non-rigorous way, taking about infinitesimal variations $\delta x^\mu$ of the curve. It is clumsy, it is not rigorous, and I really don't like it.
I want one rigorous way to differentiate this. My main problem is that the dependence on $x,y$ is implicit in $\gamma_{x,y}$.
The result in the end will be: let $\phi^\mu$ be coordinate functions on $U$. Since this is a function on $M\times M$ we have to make a difference between the first factor and the second. Adopting the admitedely bad notation of using the first half of the greek alphabet to refer to the first factor and the second half to the second factor, differentiating along $\phi^\alpha$, we have
$$g^{\alpha\beta}\nabla_\alpha \sigma(x,y)= (b-a)v^\beta$$
Where $v^\beta$ are the components of the tangent vector to $\gamma_{x,y}$ in the coordinate basis. This is terribly odd, I'm differentiating a map, and end up with dependence on tangent vector to a curve and also to the interval of parametrization of the curve.
I need some explanation on how all this can be put on rigorous grounds to make some sense. What is going on here? How does one differentiate this two-point scalar correctly?
I believe the way uses calculus of variations, but I'm unsure how.
The terrible oddness
$\def\R\{\mathbb{R}}\def\pd#1#2{\frac{\partial #1}{\partial #2}}$To see why the given formula makes sense, I think it's best to start with the Euclidean case to get a feel for what's going on. (You could do exactly the same thing with Minkowski space, but since I'm going for intuition here I'm going to stick to the most familiar.)
Thus we suppose for now that $U = M = \R^n$ with the standard metric, so that we can explicitly write down the squared distance function: it's the scalar $\sigma : \R^n \times \R^n \to \R$ defined by $$\sigma(x,y) = \frac12 |x-y|^2 = \frac12 \sum_i (x^i - y^i)^2.$$
Since we have this explicit formula, we immediately know that $\sigma$ is differentiable and we can write down its partial derivatives $$\pd{\sigma}{x^i} = x^i-y^i,\;\pd{\sigma}{y^i}=y^i-x^i.$$
To connect this with the general formula you give, we need to change our perspective and think in terms of directional derivatives instead. Since $\sigma$ is defined on $M \times M$, the space of directions we can differentiate in at a point $(x,y) \in M\times M$ is the tangent space $$T_{(x,y)}(M\times M) = T_x M \times T_y M,$$ which for the Euclidean case is again simply $\R^n \times \R^n.$ If we take $(v,w)$ in this space (so we're simultaneously varying $x$ in the direction $v$ and $y$ in the direction $w$) then the corresponding directional derivative is (from the usual formula in terms of partial derivatives) $$D_{(x,y)}\sigma(v,w) = \pd\sigma{x^i}v^i+\pd\sigma{y^i}w^i= (x^i - y^i)(v^i - w^i) = (x-y)\cdot(v-w).$$ We can recover the formula you gave by choosing $w=0$ (i.e. only varying the point $x$) and noting that if we require unit-speed parametrization on $[a,b]$, the unit tangent vector is $(x-y)/(b-a)$. For intuition, think about what this implies about the point distance function $d_y(x) = \sqrt{2\sigma(x,y)}$: the chain rule makes this become $$D_xd_y (v) = \frac{x-y}{|x-y|}\cdot v,$$ i.e. the projection of $v$ on to the line through $x$ and $y$. If you draw a picture you should be able to get some geometric intuition for this. The fact that the formula for $D\sigma$ depends on the length of $(b-a)$ is just a consequence of the squaring: as you move $x$ further from $y$, all slopes of the parabola $\sigma$ are becoming steeper.
The general proof
The key is knowing the first variation formula. Here's the version for the energy functional $E(\gamma) = \frac12 \int |\dot \gamma|^2,$ which works for general semi-Riemannian manifolds:
Now, by your convexity assumption we know that geodesics joining pairs of points are unique; so if we produce a geodesic $\gamma_s$ joining $x_s$ to $y_s$ we can immediately conclude $\sigma(x_s,y_s) = E(\gamma_s).$ Thus, to determine $D_{(x,y)}\sigma(v,w)$ we can just compute the first variation of a family of geodesics$^1$ $\gamma_s$ such that $\gamma_0$ is the geodesic from $x$ to $y$, $V(a) = v$ and $V(b) = w$. It turns out this information is all we need to compute the derivative - using the formula I quoted above we get (noting $\ddot \gamma_0 = 0$ since it's a geodesic) $$D_{(x,y)}\sigma(v,w) = g(w, \dot \gamma_0(b)) - g(v, \dot \gamma_0(a)).$$ In the case $w=0$, choosing $v = \partial_\beta$ we can write this in your index notation as $$\nabla_\beta \sigma = -g_{\beta \alpha} \dot \gamma_0^\alpha(a).$$ We've lost the scale $b-a$ here, which I believe is due to the (in general incorrect) conflation of the squared length $\frac12 \left(\int_a^b |\dot \gamma|\right)^2$ and the energy $\frac12 \int_a^b |\dot \gamma|^2$. For geodesics (which have constant $|\dot \gamma|$) these differ exactly by the multiplicative constant $b-a$. Alternatively you can just fix $[a,b] = [0,1]$ and everything is fine.