Now we have several independent and identically distributed random variables following the uniform distribution on the interval [0, 1].They are denoted as $x_1, x_2, x_3, ..., x_m$ and $y_1, y_2, ..., y_n$ respectively.Let $$ D_{i,j} = \begin{cases} x_i-y_j, & x_i\ge y_j \\\\ 1+x_i-y_j, & x_i<y_j \end{cases} $$
So, in reality, these points are distributed on a circle with a length of 1 and can only move clockwise. $$ \begin{aligned} F_{D_{ij}}(d) &= P(D_{ij} \leq d) \\\\ &= P(D \leq d | x_i \geq y_j)P(x_i \geq y_j) + P(D \leq d | x_i < y_j)P(x_i < y_j) \\\\ & = P(x_i-y_j \leq d , x_i \geq y_j) + P(1+x_i-y_j \leq d,x_i < y_j) \\\\ & = \iint\limits_{0\leq x_i-y_j \leq d} f(x_i,y_j)\mathrm{d}x_i \mathrm{d}y_j +\iint\limits_{x_i-y_j \leq d-1,x_i-y_j<0} f(x_i,y_j)\mathrm{d}x_i \mathrm{d}y_j \\\\ & = \frac{1}{2}-\frac{(1-d)^2}{2}+\frac{d^2}{2} \\\\ & = d,\ \ \ 0\leq d\leq 1 \\\\ \end{aligned} $$ Therefore, $D_{ij}$ also follows a uniform distribution on the interval [0, 1]. However, unfortunately, $D_{ij}$, for all $i$ and $j$, are not independent. For example, suppose $D_{11} = x_1 - y_1$ and $D_{12} = x_1 - y_2$. Then $E(D_{11}D_{12}) = E((x_1 - y_1)(x_1 - y_2)) = E(x_1^2) - E(x_1y_1) - E(x_1y_2) + E(y_1y_2) = \frac{1}{12} \neq 0$.
So it's difficult to determine the distribution function of the random variable $Y = \min(D_{11}, D_{12}, ..., D_{1n}, D_{21}, D_{22}, ..., D_{2n}, ..., D_{m1}, D_{m2}, ..., D_{mn})$ because each element inside it is not independent.
For the above question, a friendly and highly intelligent guy provided a fantastic answer in this link. That is for Y=$min(D_{11}, D_{12}, ..., D_{1n}, D_{21}, D_{22}, ..., D_{2n}, ..., D_{m1}, D_{m2}, ..., D_{mn})$, $$ P(Y\ge d) = \tbinom{m+n-1}{n}^{-1}\sum_{k=1}^{min(m,n)}\tbinom{m}{k}\tbinom{n-1}{k-1}max(0,1-kd)^{m+n-1} $$ and $$ E(Y) = \frac{1}{mn}(1-\tbinom{m+n}{n}^{-1}) $$ After numerical experiments, it has been verified that this answer is essentially completely correct.
Furthermore, I would like to investigate the expectation of the sum of distances obtained by continuously performing greedy matching in an $m \times n$ distance matrix. Let's assume that $m ≥ n$ for the purpose of this discussion.
Similarly, I would like to start exploring possible patterns by considering a small-scale example. Therefore, let's assume $n = 2$.
So for the first greedy match("GM"),the $E(GM_1) = \frac{1}{2m}(1-\tbinom{m+2}{2}^{-1})=\frac{m+3}{2(m+1)(m+2)}$.For an $m \times 2$ distance matrix, after finding the minimum value, we match the corresponding $x-point$ and $y-point$, and then remove the respective row and column because they can only be matched once.
This operation leaves us with a $(m-1) \times 1$ matrix (generally, it would be $(m-1) \times (n-1) \ matrix)$. We then continue greedily searching for the minimum value among the remaining elements in this residual matrix, which becomes the $GM_2$.
If there are no additional constraints, the distribution function of $GM_2$ should be similar to $GM_1$ because $GM_2$ can be considered as the minimum value between $(m-1)$ $x$-points and 1 $y$-point.However, it is evident that the matrix corresponding to $GM_2$ is a subset of the matrix corresponding to $GM_1$. So, in reality, what I want to solve is $E(GM_2|GM_2\ge GM_1)$.
(# I'm not particularly certain about the approach I used to calculate the expected value of the remaining matrix after removing the row and column containing the minimum value, as highlighted in bold. I would appreciate it if someone could provide some clarification on this matter. Thank you very much.)
I think I can start by solving for $E(GM_2|GM_2\ge GM_1, GM_1=g)$,which is equivalent to $E(GM_2|GM_2\ge g)$.
Without any conditions, $P(GM_2\ge d)=\tbinom{m-1}{1}^{-1}\tbinom{m-1}{1}(1-d)^{m-1}=(1-d)^{m-1}$
so $f_{GM_2}(d)=(m-1)(1-d)^{m-2}$
$$ \begin{aligned} E(GM_2|GM_2\ge g) & =\frac{\int_{g}^1 x \mathrm{d} F_{GM_2}(x) }{P(GM_2\ge g)} \\\\ & = \frac{m^{-1}(1-g)^{m-1}(gm-g+1)}{(1-g)^{m-1}}\\\\ & = \frac{gm-g+1}{m} \end{aligned} $$
because $$ P(GM_1\ge d) = \tbinom{m+n-1}{n}^{-1}\sum_{k=1}^{min(m,n)}\tbinom{m}{k}\tbinom{n-1}{k-1}max(0,1-kd)^{m+n-1} $$ so $$ f_{GM_1}(d)=-P'(GM_1\ge d) = \tbinom{m+n-1}{n}^{-1}\sum_{k=1}^{min(m,n)}\tbinom{m}{k}\tbinom{n-1}{k-1}k(m+n-1)max(0,(1-kd))^{m+n-2} $$
let $n=2$, $$ f_{GM_1}(d)=2[(1-d)^m+(m-1)max(0,1-2d)^m] $$ so $$ \begin{aligned} E(GM_2|GM_2\ge GM_1) &= \int_{0}^{1}\frac{gm-g+1}{m}f_{GM_1}(g)\mathrm{d}g\\\\ & = \int_{0}^{1}\frac{gm-g+1}{m}2[(1-g)^m+(m-1)max(0,1-2g)^m]\mathrm{d}g\\\\ & =2[\int_{0}^{1}\frac{gm-g+1}{m}(1-g)^m\mathrm{d}g + \int_{0}^{1/2}\frac{gm-g+1}{m}(m-1)(1-2g)^m\mathrm{d}g]\\\\ & = 2*\{\frac{1}{m}[\frac{(m-1)}{m^2+3m+2}+\frac{1}{m+1}] + \frac{m-1}{m}[\frac{m-1}{4(m+1)(m+2)}+\frac{1}{2(m+1)}]\} \\\\ & = \frac{3m^2+8m+1}{2m(m+1)(m+2)} \end{aligned} $$
Unfortunately, the obtained result does not match the results from experimental simulations, indicating that there may be an error. Could someone please point out where the mistake lies? Thank you very much!