I am trying to understand how the Recursion Probability Formula of Mann Whitney U table under the null hypothesis is derived in this Paper, Equation 1 from section 4(given below).
$$p_{n,m}(U) = \begin{cases} \mathbb 1(U = 0), & n = 0 \text{ or } m = 0 \\\\ \displaystyle\frac{n p_{n-1,m}(U-m) + m p_{n,m-1}(U)}{n+m}, & n, m > 0, \end{cases}$$ where $n, m$ denote the group sample sizes, and $U$ is the value of the $U$ statistic. Then $p_{n,m}(U)$ describes the probability of observing $U$ for the given $n, m$.
Could someone please help? Thanks!
An Idea of how to proceed will be of great help as well.
The statistic $U$ is the number times a $y_i$ precedes an $x_j$ for groups $x_1,\dots,x_n$ and $y_1,\dots,y_m$ when they are collectively listed in ascending order. This statistic depends only on the relative ordering of $x$s and $y$s and not their actual values, so just represent the sample by a string of "x"s and "y"s with $n$ "x"s and $m$ "y"s.
There are $(m+n)!/m!n!$ such strings. Under the null hypothesis, these are all equally likely. The value $p_{nm}(U)$ is the fraction of such strings where a "y" precedes an "x" exactly $U$ times. This is made up of two subsets of strings, those that end with "x" and those that end with "y".
If we consider first the subset that end with "y", note that the initial prefixes of these strings before the final "y" have the same value of U. Because the last "y" precedes no "x"s, removing it does not change the value of the $U$ statistic. Of course, these are strings with one less $y$, so the number of them is the fraction $p_{n,m-1}(U)$ of the $(m+n-1)!/(m-1)!n!$ such strings.
Similarly, for the subset that end with "x", note that the initial prefixes have a $U$-statistic equal to $(U-m)$ because each of the $m$ "y"s in the prefix precede this final "x", and by removing it, we reduce the $U$-statistic by $m$. So, the number of such prefixes is the fraction $p_{n-1,m}(U-m)$ of the $(m+n-1)!/m!(n-1)!$ such strings.
Setting the total number of strings equal to the number of prefixes in the subsets, we have: $$p_{nm}(U)\frac{(m+n)!}{m!n!} = p_{n,m-1}(U)\frac{(m+n-1)!}{(m-1)!n!} + p_{n-1,m}(U-m)\frac{(m+n-1)!}{m!(n-1)!}$$ from which the recursion follows.