I have 2 labels ('A' and 'B') in a dataset of size 100.
50 of the instances have label 'A' and 50 have label 'B'.
The instances of both labels come from the same distribution I am trying to find the probability of Wilcoxon Rank Sum 'A' = t (for any t).
how can I calculate that (I need the formula)?
Here is what I understand right now. For each item, you will assign the label
Awith some probability, say $p$, and if notA, you will assignB. We are assuming the decision for each item is made independently of the others. You are trying to find the probability that there are exactly $t$ items labeledAin a group of $n=100$ items.Then, let $X$ be the number of
Ain a group of $n$. To find the probability that $x=t$, So you couldA-- how many ways are there to do that?Awould be $p^t$, since each is $p$, and there are $t$ of them in totalB(i.e. notA)?For the Wilcoxon Rank Sum statistic, let $S$ denote the sum of the ranks of all elements labeled
A. Note first that $$ 1 + 2 + \ldots + 50 = \frac{50 \cdot 51}{2} = 1275 $$ Clearly, $\frac{50 \cdot 51}{2} \le S \le \frac{50 \cdot 51}{2} + 50 \cdot 50$ with the lower bound achieved whenAlabels are the top 50 ranks, and the upper bound when they are the bottom $50$ ranks.Because we are assuming the same distribution for
AandBto begin with, the likelihood of each assignment is proportional to the number of possible arrangements of ranks to yield a particular sum $S$. So for each $t \in [1275,3775]$, let $n_t$ denote the number of integer solutions to $x_1 + x_2 + \ldots + x_{50} = t$ where $x_i \in [1,100]$ and all $x_i$ are distinct. Then, $$ \mathbb{P}[S=t] = \frac{n_t}{\sum_{t=1275}^{3775} n_t}. $$