How to find the probability of an occurrence in WRS test

47 Views Asked by At

I have 2 labels ('A' and 'B') in a dataset of size 100.

50 of the instances have label 'A' and 50 have label 'B'.

The instances of both labels come from the same distribution I am trying to find the probability of Wilcoxon Rank Sum 'A' = t (for any t).

how can I calculate that (I need the formula)?

1

There are 1 best solutions below

2
On BEST ANSWER

Here is what I understand right now. For each item, you will assign the label A with some probability, say $p$, and if not A, you will assign B. We are assuming the decision for each item is made independently of the others. You are trying to find the probability that there are exactly $t$ items labeled A in a group of $n=100$ items.

Then, let $X$ be the number of A in a group of $n$. To find the probability that $x=t$, So you could

  1. pick $t$ places out of $n$ to single out the ones we will mark A -- how many ways are there to do that?
  2. weight of having all those to turn out A would be $p^t$, since each is $p$, and there are $t$ of them in total
  3. what is the weight of all the others coming out B (i.e. not A)?
  4. Multiply (1), (2), (3) together for the final answer.

For the Wilcoxon Rank Sum statistic, let $S$ denote the sum of the ranks of all elements labeled A. Note first that $$ 1 + 2 + \ldots + 50 = \frac{50 \cdot 51}{2} = 1275 $$ Clearly, $\frac{50 \cdot 51}{2} \le S \le \frac{50 \cdot 51}{2} + 50 \cdot 50$ with the lower bound achieved when A labels are the top 50 ranks, and the upper bound when they are the bottom $50$ ranks.

Because we are assuming the same distribution for A and B to begin with, the likelihood of each assignment is proportional to the number of possible arrangements of ranks to yield a particular sum $S$. So for each $t \in [1275,3775]$, let $n_t$ denote the number of integer solutions to $x_1 + x_2 + \ldots + x_{50} = t$ where $x_i \in [1,100]$ and all $x_i$ are distinct. Then, $$ \mathbb{P}[S=t] = \frac{n_t}{\sum_{t=1275}^{3775} n_t}. $$