Probability that sum of discrete random variables is smaller than other sum

573 Views Asked by At

Given an examplary discrete density function $f$:

Value probability
    1   15/50
    2   10/50
    3   7/50
    4   13/50
    5   5/50

What is the probability that $\sum\limits_{i=1}\limits^n\mathcal{X}_i > \sum\limits_{i=n+1}\limits^{2n}\mathcal{X}_i$ for 2n random independent variables $\mathcal{X}_i$ sampled in order $i=1,2,...,2n$ from that density?

My thought process: Let $\mathcal{X} = \sum\limits_{i=1}\limits^n\mathcal{X}_i$ and $\mathcal{Y}=\sum\limits_{i=n+1}\limits^{2n}\mathcal{X}_i$. These are two new random variables, and their density function $g$ can be computed through convolution of $f$ $n$ times with itself, e.g. for n=3 it would be $g=f*f*f$.

Now let $v_{min}=n*min(values_f)$ and $v_{max}=n*max(values_f)$. The resulting possible values of the new random variables are all $k\in[v_{min},v_{max}]$; since the values of $f$ all have a distance of 1 to the next one all values in this intervall are possible.

Now I think that $P(\mathcal{Y} < \mathcal{X})=\sum\limits_{k=v_{min}}\limits^{v_{max}}P(\mathcal{Y}<\mathcal{X} | \mathcal{Y}=k)$ = $\sum\limits_{k=v_{min}}\limits^{v_{max}}\dfrac{\sum\limits_{y=v_{min}}\limits^{k}\sum\limits_{x=v_{min}}\limits^{y}g(x)g(y)}{\sum\limits_{y=v_{min}}\limits^{k} g(y)}$.

Is that correct? I feel like I made a mistake somewhere, but can't quite catch it. Thank you!

2

There are 2 best solutions below

6
On BEST ANSWER

$P(\mathcal X < \mathcal Y) = (1 - P(\mathcal X = \mathcal Y))/2$, where $P(\mathcal X = \mathcal Y)$ is the coefficient of $x^0$ in $$A_n(x) = \left( \sum_j p_j x^j\right)^n \left(\sum_j p_j x^{-j}\right)^n$$

0
On

Comment: The following simulation in R is for $n =1000$. It simulates 100,000 $(X, Y)$ pairs. The simulated distributions of $X$ and $Y$ are very similar. The actual distributions are discrete, but nearly normal. As Commented, $P(X = Y)$ is very small, so $P(X < Y)$ is very nearly $1/2.$

m = 10^5;  x = y = numeric(m); p = c(15,10,7,13,5)/50
for(i in 1:m){
  s1 = sample(1:5, 1000, prob=p, repl=T)
  x[i] = sum( s1 )
  s2 = sample(1:5, 1000, prob=p, repl=T)
  y[i] = sum( s2 )
}
mean(x < y)
## 0.49681
mean(x == y)
## 0.0062

Below are histograms of the simulated distributions of $X, Y,$ and $X - y$. For large $n,$ normal approximations for $P(X < Y)$ and $P(X = Y)$ should be good enough to show the essence of the model for many purposes. "Best-fitting" normal densities are shown.

enter image description here