Easy way to compute $Pr[\sum_{i=1}^t X_i \geq z]$

253 Views Asked by At

We have a set of $t$ independent random variables $X_i \sim \mathrm{Bin}(n_i, p_i)$. We know that $$\mathrm{Pr}[X_i \geq z] = \sum_{j=z}^{\infty} { n_i \choose j } p_i^j (1-p_i)^{n_i -j}.$$

But is there an easy way to compute:

$$\mathrm{Pr}\left[\sum_{i=1}^t X_i \geq z\right]?$$

MY IDEAS: This should have to do something with convolution, but I am not sure.

Is it easier to compute $$\mathrm{Pr}\left[\sum_{i=1}^t X_i \leq z \right]?$$

What I thought of is maybe: $$\mathrm{Pr}\left[\sum_{i=1}^t X_i \leq z \right]=\sum_{j=1}^{z} \mathrm{Pr}\left[\sum_{i=1}^t X_i = z \right]$$ but this seems to be quite hard with $t$ random variables.

Would appreciate any hint and if you don't write an answer I am interested in whether it is too difficult or too easy?! thank you..

2

There are 2 best solutions below

7
On BEST ANSWER

The CLT is indeed applicable here as an approximation to the CDF of a sum of binomial random variables (in particular, the Lindeberg-Feller CLT) as long as we assume that the $\mathrm{Var}(X_i)=p_i(1-p_1)n_i$ are bounded from above or grow slow enough that individual variances become small compared to the variance of the entire sum (see Lindeberg Condition). This is satisfied as long as the $n_i$ don't drastically increase with $t$.

Given the above assumption (which is usually reasonable), we can use the Lindeberg-Feller CLT to approximate the sum of t binomial random variables:

Let $X_i \sim \mathrm{Bin}(n_i,p_i)$ and $s_t= \sqrt{\sum\limits_{i=1}^t n_ip_i(1-p_i)}$. Then the Lindeberg-Feller CLT states that:

$$\frac{\sum\limits_{i=1}^t (X_i-n_ip_i)}{s_t}\xrightarrow{d}\mathcal{N}(0,1).$$

Therefore, we can approximate your sum of binomials with a normal distribution:

$\frac{\sum\limits_{i=1}^t (X_i-n_ip_i)}{s_t}\xrightarrow{d}\mathcal{N}(0,1)\implies \sum\limits_{i=1}^t (X_i-n_ip_i)\xrightarrow{d} \mathcal{N}(0,s_t^2) \implies \sum\limits_{i=1}^t X_i \xrightarrow{d}\mathcal{N}\left(\sum\limits_{i=1}^t n_ip_i,s_t^2\right)$.

Thus, $\lim\limits_{t\rightarrow \infty}P\left(\sum\limits_{i=1}^t X_i \leq z\right)=\Phi\left(\frac{z-\sum\limits_{i=1}^t n_ip_i}{s_t}\right)$.

Now, the sum of these binomials is really only defined for integer values of z. Therefore, to get an approximation to $P\left(\sum_{i=1}^t X_i \geq z\right)$, just use the above approximation to estimate $1-P\left(\sum_{i=1}^t X_i < z\right)\approx 1-\Phi\left(\frac{z-1-\sum\limits_{i=1}^t n_ip_i}{s_t}\right)$

6
On

I'll give you a general idea. If you want the details, look up an article by Tomas Woersch (here).

So you have $$ S_m(n) = \sum_{k=0}^{m} \binom{n}{k}x^k = 1 + \binom{n}{1}x +\ldots + \binom{n}{m}x^m\\ \frac{S_m(n)}{\binom{n}{m}x^m} = 1 + \ldots + \frac{\binom{n}{1}x}{\binom{n}{m}x^m} + \frac{1}{\binom{n}{m}x^m} = 1 + \ldots a_{m-1}x^{1-m} +a_m x^{-m} $$ Now you need to do the following: obtain the ration of binomial coefficients I denoted with $a_k$ and then approximate then using Stirling approach. Use the fact that $$(1 - o(1))\sqrt{2 \pi n} (\frac{n}{e})^n \leq n! \leq (1+o(1))\sqrt{2 \pi n} (\frac{n}{e})^n$$