Probability that a binomial random variable with $n=10.8\cdot 10^6$ and $p=1.1\cdot 10^{-5}$ is larger than $52$

380 Views Asked by At

Given $n=10.8\cdot 10^{6}$ independent identically distributed (i.i.d.) random variables $$X_1,\dots, X_n\sim\text{Bernoulli}(p=11\cdot10^{-6}),$$ what is the following probability? $$\mathsf P \left( X_1 + \cdots + X_n \ge 52 \right)$$


Motivation

Warning: the following contains material that may cause discomfort to some readers.

According to the United Nations Office on Drugs and Crime 2015 crime statistics, the rate of police recorded instances of sexual intercourse without valid consent in Greece in the year 2015 was $1.1$ per $100'000$ people and the population of Greece is around $10.8\cdot 10^6$.

3

There are 3 best solutions below

2
On

The expected number of occurrences would be $10800000\cdot1.1/100000\approx119$. So having at least 52 is very very likey. In fact, R tells me:

> binom.test(52,10800000,1.1/100000, alternative="greater")

        Exact binomial test

data:  52 and 10800000
number of successes = 52, number of trials = 10800000, p-value = 1
alternative hypothesis: true probability of success is greater than 1.1e-05
…

Somehow, this is confusing, since you said (before editing your question if I remember correctly) that there has been an increases in the number of occurrences. This scenario suggests that there has actually been a very significant drop:

> binom.test(52,10800000,1.1/100000)

        Exact binomial test

data:  52 and 10800000
number of successes = 52, number of trials = 10800000, p-value = 8.528e-12
alternative hypothesis: true probability of success is not equal to 1.1e-05
…
3
On

I will just focus on the mathematical aspect of the question and not comment on if and how it applies to the real-world question you mentioned above (there are some important issues raised in the comments to your question).


The formal question is this: Given $n=10.8\cdot 10^{6}$ identically distributed, independent random variables $$X_1,\dots, X_n\sim\text{Bernoulli}(p=11\cdot10^{-6}),$$ what is $$\mathsf P(X_1+\dots+X_n\ge 52)\text{?}$$

Note that $$\mathsf P(X_1+\dots+X_n\ge 52)=1-\sum_{k=0}^{51}\mathsf P(X_1+\dots+X_n=k).$$

Since $X_1+\dots+X_n$ has a binomial distribution (cf. Proof that a sum of Bernoulli rvs has Binomial distribution), $$\mathsf P(X_1+\dots+X_n=k)=\binom nk p^{k}(1-p)^{n-k}.$$

A numerical computation gives $$\mathsf P(X_1+\dots+X_n\ge 52)\ge 0.9999999999981=1-1.9 \cdot 10^{-12}.$$


If you instead change the formal question to only use half the population, then the computations remained unchanged, except that now $n=5.4\cdot 10^6$, so $$\mathsf P(X_1+\dots+X_n\ge 52)\in[0.847, 0.848].$$

7
On

Since $\operatorname E(X)=\operatorname E(X_1+\cdots+X_n) = 118.8$ and this is fairly big, approximating the binomial distribution with normal distribution with the same expected value and the same variance should give good results, provided one bears in mind that the event $\big[X\ge52\big]$ is the same as the event $\big[X>51\big]$ and uses a continuity correction that seeks the probability that the normally distributed random variable is${}>51.5.$

The variance is $npq = (10.8\times10^6\times)\times(11\times10^{-6})\times(1-11\times10^{-6}) = 118.7987. $ $$ \Pr(X>51.5) = \Pr\left(\frac{X - 118.8}{\sqrt{118.7987}} > \frac{51.5-118.8}{\sqrt{118.7987}} \right) = \Pr(Z>-6.1746) $$ i.e. we seek the probability that a standard normal random variable exceeds a number that is more than six standard deviations below the mean. For all practical purposes that is $1.$

(A crude approximation to the standard deviation is $11.$)