Approximate Average of Variance of Many Bernoulli Distributions

65 Views Asked by At

$B_{1}, ... , B_{n}$ are many (~$10^6$) random variables following Bernoulli distributions.

$p_{i} = P(B_{i} = 1)$

Define $p_{avg} = \frac{1}{N}\sum_{n} p_{i}$

Can I say: $\frac{1}{N}\sum_{n}{p_{i}(1 - p_{i})} \approx p_{avg}(1 - p_{avg})$ ?

What is the error of this approximation?

Is this approximation unbiased?

Thank you.

1

There are 1 best solutions below

2
On BEST ANSWER

Since $f(x)=x(1-x)=x-x^2$ is a strictly concave function of $x$, you will have $$\frac{1}{N}\sum_{n}{p_{i}(1 - p_{i})} \le p_\text{avg}(1 - p_\text{avg})$$ with equality only when all the $p_i$ are equal, which answers your bias question

The worst case will be when half the $p_i$ are $0$ and the other half $1$, in which case $p_\text{avg}=\frac12$ and

  • $p_\text{avg}(1 - p_\text{avg})=\frac14$, the maximum possible
  • $\frac{1}{N}\sum_{n}{p_{i}(1 - p_{i})}=0$, the minimum possible