Distribution of sample variance of Bernoulli variables

2.5k Views Asked by At

I am facing the following problem, given $X_1 ... X_n$ a random sample of $Bernoulli(\theta)$ variables find the distribution of the sample variance $S^2 = \frac{1}{n} \sum_i(\bar{X} - X_i)^2$.

I have demonstrated that $S^2 = \bar{X} (1 - \bar{X})$ and i know $\bar{nX}$ has distribution $Binomial (n, \theta)$ but I have not been able to deduce the distribution of $S^2$.

2

There are 2 best solutions below

5
On

You know the distribution of $\overline X$ and you’ve correctly determined that $S^2$ is a function of $\overline X$. Then the distribution of $S^2$ is just the composition of the two: The probability of a value of $S^2$ is the sum of the probabilities of the values of $\overline X$ that are its preimages; that is, for all integers $k$ from $0$ to $\left\lfloor\frac n2\right\rfloor$ we have:

$$ P\left(S^2=\frac kn\left(1-\frac kn\right)\right)=P\left(\overline X=\frac kn\lor\overline X=1-\frac kn\right)= \begin{cases} \binom nk\theta^k(1-\theta)^k&k=\frac n2\;,\\ \binom nk\left(\theta^k(1-\theta)^{n-k}+\theta^{n-k}(1-\theta)^k\right)&\text{otherwise}\;.\\ \end{cases} $$

0
On

Noting that $\overline{X} \approx \mathcal{N}(\theta,\frac{\theta(1-\theta)}{n})$ by the Central Limit Theorem, if we apply the second-order delta method to $g(\theta) = \theta(1-\theta)$, it follows that we can get a good continuous approximation to the PMF $f_{S^{2}}$ of $S^{2}$ via

$ f_{S^{2}}(y) \approx \frac{|C|}{\sigma\sqrt{2\pi}}\sqrt{\frac{n}{ (\frac{B}{C})^{2} + 4(Cy - \frac{A}{C}) }}\left( e^{\frac{-n}{8\sigma^{2}}(\sqrt{(\frac{B}{C})^{2} + 4(Cy - \frac{A}{C}) } - \frac{B}{C})^{2} } + e^{\frac{-n}{8\sigma^{2}}(\sqrt{(\frac{B}{C})^{2} + 4(Cy - \frac{A}{C}) } + \frac{B}{C})^{2} } \right) 1_{0 < y < 1/4} $

where here $A = \overline{X}(1-\overline{X})$, $B = 1 - 2\overline{X}$, $C = -1$ and $\sigma^{2} = \overline{X}(1-\overline{X})$.

Though this is not the exact distribution of $S^{2}$ as given in Joriki's answer, you'll find this to be a very good continuous approximation to the sampling distribution of $S^{2}$, even when $n$ is quite small.