Explanation of an Easy Proof of Variance of Bernoulli Trials

1.9k Views Asked by At

I am taking a course in Combinatorics, and I've got two proofs I can use to support the Bernoulli trial variance formula, $\operatorname{var}(X) = np(1-p)$, and I would like to use the one where I don't have to use the binomial formula and the second derivatives.

Here is the explanation from the book:

enter image description here

Ok, the individual elements are independent. Got it. Now, when I go to write this out, everything does not quite hang together for me:

$$ \text{Variance is }E(X^2)-E^2(X) \text{. Additionally, the variance of a family of independent elements }F \text{ is }\sum_{x \in F}{\operatorname{var}(x)} $$

Now, I am stuck. In this case, $X$ counts the number of successes...so $|X|\ne n$, necessarily. How do I connect all of this back to $np(1-p)$? (in other words, what is the 'trivial' calculation? (seriously, 'trivial', 'clearly', 'obviously', etc.= bad juju)).

Also, how do I go about showing that $\operatorname{var}(X_1 + X_2 + \dots + X_n) = \operatorname{var}(X_1) + \dots + \operatorname{var}(X_n)$ ? (although I am pretty sure that proving this would be deeper than is necessary for the class)

3

There are 3 best solutions below

3
On BEST ANSWER

Here is what I was having difficulty with (solved before I understood Variance, Expectation, Indicator Random Variables, and Convolution):

1) The notation.

The family $F$ of $\{X_1,...,X_n\}$ represents all the trials. However, there is a set $X$ implied by the notation $X_n$. That being said, apparently there is an idiomatic practice of using $X$ to represent both the count of successes in Bernoulli trials, an individual random variable, and $X_i$ to represent an individual Bernoulli trial (where $X$ as the set of all $X_i$ is implied by the specification of the set index notation), so the meaning of $X$ and the notation are both abused.


2) The application of the variance formula on a single Bernoulli trial.

The book actually covered my problems, but I had to search around...and after working through the problem and typing this up, I sorted it out.

The definition of the expectation of $X$, the random variable (but really, function) $X: S \rightarrow \mathbb{R}$, is:

$$ E(X) = \sum_{x \in S}{\operatorname{X}(x)\operatorname{P}(x)} $$

And this equation is constrained to a discrete space, and rewritten as a sum:

$$ E(X) = \sum_y{y*\operatorname{prob}(\operatorname{X}(x) = y)} $$

Next, variance is defined as: $E((X-E(X))^2)$, which is non-trivially shown to be $E(X^2) - E(X)^2$.

Now, I write:

$$ \operatorname{var}(X_i) = E(X_i^2) - E(X_i)^2 = \sum_{y \in X_i}{y^2 * p(X_1(x) = y)} - (\sum_{y\in X_1}{y*p(X_1(x) = y)})^2 $$

Finally, I pull out the terms,

$$ 1*p - 1*p^2 = p(1-p) $$


Now I know, because I've cleared up the notation issue, that there will be $n$ of these guys when I apply this formula to the family I fixed at the start...so now, I have $\operatorname{var}(X) = np(1-p)$.



Here is the solution I settled on After I understood variance, expectation, indicator variables and convolutions:

Let $X = {X_1,X_2,\ldots,X_n}$ be a set of indicator variables. By linearity of Expectation,

$$ \operatorname{E}(X) = \operatorname{E}(X_1) + \operatorname{E}(X_2) + \ldots + \operatorname{E}(X_n)\\ $$

Then I establish that the indicator variables are independent (and this is actually the key step that I wish was clearer, and I knew more about), so:

$$ \operatorname{E}^2(X) = \operatorname{E}^2(X_1) + \ldots\\ $$


Next,

$$ \begin{align} \operatorname{var}(X) &= \operatorname{E}(X^2) - \operatorname{E}^2(X)\\ \operatorname{E}(X^2) &= \sum_{i=1}^n\sum_{j=1}^n\operatorname{E}(X_i X_j)\\ \end{align} $$

Where $X_i X_j$ equals 1 when $i=j$ and zero otherwise for any given event with probability $p$, so the convolution rewrites as: $np$.

Finally,

$$ \sum_{i=1}^n \operatorname{E}^2(X_i) = n*p^2 $$

So, variance rewrites as:

$$ np - np^2 $$

9
On

For the time being, let us take the formula $$\operatorname{Var}[X_1 + \cdots + X_n] \overset{\text{ind}}{=} \operatorname{Var}[X_1] + \cdots + \operatorname{Var}[X_n]$$ for granted. Then since $$X = \sum_{i=1}^n X_i$$ where each $X_i$ is IID Bernoulli, namely $$\Pr[X_i = 1] = p, \quad \Pr[X_i = 0] = 1-p,$$ then it is trivial to compute $$\operatorname{E}[X_i] = 0\Pr[X_i = 0] + 1\Pr[X_i = 1] = p,$$ consequently $$\operatorname{Var}[X_i] = (0 - p)^2 \Pr[X_i = 0] + (1 - p)^2 \Pr[X_i = 1] = p^2(1-p) + (1-p)^2 p = p(1-p)(p + 1-p) = p(1-p).$$ Therefore, $$\operatorname{Var}[X] = np(1-p)$$ as claimed.


Clarification on the calculation of the variance of a Bernoulli random variable.

Recall that for a discrete random variable with support $X \in \Omega$, $$\operatorname{E}[X]= \sum_{x \in \Omega} x \Pr[X = x];$$ the sum is taken over all of the elementary outcomes of $X$. If $X$ is Bernoulli, then $\Omega = \{0,1\}$, and the above sum becomes $$\operatorname{E}[X] = 0 \Pr[X = 0] + 1 \Pr[X = 1].$$ Similarly, $$\operatorname{Var}[X] = \operatorname{E}[(X - \operatorname{E}[X])^2] = \sum_{x \in \Omega} (x - \operatorname{E}[X])^2 \Pr[X = x].$$ But since we already calculated $\operatorname{E}[X] = p$, we substitute: $$\operatorname{Var}[X] = \sum_{x \in \Omega} (x - p)^2 \Pr[X = x].$$ And again, since $\Omega = \{0,1\}$, this sum becomes $$(0 - p)^2 \Pr[X = 0] + (1 - p)^2 \Pr[X = 1].$$

3
On

$$X=\sum_{i=1}^nX_i$$ $$X^2=\left(\sum_{i=1}^nX_i\right)\left(\sum_{j=1}^nX_j\right)=\sum_{i=1}^n\sum_{j=1}^nX_iX_j$$ $$i=j\implies E(X_iX_j)=E(X_i^2)=E(X_i)=P(X_i=1)=p$$ $$i\ne j\implies E(X_iX_j)=P(X_iX_j=1)=P(X_i=1\text{ and }X_j=1)=P(X_i=1)P(X_j=1)=p^2$$ $$E(X^2)=\sum_{i=1}^n\sum_{j=1}^nE(X_iX_j)=np+(n^2-n)p^2$$ $$Var(X)=E(X^2)-E(X)^2=np+(n^2-n)p^2-(np)^2=np-np^2=np(1-p)$$