Variance in a Bernoulli distribution.

1.3k Views Asked by At

I'm having trouble understanding the definition of variance of a Bernoulli distribution. I thought that the variance was the sum of the squared absolute values of each data point's distance from the mean divided by the number of distributions. However, I see a different definition of the variance for Bernoulli distributions:

In general, it is useful to think about a Bernoulli random variable as a random process with only two outcomes: a success or failure. Then we build our mathematical framework using the numerical labels 1 and 0 for successes and failures, respectively. If p is the true probability of a success, then the mean of a Bernoulli random variable X is given by:

$$ \mu = E[X] = P(X = 0) * 0 + P(X = 1) * 1 $$ $$ = (1 p) ⇥ 0 + p ⇥ 1 = 0 + p = p $$

Similarly, the variance of X can be computed:

$$ \sigma ^2 = P(X = 0)(0 - p)^2 + P(X = 1)(1 - p)^2 $$

$$ p = (1 - p)p^2 + p(1 - p)^2 $$

$$ = p(1 - p) \ $$

What is going on above? That doesn't seem like the standard definition of variance. Why are we taking the probability of $X = 0$ and multiplying it by the squared difference of p from 0?

1

There are 1 best solutions below

0
On

I suspect you are confusing sample variance with variance. Taking the average squared deviation is an estimator of $\text{Var}(X)$, not $\text{Var}(X)$ itself.

Variance is defined as $$ \mathbb E[(X-\mu)^2] $$ where $\mu = \mathbb E [X]$. In the case of Bernoulli $\mu = p$, so you immediately get $$ \mathbb E[(X-p)^2] = \mathbb P(X = 0)(0-p)^2 + \mathbb P(X = 1)(1-p)^2 $$