Understanding the Difference of Variance Formulas

67 Views Asked by At

I have been catching up on my statistics studies and have come to think of the variance of a distribution as $\frac{\sum_i{(x_i-\mu)^2}}{N}$ and this makes sense to me as a sum of the squared distance from the mean. But now when studying the binomial or Bernoulli distribution I am seeing that the variance is expressed as $$E[X-E[X]] = \sum_i{(x_i-\sum_j{x_jp(x_j)})^2p(x_i)}$$ I understand that the sample mean is X/N but the expected value is a weighted version where probability is not evenly distributed so it is $E[X] = \sum_i{x_ip(x_i)}$ but after so long of seeing the variance expressed as the first formI don't understand how this second equation makes sense or how probability fits into variance.

1

There are 1 best solutions below

3
On

So for any set of of points sampled from some pdf $p(x)$, the mean value is given by:

\begin{equation} E[X] = \sum_i{x_ip(x_i)} \end{equation}

see here: https://en.wikipedia.org/wiki/Expected_value#Finite_case

Now for the Bernoulli distribution in particular, the theoretical mean value is given by (as we have $x_0=1,x_1 = 1$):

\begin{equation} E[X] = p \end{equation}

see here: https://en.wikipedia.org/wiki/Bernoulli_distribution

The variance for Bernoulli is :

\begin{equation} Var(X) = E[X-E[X]] = p(1-p) \end{equation}

Ultimately I believe you are confusing an empirical approximation of the statistics, versus their true, theoretical values we get if we assume $X$ is distributed according to that pdf.

Empirically, the variance for any random variable $X$, assuming it is distributed accordingn to any $pdf$ is given by

\begin{equation} \frac{\sum_i{(x_i-\mu)^2}}{N} \end{equation}

And in the hopes that $X$ indeed came from the Bernoulli, we hope that as $N\rightarrow\infty$ the value of this empirical variance approaches $p(1-p)$.

I do believe you are conflating these two ideas. Especially because there is a similar form between them, especially for Bernoulli.

As for I dont get how probability fits into variance - you will find the for finite distributions probability comes into play explicitly because things are happening discretely. Also remember probability is just a model for our understanding of the world.

Implicitly when you calculate the empirical variance you are assuming each $x_i$ is sampled in proportion to how often it occurs (i.e. by it's magical, unseen, underlying distribution), because in real life this magical stuff is done for us, we dont need to include in our formulations. i.e. we each $x_i$ we observe is implicitly given to us according to some unseen $p(x_i)$. But for a theoretical model we need to model this data generation happening so we include a $p(x)$.