Intuitive understanding of an alternative variance equation

213 Views Asked by At

I think I have a fairly okay understanding of what the variance of a random variable is.

It's how far we expect the squared distances of the values that the random variable takes to be from the average of that random variable.

$\mathrm{Var}(X)=\mathrm{E}[(X-\mathrm{E}[X])^2]$

In a sense, it tells us how "expected" the expected value really is.

However, there's an alternative formula I can't wrap my head around.

$\mathrm{Var}(X)=[\mathrm{E}(X^2)-(\mathrm{E}(X))^2]$

I've seen the derivation, but...is there an intuitive way to understand it?

Maybe an intuitive example that shows it? I hate that I'm just memorizing it...

Thanks!


Edit:

Well, here's what I have so far. If someone can finish it, that would be great!

Note that this formula for the variance is super intuitive in the case that the random variable is centered. That is, in the case that $E(X)$ is $0$.

That’s because in that case, the squared distances from the mean (which is $0$) ARE simple $X^2$, so that the variance is $E(X^2)$, and the second term disappears.

The variance of the random variable shouldn't change if we shift it over by some constant $+k$. That is, a centered random variable and that same uncentered random variable should have the same variance.

Okay, so let’s say we shift the distribution to the right by some amount $+\mu$. We add $\mu$ to every possible output of $X$.

The expected value of $X$ will now be $\mu$. Which means that $E^2(X)=\mu^2$.

All that’s left to show is that $E(X^2)$ also gets added $\mu^2$ to when each value gets added $\mu$ to…


Second Edit:

Alright, got it!

Continuing from the last edit, we want to show that $E(X^2)$ gets added $\mu^2$ to when each possible random-value of $X$ gets added $\mu$ to.

Let the random values that $X$ can take on be $x_1, x_2...x_n$ with probabilities $p_1, p_2...p_n$.

Right now, we have $\mathrm{Var}[X]=E[X^2]$, since $\mathrm{E}[X]=0$.

Writing that out in full, we have:

$p_1x_1^2+p_2x_2^2+...p_nx_n^2=\mathrm{Var}[X]$

Now, we shift our distribution by $+\mu$. We now want to calculate:

$\mathrm{E}([X+\mu]^2)$

Writing it out in full, and simplifying a bit, we'll get:

$p_1(x_1+\mu)^2+p_2(x_2+\mu)^2+...p_n(x_n+\mu)^2$

$p_1(x_1^2+2x_1\mu+\mu^2)+p_2(x_2^2+2x_2\mu+\mu^2)+...p_n(x_n^2+2x_n\mu+\mu^2)$

$p_1x_1^2+p_2x_2^2+...+p_nx_n^2+p_12x_1\mu^2+p_22x_2\mu+...+p_n2x_n\mu+p_1\mu^2 + p_2\mu^2+...+p_n\mu^2$

$p_1x_1^2+p_2x_2^2+...+p_nx_n^2+2\mu(p_1x_1+p_2x_2+...+p_nx_n)+\mu^2(p_1+p_2+...+p_n)$

We now recognize the three separate parts of our equation. I closed each with brackets.

$[p_1x_1^2+p_2x_2^2+...+p_nx_n^2]+[2\mu(p_1x_1+p_2x_2+...+p_nx_n)]+[\mu^2(p_1+p_2+...+p_n)]$

1.The first part is just $\mathrm{Var}[X]=E[X^2]$.

2.The second part is $2\mu*\mathrm{E}[X]$. Since $X$ was originally centered, that's $2\mu*0=0$.

3.The third part is $\mu^2$ multiplied by the sum of the probabilities, which must sum up to $1$.

Using these three pieces of information, we can simplify to:

$\mathrm{E}[(X+\mu)^2]=\mathrm{Var}[X]+\mu^2$

Which is exactly what we wanted!!!!!!!!


Let's just conclude by bringing this full-circle.

Originally, we said that the equation $\mathrm{Var}(X)=[\mathrm{E}(X^2)-(\mathrm{E}(X))^2]$ made sense when $X$ was centered.

This was because in that case $(\mathrm{E}[X])^2=0^2=0$. The mean of our random variable was $0$.

Then, the variance was simply given by $\mathrm{Var}[X]=\mathrm{E}(X^2)$. The expected squared distance from $0$, the mean.

Next, we realized that if we shifted our distribution by some amount $+\mu$, the variance of the distribution shouldn't change.

The second term in our equation would turn into $\mu^2$. So, we would be left with $\mathrm{E}[(X+\mu)^2]-\mu^2=\mathrm{Var}[X]$

That must've implied that $\mathrm{E}[(X+\mu)^2]=\mathrm{Var}[X]+\mu^2$

Finally, we showed that indeed, that was true!

Hooray! Intuitive enough for me!

1

There are 1 best solutions below

0
On

I think the barrier to an intuitive interpretation of $E(X) = \mu$ and $V(X) = E(X^2) -\mu^2$ is in finding an intuitive mental image for $E(X^2).$

It might help to consider Bernoulli $X$ with $P(X = 1) = p$ and $P(X = 0) = 1-p.$ Then $E(X^2) = E(X),$ so that $V(X) = p - p^2 = p(1-p).$

If you think of this in terms of a possibly biased coin, $E(X)$ is the probability of Heads. $V(X)$ is very small if the the coin is heavily biased to give almost all Heads or almost all Tails. The largest variance $1/4$ is for an unbiased coin, which has minimal predictability.