Why square the term $X - \mu$ in the definiton of the variance?

820 Views Asked by At

Why is is the variance $\operatorname{Var}(X)$ of a random variable $X$ definied to be $\operatorname{E}[(X-\mu)^2]$? My professor said that you want the variance to be positive, but why not go for $\operatorname{E}[|X-\mu|]$ then? (Just starting on this subject so maybe a stupid question).

2

There are 2 best solutions below

1
On BEST ANSWER

$\newcommand{\v}{\operatorname{var}}$Because if $X_1,\ldots,X_n$ are independent then $$\v(X_1+\cdots+X_n) = \v(X_1) + \cdots + \v(X_n). \tag 1$$

Nothing like that works for the mean absolute deviation.

Observe that this makes possible: Toss a fair coin $n$ times and ask how many times you get heads. The expected number is $n/2,$ but what is the variance? It's $n/4.$ Simple. But what's the mean absolute deviation? With a fair coin, this has a known answer, but it's complicated, and with a biased coin or with some other distribution it doesn't? So what's the probability that the number of heads is in some range? $$ \lim_{n\to\infty} \Pr\left( \frac{(\text{number of heads}) - n/2}{\sqrt{n/4\,\,}} \in A \right) = \frac 1 {\sqrt{2\pi}} \int_A e^{-x^2/2} \,dx. $$ Without this, (a special case of) the central limit theorem, the problem would be much harder and maybe intractable, and we needed to know the variance of the number of heads to do this. Having a measure of dispersion like the variance that satisfies line $(1)$ above makes this possible.

The special case of the central limit theorem that applies to fair coins was discovered in the first half of the 18th century by Abraham de Moivre, except that he computed the normalizing constant numerically without at first knowing that it is $1/\sqrt{2\pi}.$ James Stirling discovered that and communicated it to de Moivre, who I think then included it in a later edition of his book, The Doctrine of Chances.

(If I'm not mistaken, the mean absolute deviation of the number of heads in $n$ independent tosses of a fair coin is the same for $n$ as for $n+1,$ if $n$ has a certain parity -- either even or odd, but I don't remember which.)

0
On

Variances works somewhat well with sums, as already mentoined in Michael Hardy's answer.

There is more that can be said: variance is (*) the quadratic form associated with an inner product, namely covariance , a measure of how different variables vary in relation to each other.

That is, covariance $\operatorname{Cov}(X,Y) := E((X-EX)(Y-EY))$ is sort of like the dot product $a\cdot b$ of two vectors $a$ and $b$ and then variance $\operatorname{Var}(X) = \operatorname{Cov}(X,X)$ is sort of like the length of a vector $a$ squared $\|a\|^2$. Note that for vectors we have $\|a\cdot a\| = \|a\|^2$.

Now consider the "pythagorean theorem". In the geometry case, we have $\|a+b\|^2 = \|a\|^2 + \|b\|^2$ if $a$ is orthogonal to $b$, i.e. if $a\cdot b = 0$. Accordingly, for variables we have $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$ if $X$ and $Y$ are uncorrolated , i.e. if $\operatorname{Cov}(X,Y) = 0$ (note that independent variables are uncorrolated, but not vice versa).

This motivates why variance should be "quadratic". The length in this analogy, as opposed to the square of the length (variance), is called standard deviation $\operatorname{sd}(X) = \sqrt{\operatorname{Var}(X)}$.


(*) ...if we identify variables that are equal almost surely.