Explanation behind 'Variance' in statistics

87 Views Asked by At

I know there are already some questions asked regarding this, but mine is a little different. I know that variance is calculated to know how spreaded the data is w.r.t mean value.

So calculating variance is equal to calculating average of the differences between values and mean and then dividing by the number of data points we have.

Now, I have 2 questions:

1) Why don't we use absolute values instead of squaring them?(Maybe absolute values are not differentiable or something?)

2) Why do we use 'N-1' instead of 'N' when dividing?

1

There are 1 best solutions below

0
On

For (1), you will lose certain useful properties. For example, suppose $X$ and $Y$ are two independent random variables, we have $\mathbb{V}ar(X+Y)=\mathbb{V}ar(X)+\mathbb{V}ar(Y)$. However, if we define $V(A)=\mathbb{E}[|A-\mathbb{E}[A]|]$, we have $V(A)+V(B) = \mathbb{E}[|A-\mathbb{E}[A]|] + \mathbb{E}[|B-\mathbb{E}[B]|]=\mathbb{E}[|A-\mathbb{E}[A]|+|B-\mathbb{E}[B]|]$, which cannot be simplified without introducing inequalities.

For (2), suppose you want to estimate the true variance $\sigma^2$ of a model. You collected a set of data, say, $\{X_i\}_{i=1}^n$. To estimate the true variance, you construct a statistic $S=\frac{1}{n-1}\sum_{i=1}^n {{(X_i-\overline{X})}}^2$. A theorem tells you that $\mathbb{E}[S]=\sigma^2$. Hence if you take $n$ instead of $n-1$, $\mathbb{E}[S] \ne \sigma^2$.