I don't understand the necessity of square the result of $x_1 - \bar{x}$ in $$\sqrt{\frac{\sum_{i=1}^{N} (x_i - \bar{x})^2}{N-1}}$$. In fact I don't understand even why is $N - 1$ on the denominator instead of just $N$. Someone could explain it or recommend a good text about it? All books about Errors Theory or even Statistics that I found are either too much abstract or too much simplist. Thanks in advance.
Why square the result of $x_1 - \bar{x}$ in the standard deviation?
1.2k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 5 best solutions below
On
It is necessary to square the deviations from the mean as you want to measure both positive as well as negative deviations (note that $\sum (x_i - \bar{x})$ is just zero). Another possibility is to take absolute values, but the above formula turns out to have nicer properties (such as additivity of variances, as pointed out by Arkamis).
Regarding the $N-1$ in the denominator: you would underestimate the standard deviation when dividing by $N$ since the true mean is not as close to $x_1$, $\ldots$, $x_n$ as the sample mean $\bar{x}$ is (in fact, $\bar{x}$ is calculated to be ''as close'' to the data points as possible). That the $N$ has to be replaced by $N-1$ can be derived by working out the expected value of your formula for the variance (the square of the SD). It turns out that the expected value equals the population variance, i.e. the sample variance is an unbiased estimator of the true variance.
On
The square is used to remove the effect of the sign of $x_i - \overline{x}$. Suppose your mean was 0, and you had measurements at -2 and +2. These would cancel, but squaring gets rid of that issue.
Now, you might ask, "why not use absolute value?" Great question! The reason is that if we use absolute value, variances are no longer additive. In other words, with this definition, we have $\textrm{Var}(x_1 + x_2 + \cdots + x_m) = \textrm{Var}(x_1) + \textrm{Var}(x_2) + \cdots + \textrm{Var}(x_m)$.
As far as the $n-1$ term... it has to do with the fact that with $n$ data points, we get $n-1$ degrees of freedom. Dividing by $n-1$ rather than $n$ reduces bias.
On
Hint: You can measure the spread by Minimum Absolute Deviation |(x-xbar)| or the squared differences. Take an example of the sequence, -3,0,3. Xbar for this is 0. Assume that you don't measure the Absolute deviation, then the spread is 0 if you just took x-xbar. The mean squared will avoid this situation and give you an objective measure of spread (to bring the unit of spread to the original measure, you take the square root of it. As far as dividing by N-1, N-1 is the number of observations minus the the degrees of freedom(measure of (number of estimators)). Here you are calculating the x-bar which is an estimator and hence subtract one.
On
Squaring the Deviations
The variance of a sample measures the spread of the values in a sample or distribution. We could do this with any function of $|x_k-\bar{x}|$. The reason that we use $(x_k-\bar{x})^2$ is because the variance computed this way has very nice properties. Here are a couple:
$1$. The variance of the sum of independent variables is the sum of their variances.
Since $x_i$ and $y_j$ are independent, their probabilities multiply. Therefore, $$ \begin{align} \hspace{-1cm}\mathrm{Var}(X+Y) &=\sum_{i=1}^n\sum_{j=1}^m\Big[(x_i+y_j)-(\bar{x}+\bar{y})\Big]^2p_iq_j\\ &=\sum_{k=1}^n(x_i-\bar{x})^2p_i+\sum_{j=1}^m(y_j-\bar{y})^2q_j+2\sum_{i=1}^n(x_i-\bar{x})p_i\sum_{j=1}^m(y_j-\bar{y})q_j\\ &=\sum_{k=1}^n(x_i-\bar{x})^2p_i+\sum_{j=1}^m(y_j-\bar{y})^2q_j\\ &=\mathrm{Var}(X)+\mathrm{Var}(Y)\tag{1} \end{align} $$
$2$. The mean is the point from which the mean square variance is minimized: $$ \begin{align} \sum_{i=1}^n(x_i-a)^2p_i &=\sum_{i=1}^n(x_i^2-2ax_i+a^2)p_i\\ &=\sum_{i=1}^n\left(x_i^2-2\bar{x}x_i+\bar{x}^2+(\bar{x}-a)(2x_i-\bar{x}-a)\right)p_i\\ &=\left(\sum_{i=1}^n(x_i-\bar{x})^2p_i\right)+(\bar{x}-a)^2\tag{2} \end{align} $$ Dividing by $\mathbf{n-1}$
Considering $(2)$, it can be seen that the mean square of a sample measured from the mean of the sample will be smaller than the mean square of the sample measured from the mean of the distribution. In this answer, this idea is quantified to show that $$ \mathrm{E}[v_s]=\frac{n{-}1}{n}v_d\tag{3} $$ where $\mathrm{E}[v_s]$ is the expected value of the sample variance and $v_d$ is the distribution variance. $(3)$ explains why we estimate the distribution variance as $$ v_d=\frac1{n-1}\sum_{i=1}^n(x_i-\bar{x})^2\tag{4} $$ where $\bar{x}$ is the sample mean.
Squaring $x_i-\bar x$:
If we didn't square it, we would just be adding up $x_i - \bar x$, and that will always give us zero. What we want instead is to total "how far" each $x_i$ is from $\bar x$.
So, we need to make sure we're taking the average of some positive quantity representing how far $x_i$ is from $\bar x$; one good choice is $(x_i - \bar x)^2$. Another example is $|x_i - \bar x|$, which leads to the average absolute deviation. It ends up that standard deviation tends to be "nicer" for most uses, though both results are a measurement of how "spread out" your data is.
Using $N-1$:
Using the $N-1$ instead of $N$ is called Bessel's correction; there are a few proofs on the Wikipedia page I've linked as to why you need the $N-1$ in order to get a better estimate of the population standard deviation.