Query on the standard deviation formula

112 Views Asked by At

It is widely known that the variance formula is:

$S^{2}=\frac{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}{n-1}$

and that the standard deviation formula is:

$S^{2}=\sqrt[]{\frac{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}{n-1}}$

But if the purpose of squaring the difference $\left ( X_{i} - \overline{X} \right )^{^{2}}$ is to eliminate the effect of the sign, would it not be more logical for the standard deviation equation to eliminate only the effect of the square affecting the upper part of the equation and not its entirety? like this:

$S^{2}=\frac{\sqrt{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}}{n-1}$

Does someone understand and can explain to me why it is not like this? Thanks.

3

There are 3 best solutions below

0
On BEST ANSWER

[This is about the "purpose of the variance" part of the question.]

The purpose of squaring the error, i.e., $X_i-\bar X$, is not to eliminate sign effects, any other non-negative function instead of squaring would do the same.

Gauss (1821) choosed squaring the error, and he admits that this decision

"is made arbitrarily without a strong necessity"

Laplace proposes the absolute value, but Gauss argued against it. Following Laplace, a doubled error would count as much as the same error done twice.

But his main reason was that the absolute value doesn't have a derivative. He states that

"This treatment [that of Laplace] opposes in a higher degree any analytic treatment whereas the results from our principle [squaring the error] distinguish in simplicity and in generality as well."

See https://archive.org/details/abhandlungenmet00gausrich/page/n17/mode/2up, p. 5f.

3
On

The most important reason is consistency. An estimator $\hat{\theta}_n$ of $\theta$ is consistent if for every $\epsilon>0$ $$ P(|\hat{\theta}_n-\theta|>\epsilon)\to 0,\quad n\to\infty, $$ and we write $\hat{\theta}_n\overset{P}{\to}\theta$. It can be proved that $S\overset{P}{\to}\sqrt{Var(X)}$, where $S$ is constructed from $n$ iid copies $X_1,...,X_n$ of $X$. This says that whenever or sample size is big we have a very good estimate of the real standard deviation.

If you define $S=\sqrt{\sum_{i=1}^n(X_i-\bar{X}_n)^2}/(n-1)$, then $S\overset{P}{\to}0$, and that will be useless. The last affirmation can be proved with Slutsky's theorem.

0
On

For a random sample $X_1, X_2, \dots, X_n$ from a normal population with variance $\sigma^2,$ the sample variance $S^2 = S_n^2 = \frac{1}{n-1}\sum_{i=1}^n(X_i -\bar S)^2$ has $E(S^2) = \sigma^2,\;$ $S_n^2 \stackrel{p}{\rightarrow} \sigma^2,\;$ and $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1).$

The latter relationship gives rise to a 95% confidence interval for $\sigma^2$ of the form $\left(\frac{(n-1)S^2}{U},\frac{(n-1)S^2}{L}\right),$ where $L$ and $U$ are quantiles .025 and .975, respectively of $\mathsf{Chisq}(n-1).$

Example: If a sample of size $n=500$ from $\mathsf{Norm}(\mu=100, \sigma=15)$ has $\bar X = 100.25$ and $\S^2 = 205.77$ then a 95% confidence interval for $\sigma^2$ is $(182.44, 233.89)$. Take square roots of endpoints to get a CI for $\sigma:\,(13.51, 15.29).$

set.seed(426)  # to get same sample
x = rnorm(500, 100, 15)
[1] 100.2498
[1] 205.7661
CI = 499*var(x)/qchisq(c(.975,.025), 499);  CI
[1] 182.4435 233.8900
sqrt(CI)
[1] 13.50716 15.29346

One advantage of this usual definition of $S^2,$ especially for normal data, is that there is a lot of distribution theory for using $S^2$ in testing hypotheses and making confidence intervals.

However, $E(S_n) < \sigma,$ where the bias decreases as $n$ increases, and is usually ignored for $n$ of moderate or large size. (Expectation is a linear operator and so does not necessarily survive nonlinear operations, such as taking square roots.)

In particular, for normal data, $E(S_n) = \sigma\sqrt{\frac{2}{n-1}}\Gamma\left(\frac{n}{2}\right)/\Gamma\left(\frac{n-1}{2}\right).$ See Wikipedia for a discussion of the gamma function $\Gamma(\cdot).$ Here is a table made using R of multipliers of $\sigma$ for $n=5, 15, 25, 50:$

n = c(5, 15, 25, 50)
coef=sqrt(2/(n-1))*gamma(n/2)/gamma((n-1)/2)
cbind(n, coef)
      n      coef
[1,]  5 0.9399856
[2,] 15 0.9823162
[3,] 25 0.9896404
[4,] 50 0.9949113

If the population mean $\mu$ is known and the population variance is to be estimated from data, then the estimate $V=\frac{1}{n}\sum_{i=1}^n (X_i - \mu)^2$ is frequently used. Then $nV/\sigma^2 \sim \mathsf{Chisq}(n).$

Especially for nonnormal data, several alternative descriptions of the variability of a sample are to use, including

  • The sample interquartile range (difference between the 75th and 25th percentiles),
  • The sample range (largest sample value minus smallest), and
  • The mean absolute deviation $\frac{1}{n}\sum_{i=1}^n |X_i - \bar X|.$ Notice that measure of variability 'gets rid of signs' without squaring.

Several other descriptions of sample variability are in use.