What is the difference between dispersion and mean root square deviation?

1.4k Views Asked by At

So the question is why do we have mean root square deviation (standard deviation)? Isn't it enough just to use only such characteristic as dispersion? Why is root square deviation (standard deviation) used if we already have dispersion?

I'd be very happy to discuss this topic with you, guys! Thank you in advance!

3

There are 3 best solutions below

6
On

The correct term is "root mean square", not "mean root square". "Dispersion" is a broad term: standard deviation is one measure of dispersion; mean absolute deviation is another; interquartile range is another, and so on. The main reason for using root-mean-square deviation is this: \begin{align} & \text{If } X_1,\ldots,X_n \text{ are independent} \\[6pt] & \text{then } \operatorname{var}(X_1+\cdots+X_n) = \operatorname{var}(X_1) + \cdots+\operatorname{var}(X_n). \end{align}

0
On

One reason for using the sample variance $S^2 = \frac{\sum_{i=1}^n(X_i-\bar X)^2}{n-1}$ is that $E(S^2) = \sigma^2,$ the population variance. The technical terminology is that $S^2$ is an unbiased estimator of $\sigma^2.$ You may want to look at a formal proof of this. [However, the sample standard deviation $S$ is not unbiased, we have $E(S) < \sigma.$ That is, on average the sample standard deviation $S$ slightly underestimates the population standard deviation $\sigma.$ Especially for small sample sizes $n,$ the shortfall may be noticeable. For $n = 10$ normal observations $E(S) \approx 0.94\sigma;$ but for $n=25,$ it's $E(S) \approx 0.99\sigma.$]

Moreover, for normal data, $S^2$ is the 'best' estimate of the population variance, according to widely accepted criteria for 'goodness'.

Also for normal data, one has $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(n-1).$ This provides a convenient method to make confidence intervals which give an idea how far from $\sigma^2$ the estimate $S^2$ might be.

By contrast, a major disadvantage of $S^2$ is that it is very sensitive to values of $X_i$ that are far from $\bar X.$ One unusually large 'deviation' from the mean $X_i - \bar X$ can cause $S^2$ itself to be unusually large. That is the reason one sometimes uses the other measures of dispersion mentioned in the Answer by @MichaelHardy. (The mean absolute deviation and interquartile range are sometimes called robust measures of dispersion.)

As an analogy, you may already be familiar with two different measures of 'centrality', the sample mean and sample median. The mean is often used, but there are situations in which the median is better.

0
On

What you are missing is that dispersion is a generic term thus there is no specific formula. You cannot compute the dispersion.

Wikipedia currently lists seven different estimators of dispersion (https://en.wikipedia.org/wiki/Statistical_dispersion#Measures) among which the standard deviation, obviously. One or the other can be preferred for some property such as efficiency, robustness, ease of computation...