What's the point of variance?

746 Views Asked by At

The big question: Why should I use variance over standard deviation?

In what contexts should variance be used (on its own, not with SD)? I'm failing to understand the point of variance - anything I found on the internet about variance can easily be explained with SD.

3

There are 3 best solutions below

2
On BEST ANSWER

Given a random sample $X_1, X_2, \dots, X_n$, the sample standard deviation $S$ is the (positive) square root of the sample variance $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ where $\bar X = \frac 1 n \sum_{i=1}^n X_i.$

Given a population and a continuous random variable $X$ with the population distribution, the standard deviation $\sigma$ is the (positive) square root of the population variance $\sigma^2 = \int_S (x-\mu)^2 f_X(x)\,dx,$ where $S$ is the support of the random variable, and $\mu = \int_S xf_X(x)\,dx.$ There are analogous formulas with sums (rather than integrals) for discrete random variables. [For continuous random variables or random variables taking a countable (non-finite) number of values, the population mean $\mu$ or variance $\sigma^2$ may not exist.]

Thus standard deviations are defined in terms of variances. So if we had to do without one or the other it would be the standard deviations that would disappear.

For normal and some other distributions, sample variances are unbiased estimators of corresponding population variances: $E(S^2) = \sigma^2.$ However, unbiasedness of the sample variance does not imply unbiasedness of $S$ for estimating $\sigma.$ For example, in sampling from a normal population with standard deviation $\sigma,$ we have $E(S) = \sigma\sqrt{\frac{2}{n-1}}\Gamma(\frac n 2)/\Gamma(\frac{n-1}{2}) < \sigma.$

For two independent, jointly distributed random variables $X$ and $Y,$ and $T = X+Y,$ one has $\sigma_T^2 = \sigma_X^2 + \sigma_Y^2$ and hence $\sigma_T = \sqrt{\sigma_X^2 + \sigma_Y^2}.$ Similarly, in sampling independently from two populations one has $\sigma_{\bar X}^2 = \sigma_X^2/n_1$ and $\sigma_{\bar Y}^2 = \sigma_Y^2/n_2.$ Thus $SD(\bar X - \bar Y) = \sqrt{\frac{\sigma_X^2}{n_1}+\frac{\sigma_Y^2}{n_2}}.$

It is not possible in one page to give all of the preferences for variances over standard deviations, or the reverse. As a general statement:

  • Standard deviations are often preferred in applications because $S$ and $\sigma$ have the same units as observations from the population (e.g., cm). Variances have squared units (e.g., cm${}^2$). [See @Samuel's Answer (+1) for a few more words on this.]

  • Variances are often preferred in theory and derivations because the formulas tend to be simpler for variances.

  • Although the relationship between a variance and the corresponding standard deviation is simply a matter of taking the square root, there are many discussions in which it is convenient to use both variances and standard deviations.

4
On

Depends on what you are after.

Variance is how we calculated standard deviation ($SD=\sqrt{VAR}$). Note that the units of variance are in squared units. So, if we need to calculate a distance within our domain space we need to take the square root to get unsquared units. This is why when calculating confidence intervals we use SD in the formula, because we need to calculate a spread/distance in our domain space, which must have the same units as our domain. Hopefully this helps.

1
On

The following answer is intentionally a "hand waving" because it aims to give an intuitive answer and not formal definitions. All the points can be easily translated into formal language, but I assume that it may suit better another thread with a theoretical/formal discussion.

  1. Instead of talking about "variance", I will talk about the mean squared error. Why? I guess that you were motivated by some applied considerations, hence parameter estimation is probably what you are concerned about. In estimation theory, there is a notion of Fisher information which is (very very) roughly speaking gives you some indication of the "amount" of information that you have in a sample or estimator regarding some parameter of interest. It happens to be that we can define (in so called "regular cases") the information itself as the variance of the "curvature" of the distribution (log-likelihood). Moreover, in many cases fisher information is the inverse of the variance. Namely, variance is in a sense an "opposite" of information. Moreover, lower bounds for the variances/ MSE of estimators defined in terms of Fisher's information.
    In all this settings, there is no need in the notion of SD.

  2. Another interesting instance for the use of variance, is in linear regression and the notion of "explained variance" $R^2$. Only in the simple model case, its squared root has valuable meaning. Otherwise, it gives you nothing.

  3. Still in the context of regression - the "best" coefficients of the model that minimize the MSE are the same that maximize the likelihood of the data. Namely, MSE/variance can be seen as a natural notion once again where the SD notion is unnecessary.

  4. Another instance where the (sample) variance pops up is in model selection. In"normal" (Gaussian) regression, minimizing the Akaike's information criteria (AIC) is the same as minimizing the residuals variance. Namely, AIC criteria that was derived from information theory motivation happens to be related to the notion of variance.

  5. Variance operator is immediately generalizable to higher dimensions, while SD is more subtle and pretty much unnecessary notion in high-dimensional settings.