Variance is the squared difference - why not to the 3, or 4 instead?

3.2k Views Asked by At

So there is this question about why variance is squared.

And the answer seems to be "because we get to do groovy maths when it is squared". Ok, that's cool, I can dig.

However, I'm sitting reading some financial maths stuff, and a lot of the equations on pricing and risk are based on variance. It doesn't seem to be the best basis for, you know, pricing exotic vehicles that are worth in the millions (or billions) that a formula for variance is used "because the maths is better".

To make a point then, why not have the variance be from the cubed, abs cubed, or the 4th power (or even a negative power)?

eg (apologies, I don't know Latex)

Sum 1/N * |(x - mean)^3|

OR

Sum 1/N * (x - mean)^4

Would using variance-to-a-different power measurably alter pricings/valuations if the equations still used variance as usual (but the variance was calculated with the different power)?

Is there a reason why we stopped at "power of 2", and are there any implications of using a variance concocted from a different (higher or lower) power?

3

There are 3 best solutions below

0
On

In principle, decisions involving large amounts of money should be made using the nonlinear utility of money. However, that is subjective and hard to quantify.

0
On

One reason why variance is a natural measure is that it is a special case of covariance, which by the simple arithmetic of multiplication measures how two variables tend to move together.

In the gathering of data, there are invariably errors. In any statistical estimate, we must accept errors. However, big errors are worse than small ones. Squaring errors magnifies big errors and reduces small ones. So, by minimizing the mean squared error of our estimate, we are avoiding big errors (while tolerating small ones) as far as possible.

We could use fourth powers of errors or deviations, which would weight the measure of variability even more heavily on the largest errors. There might be applications where a case could be made for doing this, although the mathematics wouldn't be pleasant.

In finance, a good case could be made for an asymmetric error measure. For example, a negative error might equate to a loss, which we strongly wish to avoid, while we are happy with a positive error, which represents a gain. Thus, instead of minimizing $\sum_i(x_i-\bar x)^2$, we minimize (say) $\sum_i\exp(\bar x-x_i)$. This kind of revision would need rewriting the statistical theory from scratch, though.

0
On

There are statistical quantities based on the third and fourth powers. They are called, respectively, skew and kurtosis.

Skew is relatively easy to demonstrate. It is an asymmetry in the two tails of the distribution (or the lack of a tail altogether on one side). For example, pick a chi-squared distribution with a small number of degrees of freedom; it has an obvious skew. (The skew is present but smaller for larger numbers of degrees of freedom.)

Kurtosis measures how much a distribution tends to have outliers ("heavy tails").

Neither one of these is intrinsically any better at explaining the movements of billions of dollars in the stock market than explaining the movement of handfuls of dollars at a blackjack table. You can just as easily get $10^{12}$ by squaring $10^6$ as by taking the fourth power of $10^3$, so the amount of money involved is pretty much irrelevant mathematically.

There is, in fact, an infinite series of central moments of a probability distribution, of which the mean and the variance are just the first and second moments, respectively. Skew and kurtosis are based on the third and fourth moments. The reason you don't see much use of moments higher than the second moment is that, ironically, their effects are secondary to the effects of the variance, despite the higher exponents in their definitions. In fact, the first moment is in many ways the most important; that's why we call it the expected value.


As for using the absolute value in order to "correct" the third power: actually, one of the ways that people have tried to make statistics more robust (less susceptible to being overly influenced by a few rare "outlier" observations) is to take the absolute value of the linear deviation from the mean (or better still, deviation from the median). That is, the square is in some ways already too high a power of the deviation to do statistics as well as we might like. But the squared deviation has the advantage of several very convenient properties that make it much easier to work with than an absolute value of an odd power. Going to a higher power and putting an absolute value on it would combine all the disadvantages of variance and absolute deviation, magnified (literally).