Does "Expected Absolute Deviation" or "Expected Absolute Deviation Range" exist in stats and have another name?

1.1k Views Asked by At

So everyone is familiar with Variance and Standard Deviation from high school, but it seems no one has any familiarity with a philosophical justification for such weird, seemingly arbitrary measures. After all, why square everything? Just to make it positive?

Anyway people use these measures to define what is meant by a particular value falling within "an expected range", or what is an "outlier" or "unusual data point". Now I'm looking to define the same concept, but this time without using some arbitrarily chosen measure, but something that has a logical foundation - i.e., that actually means "expected range".

After some digging around I discovered Mean Absolute Deviation http://en.wikipedia.org/wiki/Absolute_deviation#Mean_absolute_deviation

But what I really want is not the Mean Absolute Deviation, but the Expected Absolute Deviation. It's easy to modify the formula on the wiki page to produce this, we just add in a p_i term so that each Absolute Deviation is multiplied by the probability of that x_i occurring.

But then after quite a bit of thought I realized that even that doesn't really define what a "outlier" or "reasonable result" really means. This is because if we define the range of "normality" to be "Mean +/- Expected Absolute Deviation" we are ignoring the asymmetry of the distribution (if it's asymmetrical). We need to compute a Upper Expected Absolute Deviation and Lower Expected Absolute Deviation giving a Expected Absolute Deviation Range. For example a value of Mean + 0.1 might be within expectation for some distribution X, but Mean - 0.1 might not be due to an asymmetry.

So my question is, does such a line of reasoning already exist in statistics, and if so go by some other name? I'd be very interested if such Ranges have been derived for various common distributions, particularly Binomial and Multinomial. Any recommended books or links or literature much appreciated.

Thanks

1

There are 1 best solutions below

6
On

So this is not directly relevant to your final question, but it may help you understand a little bit as to why Standard Deviation/Variance are commonly used concepts.

Variance is one of the moments of a probability distribution. I'm not sure what your math background is, so I won't go into what moments really are, but what's important to understand is that if you have a probability distribution, the first moment is the mean, the second is the variance and the third is the skewness (which can help you deal with the asymmetry of a distribution).

The reason why this is important is that if you have a probability distribution (bounded), then the moments uniquely determine the distribution. Although the 3 moments above aren't enough to uniquely determine it, you can actually determine a fair amount about it with just those three values. If you used something other than variance to measure your spread, while it could be useful in many ways, it may not be as informative in generating the distribution function.

As mentioned in symplectomorphic's comment, there are other reasons too but the ability to use the moments of the probability distribution to determine the distribution is one of them.