Implications of Standard Deviation > Mean

2.1k Views Asked by At

I have a numerical dataset in which the standard deviation is larger than the mean and I am interested to know what this fact can tell me about the dataset. I have drawn the following conclusions on my own:

1. The mean is skewed by negative values, extreme outliers or both
2. The dataset is not normally distributed
3. The median will provide a better estimate of a "typical" value than the mean

Can you find any flaw in them? Are there any other conclusions I can draw about this dataset from the fact that the standard deviation is larger than the mean? I'm interested in:

- implications for describing such a dataset, 
- deciding how "typical" a given value or subset of values from the dataset are, 
- performing analyses on the dataset as a whole 

Sorry if this is overly-vague/theoretical, I have many datasets that I can supply as examples along with specific questions I have about each but I'm choosing to ask a more general question in hopes of getting a deeper understanding about what this phenomena means.

1

There are 1 best solutions below

5
On BEST ANSWER

(1) is meaningless because "and" and "or" do not have the same meaning.

(1) using "or" is trivially true because any normal distribution has negative values.

(1) using "and" is false because a distribution of a non-negative random variable might have a mean but infinite standard deviation, or because a normal distribution can have negative mean.

(2) is typically true, because the median is the 50th percentile, and is robust, whereas the mean can be changed arbitrarily by even a single data point, and thus often gives a misleading picture when the data is not normally distributed (like income).

[Okay now with the new list...]

(2) is still false because any normal distribution with negative mean will have a (positive) standard deviation which would be greater than the mean.

In general, comparing the mean with the standard deviation is typically not meaningful.