Why is statistical properties of mode and median are difficult to determine?

1.7k Views Asked by At

I have read a book saying that statistical properties of the mode and the median are difficult to determine compared to the mean. I am not entirely sure why is so. And the book does not provide further explanation.

Is it correct that because the mean can be put into mathematical functions more easily compared to the mode and median? What does it mean by statistical properties? What does it mean by determining statistical properties?

If anyone could provide more explanations or examples of some sort. Thanks for all the help!

3

There are 3 best solutions below

4
On BEST ANSWER

Generally by statistical properties we mean whether a sample can give us "reasonable" ideas about the population. Mean is more suited for this purpose (just as you mention) mainly because mean has a nice mathematical expression.

A sample mean is an unbiased estimator of the population mean.

Also, one of the main reasons while mean is so important is the Central limit theorem.

However, I must mention that median has nice geometrical interpretation in the sense that it "divides the data set into equal parts"- the sum of absolute distance of the data points is minimum when taken about the median. However, median is generally much tougher to work with mathematically.

For me mode is just a convenient "fast summary" of the data set, and again it's not very suitable for mathematical treatments.

I haven't been entirely rigorous in my answer, and my opinion about mode is based on a matter of "taste"; though I am yet to see much applications of mode as a measure of central tendency, other than just an intuitive description of data.

0
On

I think the simplest thing is that the mode and median especially don't tell you much of what you'd expect from a data set, and the mode especially is very difficult to determine in a useful manner for some data sets. In particular if you have a data set which consists of non-discrete but non-dense points, there is no determinable mode as there is no frequency for any data point besides 0 or 1. Median is much easier and can usually be determined fairly well in any data set, but doesn't tell you a huge amount about the overall layout of datasets. I could have a dataset where I have 49 points with 0, 2 with 1, and 49 with 16000, and a mean would give you something more useful than the median here as far as the overall value.

As far as how easy they are to put into mathematical functions, a mean is significantly easier. Given either a discrete, or continuous data set, it is the sum or integral respectively over the data set, divided by the size of that data set. Median is much more difficult for continuous sets. In the case of discrete sets you find where exactly half of the data is below and above the point. In a continuous set you have to solve where the integral equals one-half, then take f(x) at that point, which is much more difficult. Likewise for mode it's also very difficult. In a non-dense discrete data set (e.g. time to do something, where each time has so much precision that there is likely no more than one data point at each time) all values will either have a frequency of 1 or 0 (generally) or be so non-dense that a definition of mode would be trivial and tell you nothing about the data set. On a continuous set it involves finding the global maximum of the function which is arguably easier than the median above.

0
On

Under appropriate conditions, the mean depends continuously on the distribution, but the median and mode don't. For example, consider the Bernoulli distribution with parameter $p$. The mean is $p$, but the median and mode jump discontinuously from $0$ to $1$ (and are not uniquely defined at $p=1/2$).