Mean vs. Median: When to Use?

56k Views Asked by At

I know the difference between the mean and the median.

  • The mean of a set of numbers is the sum of all the numbers divided by the cardinality.
  • The median of a set of numbers is the middle number, when the set is organized in ascending or descending order (and, when the set has an even cardinality, the mean of the middle two numbers).

It seems to me that they're often used interchangeably, both to give a sense of what's going on in same data.

Do they mean (pun intended) different things? When should one be used over the other?

2

There are 2 best solutions below

0
On BEST ANSWER

Almost all analytic calculations on sets of data are more natural in terms of the mean than the median. For example, the "$z$-test for significance of a discrepancy relative to the null hypothesis deals with the sample estimated mean and sample unbiased estimated standard deviation.

The median, and particularly the difference between the median and the mean, is useful to characterize how "skewed" the data is (although the skew, which depends on the third moment about the mean, is also useful for that).

The real use of the median comes when the data set may contain extreme outliers (perhaps due to errors in early processing of the sample numbers, or a serious bias in the sample gathering procedure). Then describing the distribution in terms of quartiles (with the median dividing the second from the third quartile) can be more informative than quoting $\mu$ and $\sigma$.

2
On

The median is particularly handy to describe data with a significant skew or long tail. For example, if we looked at incomes, a small number of rock-stars, corporate executives and hedge-fund managers each taking home multi-million dollar salaries. These outliers carry more weight in the calculation of the mean than they do in the median calculation. Mean income is higher than median income. The median income would be closer to something we associate with middle-class.

Means are great when the distribution has been well studied and is well understood. (e.g. normally distributed) Then mean and standard deviation tell us just about everything we care to know.