Why are there different methods to finding quartiles?

2.9k Views Asked by At

On different programs and types of technologies there are different algorithms used to find quartiles. Ti 83 Plus will spit out an answer different from Excel or minitab etc. Why are there so many different methods? Why is it okay to have different answers for quartiles from different programs? I am getting different answers from each method and I am unsure of which one should be correct, if any are.

1

There are 1 best solutions below

0
On

You are correct that slightly different algorithms for 'quartiles' are implemented in various calculators, spreadsheets, and statistical software packages.

Roughly speaking, the idea is that the lower quartile $(Q_1),$ the median, and the upper quartile $(Q_3)$ divide the sorted sample into four 'chunks' with (approximately) the same number of observations in each chunk. Particularly for small samples, there is no obvious way to do this, and compromises of some sort must be made. There are different algorithms simply because different people have different ideas how to make the compromises. They may have slightly different objectives in mind how to use the quartiles in practice.

For example, if you have ten sorted observations: 1, 3, 4, 6, 8, 9, 11, 11, 15, 19, there is no obvious way to find the quartiles.

R statistical software (excellent, and free from www.r-project.org) gives you a choice of nine different algorithms: here are some sample printouts from my example above. Some 'types' insist on giving only exact sample values as quartiles, others can pick values between sample values. Some types supposedly work better with normal (or other) distributions than others. And so on.

So the answers to your questions are 'actual ambiguity' and 'pig-headed pride with inability to reach consensus'.

 x = c(1, 3, 4, 6, 8, 9, 11, 11, 15, 19)
 quantile(x, type=1)
 0%  25%  50%  75% 100% 
  1    4    8   11   19 
 quantile(x, type=2)
 0%  25%  50%  75% 100% 
1.0  4.0  8.5 11.0 19.0 
 quantile(x, type=3)
 0%  25%  50%  75% 100% 
  1    3    8   11   19 
 quantile(x, type=4)
 0%  25%  50%  75% 100% 
1.0  3.5  8.0 11.0 19.0 
quantile(x) # 'type=7' is the default
 0%  25%  50%  75% 100% 
1.0  4.5  8.5 11.0 19.0 

Fortunately, in practice, quantiles are most often used for very large datasets. And for large datasets, differences among algorithms become less noticeable and are often unimportant.

Here are results using a sample of 1000 observations from a normal population with mean 100 and standard deviation 15, and rounded to three places. (You can pretend they are achievement test scores.)

 y = round(rnorm(1000, 100, 15), 3)
 quantile(y) # (type 7 again)   
       0%      25%      50%      75%     100% 
  48.7950  90.6825  98.8225 109.0528 146.6820 
 quantile(y, type=1)
      0%     25%     50%     75%    100% 
  48.795  90.663  98.808 109.044 146.682 
 quantile(y, type=3)
      0%     25%     50%     75%    100% 
  48.795  90.663  98.808 109.044 146.682 
 quantile(y, type=4)
      0%     25%     50%     75%    100% 
  48.795  90.663  98.808 109.044 146.682 
 quantile(y, type=6) # used by Minitab
      0%      25%      50%      75%     100% 
 48.7950  90.6695  98.8225 109.0702 146.6820 

If you have access to R, and are curious to pursue this further, you could look at the R help screen at ? quantile under type. But as a beginning statistician, you might do better just to understand that these differences do exist and decide not to worry much about them. Follow the rule in your textbook and expect your calculator or software to give different answers sometimes.