I have the following set of numbers: 1, 1, 8, 12, 13, 13, 14, 16, 19, 22, 27, 28, 31
I'm supposed to calculate the value of the 1st quartile (25th percentile) in this data set.
- Using the formula in my math book gives me 10.
- Using Excel's or Google Sheet's built-in QUARTILE function gives me 12.
- Using Wolfram Alpha https://www.wolframalpha.com/input/?i=first+quartile+(1,+1,+8,+12,+13,+13,+14,+16,+19,+22,+27,+28,+31) gives me 11.
Somebody has noted in a Stack Overflow post (https://stackoverflow.com/a/53551756) this behavior with Python programming language too.
Can someone please explain why the results differ in these different methods? What makes this very specific data set special so that these methods give different values? Which of these values is the correct / most correct one and why?
Many thanks in advance!
The issue here is that (unlike the median) there is no universally recognised definition of the quartiles for a sample. Wikipedia gives three possible methods of calculating the lower quartile.
For an even number of points, these will all give the same answer, which is the median of the smaller half of the sample. But if you have an odd number of points, they will often give three different answers.
This is because for an even number of points, the $1/4$ point of your data either hits one of the points exactly or falls exactly in between two points (in the same way that the halfway point does for any number of points). For an odd number of points, it falls nearer to one point than any other, but not exactly on any point. For $13$, as you have, you really want the $3.75$th data point. The three methods (in wikipedia order) come down to
I don't think any of these is "more correct" than any other; if one were, probably everyone would agree. If you're learning this for an exam, your exam board will presumably have a policy on which of these they give credit for.
Of course, the concept of percentile is most important for large samples and, in that context, these definitions all yield results that are close to one another. The differences, in fact, almost surely approach zero as the sample size increases.