How do you use cumulative frequency percentages to find percentiles?

3.6k Views Asked by At

I want to find the 25th, 50th, and 75th percentile of this data-set.

data-set: https://i.stack.imgur.com/FTLwu.jpg

Am not sure how one uses the cumulative frequency percentage in order to derive what the 25th,50th, and 75th percentiles are.

I am guessing for the 50th, as I know this is generally the median, or middle number but I cannot be sure this is where 50% of the data clusters. But I would guess that p_50 = 36.6% and therefore would be on line 5.

So the median would be at 5 or at 36.6% with 605 cumulative frequency

Please help

1

There are 1 best solutions below

2
On BEST ANSWER

It is always difficult to deal with an image that you have to leave the main page to see.

Here is the main idea: In your data the lower quartile (25th percentile) is at 5 because the cumulative percent for (up to and including) 4 is 19.8% < 25%, and the cumulative percent for 5 is 36.6% > 25%.

You can use the same method to get the median (50th percentile) and the upper quartile (75th percentile). The median is $6,$ as you say in a Comment.

Below I will show another dataset (generated using R statistical software) and a summary that shows lower quartile, median, and upper quartile. You can use this example to check whether you understand the fundamental idea.

 x = sort(rbinom(60, 10, .7))
 x
 [1]  3  4  4  4  5  5  6  6  6  6  6  6  6  6  6
[16]  6  6  6  7  7  7  7  7  7  7  7  7  7  7  7
[31]  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8
[46]  8  8  8  8  8  8  8  8  8  8  9  9  9  9 10
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00    6.00    7.00    7.05    8.00   10.00 

The lower quartile is $6$ because six of 60 (10% < 25%) of observations are at or below $5$. But 18 of 60 (30% > 25%) of observations are at or below $6.$ Can you see how R computed the median and the upper quartile?

Note: You should know that various texts and statistical programs have slightly different rules for defining percentiles. (Difficulties arise when a percentile comes at a 'gap' between two values and when there are very many tied values. Various sources deal with these difficulties in different ways.) For large samples, these different methods give very similar results.