Calculating which value in a set correlates with a given percentile value

248 Views Asked by At

Ok I've got some context and a definitive question (hopefully)...

(Improving upon https://math.stackexchange.com/questions/1426711/how-to-calculate-percentiles which I'll delete shortly)

I have a working example of how to calculate a value's percentile value within a range of values...

0.00, 0.00, 0.01, 0.30, 1.00, 2.00, 2.00, 2.00 is the range of values (the first data set).

2.00 is the value I need to be finding the "percentile value" of.

The formula is (((B + (0.5 * E)) / n) * 100) where B = 'number of scores below x', E = 'number of scores equal to x', n = 'number of scores'.

Applying this I get (((5 + (0.5 * 3)) / 8) * 100) = 81.25%

The percentile value of 2.00 within the above data set is 81.25%.

Now here is the problem / my question:

Given another data set, how do I find the value within it that would have the percentile value of 81.25%. I've found a long way to do it which is to use the method above to find each value's percentile value and then work out how 81.25% corresponds. But this is quite intensive if it is a large set. Is there a quicker and more efficient way to do this? Ie. kind of the reverse of the formula above.

Thanks!

edit/PS: I'm a software developer so forgive my terminology, bad representation of formulas etc. If I need to clarify anything let me know.

1

There are 1 best solutions below

0
On

First, yours is one of many methods of defining a percentile (quantile). Several methods are in common use. Not all statistical software packages agree on the formula. In R, you can look at the nine different methods implemented in R by looking at the help page at ? quantile, under type. In particular, notice that some types always return a value from the sample, and other types can return intermediate numbers. The default in R is type 7.

Because you familiar with software, perhaps the following session in R will reveal some of the issues and help you find the 'type' of quantile method you want to implement for your purposes.

 x = c(0.00, 0.00, 0.01, 0.30, 1.00, 2.00, 2.00, 2.00)
 mean(x < 2.00) + .5*mean(x == 2.00)
 [1] 0.8125
 > quantile(x, .8125)
 81.25% 
 2 
 quantile(x, .8)
 80% 
 2 
 quantile(x,.9)
 90% 
 2 

 y = sort(sample(1:10, 10, repl=T));  y
 [1] 1 1 2 2 4 5 7 8 9 9
 quantile(y, .8125)  # using R default 'type=7'
 81.25% 
 8.3125 
 > quantile(y, .9125, type=1)  # 'type=1' is inverse ECDF
 91.25% 
 9 

In practice, quantiles are used in statistics mainly for description of very large datasets. Then the differences among results returned by various types become trivially small.

 > qnorm(.81, 100, 15)  #81st percentile of NORM(100, sd=15)
 [1] 113.1684
 w = rnorm(1000, 100, 15)
 > quantile(w, .81)
      81% 
 112.5742 
 quantile(w, .81, type=1)
      81% 
 112.5479 
 > sort(w)[810]  # 'type=1' gives 810th element of sorted sample
 [1] 112.5479