Correction of -0.5 in percentile formula

2k Views Asked by At

My question is about how to calculate the percentile of a list of numbers. I found on the Internet the formula:

$$p_i=100·\frac{i-0.5}{N}$$

Nevertheless, I don't understand the reason of -0.5. I mean, for example if I have the following ranked list of numbers:

$$1, 2, 4, 5, 100$$

In my opinion, 100 should be the 100%p and not:

$$p_5=100·\frac{5-0.5}{5} = 90\%$$

I am assuming that all the numbers have the same probability. In this way I'm having the same problem with another formula that is commonly used in this type of calculations:

$$p=100·\frac{i}{n+1}$$

I found this formulas in the following websites:

https://web.stanford.edu/class/archive/anthsci/anthsci192/anthsci192.1064/handouts/calculating%20percentiles.pdf

http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm

Thanks for you help!

2

There are 2 best solutions below

4
On

For your numbers 1,2,4,5,100 each number represents 20% of the distribution. The first number 1 is 0%-20%, therefore we want a formula which gives 10% as a point value of the percentile. The final number is 80%-100% and so we want 90% as the result.

0
On

Many different formulas for percentile are in common use, and the differences among these formulas are especially noticeable for small samples. An important reason for the different methods is that some focus on describing the sample and some focus on making inferences about the population from which the sample was drawn.

R statistical software implements nine different 'types' of quantiles in common use by various textbook authors and software packages; none of them seem to use the formula you found on the Internet. I have posted output from some of them below. [Without extra 'arguments', the R function 'quantile' gives quantiles 0%, 25% (lower quartile), 50% (median), 75% (upper quartile) and 100%, according to what it calls 'Type 7'.]

quantile(x)                  # the 'default' method in R
  0%  25%  50%  75% 100% 
   1    2    4    5  100 
quantile(x, type=4)
    0%    25%    50%    75%   100% 
  1.00   1.25   3.00   4.75 100.00 
quantile(x, type=6)
   0%   25%   50%   75%  100% 
  1.0   1.5   4.0  52.5 100.0 
quantile(x, type=8)
        0%        25%        50%        75%       100% 
  1.000000   1.666667   4.000000  36.666667 100.000000 
quantile(x, type=9)
      0%      25%      50%      75%     100% 
  1.0000   1.6875   4.0000  34.6875 100.0000 

Notice that all of them say that observation 100 (the maximum observation) is the 100th percentile. Of course, this does not guarantee that 100 is the maximum possible observation in the population from which your sample was drawn.

With so many different methods in use, and without knowing the credentials of the author of the Internet site where you found the formula, it would be foolish for me to say that the formula you found is "wrong." But this formula is certainly not in the mainstream. If you don't like its results, you should certainly feel free to shop around for a method you like better.

Note: As a general policy, you should always be skeptical of technical information you find via Google searches. For statistical issues, it is best to rely on sites of government agencies, statistical organizations, and statistics departments at major universities. I have often found the online handbook of the US agency NIST to be helpful. For example, this page.