Understanding percentile computation

60 Views Asked by At

I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:

a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)

0.988

I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

The correct result would be the number at position $5$: $a_5 =1$.

A $p$-th percentile $P_p$ is characterized by the following two properties:

  • At most $p\%$ of the data is less than $P_p$
  • At most $(100-p)\%$ of the data is greater than $P_p$

Let $n$ be the number of data items. There are two cases:

  • If $n\cdot\frac{p}{100}$ is not an integer, then $P_p$ is uniquely determined. Then, the value of the data item at position $\left\lceil n\cdot\frac{p}{100} \right\rceil$ (rounding up) is the $p$-th percentile. In your case $$5\cdot\frac{99}{100}=4.95 \stackrel{}{\longrightarrow}\lceil n\cdot\frac{p}{100}\rceil = 5$$
  • If $n\cdot\frac{p}{100}$ is an integer, then any value starting from the data item at position $n\cdot\frac{p}{100}$ till the item at position $n\cdot\frac{p}{100}+1$ satisfies the above given characterizations. This is the only case, where interpolation might be applied.

Summary: The percentile function in "numpy" (np) is mathematically not correct.

0
On

HINT

Look at the documentation of your percentile function, and notice that it is using linear interpolation in places where the data was not available.

Indeed, if $(0.7,0.8)$ and $(1,1)$ are interpolated with a line, what will you get at $0.99$?