I understand percentile in the context of test scores with many examples (eg. you SAT score falls in the 99th percentile), but I am not sure I understand percentile in the following context and what is going on. Imagine a model outputs probabilities (on some days we have a lot of new data and outputted probabilities, and some days we don't). Imagine I want to compute the 99th percentile of outputted probabilities. Here are the probabilities for today:
a = np.array([0,0.2,0.4,0.7,1])
p = np.percentile(a,99)
print(p)
0.988
I don't understand how the 99th percentile is computed in this situation where there are only 5 outputted probabilities. How was the output computed? Thanks!
The correct result would be the number at position $5$: $a_5 =1$.
A $p$-th percentile $P_p$ is characterized by the following two properties:
Let $n$ be the number of data items. There are two cases:
Summary: The percentile function in "numpy" (np) is mathematically not correct.