How does mathematical software, like Libreoffice Calc, manage to find percentiles when there are no any in the first place?

167 Views Asked by At

From now on I will say "exclusive percentile" to refer to a percentile defined by the exclusive definition of percentiles. Similar goes for "inclusive percentile".

Look at this screenshot:

enter image description here

We have dataset {1,2,3,4,5,6,7,8,9}. Libreoffice Calc says that 4.2 is a 40th inclusive percentile of the dataset and 4 is a 40th exclusive percentile of the same dataset. But there are just no percentiles in the first place! (if we don't count the 0th exclusive percentile and the 100th inclsuive percentile). The only way to cut the dataset into equal parts is by dividing datapoints into triplets. And neither 1/9, nor 2/9, nor 3/9, nor 4/9, nor 5/9, nor 6/9, nor 7/9, nor 8/9 proportions can be translated into percentiles (0/9 and 9/9 can, but they would be equal to the 0th exclusive percentile and the 100th inclusive percentile respectively.)

1

There are 1 best solutions below

7
On BEST ANSWER

And neither 1/9, nor 2/9, nor 3/9, nor 4/9, nor 5/9, nor 6/9, nor 7/9, nor 8/9 proportions can be translated into percentiles

Why is that? We can define, for any real number $p\in[0,1],$ the $100p$ th percentile to be the minimum value $x$ such that $100p\%$ of the data is less than or equal to $x$. Note that this agrees with the usual percentile definition if we restrict $p$ to $[0, 0.01, \dots, 1]$. With this definition, percentiles are always well-defined.

By well-defined, I mean that given the dataset, the percentile is a unique number. As pointed out in the comments, indeed for the dataset $\{1,2,3\}$ this definition gives us that the $50$-th percentile is $2,$ and not any number between $2$ and $3.$

A point $x$ is, for example, $100/9$-percentile if $(100/9)\%,$ i.e., $1/9$ th of the data are less than or equal to it. That is, $1$ is the $100/9$ th percentile. In your case, $(1,2,\dots,9)$ are the $(100/9,\dots,800/9,100)$ th percentile respectively.

Now comes the question of inclusive or exclusive percentiles. The software outputs an inclusive $40$ percentile based on interpolation. $4$ is the inclusive $\dfrac{(4-1)\times 100}{9-1}=37.5$ th percentile and $5$ is the inclusive $\dfrac{(5-1)\times 100}{9-1}=50$ th percentile. Check that this way $4.2$ becomes the 40th percentile.

The exclusive percentile definition is to find the minimum value $x$ such that $40\%$ of the data is less than or equal to $x$. This is satisfied at $x=4.$