75 percentile: find top offenders

60 Views Asked by At

Imagine that a car company wants to analyze all those orders late, yet to be delivered to customers. By late I mean that, to date, the contractual delivery date has expired. ID simply refers to the i-th order. What I called Span is simply the difference between today and the contractual delivery date. For example: Span = 4 means that the contractual delivery date expired 4 days ago. If I now compute P75, the result I get is 154. The question is: provided that I want the result of P75 to be as low as possible (right now it's 154 as we've just seen), which are the top offenders orders? In other words, I'd like to find those orders such that, if delivered to customers, yield the maximum decrease in the result (from 154 to something lower). Do I just need to start working from bottom orders (row 230 with span 731), those with higher span and work my way up?

first pic

second

third

1

There are 1 best solutions below

2
On BEST ANSWER

Percentiles are used as a measure for statistical distribution. Such as the median, a percentile is not affected by outliers, as opposed to mean and standard deviation. A percentile does not care about the specific location of each case. It cares only about the location where (in this case) 75% of all cases is left of it.

With this in mind, I think it depends more on the cases around your percentile of interest. Especially if the case density is low around your percentile, you have a major impact.

If you remove a low case (left of your percentile), then your percentile will shift to the right, which you do not want. If you remove a high case (right of your percentile), then your percentile will shift to the left, which you do want.

Note: This methodology works if every case takes the same amount of effort. If higher cases take more time to solve, I can imagine that the tactic needs to change.

Suppose each case takes the same effort. I can define a series $X$ consisting of $N$ monotonously increasing elements $X(i)$ whereby

$$X(i+1)\ge X(i) \ for\ i=1,...,N-1$$

Index $i$ would be the ID, whereas $X(i)$ is the corresponding SPAN. I define the 75-percentile as $P_{75}=X(i^*)$, whereby $$i^*:=\lceil0.75N\rceil$$

If I remove element $i_a\gg i^*$, then $$P^{new}_{75}=X(\lceil0.75(N-1)\rceil)$$ This might still be the same $X(i^*)$, or $X(i^*-1)$. Thus, you may carry out 1 jump (downwards, which is what you want).

In the case that you remove element $i_a\ll i^*$, then $$P^{new}_{75}=X(1+\lceil0.75(N-1)\rceil)$$ Again, you may carry out 1 jump, towards $X(i^*+1)$.

This jumping is independent of which element left or right of your percentile you take.

If you remove a direct neighbour of $X(i^*)$, you might carry out two jumps.

Things get interesting if you remove the element $X(i^*)$ itself as you might 'win' or 'lose'.

Seen your data is not strictly monotonous and has some plateaus, not much will happen if at a plateau.

As a general rule, I would always start to remove the case $X(i^*+1)$.