Say you work on a website. When someone accesses the website in their browser, then the website makes 5 more calls to different services to load the remaining content. The issue is, until all content is loaded, it takes a long time - the service calls are slow.
We need to improve it. People measure, calculate the average response times and the 100th percentile for the service calls.
But what should we optimise? Try to reduce the average or lower the 100th percentile?
One argument is, the 100th percentile is an outlier, it is rare. Average is what is "usually" experienced.
But if our website makes instead of 5 maybe 25 or 50 or 150 calls, is "experiencing" the 100th percentile of response time really that uncommon? I was trying to find a model to approximate how rare this is.
- On each service call you have a $1/100$ chance to be in the highest percentile, since we group our response time measures into $100$ percentiles (from 1st percentile to 100th percentile).
- That is, you have a $1-1/100$ chance of not being in the highest percentile.
- Assume you make $N$ service calls, then you have a $(1 - 1/100)^N$ chance of not experiencing the highest percentile response time.
Some examples:
N P(not experiencing highest percentile) P(experiencing highest percentile)
1 0.99 0.01
5 0.95 0.05
50 0.60 0.40
...
We can see the larger $N$ becomes, the more likely it gets we at least once experience the highest percentile load time.
The model confirms the idea that the highest percentile is not that rare if you do many calls. But I'm still wondering if this is the correct way to calculate it. Especially step 1, the assumption to have a 1/100 chance to experience the highest percentile still makes me wonder if one can think about it this way or not.
I feel like you are mixing quantiles or a continuum of i.i.d random variables with a finite number of draws in a way that leads to results that don't add up. This is like, "all the children in Lake Wobegone are above average."
You might want to think instead about order statistics. There are $N$ calls in total, but a given person only makes $p = k/N$ of them. If you take $N$ large, each person is making a vanishingly small fraction of the total calls. The asymptotics will be very different than the computations you did, and will be similar to the computation of quantiles (https://en.wikipedia.org/wiki/Order_statistic#Large_sample_sizes). You could probably model each person as a fixed proportion of calls like the $p=k/N$, or as a Poisson distribution to randomize the number of calls they make, compute a distribution of times for fixed $(k,N)$, then take the number of people $N$ to infinity to get a distribution of wait times that is more in line with what you expected.