Way to aggregated percentiles

16.2k Views Asked by At

I want to have information of http response times. The average, 95 and 99 percentil.

All the response times are collected in each web server, and every second is sent back to a central server that aggregates the info and generate a report every 2 seconds. That report includes the total number of requests, average response time, 95 and 99 percentile, but for size efficiency not all the data. That is stored and graph.

Now the problem: I want to have info for the last minute, hour, day, week. The idea is to aggregate the reports. Average response time with the weight of each sample is easy. But with the percentiles, I have the feeling that is not as easy as calculate the average in base of the number of request.

Questions:

  • Is there any exact way to calculate an aggregated 95 percentile?
  • Is there an approximation that could suit for this use case good enough?
2

There are 2 best solutions below

0
On

It would be very nice to have data for more percentiles. If you had the $90,91,92,\dots,99$ percentiles for each block of time you could do very well. Say you had this data along with then number of requests for each of the $168$ hours of a week and want to report the aggregate $95$ percentile for the week. Average all the $95$ percentiles for a starting point. Say that time is the $93$ percentile for the first hour. Then $7\%$ of the requests that hour took longer. Total up the number of requests that took longer and compare to the total number of requests in the week. Say you find the this time is $96$ percentile. Shorten it down a bit and try again. You can use bisection to guide your search.

If you don't have the additional data, I don't see what you can do better than average. If the need is just to report a number, that will be good enough. It probably will also be good enough to spot trends, but may not be an accurate $95$ percentile. Do you need that?

1
On

No, there is no exact way to aggregate percentiles -- you need all data, or at least a histogram of all data.

Yes, there are approximations, Q-digest seems very promising.

In my spare time I hacked up histogram of quantised log-response-time, domain limited to some arbitrary min and max. That answers my queries in log(N) space, considering the imposed limits it's actually O(1). Roughly, the histogram looks like this:

buckets = 1ms|2ms|4ms|...|1h|2h|4h
counts = 0,0,12,145,...,1,0,0,0

In my case, I used series [2**(0.1*i) for i in range(100)], that gives nice 7% increments from 1 to about 1000; 1 being considered impossibly fast and 1000 unbearably slow.

Faster lookup is achieved by using cumulative histogram, as binary search can be used.