I would like to calculate the 90th percentile, for example p90 for this dataset:
p90(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) = 9.1
Problem is that due to technical limitations I don't have whole dataset available for the calculation and only partial subset (of random size) of whole dataset is processed by n-workers and each worker generates own p90 subresult, e.g. 2 workers:
worker 1: p90(1, 2, 3) = 2.8
worker 2: p90(4, 5, 6, 7, 8, 9, 10) = 9.4
Is it possible to calculate p90 of whole dataset from these worker p90 subresults somehow? I can generate additional metrics by worker, e.g. count if it helps.
I can calculate result for sum/min/max/avg/count functions from these worker sum/min/max/avg/count subresults, but I'm struggling with percentiles. Is it possible?