I have a source of data that continuously gives me measurement results. At some point I compute the mean. Luckily to this end, I don't need to store all the values, I only store the sum and the number of results, so I can divide one by the other. It is important because of efficiency reason, as I have loads of such sources operating concurrently.
The requirement has just changed and I was asked for the median instead of the mean. In order to compute the median, one needs all the values from the data set, which in my case would exhaust computer memory, thus I am thinking of an improvement. The idea I have is storing histograms and approximating medians from them. Nevertheless, creating a histogram object for each data source (which I have millions) might not pay back for sources that produce very few values.
In short, I'd appreciate any comments related to median, quartiles and median absolute deviation computations optimization.