After reading The Math Form: Re-Calculating the Standard Deviation I'm curious what sort of statistics can be calculated or estimated in a similar manner.
The post details how—given just the mean, size and standard deviation of a data set—one can recalculate the standard deviation for the same data set plus some new data point without the original data set.
It seems intuitive that there's no possible way to recalculate the median, for example (please correct me if I'm wrong!).
But is there some way to estimate percentiles without retaining the original data set? I'm mostly interested to see what can be done using as little storage as possible, so solutions that rely on indexing are also interesting!
I came across some code that does just what I'm asking when browsing the source for ptaoussanis/tufte, a Clojure performance profiling tool.
ptaoussanis implements a function to merge two sets of statistics, including size, mean, median absolute deviation, and approximations of the 50th, 90th, 95th, and 99th percentiles.
He references Algorithms for calculating variance and a SO question/answer about calculating absolute deviation.
Here's the code snippet, for anyone interested: