Aggregating standard deviation to a summary point

9.9k Views Asked by At

I have a range of data (server performance statistics) is formatted as follows, for each server:

Time            | Average |  Min  |  Max  | StdDev  | SampleCount |
-------------------------------------------------------------------
Monday 1st      |    125  |   15  |  220  | 12.56   |     5       |
Tuesday 2nd     |    118  |   11  |  221  | 13.21   |     4       |
Wednesday 3rd   |    118  |   11  |  221  | 13.21   |     3       |
....            |    ...  |   ..  |  ...  | .....   |     .       |
and so on...

These data points are calculated from data that has a finer resolution (e.g. hourly data).

I need to aggregate this data into a single summary point so the end result is a list of servers and an aggregate average, min, max, standard deviation.

For average, I take the average of all the averages. For min, we take the minimum min. For max, we take the maximum max.

However, I'm not sure what method I should be using to aggregate standard deviation? I've seen various answers including square roots and variance but I really need a concrete answer on this - can anyone help?

1

There are 1 best solutions below

9
On

You say you take the average of all the averages, but I notice that you have a sample count column. Are these averages over different sample sizes? If so, then you would probably want a weighted average for your aggregate average: $$\text{Aggregate Average} =\frac{\sum_i (\text{sample size})_i(\text{average})_i}{\sum_i (\text{sample size})_i}$$

But without knowing more about the data, I cannot say for sure.

Now standard deviation is just the square root of the average variance. Over entire populations, it is defined by $$\sigma = \sqrt\frac{\sum_{i=1}^N(x_i - \bar x)^2}{N}$$ where $\bar x = \left(\sum_{i=1}^N x_i\right) /N$ is the average. Note that the variance $\sigma^2$ is just an average itself. So you can combine them just like you do other averages. Assuming that I am right about needing to include sample sizes, you want: $$ \text{Aggregate } \sigma^2 = \frac{\sum_i (\text{sample size})_i\sigma^2_i}{\sum_i (\text{sample size})_i}$$ Then you just take the square root to get the aggregate standard deviation. (If I am wrong about needing to include sample size, then you use the same equation with each sample size $= 1$.)