I have a range of data (server performance statistics) is formatted as follows, for each server:
Time | Average | Min | Max | StdDev | SampleCount |
-------------------------------------------------------------------
Monday 1st | 125 | 15 | 220 | 12.56 | 5 |
Tuesday 2nd | 118 | 11 | 221 | 13.21 | 4 |
Wednesday 3rd | 118 | 11 | 221 | 13.21 | 3 |
.... | ... | .. | ... | ..... | . |
and so on...
These data points are calculated from data that has a finer resolution (e.g. hourly data).
I need to aggregate this data into a single summary point so the end result is a list of servers and an aggregate average, min, max, standard deviation.
For average, I take the average of all the averages. For min, we take the minimum min. For max, we take the maximum max.
However, I'm not sure what method I should be using to aggregate standard deviation? I've seen various answers including square roots and variance but I really need a concrete answer on this - can anyone help?
You say you take the average of all the averages, but I notice that you have a sample count column. Are these averages over different sample sizes? If so, then you would probably want a weighted average for your aggregate average: $$\text{Aggregate Average} =\frac{\sum_i (\text{sample size})_i(\text{average})_i}{\sum_i (\text{sample size})_i}$$
But without knowing more about the data, I cannot say for sure.
Now standard deviation is just the square root of the average variance. Over entire populations, it is defined by $$\sigma = \sqrt\frac{\sum_{i=1}^N(x_i - \bar x)^2}{N}$$ where $\bar x = \left(\sum_{i=1}^N x_i\right) /N$ is the average. Note that the variance $\sigma^2$ is just an average itself. So you can combine them just like you do other averages. Assuming that I am right about needing to include sample sizes, you want: $$ \text{Aggregate } \sigma^2 = \frac{\sum_i (\text{sample size})_i\sigma^2_i}{\sum_i (\text{sample size})_i}$$ Then you just take the square root to get the aggregate standard deviation. (If I am wrong about needing to include sample size, then you use the same equation with each sample size $= 1$.)