Standard deviation of a pairwise set average

202 Views Asked by At

How do the standard deviations of two or more sets transform if those sets are averaged together in a pairwise manner?

For example, let $X=[a_0,a_1,\dots,a_n]$ and $Y=[b_0,b_1,\dots,b_n]$. What is the standard deviation of the "pairwise set average" of these sets: $$Z=\left[\frac{a_0+b_0}2,\dots,\frac{a_n+b_n}2\right]$$ given the standard deviations of $X$ and $Y$? Can this be represented as a function, and if not why?

The Cross Validated post here gives such a function, but for the pairwise difference; the MSE post here deals with a non-pairwise addition. But this is a different operation.

1

There are 1 best solutions below

0
On

You numbered the values $0$ through $n$ rather than the more usual $1$ through $n$, so there are $n+1$ values of $a$ and $n+1$ values of $b$.

Let $\bar a = (a_0+\cdots+a_n)/(n+1)$ and $\bar b = (b_0+\cdots+b_n)/(n+1)$.

Often one sees a statement that "variance" means $$ \frac 1 n \sum_{k=0}^n (a_k - \bar a)^2, $$ where one divides by $1$ less than the number of observed values. However, that is used ONLY when one estimates a population standard deviation based on a sample standard deviation. When the $n+1$ values $a_0,\ldots,a_n$ are the whole population, then the variance is $$ \frac 1 {n+1} \sum_{k=0}^n (a_k - \bar a)^2, $$ and that is the variance that is well behaved in ways that justify the use of standard deviation rather than the simpler mean absolute deviation. One of those good behaviors is that if the two variables were independent, then the variance of the sum would be the sum of the variances. Independence in this case would mean you'd have a variable whose values are these: $$ \begin{array}{cccccccc} (a_0+b_0)/2, & (a_0+b_1)/2, & (a_0+b_2)/2, & \ldots & (a_0+b_n)/2 \\ (a_1+b_0)/2, & (a_1+b_1)/2, & (a_1+b_2)/2, & \ldots & (a_1+b_n)/2 \\ (a_2+b_0)/2, & (a_2+b_1)/2, & (a_2+b_2)/2, & \ldots & (a_2+b_n)/2 \\ \vdots & \vdots & \vdots & & \vdots \\ (a_n+b_0)/2, & (a_n+b_1)/2, & (a_n+b_2)/2, & \ldots & (a_n+b_n)/2 \end{array} $$ But that's not what we have here; we have $$ \frac{a_0+b_0} 2,\ \frac{a_1+b_1} 2,\ \frac{a_2+b_2} 2,\ \ldots,\ \frac{a_n+b_n} 2. $$ First, notice that the variance is $1/4$ that of the following: $$ a_0+b_0,\ a_1+b_1,\ a_2+b_2,\ \ldots,\ a_n+b_n. $$

The variance of that sum is the sum of the two variances plus $2$ times the covariance i.e. it is $$ \operatorname{var}(a+b) = \operatorname{var}(a) + \operatorname{var}(b) + 2\operatorname{cov}(a,b), $$ where $$ \operatorname{cov}(a,b) = \frac 1 {n+1} \sum_{k=0}^n (a_k - \bar a)(b_k - \bar b). $$ You can prove that by a bit of algebra applied to the definitions of variance and covariance.

The covariance is positive if on average $b$ increases as $a$ increases, and negative if on average $b$ decreases as $a$ increases.