I am trying to work out the variance of a sum of two sets of random variables $X_1,\cdots,X_n$ and $Y_1,\cdots,Y_m$ for a paper I'm working on. The variables $X_i$ are pairwise independent and also independent from the $Y_i's$. The variables $Y_i$ are not independent.
The solution I derived for the general case is this. First, let $C=\{X_1,\cdots,X_n,Y_1,\cdots,Y_m\}$. $$ \begin{align*} Var\left[ \sum_{i=1}^{n+m} C_i \right] &= \sum_{i=1}^{n+m} Var[C_i] + 2\sum_{i=1}^{n+m}\sum_{j=i+1}^{n+m} Cov(C_i,C_j) \\ &= \sum_{i=1}^{n} Var[X_i] + 2\sum_{i=1}^{n}\sum_{j=i+1}^{n+m} Cov(X_i,C_j) + \sum_{j=1}^{m} Var[Y_j] + 2\sum_{i=m}^{n+m}\sum_{j=i+1}^{n+m} Cov(Y_i,C_j) \\ &= \sum_{i=1}^{n} Var[X_i] + \sum_{j=1}^{m} Var[Y_j] + 2\sum_{i=1}^{m}\sum_{j=i+1}^{m} Cov(Y_i,Y_j) \\ &= \sum_{i=1}^{n} Var[X_i] + Var\left[ \sum_{j=1}^m Y_j \right] \end{align*} $$
(corrections are appreciated). N.B.: the first equality can be found in the book "Probability and Computing" by Michael Mitzenmacher and Eli Upfal -- Cambridge University Press, 2005, in particular in exercise 3.14.
As long as the derivation above is correct, my questions are:
- Is it necessary to put this in the paper (given that there are length restrictions)?
- Is this some sort of trivial (or straightforward) result that need not be stated? (restrictions might drop, so I need to know whether this is "well-known" or not).
- In case it is not well-known, can this be found somewhere (papers/books) which I can cite so that I don't have to write it?
Thank you.
It's somewhat trivial, if you write $ S_X = X_1 + \cdots X_n $ and $S_Y = Y_1 + \cdots + Y_m$ then by your assumptions $S_X, S_Y$ are independent random variables and thus $Var(S_X+S_Y) = Var(S_X) + Var(S_Y)$. Then, since the X's are independent you have $Var(S_X) = Var(X_1) + \cdots + Var(X_n)$ and similarly for $S_Y$. QED.