I recently have been trying to analyze the variance of two related random variables $G$ and $F$, hoping to get a decrease of the variance when switching to the second ($F$). However, either my computations are wrong, or my intuition is -- it looks like the variance goes up. At this point, I am hoping for either finding a mistake (it goes down) or understanding why.
Let $(p_1,\dots,p_n)\in[0,1]^n$ be parameters such that $q_i=q\stackrel{\rm{}def}{=}\frac{1}{n}\sum_{i=1}^n p_i \in (0,1)$ for all $i\in[n]$. Let $X_1,\dots, X_n$ be independent Poisson random variables, where $X_i\sim\operatorname{Poisson}(p_i)$; and $Y_1,\dots,Y_n$ be independent random variables (also independent from the $X_i$'s) with $Y_i\sim\operatorname{Poisson}(q_i)$.
I also have a random variable $Y\sim\operatorname{Poisson}(q)$, independent of $(X_i)_{1\leq i \leq n}$. ($Y$ and all the $Y_i$'s are identically distributed).
Let $$G\stackrel{\rm{}def}{=} \sum_{i=1}^n \left( (X_i-Y_i)^2 - X_i - Y_i \right)$$ and $$F\stackrel{\rm{}def}{=} \sum_{i=1}^n \left( (X_i-Y)^2 - X_i - Y \right).$$
We have* $\mathbb{E} G = \mathbb{E} F = \sum_{i=1}^n (p_i-q_i)^2 $. What I am interested in is the (way messier to compute for $F$, as the terms of the sums are no longer independent) variance of both $F$ and $G$; getting closed form expression, or at least upper bounding them. I have this for $G$, but "intuitively" feel that one should get $$ \operatorname{Var} F \leq \operatorname{Var} G \tag{$\dagger$} $$ as instead of many $Y_i$ which induce fluctuations, there is only one random variable $Y$. Trying to compute it, using properties of the Poisson distribution and breaking the variance into $\mathbb{E}[ G^2] - (\mathbb{E} G)^2$ (and expanding the square of the sum in the expectation of the first term) is (a) horrendous; and (b) the result I got seems to contradict my "conjecture" $(\dagger)$.
Is there a simple way to prove/disprove what I thought, or (even better) to elegantly compute the variance to do so?
Best,
Clément.
*(unless I messed up with the statement of the problem)
My intuition is the other way around, because using multiple i.i.d. samples often reduces variance. Define:
\begin{eqnarray*} G_i &=& (X_i-Y_i)^2 - (X_i+Y_i)\\ F_i &=& (X_i-Y)^2 - (X_i+Y) \end{eqnarray*} Then $G=\sum_{i=1}^n G_i$ and $F = \sum_{i=1}^nF_i$. The $\{G_1, \ldots, G_n\}$ variables are independent, while the $\{F_1, \ldots, F_n\}$ variables are not. However, for each $i \in \{1, \ldots, n\}$, $F_i$ and $G_i$ are identically distributed. So $Var(F_i)=Var(G_i)$ and:
$Var(G) = \sum_{i=1}^nVar(G_i)$
$Var(F) = \sum_{i=1}^nVar(G_i) + \sum_{i\neq j} E[(F_i-E[F_i])(F_j-E[F_j])]$
Thus, $Var(F)>Var(G)$ when the $F_i$ variables are postively correlated with each other, as they probably are in this case.
You can reduce variance if you find a way of making $F_i$ samples that are negatively correlated with each other.