Let $N \gg k\geq 1$, and $\gamma \in (0,1/2)$. I have a set of $N$ colored balls (3 colors in total), where $N$ is a multiple of
- $\gamma N$ are red;
- $\gamma N$ are blue;
- $(1-2\gamma)N$ are green.
I pick a random subset $S$ of $[N]$ of size $k$, and would like to know the expectation and variance of the quantity $$ (X-Y)^2 $$ where $X$ (resp. $Y$) is the number of red (resp. blue) balls. (Ideally, the full distribution of $X-Y$ would be nice, but the above is all I really need, I think).
Now, this does feel an awful lot like a variant of balls-and-bins, or basically of something I should know. But every attempt I have made today either relies on (a) the (not very justified) approximation of independence (i.e., sampling with replacement instead of without); or (b) horrible looking sums involving binomial coefficients.
Does the problem above have a name, or a long history that I am missing because of not knowing the right keywords? Or, if not, is there a very simple analysis of the two quantities mentioned above?
Two possible techniques.
First technique: use other people's work. The distribution in question appears to be, from @BGM's comment, a multivariate hypergeometric distribution, which is a very enjoyable name to pronounce very quickly 5 times in a row.
More precisely, $(X,Y,k)\sim \operatorname{MultivHypergeom}_3(\underbrace{(\gamma N,\gamma N,(1-2\gamma) N)}_{(K_1,K_2,K_3)}, N, k)$, from which we get
$$ \mathbb{E}X = \mathbb{E}Y = k\frac{\gamma N}{N} = \gamma k \tag{1} $$ $$ \operatorname{Var} X = \operatorname{Var} Y = \frac{\gamma N}{N}\left(1-\frac{\gamma N}{N}\right)k\frac{N-k}{N-1}= k\gamma\left(1-\gamma\right)\frac{N-k}{N-1} \tag{2} $$ and $$ \operatorname{Cov} (X,Y) = -k\frac{K_1K_2}{N^2}\frac{N-k}{N-1}= -k\gamma^2\frac{N-k}{N-1} \tag{3} $$ from which we can compute $$\begin{align} \mathbb{E}[(X-Y)^2] &= \operatorname{Var} (X-Y) + \mathbb{E}[X-Y]^2 = \operatorname{Var} (X-Y) = \operatorname{Var} X + \operatorname{Var} Y - 2\operatorname{Cov} (X,Y) \\ &= 2\left( \operatorname{Var} X - \operatorname{Cov} (X,Y) \right) = 2\left( k\gamma\left(1-\gamma\right)\frac{N-k}{N-1} + k\gamma^2\frac{N-k}{N-1} \right) \\ &= 2k\gamma \frac{N-k}{N-1}. \end{align}$$ Which, intuitively, seems to make sense (something around $2k\gamma$, plus a correction due to the lack of independence). I expect (no pun intended) one can also leverage this approach to obtain the expression of $\mathbb{E}[(X-Y)^4]$.
Second technique: use someone's ideas. Basically, @Slowpoke's answer, with the following fix: $\mathbb{E}[X\mid X+Y]$ is a random variable distributed according to a hypergeometric distribution, not binomial. That is, we have $$ \mathbb{E}[X\mid X+Y] \sim \operatorname{Hypergeom}(X+Y, \gamma N, 2\gamma N) $$ where $U\stackrel{\rm def}{=}X+Y\sim \operatorname{Hypergeom}(k, 2\gamma N, N)$ from which we can try to compute $$ \mathbb{E}[(X-Y)^2] = \mathbb{E}[\mathbb{E}[(X-Y)^2\mid U]] = \mathbb{E}[\mathbb{E}[(2X-U)^2\mid U]] $$ and $$ \mathbb{E}[(X-Y)^4] = \mathbb{E}[\mathbb{E}[(X-Y)^4\mid U]] = \mathbb{E}[\mathbb{E}[(2X-U)^4\mid U]]. $$ This is a bit tedious, but doable, and e.g. using Mathematica:
we obtain $$ \begin{align} \mathbb{E}[(X-Y)^2] &= 2k\gamma \frac{N-k}{N-1} \operatorname*{\sim}_{N\to\infty} 2k\gamma \\ \mathbb{E}[(X-Y)^4] &= \frac{(2 \gamma k (k - N) ((-1 + 6 \gamma (N-1) - N) N + 6 k^2 (\gamma N-1) - 6 k N (\gamma N-1)))}{(N-3) (N-2) (N-1)} \\ &\operatorname*{\sim}_{N\to\infty} 12 \gamma^2 k^2 + 2(1 - 6 \gamma) \gamma k \end{align} $$