I have a deck of $N$ cards. $n$ of the cards are red circles and $N-n$ cards are blue circles. I also have an unlimited supply of red square and blue square cards. I play the following game and want to know the probability distribution of the outcome:
- Repeat the following process until no circle card left:
- Pick one of the circle cards from the deck at random and replace it with a square card of the same color (let's call this color $c$). Then pick another card from the deck (it could be a circle or square card of any color) at random and replace it with another square card of color $c$.
Note that in each repeated step of the above process, the second replaced card could possibly be the same as the first square card of that step.
Every time you repeat this process, you replace one or two of the circle cards with square cards. In the end, we will have $N$ square cards. I want to know the probability $P(n'|n)$ of having $n'$ red squares and $N-n'$ blue squares at the end, given $n$ initial red circles.
What do I want:
- Ideally $P(n'|n)$.
- If that's not easy, a large $N$ approximation could be equally helpful.
- Alternatively, finding the mean and the variance of $n'$ would be equally helpful. I'm guessing the mean of $n'$ is $n$.
- The mean and variance of $n'$ for large $N$ would also be good enough if nothing else works.
Update 1: Numerically, it looks like $\left\langle n'\right\rangle = n$, and for large $N$, $\text{var}(n') \approx\frac{3n(N-n)}{4N}$.
Update 2: antkam's answer gives a beautiful symmetry proof for $\left\langle n'\right\rangle = n$. Now, all I need is:
Prove $\text{var}(n') \approx\frac{3n(N-n)}{4N}$ for large N.
Update 3: Some more numerical results that may help to find a proof: as in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as $$n'=\sum_{i=1}^n X_i.$$ We can write the Var($n'$) as $$ \begin{align} \text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle - \left\langle\sum_{i=1}^n X_i\right\rangle^2\\ &=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right) \end{align} $$ Numerically, I have found (Edit: Now both of these partial results are proven in joriki's answer below):
- $\text{Var}(X_1) = 3/4$
- Probabilities are $P(X=0) = P(X=2) = 3/8$ and $P(X=1)=1/4$.
Observations:
- The equality of $P(X=0) = P(X=2)$ can be justfied through the following observation: For $\sum_{i=1}^N X_i = N$ to be satisfied, the number of $X_i=0$ should be exactly the same as the number of $X_j=2$.
- For $\text{Var}(n')$ to be $\frac{3n(N-n)}{4N}$, the value of $\text{Cov}(X_1,X_2)$ should be $-\frac{3}{4N}$ to first order in $1/N$.
Sketch of the proof for $\text{Var}(n') \approx \frac{3n(N-n)}{4(N-1)}$ at large $N$:
As in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as $$n'=\sum_{i=1}^n X_i.$$ We can write the Var($n'$) as $$ \begin{align} \text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle - \left\langle\sum_{i=1}^n X_i\right\rangle^2\\ &=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right) \end{align} $$ Where we have used the fact that $\text{Var}(X_i)$ is the same for all $i$s and $\text{Cov}(X_i,X_j)$ is the same for all $i\neq j$ by symmetry.
Joriki's answer proved that $\text{Var}(X_i) = 3/4$. All is left to do is to prove that $\text{Cov}(X_i,X_j) = -\frac1{N-1} \text{Var}(X_i)$, and then we will have $$ \begin{align} \text{Var}(n') &=n\text{Var}(X_1)\left(1-\frac{n-1}{N-1}\right) = \frac34\frac{n(N-n)}{N-1}. \end{align} $$ To do this, let us first define the variables $\delta X_i \equiv X_i - \langle X_i\rangle $. The $\text{Cov}(X_i,X_j)$ can be written in terms of $\delta X_i$s as (forgive the sloppy notation) $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\int \delta X_i \delta X_j P(\delta X_i,\delta X_j) d\delta X_i d\delta X_j\\ &=\int\int \delta X_i \delta X_j P(\delta X_i|\delta X_j) P(\delta X_j) d\delta X_i d\delta X_j\\ &=\int\delta X_j\underbrace{\left(\int \delta X_i P(\delta X_i|\delta X_j)d\delta X_i \right)}_{\langle \delta X_i|\delta X_j\rangle} P(\delta X_j) d\delta X_j\\ \end{align} $$ Note that since $\sum_i \delta X_i = 0$, by symmetry the expected value of $\delta X_i$ given $\delta X_j$ should be $-\frac{1}{N-1} \delta X_j$: $$ \langle \delta X_i|\delta X_j\rangle=-\frac{1}{N-1} \delta X_j $$ So we have: $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\delta X_j\langle \delta X_i|\delta X_j\rangle P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\int\left(\delta X_j\right)^2 P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\text{Var}(\delta X_j). \end{align} $$