Probability of the outcome of this simple card game

639 Views Asked by At

I have a deck of $N$ cards. $n$ of the cards are red circles and $N-n$ cards are blue circles. I also have an unlimited supply of red square and blue square cards. I play the following game and want to know the probability distribution of the outcome:

  • Repeat the following process until no circle card left:
  • Pick one of the circle cards from the deck at random and replace it with a square card of the same color (let's call this color $c$). Then pick another card from the deck (it could be a circle or square card of any color) at random and replace it with another square card of color $c$.

Note that in each repeated step of the above process, the second replaced card could possibly be the same as the first square card of that step.

Every time you repeat this process, you replace one or two of the circle cards with square cards. In the end, we will have $N$ square cards. I want to know the probability $P(n'|n)$ of having $n'$ red squares and $N-n'$ blue squares at the end, given $n$ initial red circles.

What do I want:

  • Ideally $P(n'|n)$.
  • If that's not easy, a large $N$ approximation could be equally helpful.
  • Alternatively, finding the mean and the variance of $n'$ would be equally helpful. I'm guessing the mean of $n'$ is $n$.
  • The mean and variance of $n'$ for large $N$ would also be good enough if nothing else works.

Update 1: Numerically, it looks like $\left\langle n'\right\rangle = n$, and for large $N$, $\text{var}(n') \approx\frac{3n(N-n)}{4N}$.

Update 2: antkam's answer gives a beautiful symmetry proof for $\left\langle n'\right\rangle = n$. Now, all I need is:

Prove $\text{var}(n') \approx\frac{3n(N-n)}{4N}$ for large N.

Update 3: Some more numerical results that may help to find a proof: as in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as $$n'=\sum_{i=1}^n X_i.$$ We can write the Var($n'$) as $$ \begin{align} \text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle - \left\langle\sum_{i=1}^n X_i\right\rangle^2\\ &=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right) \end{align} $$ Numerically, I have found (Edit: Now both of these partial results are proven in joriki's answer below):

  • $\text{Var}(X_1) = 3/4$
  • Probabilities are $P(X=0) = P(X=2) = 3/8$ and $P(X=1)=1/4$.

Observations:

  • The equality of $P(X=0) = P(X=2)$ can be justfied through the following observation: For $\sum_{i=1}^N X_i = N$ to be satisfied, the number of $X_i=0$ should be exactly the same as the number of $X_j=2$.
  • For $\text{Var}(n')$ to be $\frac{3n(N-n)}{4N}$, the value of $\text{Cov}(X_1,X_2)$ should be $-\frac{3}{4N}$ to first order in $1/N$.
5

There are 5 best solutions below

2
On BEST ANSWER

Sketch of the proof for $\text{Var}(n') \approx \frac{3n(N-n)}{4(N-1)}$ at large $N$:

As in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as $$n'=\sum_{i=1}^n X_i.$$ We can write the Var($n'$) as $$ \begin{align} \text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle - \left\langle\sum_{i=1}^n X_i\right\rangle^2\\ &=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right) \end{align} $$ Where we have used the fact that $\text{Var}(X_i)$ is the same for all $i$s and $\text{Cov}(X_i,X_j)$ is the same for all $i\neq j$ by symmetry.

Joriki's answer proved that $\text{Var}(X_i) = 3/4$. All is left to do is to prove that $\text{Cov}(X_i,X_j) = -\frac1{N-1} \text{Var}(X_i)$, and then we will have $$ \begin{align} \text{Var}(n') &=n\text{Var}(X_1)\left(1-\frac{n-1}{N-1}\right) = \frac34\frac{n(N-n)}{N-1}. \end{align} $$ To do this, let us first define the variables $\delta X_i \equiv X_i - \langle X_i\rangle $. The $\text{Cov}(X_i,X_j)$ can be written in terms of $\delta X_i$s as (forgive the sloppy notation) $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\int \delta X_i \delta X_j P(\delta X_i,\delta X_j) d\delta X_i d\delta X_j\\ &=\int\int \delta X_i \delta X_j P(\delta X_i|\delta X_j) P(\delta X_j) d\delta X_i d\delta X_j\\ &=\int\delta X_j\underbrace{\left(\int \delta X_i P(\delta X_i|\delta X_j)d\delta X_i \right)}_{\langle \delta X_i|\delta X_j\rangle} P(\delta X_j) d\delta X_j\\ \end{align} $$ Note that since $\sum_i \delta X_i = 0$, by symmetry the expected value of $\delta X_i$ given $\delta X_j$ should be $-\frac{1}{N-1} \delta X_j$: $$ \langle \delta X_i|\delta X_j\rangle=-\frac{1}{N-1} \delta X_j $$ So we have: $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\delta X_j\langle \delta X_i|\delta X_j\rangle P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\int\left(\delta X_j\right)^2 P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\text{Var}(\delta X_j). \end{align} $$

0
On

You can model this as a discrete-time Markov chain with states $(\text{rc},\text{bc},\text{rs},\text{bs})\in S$, where $\text{rc}\ge 0$ is the number of red circles, etc., and $\text{rc}+\text{bc}+\text{rs}+\text{bs}=N$. Given transition probabilities $p_{i,j}$, the absorption probabilities $\pi_{i,j} \ge 0$ of eventually ending up in state $j\in S$ starting from state $i\in S$ can be computed from the linear system: \begin{align} \pi_{i,j} &= \sum_{k\in S} p_{i,k} \pi_{k,j} &&\text{for all $i$ and $j$}\\ \pi_{i,i} &= 1 &&\text{if state $i$ has no circles}\\ \sum_{j \in S} \pi_{i,j} &= 1 &&\text{for all $i$} \end{align} Then $$P(n'|n) = \sum_{\substack{i \in S\\\text{$i$ has $n$ red circles}}} \sum_{\substack{j \in S\\\text{$j$ has $n'$ red squares}}} \pi_{i,j}.$$

For example, the values of $P(n'|n)$ for $N=8$ are as follows:

n\n'        0         1         2         3         4         5         6         7         8 
0   1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
1   0.3603253 0.2793493 0.3603253 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
2   0.1027300 0.2067804 0.3809791 0.2067804 0.1027300 0.0000000 0.0000000 0.0000000 0.0000000 
3   0.0196712 0.0874329 0.2453533 0.2950851 0.2453533 0.0874329 0.0196712 0.0000000 0.0000000 
4   0.0015393 0.0191598 0.1008081 0.2233022 0.3103814 0.2233022 0.1008081 0.0191598 0.0015393 
5   0.0000000 0.0000000 0.0196712 0.0874329 0.2453533 0.2950851 0.2453533 0.0874329 0.0196712 
6   0.0000000 0.0000000 0.0000000 0.0000000 0.1027300 0.2067804 0.3809791 0.2067804 0.1027300 
7   0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3603253 0.2793493 0.3603253 
8   0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 

From computations for small $N$, it does appear that $E(n'|n)=n$.

1
On

It is indeed true that $E[n'] = n$ for any $n \ge 1$.

The idea is to look at the random variable $c_R(t)+s_R(t)$ for each $t$, where $c_R(t)$ is the number of red circles and $s_R(t)$ is the number of red squares after $t$ processes/[plays of the game]. $c_B(t)$ and $s_B(t)$ are defined analogously.

By doing casework on whether a red circle or blue circle was chosen, we have $$E[c_R(t+1)+s_R(t+1)] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(c_R(t)-2+s_R(t)+2)+\frac{s_R(t)+1}{N}(c_R(t)-1+S_R(t)+1)+\frac{c_B(t)}{N}(c_R(t)-1+S_R(t)+2)+\frac{s_B(t)}{N}(c_R(t)-1+s_R(t)+2)\right]$$ $$+\frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(c_R(t)+s_R(t))+\frac{s_B(t)+1}{N}(c_R(t)+s_R(t))+\frac{c_R(t)}{N}(c_R(t)+s_R(t)-1)+\frac{s_R(t)}{N}(c_R(t)+s_R(t)-1)\right],$$ which after some computations, involving $c_R(t)+s_R(t)+c_B(t)+s_B(t) \equiv N$, gives $$E[c_R(t+1)+s_R(t+1)] = E[c_R(t)+s_R(t)](1-\frac{1}{N})+E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Similarly, $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(\frac{c_R(t)-2}{c_R(t)+c_B(t)-2})+\frac{c_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+1+s_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-1})\right]$$ $$+ \frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-2})+\frac{c_R(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+s_B(t)+1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-1})\right],$$ which after a very long computation using also $s_R(t)+s_B(t) \equiv N-c_R(t)-c_B(t)$ gives $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Since $E[\frac{c_R(0)}{c_R(0)+c_B(0)}] = \frac{n}{N}$, we see that $$E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right] = \frac{n}{N}$$ for all $t \ge 0$. Therefore, $$E[c_R(t+1)+s_R(t+1)] = \left(1-\frac{1}{N}\right)E[c_R(t)+s_R(t)]+\frac{n}{N}.$$ Since $E[c_R(0)+s_R(0)] = n$, iteration/induction shows that $$E[c_R(t)+s_R(t)] = n$$ for all $t \ge 0$. In particular, we have $$E[n'] = n,$$ as desired.

3
On

A symmetry-based proof that $E[n'] = n$. Also possibly an approach for approximating, but not calculating exactly, $Var(n')$.


Imagine the original $N$ circle cards are in fact $N$ different colors, $c_1, ..., c_N$. The same process is done to the deck, using as many square cards (of the same $N$ colors) as you need.

When the process is finished (i.e. no more circle cards), let random variable $X_j$ be the number of final cards of color $c_j$. Clearly:

  • $E[X_1] = E[X_2] = \dots = E[X_N]$ by symmetry,

  • $X_1 + X_2 + \dots + X_N = N$ by conservation of the total number of cards,

  • $E[X_1] + E[X_2] + \dots + E[X_N] = N$ combining the above and using linearity of expectation,

  • and therefore: $E[X_j] = 1$ for all $j$.

Now imagine you are partially colorblind and cannot distinguish between $c_1, ..., c_n$ and think of all of them as "red". The final no. of "red" cards (to your eyes) will be $n' \equiv X_1 + \dots + X_n,$ and so we have $E[n'] = n$. QED


Further thoughts on $Var(n')$:

Note that each of the $N$ colors can have at most $2$ final cards, i.e. $X_j \in \{0, 1, 2\}$. It might be possible (but difficult/tedious) to calculate $Var(X_j)$, based on careful case analysis, e.g. $X_j = 2$ iff that $c_j$ circle card was picked as the first card of some round $t$, and, neither of the $2$ square cards of color $c_j$ added to the deck was subsequently replaced in rounds $> t$.

If we know $Var(X_j)$, then based on $n' \equiv X_1 + \dots + X_n,$ a decent approximation might be $Var(n') \approx n \times Var(X_j)$. This would be exact equality if the $X_j$'s were independent, but of course they are actually dependent. However, in the limit (or maybe more stringently: the large $n$ but much larger $N$ limit, i.e. $1 \ll n \ll N$), the approximation might be quite good.

The OP said numerically it seems $Var(n') \approx {3n(N-n) \over 4N}$. This would be consistent with my thoughts above if $Var(X_j) = \frac34,$ in the $1 \ll n \ll N$ limit.

8
On

Here’s a derivation of the variance.

Let $r(t)$ denote the proportion of circle cards left after time $t$, where a step takes time $\frac1N$. In each step, we remove one circle card, and then another circle card with probability $r$. Thus, in the limit $N\to\infty$, $r$ satisfies the differential equation

$$ r'=-(1+r)\;. $$

The general solution is

$$ r=c\mathrm e^{-t}-1\;, $$

and with $r(0)=1$ we have $c=2$, so

$$ r=2\mathrm e^{-t}-1\;, $$

and thus

$$ t=-\log\frac{r+1}2\;. $$

The process ends at $r=0$, and thus at $t=\log2$, so we expect it to take $N\log2$ steps.

Now focus on some particular circle card. It is replaced by a square card at some $r$ uniformly distributed over $[0,1]$. In the first half-step, it is replaced with probability $\frac1{rN}$, and in the second half-step it is replaced with probability $\frac1N$. Thus, conditional on the circle card being replaced at $r$, the probability that it’s replaced in the first half-step, and thus produces two square cards of its colour, is $\frac1{r+1}$. (The probability that the same card is selected in both half-steps is of order $\frac1N$ and thus negligible in the limit.)

If the circle card is replaced by $2$ square cards, there remain $N\left(\log2+\log\frac{r+1}2\right)=N\log(r+1)$ steps in which these $2$ square cards could be replaced by other square cards. The probability for neither of them to be replaced in any of these steps is

\begin{eqnarray} \left(1-\frac2N\right)^{N\log(r+1)} &\to_{n\to\infty}& \exp(-2\log(r+1)) \\ &=& (r+1)^{-2}\;. \end{eqnarray}

Integrating over the uniform distribution with respect to $r$, we obtain the probability that the circle card leaves behind $2$ square cards (i.e. produces $2$ square cards, neither of which is replaced) as

$$ \int_0^1(r+1)^{-3}\mathrm dr=\frac38\;. $$

As shown in antkam’s answer, the expected number of square cards that a circle card leaves behind is $1$, so the probability that it leaves behind $0$ cards must also be $\frac38$. For the change $D\in\{-1,0,1\}$ in the number of cards of this type, the variance is easily calculated as

$$\mathsf{Var}[D]=\mathsf E[D^2]-\mathsf E[D]^2=\mathsf E[D^2]=\mathsf P(D\ne0)\;,$$

so the variance in the number of cards left behind ($\mathsf{Var}[X_j]$) in antkam’s answer) is $\frac38+\frac38=\frac34$.

Your own answer shows that this implies the result for the total variance that you had obtained numerically. This is the somewhat hand-waving argument I gave for this before you posted your answer: With probability $\frac38+\frac38=\frac34$, the number of cards of the circle card’s colour changes by $\pm1$ with probability $\frac nN$ or $\frac{N-n}N$. The variance of the sum of these $\frac34N$ approximately independent Bernoulli variables is

$$ \frac34N\cdot\frac nN\left(1-\frac nN\right)=\frac{3n(N-n)}{4N}\;, $$

in agreement with your numerical results.

I haven’t made any attempt to rigorously show that the errors all vanish in the limit $N\to\infty$, but this seems very plausible. For instance, the variance of the number of steps should be $O(\sqrt N)$, and $\left(1-\frac2N\right)^\sqrt N\to1$. The correlation among the Bernoulli trials should also go to zero, as it typically does in such scenarios.