Solving a riddle with Markov chains - extracting pairs of socks

274 Views Asked by At

I would like to solve the following riddle:

A sack contains $2N$ socks, of $N$ colors. What is the expected number of socks you should draw to obtain a matching pair?

I tried to solve it using the expected time to absorption of a Markov chain. The chain is built in this way:

  1. the state where I have matching pairs is called $F$, and it is the only absorbing state.
  2. the state where I already got $i$ socks is called just $i$, where $1\leq i\leq N$, because $i=N+1$ corresponds to the state $F$ (by pidgeon's hole rule).

I setup the problem this way:

  1. the time to reach absorption from state $i$ is $t_i$, and $t_F=0$.
  2. the time $t_i$ is computed as $$ t_i=1+\sum_{j=1}^{N} p_{i,j}t_j,$$ being $p_{i,j}$ the transition probability from state $i$ to state $j$.
  3. $p_{i,j}=0\,\forall j\neq i+1$, and $$p_{i,i+1}=\frac{2N-2i}{2N-i},$$ as it is the chance to draw any of the socks of a color not matching the color of any of the $i$ socks I already drawn, over all the remaining socks.
  4. $p_{i,F}=1-p_{i,i+1}=\frac i{2N-i}$, or drawing any of the $i$ socks pairing with any of the $i$ I already drawn.

I get: $$ t_N = 1,\,t_{N-1}=1+\frac1{N+1},\,t_{N-2}=1+\frac 4{N+1},..$$ that by induction suggests (I did not prove) $t_{N-i}=1+\frac{i^2}{N+1}$.

In any case, I can explicitly determine all the $t_i$'s. At this point, I would compute the expected number of drawns as $1+\sum_{j=1}^N t_j P_j$, where $P_j$ is the probability associated with every state.

I have two specific questions:

  1. does the reasoning sound correct to you?
  2. how do I determine all the $P_j$'s? At first, I would pose $P_1=1$, as I can always reach that state, but it seems weird to me. Anyway, I would pose $P_{i+1}=P_i p_{i,i+1}$, and this sounds reasonable to me.
1

There are 1 best solutions below

1
On BEST ANSWER

IMHO your confusion comes from the fact that you thought $t_i$ are random variables. Instead, they are already expected values. Consider the equation:

$$t_4 = 1 + p_{4,5} \cdot t_5 = 1 + {2N-8 \over 2N - 4} t_5$$

In this equation, $t_4$ should be interpreted as the expected time to finish given you have $4$ different colors, and $t_5$ should be interpreted as the expected time to finish given you have $5$ different colors. So the overall answer you seek is simply $t_0$. (There is no need for your $P_j$ unless the game starts at a random state according to the prior distribution $P_j$.)

Another way to tell that your $t_4$ is not a random variable: We all agree that the time to finish from $i$ colors is an integer-valued random variable, right? But in the equation above, ${2N - 8 \over 2N-4}$ is generally not an integer, so how can it relate two integer r.v.s? Instead, it can (and does) relate two expected values because those can be non-integral.

BTW while I agree that $t_N = 1$ (obviously), I get $t_{N-1} = 1 + {2 \over N+1}$, so your solution to the recurrence is also wrong.