Like all combinatoric problems, this one is probably equivalent to another, well-known one, but I haven't managed to find such an equivalent problem (and OEIS didn't help), so I offer this one as being possibly new and possibly interesting.
Problem statement
I have $2N$ socks in a laundry basket, and I am hanging them on the hot pipes to dry. To make life easier later, I want to hang them in pairs. Since it is dark where the pipes are, I adopt the following algorithm:
- Take a sock at random from the basket.
- If it matches one that is already on my arm, hang them both on the pipes: the one in my hand and the matching one taken from my arm.
- If it does not match one that is already on my arm, hang it on my arm with the others.
- Do this $2N$ times.
The question is: How long does my arm have to be?
Clearly, the minimum length is $1$, for instance if the socks come out in the order $AABBCC$. Equally clearly, the maximum length is $N$, for instance if the socks come out as $ABCABC$. But what is the likeliest length? Or the average length? Or what sort of distribution do the required lengths have?
It turns out to be easiest to parameterise the results not by $2N$, the number of socks, but by $2N-1$, which I will call $M$.
The first few results
(Notation: $n!!$ is the semifactorial, the factorial including only odd numbers; thus $7!!=7\times 5\times 3\times 1$).
In each case I provide the frequency for each possible arm length, starting with a length of 1. I use frequencies rather than probabilities because they are easier to type, but you can get the probabilities by dividing by $M!!$.
$$ \begin{array}{c|rrrrr} M \\ \hline 1 & 1 \\ 3 & 1 & 2 \\ 5 & 1 & 8 & 6 \\ 7 & 1 & 30 & 50 & 24 \\ 9 & 1 & 148 & 340 & 336 & 120 \\ \end{array} $$ It would be good to know (for example) if these frequencies tend to some sort of known distribution as $M\to\infty$, just as the binomial coefficients do.
But, as I said at the beginning, this may just be a re-encoding of a known combinatorial problem, carrying a lot of previously worked out results along with it. I thought, for instance, of the lengths of random walks in $N$ dimensions with only one step forward and one step back being allowed in each dimension – but that looked too complicated to give any straightforward direction to follow.
Background: methods
In case it is interesting or helpful, I obtained the results above by means of a two-dimensional generating function, in which the coefficient of $y^n$ identified the arm length needed and the coefficient of $x^n$ identified how many socks had been retrieved at the [first] time that this length was reached. Calling the resulting generating function $A_M(x,y)$, the recurrence I used was:
$$A_M=MxyA_{M-2}+x^2(x-y)\frac\partial{\partial x}A_{M-2}+(1-x^2)xy$$
which is based on sound first principles and matches the results of manual calculation up to $M=5$. Having found a polynomial, I substitute $x=1$ and the numbers in the table above are then the coefficients of the powers of $y$.
But, mathematics being close to comedy, all this elaboration may be an unnecessarily complicated way to get to a result too trivial to be found even in OEIS. Is it?
I did some Monte Carlo with this interesting problem and came to some interesting conclusions. If you have $N$ pairs of socks the expected maximum arm length is slightly above $N/2$.
First, I made 1,000,000 experiments with 100 pairs of socks and recorded maximum arm length reached in each one. For example, maximum arm length of 54 was reached about 90,000 times. And it all looks like a normal distribution to me. The average value of maximum arm length was 53.91, confirmed several times in a row.
Nothing changed with 100 pairs of socks and 10,000,000 experiments. Average value remained the same. So it looks like you need about a million runs to draw up a meaningful conclusion.
Here is what I got when I doubled the number of socks to 200 pairs. Maximum arm length on average was 105.12, still above 50%. I got the same value in several repeated experiments ($\pm0.01$).
Finally, I decided to check expected maximum arm length for different number of sock pairs, from 10 to 250. Each number of pairs was tested 2,000,000 times before the average value was calculated. Here are the results:
$$ \begin{array}{c|rr} \textbf{Pairs} & \textbf{Arm Length} & \textbf{Increment} \\ \hline 10 & 6.49 & \\ 20 & 12.03 & 5.54 \\ 30 & 17.41 & 5.38 \\ 40 & 22.71 & 5.30 \\ 50 & 27.97 & 5.26 \\ 60 & 33.20 & 5.23 \\ 70 & 38.40 & 5.20 \\ 80 & 43.59 & 5.19 \\ 90 & 48.75 & 5.16 \\ 100 & 53.91 & 5.16 \\ 110 & 59.07 & 5.16 \\ 120 & 64.20 & 5.13 \\ 130 & 69.33 & 5.13 \\ 140 & 74.46 & 5.13 \\ 150 & 79.58 & 5.12 \\ 160 & 84.69 & 5.11 \\ 170 & 89.80 & 5.11 \\ 180 & 94.91 & 5.11 \\ 190 & 100.02 & 5.11 \\ 200 & 105.11 & 5.09 \\ 210 & 110.20 & 5.09 \\ 220 & 115.29 & 5.09 \\ 230 & 120.38 & 5.09 \\ 240 & 125.47 & 5.09 \\ 250 & 130.56 & 5.09 \end{array} $$
It looks like a straight line but it's actually an arc, slightly bended downwards (take a look at the increment column).
Finally, here is the Java code that I used for my experiments.
EDIT: For N=500, the average value of maximum arm length after 2,000,000 tests is 257.19. For N=1000, the result is 509.23.
It seems that for $N\to\infty$, the result goes down to $N/2$. I don't know how to prove this.