I have a set of $N$ elements and I iterate the following procedure (all the elements at the beginning are not marked):
- I pick an element at random.
- If it is marked, I stop.
- If it is not marked, I mark it and I repeat the procedure.
How many times on average do I iterate the procedure before stopping because I am picking something I have already picked?
My best shot at solving it
The probability of picking one more, having already picked one is
\begin{equation*} p_i = \frac{N - i}{N} \end{equation*}
The probability of picking at least $i$ is therefore:
\begin{equation*} s_i = \frac{N!}{(N - i)! N^i} \end{equation*}
The probability of picking exactly $i$ is
\begin{equation*} f_i = s_i - s_{i + 1} = \frac{N!}{(N - i)!N^i}\frac{i}{N} \end{equation*}
So far so good. Now, I use Stirling's approximation ($N$ is very large and the probability to have a large $i$ is very small) and (if I didnt' get any calculation wrong) I get
\begin{equation*} f_i = \frac{i}{N} \left(\frac{N}{N - i}\right)^{(N - i + 1/2)} \end{equation*}
The average number of picks $<i>$ is therefore:
\begin{equation*} <i> = \sum_j^N \frac{j^2}{N} e^{-j} \left(\frac{N}{N - j}\right)^{(N - j + 1/2)} \end{equation*}
But here I have no clue on how to go on. I can imagine this is a fairly known problem, but I wouldn't know where to look for, does it have a commonly known name?
This has been researched. It is a variant of the Birthday Problem. It seems you're on the right track according to Wikipedia, but your formula is a little off. Knuth gives $1+\sum_{i=1}^n\frac{N!}{(N - i)! N^i}$ as the answer. His proof is in The Art of Computer Programming. Ramanujan also offers an asymptotic approximation in the link, which is also called "Ramanujan's Q function."