Think of the following scenario:
We are a group of $42$ people. I tag you, and you tag another person. This other person tags another person, etc. The "chain" of tagging stops when a person has been tagged twice. (A person can't tag the person that tagged them).
I am trying to find a general formula to calculate the expected number of people who are tagged. I looked at the mean of a hypergeometric distribution, but it doesn't seem to be what I need, since it's used to determine the expected number of successes in n trials. In my case, the trials stop after the first failure.
My second guess was a negative hypergeometric distribution, but apart from a disputed wikipedia article and this Encyclopedia of math article, information seems very scarce.
Following the latter, I calculated the mean as follows: $E[X] = m(N-M)/(M+1)$
Where $M = 1$ (number of tagged people), $N-M = 41$ (untagged), $N = 42$ (total), and $m = 1$ (number of failures before stopping). This yields $E[X] = 20.5$.
Is this accurate ? Is there a way this formula can be derived using only the hypergeometric distribution (which is widely more documented than the negative hypergeometric distribution).
Thank you.


Original poster here. I think I have got it figured out. I assumed that every "tagging" trial is independant, with a varying probability. (41/42, 40/42, 39/42 and so on).
The expected number of "successes" before the first failure is simply the sum of all probabilities. See https://en.wikipedia.org/wiki/Poisson_binomial_distribution
This yields an expected number of 20.5 taggings.