An urn has $N$ red balls. We constantly sample $k$ random balls from it (group drawing with fixed size $k$), repaint the balls to white balls, and return $k$ white balls to the urn. I wonder the expected number of drawings $t$ until we have a draw with all white balls.
I guess this is a variant of the famous coupon collector problem (with group drawings), but instead of asking the rounds of replacing all $N$ red balls with white balls, this problem asks the number of drawings until we have a draw filled with all white balls.
Related coupon collector problem: https://mathoverflow.net/questions/229060/batched-coupon-collector-problem
This can be modeled as a finite-state absorbing Markov chain. We have $N+2$ states, where states $0$ through $N$ are transient states corresponding to the number of white balls, and state $N+1$ is the absorbing state, meaning that there has been a draw consisting solely of white balls. Then the expectation we seek is simply the average time to absorption, which may be computed as detailed in the wiki article.
Suppose that the chain is in transient state $w$, that is, that there are $w$ white balls and that a draw comprising only white balls has not yet occurred. The probability of drawing $r$ red balls, for $0< r\leq\min(k, N-w)$ is $$p_{w,r}:=\frac{\binom{N-w}{r}\binom{w}{k-r}}{\binom Nk}$$ when $r<k,\ p_{w,r}$ is the transition probability from state $w$ to state $w+r$. When $r=0,\ p_{w,0}$ is the transition probability from state $w$ to state $N+1$, the absorbing state. In practice, we would construct the $N+1\times N+1$ fundamental matrix directly, so we wouldn't need the $r=0$ case.
It seems unlikely to me that we could arrive at a simple formula for the expected time to absorption. I'm going to write a little script to compute some sample values, to see if a pattern emerges.
EDIT
Here's the script I wrote:
The
expect0computes the expectation using integer arithmetic, and theexpect1function does exactly the same thing, but uses floating point.expect0returns both the exact value and a floating-point approximation.The script computes the values for $N=10$ and $1\leq k\leq10$. Here is the output:
If there's a pattern here, it's buried too deep for the likes of me. For any extensive calculation I would delete (or comment out) the imports from sympy and
expect0, and just useexpect1.It's interesting that generally, the expectation decreases as $k$ increases, but it is greater when $k=2$ than when $k=1$. This has been the case for the other (very few) values of $N$ that I have tried. I wonder if it is true in general, and if so how one would prove it.