Expectation rounds of group drawings of urn balls

98 Views Asked by At

An urn has $N$ red balls. We constantly sample $k$ random balls from it (group drawing with fixed size $k$), repaint the balls to white balls, and return $k$ white balls to the urn. I wonder the expected number of drawings $t$ until we have a draw with all white balls.

I guess this is a variant of the famous coupon collector problem (with group drawings), but instead of asking the rounds of replacing all $N$ red balls with white balls, this problem asks the number of drawings until we have a draw filled with all white balls.

Related coupon collector problem: https://mathoverflow.net/questions/229060/batched-coupon-collector-problem

1

There are 1 best solutions below

0
On

This can be modeled as a finite-state absorbing Markov chain. We have $N+2$ states, where states $0$ through $N$ are transient states corresponding to the number of white balls, and state $N+1$ is the absorbing state, meaning that there has been a draw consisting solely of white balls. Then the expectation we seek is simply the average time to absorption, which may be computed as detailed in the wiki article.

Suppose that the chain is in transient state $w$, that is, that there are $w$ white balls and that a draw comprising only white balls has not yet occurred. The probability of drawing $r$ red balls, for $0< r\leq\min(k, N-w)$ is $$p_{w,r}:=\frac{\binom{N-w}{r}\binom{w}{k-r}}{\binom Nk}$$ when $r<k,\ p_{w,r}$ is the transition probability from state $w$ to state $w+r$. When $r=0,\ p_{w,0}$ is the transition probability from state $w$ to state $N+1$, the absorbing state. In practice, we would construct the $N+1\times N+1$ fundamental matrix directly, so we wouldn't need the $r=0$ case.

It seems unlikely to me that we could arrive at a simple formula for the expected time to absorption. I'm going to write a little script to compute some sample values, to see if a pattern emerges.

EDIT

Here's the script I wrote:

import numpy as np
from math import factorial
from sympy import Matrix, binomial, zeros, eye

'''
A bin contains N red balls.  Repeatedly, we draw $k$ balls from the 
bin, paint any red balls white, and return all the balls to the bin.
What is the average number of draws until all we get a draw 
comprising only white balls?

We model this as an absorbing Marokov chain with N+2 states, where
states 0 through N correspond to the number of white balls, and state
N+1 is the absorbing state, meaning that a draw comprising solely white
balls has occurred.
'''
def choose(n,m):
    if m > n or m < 0:
        return 0
    return factorial(n)//(factorial(n-m)*factorial(m))

def expect1(N, k):
    # floating point
    
    if k > N:
        raise
    Q = np.zeros((N+1, N+1))  
    for white in range(N+1):
        for red in range(1, min(N-white,k)+1):
            Q[white, white+red]  = choose(N-white,red)*choose(white, k-red)        
    draws = choose(N,k)
    Q /= draws
    I = np.eye(N+1)
    F = np.linalg.inv(I-Q)
    del(Q)
    return sum(F[0,:])

def expect0(N, k):
    #exact
    
    Q = zeros(N+1,N+1)
    for white in range(N+1):
        for red in range(1, min(N-white,k)+1):
            Q[white, white+red]  = binomial(N-white,red)*binomial(white, k-red)        
    draws = binomial(N,k)
    Q /= draws
    I = eye(N+1)
    F = (I-Q).inv()
    answer = sum(F[0,:])
    return answer, float(answer)

for k in range(1,11):
    e = expect0(10, k)
    print(f'{k}: {e[0]} {e[1]}')

The expect0 computes the expectation using integer arithmetic, and the expect1 function does exactly the same thing, but uses floating point. expect0 returns both the exact value and a floating-point approximation.

The script computes the values for $N=10$ and $1\leq k\leq10$. Here is the output:

1: 7281587/1562500 4.66021568
2: 4808795186/922640625 5.211991598570679
3: 28894847/5760000 5.016466493055556
4: 154103/33075 4.659198790627362
5: 2446715/571536 4.28094643207077
6: 34281/8750 3.917828571428571
7: 10739/3000 3.5796666666666668
8: 734/225 3.2622222222222224
9: 29/10 2.9
10: 2 2.0     

If there's a pattern here, it's buried too deep for the likes of me. For any extensive calculation I would delete (or comment out) the imports from sympy and expect0, and just use expect1.

It's interesting that generally, the expectation decreases as $k$ increases, but it is greater when $k=2$ than when $k=1$. This has been the case for the other (very few) values of $N$ that I have tried. I wonder if it is true in general, and if so how one would prove it.