Closed form for a biased hypergeometric distribution

59 Views Asked by At

The question I have is as follows:

I have $n$ balls, each of a different color, say $c_1, c_2, ... , c_n$ inside a bag. I take out one ball at a time, without replacement, till I get the ball with the color $c_1$. (arbitrary)

However, each ball is not equally likely to be picked, they instead have a probability, $p_1, p_2, ... , p_n$ of being picked. Obviously, sum of all ${p_i}'s$ is 1. However, whenever any one ball I take out, say $c_i$, the probabilities of the remaining balls get divided by $1 - p_i$ since they are each more likely to be picked now.

What I am trying to find now is the expected number of balls I will have to take out to get the ball with color $c_1$.

I expected this to be some sort of a traditional probability problem. But when I tried to solve it, things got hairy real fast.

for $n = 1$, $$E_1 = 1$$ $n = 2$, $$E_2 = p_1 * 1 + p_2 * \frac{p_1}{1 - p_2} * 2 = p_1 + 2 * p_2$$ $n = 3$, $$E_3 = p_1 * 1 + p_2 * \frac{p_1}{1 - p_2} * 2 + p_3 * \frac{p_1}{1 - p_3} * 2 + p_2 * \frac{p_3}{1 - p_2} * \frac{p_1}{1 - p_2 - p_3} * 3 + p_3 * \frac{p_2}{1 - p_3} * \frac{p_1}{1 - p_3 - p_2} * 3$$

for $n = 3$ itself the expression gets pretty ugly. And for something like $n = 10$, it will be a nightmare to calculate by hand. Computer simulations will get too costly for $n > 10$ assuming a very loose upper bound of $n!$ runtime, which I have tried.

I googled about biased hypergeometric distributions which led me to Fischer's and Wallenius' non-central hypergeometric distributions which both seem to fit here, since the number of balls of each color is 1 to begin with.

After consulting what little literature I was able to find on these distributions online I feel like I am in the right place but I have absolutely no idea how to proceed. I would appreciate any help with this or just a nudge in the right direction.

Thanks!