Suppose we have an urn with $N$ different colored balls and $b$ balls of each color, for $Nb$ balls total in the urn. From this we randomly draw $r$ balls without replacement. What is the probability that these $r$ balls will contain exactly $k$ unique colors?
Clearly it must first be that for $Nb>r>0$ and $r>k>1$ to have a positive probability. Beyond this, the closest I can get is:
\begin{equation} P(k|r)=\frac{\binom{N}{k} \binom{b}{1}^k \binom{kb-k}{r-k}}{\binom{Nb}{r}} \end{equation}
But I know it's not right. My reasoning for the above equation is that there are $\binom{Nb}{r}$ ways can choose the $r$ balls overall (though even this I'm not sure about -- should this take into account the unique colors? Something along the lines of the adjusting the number of permutations of a word with multiple letters?). And for the numerator, we select the $\binom{N}{k}$ colors, for each of these colors we select one of the $b$ balls (so $k$ balls total) to ensure that each color has at least one ball. Then we select the remaining $r-k$ balls from the remaining $kb-k$ balls of the $k$ colors.
But this isn't quite correct, which I think is either due to the denominator being wrong. Or to the fact that the $\binom{kb-k}{r-k}$ should likewise take into account the unique ways that the $r-k$ balls can be binned among the $k$ colors.
Any thoughts? Thanks!
What is certainly correct is the number to choose $r$ of $Nb$ balls, and $k$ of $N$ colors. Thus the error can be hidden only in the way to choose $r$ of $kb$ balls subject to the restriction that at least one ball of each of $k$ colors have to be chosen.
Let's try inclusion exclusion principle. We first take all possible combinations to choose $r$ of $kb$ balls, then subtract those which have at least one color missing, then add those which have at least two colors missing and so on. This results in the expression: $$ {b,k\brack r}=\sum_{i=0}^k (-1)^i\binom{k}{i}\binom{b(k-i)}{r}, $$ with the final result: $$ P(k|r)=\frac{\binom{N}{k}{b,k\brack r}}{\binom{Nb}{r}}. $$