n*m distinguishable balls with m different colors, the probability of randomly choosing k balls containing all balls from at least 2 different colors

230 Views Asked by At

Assume we have m groups of n balls and the balls in the same group have the same color. So there are m*n balls in total. Now, suppose we randomly choose k>(2*n) balls from the set of m*n balls. How much is the probability that the chosen k balls contain all balls of at least two different colors (entirely all balls of two groups)?

In other words, the unchosen set of balls contains the balls of different m-2 colors at most (instead of m colors).

To grasp better, notice the picture of 3*4 ( = n*m) balls. Each group of 3 balls has the same color. The probability I am looking for is to choose k balls containing the balls of two entire groups. For example, choosing balls 1, 5, 9, 3, 7, 11, 8 (contains all yellow and blue balls).

4 groups of 3 balls with the same colors: n = 3, m = 4

I hope I could explain the problem clearly. I have implemented a simulator for testing different scenarios. Then I tested the simulated results with different combinatorial/binomial solutions. But I get different results every time and now I'm lost.

This is my simulator in python testing different choices many times:

from random import sample
from collections import Counter
m = 4
n = 3
k = 5
it = 100000
balls = range(m*n)
cf = 0
for i in range(it):
    choices = sample(balls, k)
    samecolors = map(lambda x:x%m, choices)
    cnt = Counter(samecolors)
    mc = cnt.most_common(2)
    if (mc[-1][-1] == n): // if the second most common chosen color has *n* balls
        cf += 1
print(float(cf)/float(it))
2

There are 2 best solutions below

3
On BEST ANSWER

The overall number of ways to choose $k$ out of $mn$ balls is $\binom{mn}{k}$, where all balls are assumed to be distinguishable. Among them naively there are $$\binom mr\binom{mn-rn}{k-rn}$$ combinations consisting of at least $r$ full sets of balls of the same color. However if $k\ge(r+1)n$ the above expression will double-count all combinations consisting of more than $r$ full sets, which should be accounted for. The correct way for this is the generalised inclusion-exclusion principle: $$ \nu_r=\sum_{j\ge r}(-1)^{j-r}\binom jr\binom mj\binom{mn-jn}{k-jn}, $$ which gives the number of combinations with exactly $r$ full sets.

To obtain the number of combinations with at least $r$ full sets one should sum the above expressions: $$\begin{align} N_r=\sum_{i\ge r}\nu_i&=\sum_{i\ge r}\sum_{j\ge i}(-1)^{j-i}\binom ji\binom mj\binom{mn-jn}{k-jn}\\ &=\sum_{j\ge i}(-1)^j\binom mj\binom{mn-jn}{k-jn}\sum_{i\ge r}(-1)^{i}\binom ji\\ &=\sum_{j\ge r}(-1)^{j-r}\binom{j-1}{r-1}\binom mj\binom{mn-jn}{k-jn}. \end{align}$$

Thus, the probability in question reads (with $r=2$): $$ p_r=\frac {\sum_{j\ge r}(-1)^{j-r}\binom{j-1}{r-1}\binom mj\binom{mn-jn}{k-jn}}{\binom{mn}k}. $$

5
On

Using multinomial theorem, the total number of cases are the number of solutions of the equation $$x_1 + x_2 + x_3 + ...+ x_m = k, x_i \in \{0, 1, ... n\}$$ The number of solutions is coefficient of $x^k$ in $(1-x^{n+1})^m \times (1-x)^{-m}$. This will have to be evaluated depending on the value of $k$. Let this number be $A$

The number of required cases are the number of solutions of the equation $$y_1 + y_2 + ... + y_{m-2} = k-2n$$ Note that we have 2 complete sets of $n$ elements selected, so we are finding the number of ways to select the remaining $k-2n$ elements. The number of solutions is coefficient of $x^{k-2n}$ in $(1-x^{n+1})^{m-2} \times (1-x)^{-(m-2)}$. Let this number be $B$.

The net probability is $$P = \frac{B}{A}$$

$B$ and $A$ would have to be calculated based on a numeric value of $k$.