Probability of having 'k' elements that occur only once in a multiset filled by sampling with replacement

700 Views Asked by At

Let's say that I have a set of unique elements, $P$, and a multiset $M$ that I fill with $N \leq ||P||$ elements by sampling with replacement from $P$. What is the probability that the multiset $M$ contains $0 \leq k \leq N$ elements that occur once (i.e. elements with only a single copy in the multiset)?

The probability of a single element in a multiset of cardinality $N$ occurring only once should be equivalent to tossing $N$ balls into $||P||$ bins, and finding the probability that a particular ball is by itself in a bin.


To get started...

From pg. 95 of "Probability and computing: Randomized algorithms and probabilistic analysis" by Michael Mitzenmacher and Eli Upfal, when we toss $N$ balls into $||P||$ bins, the probability that a specific bin has exactly $r$ balls, P[$r$], is given as:

P[$r$] = ${N \choose r}$ $(\frac{1}{||P||})^r(1-\frac{1}{||P||})^{(N-r)}$

By linearity of expectation, we can now write an expression for the expected number of balls that exist in a bin of $r$ balls as: E[X] = $||P||*r*{N \choose r}$ $(\frac{1}{||P||})^r(1-\frac{1}{||P||})^{(N-r)}$

For $r = 1$, $E[X] = N*(1-\frac{1}{||P||})^{N-1}$

1

There are 1 best solutions below

12
On BEST ANSWER

This is another variant on the birthday problem and the coupon collector's problem. Let's say there are $p$ elements of $P$ (possible birthdays) and a sample size with replacement of $n$ (people in the room). $k$ is then the number of people who do not share a birthday with anyone else in the room.

The expected number who do not share a birthday (i.e. $E[K]$) is not as difficult to state:
$$n \left(1-\frac{1}{p}\right)^{n-1}$$ while the variance is: $$n \left(1-\frac{1}{p}\right)^{n-1} + n (n-1) \left(1-\frac{1}{p}\right) \left(1-\frac{2}{p}\right)^{n-2} - n^2 \left(1-\frac{1}{p}\right)^{2(n-1)}.$$

These are similar in form to the expressions I gave on a similar question for number of distinct birthdays (distinct elements of $M$ in your question).

Added:

I think you can calculate the numbers using the following integer recurrence:

$$f(a,b,c,d) = (d-b+1) f(a-1,b-1,c-1,d) + (b-a) f(a,b,c-1,d) + (a+1) f(a+1,b,c-1,d)$$

starting at $f(a,b,0,d)=0$ except that $f(0,0,0,d)=1$. The probabilities you are looking for will then be

$$Pr(K=k|n,p) = p^{-n} \sum_b f(k,b,n,p).$$

As an example, looking at $n=5$ and $p=10$, you should find $f(0,1,5,10) = 10$, $f(0,2,5,10) = 900$, $f(1,2,5,10) = 450$, $f(1,3,5,10) = 10800$, $f(2,3,5,10) = 7200$, $f(3,4,5,10) = 50400$, and $f(5,5,5,10) = 30240$. This will lead to probabilities for the number of unique elements of

  • $Pr(K=0|n=5, p=10) = 0.0091$,
  • $Pr(K=1|n=5, p=10) = 0.1125$,
  • $Pr(K=2|n=5, p=10) = 0.0720$,
  • $Pr(K=3|n=5, p=10) = 0.5040$,
  • $Pr(K=4|n=5, p=10) = 0$, and
  • $Pr(K=5|n=5, p=10) = 0.3024$.

Given the lack of an obvious pattern in these probabilities as $k$ increases, I think there is unlikely to be a simple closed form in general.