What is the expected number of distinct drink types a group of $k$ people will order if there are $n$ choices?

201 Views Asked by At

Assuming each of the $n$ choices is equally likely and everybody orders one. I tried answering the question - what's the probability that it's exactly equal to some $i \in [1,n]$ and I get this expression:

$$Prob(X=i) = \frac{{n \choose i} i^k}{n^k}$$

because the total number of choices if we list them out for $k$ people are $n^k$ ($n$ choices for each) and if there's going to be $i$ distinct drinks served then we have to pick which $i$ (so ${n \choose i}$) and then each of the $k$ people must choose freely from the $i$ options.

I thus get this explicit expression for the expected value:

$$E(X) = \sum_{i=1}^n i P(X=i) = \sum_{i=1}^n i \frac{{n \choose i} i^k}{n^k} = \sum_{i=1}^n i {n \choose i} \left(\frac{i}{n}\right)^k$$

I don't see how I can simplify that summation further, though - so maybe i'm not on the right track here?

One alternative way to think about it is as a state diagram where we have a step $i$ for each of the $n$ choices, and it represents "unique drinks so far". Then as each person orders a drink we can either transition to the right (one more distinct drink) or remain in the same state (but never transition left). With this representation, if I let $n$ be $4$, for example, and we start in state $1$ (first row, first drink ordered is always unique) and as people place their orders the transition probabilities are:

$$M = \left( \begin{array}{cccc} 1/4 & 3/4 & 0 & 0\\ 0 & 1/2 & 1/2 & 0\\ 0 & 0 & 3/4 & 1/4\\ 0 & 0 & 0 & 1\end{array} \right)$$

then if we want to know the probabilities after $k-1$ iterations (for a total of $k$ people) we just need to look at $M^{k-1}$ and our expected value should be the inner product of the top-most row with the vector $(1,2,3,4)^t$. I'm not sure how to generalize this either to any $n$ but it feels doable just from the structure in the matrix (and the values look reasonable when I try it in R)

Are both of these approaches correct? If so, which one and how would get to the bottom of this? I also get different numerical answers for $n=4$ and $k=4$ using both approaches. The transition matrix approach gives me $2.73$ while the explicit summation yields $2.66$

2

There are 2 best solutions below

1
On BEST ANSWER

A quick answer can be found using the method of indicator random variables. Let the drink types be labelled from $1$ to $n$. For each $i$ from $1$ to $n$, let random variable $X_i$ be equal to $1$ if at least one person has a drink of Type $i$, and let $X_i=0$ otherwise.

Then the number $Y$ of different types drunk is given by $Y=X_1+\cdots+X_n$. Thus $$E(Y)=E(X_1)+E(X_2)+\cdots+E(X_n).$$ Note that $\Pr(X_i=0)=\left(1-\frac{1}{n}\right)^k$. It follows that $$E(X_i)=\Pr(X_i=1)=1-\left(1-\frac{1}{n}\right)^k.$$ Multiply by $n$ to find $E(Y)$.

3
On

Assume $|A|=k$ and $|B|=n$. We want to compute the average size of $f(A)$, with $f$ ranging over all the functions from $A$ to $B$. For any $j\in\{1,\ldots,k\}$, there are (look at Stirling numbers of the second kind) $$ {k\brace j}\cdot j!\cdot\binom{n}{j}={k \brace j}(n)_j\tag{1} $$ functions such that $|f(A)|=j$. Our expected value is so: $$ \frac{1}{n^k}\sum_{j=1}^{k}{k\brace j}(n)_j\cdot j \tag{2}$$ but: $$ j{k\brace j} = {k+1\brace j}-{k\brace j-1}\tag{3}$$ hence our expected value is: $$\frac{1}{n^k}\left(\sum_{j=1}^{k}{k+1\brace j}(n)_j-n\cdot\sum_{j=0}^{k-1}{k\brace j}(n-1)_{j}\right)\tag{4}$$ that is: $$ \frac{1}{n^k}\left(n^{k+1}-(n)_{k+1}-n\cdot\left((n-1)^k-(n-1)_k\right)\right)=\color{red}{n\cdot\left(1-\left(1-\frac{1}{n}\right)^k\right)}.\tag{5}$$

That is much easier to find through a double-counting argument, but I will let you mumble about it - ok, André Nicolas has just written it.